<a href="https://colab.research.google.com/github/gened1080/audio-fingerprinting/blob/master/Compare_with_Noise_Fall_2020.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Comparing a signal with itself while adding noise

With this notebook you can compare an audio signal with itself while adding noise to it using the AudioFP class for fingerprinting. The idea is to get a sense of how noise affects audio fingerprinting and the comparison of fingerprints.

In [2]:
%%bash
!(stat -t /usr/local/lib/*/dist-packages/google/colab > /dev/null 2>&1) && exit 
rm -rf audio-fingerprinting
git clone https://github.com/gened1080/audio-fingerprinting.git
pip install pydub
pip install datasketch
sudo apt-get install libasound-dev portaudio19-dev libportaudio2 libportaudiocpp0 ffmpeg
pip install pyaudio

Collecting pydub
  Downloading https://files.pythonhosted.org/packages/7b/d1/fbfa79371a8cd9bb15c2e3c480d7e6e340ed5cc55005174e16f48418333a/pydub-0.24.1-py2.py3-none-any.whl
Installing collected packages: pydub
Successfully installed pydub-0.24.1
Collecting datasketch
  Downloading https://files.pythonhosted.org/packages/e9/7a/975274a59ab7e0117a8304ae97220b5800d31f69813c97753a79239dabac/datasketch-1.5.1-py2.py3-none-any.whl (73kB)
Installing collected packages: datasketch
Successfully installed datasketch-1.5.1
Reading package lists...
Building dependency tree...
Reading state information...
libasound2-dev is already the newest version (1.1.3-5ubuntu0.5).
ffmpeg is already the newest version (7:3.4.8-0ubuntu0.2).
The following package was automatically installed and is no longer required:
  libnvidia-common-440
Use 'sudo apt autoremove' to remove it.
Suggested packages:
  portaudio19-doc
The following NEW packages will be installed:
  libportaudio2 libportaudiocpp0 portaudio19-dev
0 upgr

Cloning into 'audio-fingerprinting'...
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 76, <> line 3.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype
dpkg-preconfigure: unable to re-open stdin: 


In [8]:
# Import relevant packages

from bokeh.io import output_notebook
import warnings
import sys
sys.path.append('/content/audio-fingerprinting')
import AudioFP as afp
import os

warnings.filterwarnings('ignore')
output_notebook()

### Fingerprint a song

We start by first fingerprinting a song. 

In [5]:
# Create AudioFP object for first song
song1 = afp.AudioFP(process='a')  # When prompted to choose whether to read audiofile or saved fingerprint, 
                                  # enter "f" to read from audiofile
                                  # because we need the raw signal to add noise.

Enter "f" to read from audio file or "s" to open saved fingerprint: f
Enter the filename you want to read (excluding the extension): SoundHelix-Song-1
Do you want to show all plots? Enter "y" or "n": n
Do you want to save the fingerprint to file for later use? Enter "y" or "n": y
Saving the fingerprint for: SoundHelix-Song-1


### Add noise to the signal and fingerprint the noisy track

Next, we create another `AudioFP` object. This time we proceed manually by setting the `process` argument to `m`. We start by creating an empty object and manually setting its `framerate` and `songname` properties based on the original signal. Next, we will use the function `add_noise` defined in the `AudioFP` class to generate Gaussian white noise of a specified decibel level and add to the signal. See [this page](https://chchearing.org/noise/common-environmental-noise-levels/) for common noise levels in decibels. The function `add_noise` takes the audio signal and its framerate as inputs in that order and outputs the signal with the added noise. Finally, we will go through the steps to generate a fingerprint of the signal with the noise.

In [None]:
# Create another AudioFP object from the same file and add noise
song2 = afp.AudioFP(process='m')
plot = False  # boolean to display results
song2.songname = 'noisy_' + song1.songname 
filename = song1.songname
channels, song2.framerate = afp.AudioFP.read_audiofile(song2, plot, filename)
# Add noise to the signal
channels = afp.add_noise(channels, song2.framerate)
# Create audio fingerprint
f, t, sgram = afp.AudioFP.generate_spectrogram(song2, plot, channels, song2.framerate)
fp, tp, peaks = afp.AudioFP.find_peaks(song2, plot, f, t, sgram)
afp.AudioFP.generate_fingerprint(song2, plot, fp, tp, peaks)

Enter the noise level you want to add in dB: 2


### Comparing fingerprints

For comparing two fingerprints, we will calculate what is known as the Jaccard similarity. Jaccard similarity, mathematically is the size of the intersection divided by the size of the union between two givent sets. Thus, two identical sets would have a Jaccard similarity index of 1 while entirely dissimilar sets would result in 0. A number in between 0 and 1 indicates some similarity, however, there isn't any rule specifying how "similar" are two songs with a Jaccard similarity index of say 0.7 for instance. All we can say at this point is that closer the Jaccard similarity index of two songs is to 1 the more similar they are. One could use [bootstrapping](https://en.wikipedia.org/wiki/Bootstrapping_(statistics)) to determine the extent of similarity of an arbitrary similarity score. Below, we have used some ranges based on some intuition using a small set of songs. The function `compare_fingerprints` is defined in the `AudioFP` class. If you want to see how the ranges are defined, take a look at the file `AudioFP.py`.

In [10]:
afp.compare_fingerprints(song1, song2)

SoundHelix-Song-1 and SoundHelix-Song-1 are identical!
Jaccard similarity =  0.98046875
