# <center> MUSIC INFORMATION RETRIEVAL</center>
## <center> Audio fingerprinting - with audfprint</center>    

**Note**: *this notebook its based on the one prepared by **Marius Miron** for the MIR course.*

### About this notebook

This notebooks uses `audfprint` by Dan Ellis, which can take a list of soundfiles and create a database of landmarks, and then subsequently take one or more query audio files and match them against the previously-created database. This can be used e.g. to "de-duplicate" a collection of music. The fingerprint is robust to things like time skews, different encoding schemes, and even added noise. 

We also use `mirdata` to manage a dataset of audio recordings and `audiomentations` for audio transformations.

### How to run the notebook
You can download the notebook and run it locally in your computer.

You can also run it in Google Colab by using the following link.

<table align="center">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/mrocamora/mir_course/blob/main/notebooks/MIR_course-audio_fingerprinting_with_audfprint.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
</table>

### Installation of packages

We install mirdata to manage the datasets and audiomentations for audio transformations such as adding noise, time stretching, pitch shifting.

In [None]:
!pip install mirdata
!pip install audiomentations

We download the audfprint repository which has an audio fingerprint API which can construct a database from a directory and query the database. 

In [None]:
!git clone https://github.com/dpwe/audfprint.git
%cd audfprint/
!pip install -r requirements.txt

### Data preparation

We import the libraries and we initialize and download the Orchset dataset with mirdata. 

In [None]:
import os
import mirdata
import soundfile as sf
import audiomentations
dataset = mirdata.initialize("orchset")

In [None]:
dataset.download()

In [None]:
import IPython.display as ipd
tracks = dataset.load_tracks()  
track = tracks['Beethoven-S5-I-ex1']
x, sr = track.audio_mono
ipd.Audio(x,rate=sr)

### Fingerprinting

This is the path where the mono audio files are located. 

In [None]:
os.path.join(dataset.data_home,'audio','mono')

Using the path above we can construct a new database by ingesting all the files in the corresponding path. 

In [None]:
!python audfprint.py new --dbase fpdbase.pklz /root/mir_datasets/orchset/audio/mono/*.wav

We test the system with the stereo audio for one track in the database. The system should be robust to taking one of the channels as input. 

In [None]:
!mkdir '/root/temp'
x, sr = track.audio_stereo
out_filename = os.path.join('/root/temp',track.track_id+'-L.wav')
print(out_filename)
sf.write(out_filename, x[0,int(5*sr):], sr)
out_filename = os.path.join('/root/temp',track.track_id+'-R.wav')
print(out_filename)
sf.write(out_filename, x[1,int(5*sr):], sr)

In [None]:
!python audfprint.py match --dbase fpdbase.pklz /root/temp/Beethoven-S5-I-ex1-L.wav 
%matplotlib
!python audfprint.py match --dbase fpdbase.pklz /root/temp/Beethoven-S5-I-ex1-R.wav -I

The two queries contains audio from Beethoven-S5-I-ex1.wav starting at 5 sec into the track, left and right channel separately. There were a total of 61/43 landmark hashes shared between the query and that track. Generally, anything more than 5 or 6 consistently-timed matching hashes indicate a true match, and random chance will result in fewer than 1% of the raw common hashes being temporally consistent.

The system should also be robust to various transformations like noise, time stretching, pitch shifting. We use audiomentations to compose a transformation by chaining up different audio effects. This is applied to a given audio track and written to the disk. We use audfprint to identify this file in the database.

In [None]:
augment0 = audiomentations.Compose([
    audiomentations.AddGaussianNoise(min_amplitude=0.001, max_amplitude=0.015, p=0.5),
    audiomentations.TimeStretch(min_rate=0.8, max_rate=1.25, p=0.5),
    audiomentations.PitchShift(min_semitones=-4, max_semitones=4, p=0.5),
    #audiomentations.Shift(min_fraction=-0.5, max_fraction=0.5, p=0.5),
])
x, sr = track.audio_mono
# Augment/transform/perturb the audio data
augmented_samples = augment0(samples=x, sample_rate=sr)
out_filename = os.path.join('/root/temp',track.track_id+'-aug0.wav')
print(out_filename)
sf.write(out_filename, augmented_samples, sr)
!python audfprint.py match --dbase fpdbase.pklz /root/temp/Beethoven-S5-I-ex1-aug0.wav
ipd.Audio(augmented_samples,rate=sr)

We can stress-test the system by increasing gradually the amount of noise until the song is not identified.

What is the impact of these transformation on the methods using landmarks/peaks such as audfprint? 

In [None]:
augment1 = audiomentations.Compose([
    audiomentations.AddGaussianNoise(min_amplitude=0.009, max_amplitude=0.09, p=0.99),
    #audiomentations.AddShortNoises()
])
x, sr = track.audio_mono
# Augment/transform/perturb the audio data
augmented_samples = augment1(samples=x[int(5*sr):], sample_rate=sr)
out_filename = os.path.join('/root/temp',track.track_id+'-aug1.wav')
print(out_filename)
sf.write(out_filename, augmented_samples, sr)
!python audfprint.py match --dbase fpdbase.pklz /root/temp/Beethoven-S5-I-ex1-aug1.wav
ipd.Audio(augmented_samples,rate=sr)


Adding transformations to the original audio leads to a loss of peaks and to a change in distance between the pairs/triplets of peaks which generate different hashes. 

In [None]:
augment2 = audiomentations.Compose([
    audiomentations.AddGaussianNoise(min_amplitude=0.01, max_amplitude=0.1, p=0.99),
    #audiomentations.AddShortNoises()
])
x, sr = track.audio_mono
# Augment/transform/perturb the audio data
augmented_samples = augment2(samples=x[int(5*sr):], sample_rate=sr)
out_filename = os.path.join('/root/temp',track.track_id+'-aug2.wav')
print(out_filename)
sf.write(out_filename, augmented_samples, sr)
!python audfprint.py match --dbase fpdbase.pklz /root/temp/Beethoven-S5-I-ex1-aug2.wav
ipd.Audio(augmented_samples,rate=sr)

Let's take a look under the hood to see how audfprint works. 

In [None]:
import audfprint_analyze
import audfprint_match
import hash_table
from audfprint_analyze import g2h_analyzer

matcher = audfprint_match.Matcher()
#from audfprint_analyze import Matcher

pat = '/root/mir_datasets/orchset/audio/mono/*wav'
qry = '/root/temp/Beethoven-S5-I-ex1-aug2.wav'

####hash_tab = audfprint_analyze.glob2hashtable(pat) #new database
hash_tab = hash_table.HashTable('fpdbase.pklz') #load old database

g2h_analyzer = audfprint_analyze.Analyzer(density=40.0)

rslts, dur, nhash = matcher.match_file(g2h_analyzer,hash_tab, qry)
t_hop = 0.02322
if len(rslts)>0:
  print("Matched", qry, "(", dur, "s,", nhash, "hashes)",
          "as", hash_tab.names[rslts[0][0]],
          "at", t_hop * float(rslts[0][2]), "with", rslts[0][1],
          "of", rslts[0][3], "hashes")
  %matplotlib inline
  matcher.illustrate_match(g2h_analyzer, hash_tab, qry)
else:
  print("No matches")