# Implementing Shazam from scratch
Shazam is a great application that can tell you the title of a song by listening to a short sample. We will implement a simplified copy of this app by dealing with hashing algorithms. In particular implementing an LSH algorithm that takes as input an audio track and finds relevant matches.

# 1. The dataset

We used a kaggle dataset containing songs in an mp3 format that we will convert to wav:
https://www.kaggle.com/dhrumil140396/mp3s32k

In [None]:
data_folder = Path(f.PATH_SONGS_FOLDER)
mp3_tracks = data_folder.glob("*/*/*.mp3")
tracks = data_folder.glob("*/*/*.wav")

In [None]:
for track in tqdm(mp3_tracks, total=N_TRACKS):
    convert_mp3_to_wav(str(track))

# 2. Fingerprint Hashing 
We want to create a representation of our audio signal that allows us to characterize it with respect to its peaks. Once this process is complete, we can adopt a hashing function to get a fingerprint of each song.

#### First we extract the peaks for each song
To apply the LSH it is important to round the shingles in order to have a smaller number of shingles and less discriminant, this will allow us to find the buckets when implementing the LSH

In [None]:
song_peaks = f.extract_peaks(song_path, rounded = True)

#### Then we store in an array all the unique shingles
This will allow us to create the shingles matrix, a matrix with the shingles on the rows and the songs on the columns. There will be a 1 in the cell **(i,j)** if the shingle **i** is present in the song **j**.

In [None]:
shingles = f.unique_shingles(song_peaks)

#### Finally we build the shingls matrix

In [None]:
matrix = shingles_matrix(shingles, song_peaks)

#### Hashing the shingles matrix

This technique consists in permutating the matrix rows and for each column take the index of the first non-zero value. This will be the new row of the hash matrix. The hash matrix will have number of rows equal to the number of permutations we decided to apply and each column will be the fingerprint of a song.

It is important to set a seed because then we'll apply the same permutation to the queries to get their fingerprints.

In [19]:
hash_matrix = f.hash_matrix(matrix, shingles, song_peaks)

100%|██████████| 20/20 [00:00<00:00, 110.78it/s]


# 2. Applying LSH
We suggest to read this article in order to have a better idea of the algorithm (https://www.learndatasci.com/tutorials/building-recommendation-engine-locality-sensitive-hashing-lsh-python/).

The hash matrix will be divided into **b** bands of **r** rows each. We'll then create a dictionary to find all the songs in which a certain bucket is present. 

This will allow us when processing a query to only look for the songs contained in the buckets of the query.

In [20]:
buckets = f.db_buckets(hash_matrix, n_bands=5)

#### Matching the songs

To match a song the steps will be the following:
   1. Convert the query to shingles.
   2. Apply MinHash and LSH to the shingle set, which maps it to a specific bucket.
   3. Conduct a similarity search between the query item and the other items in the bucket.

In [21]:
for i in range(1,11):
    f.shazamLSH(f.PATH_TEST_QUERY + f'{i}.wav', hash_matrix_rounded, shingles_rounded, buckets)

Im listening to your music, please dont make noise ...
Maybe you were looking for this song:  dream on - aerosmith 
-----------------------

Im listening to your music, please dont make noise ...
Maybe you were looking for this song:  i want to break free - queen 
-----------------------

Im listening to your music, please dont make noise ...
Maybe you were looking for this song:  october - u2 
-----------------------

Im listening to your music, please dont make noise ...
Maybe you were looking for this song:  ob-la-di ob-la-da - beatles 
-----------------------

Im listening to your music, please dont make noise ...
Maybe you were looking for this song:  karma police - radiohead 
-----------------------

Im listening to your music, please dont make noise ...
Maybe you were looking for this song:  heartbreaker - led zeppelin 
-----------------------

Im listening to your music, please dont make noise ...
Maybe you were looking for this song:  go your own way - fleetwood mac 
---------