# Audio Similarity Search using Vector Embeddings
This notebook demonstrates how to create vector embeddings of **audio files** to store into the **LanceDB vector store**, and then to find similar audio files.
We will be using [panns_inference package](https://github.com/qiuqiangkong/panns_inference) to tag the audio and create embeddings.
We'll also be using this [HuggingFace dataset](https://huggingface.co/datasets/ashraq/esc50) for the audio files. The dataset contains 2,000 sounds and labels.

In [46]:
!pip install panns-inference tqdm lancedb datasets torch torchvision torchaudio --q

In [47]:
from datasets import load_dataset
from panns_inference import AudioTagging
from tqdm import tqdm
from IPython.display import Audio, display
import numpy as np
import lancedb

FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\kaush/panns_data/class_labels_indices.csv'

On devices that have CUDA installed, you may be able to install torch's CUDA supported version.
```bash
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
```
If you don't have CUDA or a GPU (or different os), you can install torch here: https://pytorch.org/get-started/locally/

In [8]:
dataset = load_dataset("ashraq/esc50", split="train")
at = AudioTagging(checkpoint_path=None, device="cuda") # device="cpu" for CPU inference

Downloading readme:   0%|          | 0.00/345 [00:00<?, ?B/s]

Repo card metadata block was not found. Setting CardData to empty.


Downloading metadata:   0%|          | 0.00/1.61k [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/387M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/387M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split:   0%|          | 0/2000 [00:00<?, ? examples/s]

NameError: name 'AudioTagging' is not defined

In [19]:
dataset

Dataset({
    features: ['filename', 'fold', 'target', 'category', 'esc10', 'src_file', 'take', 'audio'],
    num_rows: 2000
})

Now, to create the data ane embeddings! We can start by creating batches of 100 for the data, keeping track of the most important columns: `category` and `audio`.

In [20]:
batches = [batch["audio"] for batch in dataset.iter(100)]
meta_batches = [batch["category"] for batch in dataset.iter(100)]
audio_data = [np.array([audio["array"] for audio in batch]) for batch in batches]
meta_data = [np.array([meta for meta in batch]) for batch in meta_batches]

We now want to iterate through these batches, and for each audio file, we want to use the AudioTagging embedder to extract the embedding. Then, we can store these embeddings, audio files, and category name into a list of dictionaries. Each dictionary has to contain a `vector` column in order to add to the LanceDB table, if no embedding function is provided.

In [None]:
for i in tqdm(range(len(audio_data))):
    (_, embedding) = at.inference(audio_data[i])
    data = [{"audio": x[0]['array'], "vector": x[1], 'sampling_rate': x[0]['sampling_rate'], 'category': meta_data[i][j]} for j, x in enumerate(zip(batches[i], embedding))]

Once we have this data list, we can create a LanceDB table by first connecting to a certain directory before, and then calling `db.create_table()`. If the table already exists, we open the table and add the data.

In [None]:
# Connect to directory at the top of the file
db = lancedb.connect("data/audio-lancedb")
table_name = "audio-search"

if table_name not in db.table_names():
    tbl = db.create_table(table_name, data)
else:
    tbl = db.open_table(table_name)
    tbl.add(data)

We can now combine all of this into a single function:

In [7]:
def insert_audio():
    batches = [batch["audio"] for batch in dataset.iter(100)]
    meta_batches = [batch["category"] for batch in dataset.iter(100)]
    audio_data = [np.array([audio["array"] for audio in batch]) for batch in batches]
    meta_data = [np.array([meta for meta in batch]) for batch in meta_batches]
    print("Start")
    for i in tqdm(range(len(audio_data))):
        (_, embedding) = at.inference(audio_data[i])
        data = [{"audio": x[0]['array'], "vector": x[1], 'sampling_rate': x[0]['sampling_rate'], 'category': meta_data[i][j]} for j, x in enumerate(zip(batches[i], embedding))]
        if table_name not in db.table_names():
            tbl = db.create_table(table_name, data)
        else:
            tbl = db.open_table(table_name)
            tbl.add(data)

In [8]:
import shutil
shutil.rmtree("data/audio-lancedb/audio-search.lance")

insert_audio()

Start


100%|██████████| 20/20 [05:02<00:00, 15.13s/it]


Great! We now have a fully populated table with all the necessary information. The next step would be to query the table and find those similar audio files. We can do this by first opening the table, and then getting the specific audio file we want to search for.

In [22]:
tbl = db.open_table(table_name)
audio = dataset[123]['audio']['array']
category = dataset[123]['category']
display(Audio(audio, rate=dataset[123]['audio']['sampling_rate']))
print("Category:", category)

Category: washing_machine


Next, we call the embedding function again to create those embeddings, which would allow us to search our table.

In [23]:
(_, embedding) = at.inference(audio[None, :])
result = tbl.search(embedding[0]).limit(5).to_df()
print(result)

                                               audio  \
0  [0.08984375, 0.0042724609375, -0.1806030273437...   
1  [-0.04083251953125, -0.0345458984375, -0.04095...   
2  [-0.15716552734375, -0.1749267578125, -0.17120...   
3  [0.313934326171875, 0.312774658203125, 0.31698...   
4  [-0.200286865234375, -0.19146728515625, -0.144...   

                                              vector  sampling_rate  \
0  [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...          44100   
1  [0.0, 0.5637009, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,...          44100   
2  [0.0, 0.46206585, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0...          44100   
3  [0.0, 0.5878647, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,...          44100   
4  [0.0, 0.706554, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...          44100   

          category         score  
0  washing_machine  4.816496e-12  
1  washing_machine  1.564546e+01  
2  washing_machine  2.403421e+01  
3  washing_machine  2.661950e+01  
4  washing_machine  3.144772e+01  


In [24]:
for i in range(len(result)):
    print(str(i) + ". Category:", result['category'][i])
    display(Audio(result['audio'][i], rate=result['sampling_rate'][i]))

0. Category: washing_machine


1. Category: washing_machine


2. Category: washing_machine


3. Category: washing_machine


4. Category: washing_machine


Nice! It seems to be working! We can compile this into another function here, that takes an `id` of the audio from 0 to 1,999.

In [14]:
def search_audio(id):
    tbl = db.open_table(table_name)
    audio = dataset[id]['audio']['array']
    category = dataset[id]['category']
    display(Audio(audio, rate=dataset[id]['audio']['sampling_rate']))
    print("Category:", category)

    (_, embedding) = at.inference(audio[None, :])
    result = tbl.search(embedding[0]).limit(5).to_df()
    print(result)
    for i in range(len(result)):
        print(str(i) + ". Category:", result['category'][i])
        display(Audio(result['audio'][i], rate=result['sampling_rate'][i]))


In [28]:
search_audio(1235)

Category: insects
                                               audio  \
0  [0.0003662109375, 0.00079345703125, 0.00021362...   
1  [0.00909423828125, 0.0054931640625, 0.00555419...   
2  [0.047698974609375, 0.043212890625, 0.03265380...   
3  [-0.044281005859375, -0.04522705078125, -0.046...   
4  [-0.000335693359375, -0.00384521484375, -0.004...   

                                              vector  sampling_rate category  \
0  [0.0, 0.38009748, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0...          44100  insects   
1  [0.0, 0.018222854, 0.0, 0.0, 0.0, 0.0, 0.0, 0....          44100  insects   
2  [0.0, 0.06542123, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0...          44100  insects   
3  [0.0, 0.005919501, 0.0, 0.0, 0.0, 0.0, 0.0, 0....          44100  insects   
4  [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...          44100  insects   

          score  
0  1.173890e-11  
1  5.864442e+01  
2  5.973289e+01  
3  5.973791e+01  
4  6.134641e+01  
0. Category: insects


1. Category: insects


2. Category: insects


3. Category: insects


4. Category: insects
