# Audio Similarity Search
In this example we will be going over the code required to perform audio similarity searches. This example uses a the PANNs model to extract audio features that are then used with Milvus to build a system that can perform the searches.

A deployable version of a reverse audio search can be found in this directory.

## Data

This example uses the TUT Acoustic scenes 2017 Evaluation dataset, which contains 1622 10-second audio clips that fall within 15 categories: Bus, Cafe,
Car, City center, Forest path, Grocery store,  Home, Lakeside beach, Library, Metro station, Office, Residential area, Train, Tram, and Urban park.

Dataset size: ~ 4.29 GB.


Directory Structure:  
The file loader used in this example requires that all the data be in .wav format due to librosa limitations. The way that files are read also limits the structure to a folder with all the data points. 

## Requirements

|  Packages   |  Servers    |
|-                  | -                 |   
| pymilvus          | milvus-1.1.0      |
| redis             | redis             |
| librosa           |
| ipython           |
| numpy             |
| panns_inference   |

We have included a requirements.txt file in order to easily satisfy the required packages. 

## Up and Running


### Installing Packages
Install the required python packages with `requirements.txt`.

In [1]:
! pip install -r requirements.txt



### Starting Milvus Server

This demo uses Milvus 1.1.0, please refer to the [Install Milvus](https://milvus.io/docs/v1.1.0/install_milvus.md) guide to learn how to use this docker container. For this example we wont be mapping any local volumes. 

In [1]:
! docker run --name milvus_cpu_1.1.0 -d \
-p 19530:19530 \
-p 19121:19121 \
milvusdb/milvus:1.1.0-cpu-d050721-5e559c

docker: Error response from daemon: Conflict. The container name "/milvus_cpu_1.1.0" is already in use by container "4fb8cec9122862bcb864b6b782796d32dacfd0b8489bd9666224222cde746485". You have to remove (or rename) that container to be able to reuse that name.
See 'docker run --help'.


### Starting Redis Server

We are using Redis as a metadata storage service for this example. Code can easily be modified to use a python dictionary, but that usually does not work in any use case outside of quick examples. We need a metadata storage service in order to be able to be able to map between embeddings and their corresponding audio clips.

In [2]:
! docker run --name redis -d -p 6379:6379 redis

docker: Error response from daemon: Conflict. The container name "/redis" is already in use by container "0e45df4657c651586ae5c80d0db1605206415a0bcb101573b690a434b5a4f7e8". You have to remove (or rename) that container to be able to reuse that name.
See 'docker run --help'.


### Confirm Running Servers

In [3]:
! docker logs milvus_cpu_1.1.0


    __  _________ _   ____  ______    
   /  |/  /  _/ /| | / / / / / __/    
  / /|_/ // // /_| |/ / /_/ /\ \    
 /_/  /_/___/____/___/\____/___/     

Welcome to use Milvus!
Milvus Release version: v1.1.0, built at 2021-05-06 14:50.43, with OpenBLAS library.
You are using Milvus CPU edition
Last commit id: 5e559cd7918297bcdb55985b80567cb6278074dd

Loading configuration from: /var/lib/milvus/conf/server_config.yaml
WARNNING: You are using SQLite as the meta data management, which can't be used in production. Please change it to MySQL!
Supported CPU instruction sets: avx2, sse4_2
FAISS hook AVX2
Milvus server started successfully!


In [4]:
! docker logs redis

1:C 18 May 2021 20:22:25.046 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 18 May 2021 20:22:25.046 # Redis version=6.2.1, bits=64, commit=00000000, modified=0, pid=1, just started
1:M 18 May 2021 20:22:25.047 * monotonic clock: POSIX clock_gettime
1:M 18 May 2021 20:22:25.047 * Running mode=standalone, port=6379.
1:M 18 May 2021 20:22:25.048 # Server initialized
1:M 18 May 2021 20:22:25.048 * Ready to accept connections
1:M 18 May 2021 21:19:11.535 * 100 changes in 300 seconds. Saving...
1:M 18 May 2021 21:19:11.537 * Background saving started by pid 20
20:C 18 May 2021 21:19:11.541 * DB saved on disk
20:C 18 May 2021 21:19:11.542 * RDB: 0 MB of memory used by copy-on-write
1:M 18 May 2021 21:19:11.638 * Background saving terminated with success


### Downloading Data
These commands download and unzip the data.

In [None]:
!wget -O 'file1.zip' 'https://zenodo.org/record/1040168/files/TUT-acoustic-scenes-2017-evaluation.audio.1.zip?download=1' -q --show-progress
!wget -O 'file2.zip' 'https://zenodo.org/record/1040168/files/TUT-acoustic-scenes-2017-evaluation.audio.2.zip?download=1' -q --show-progress
!wget -O 'file3.zip' 'https://zenodo.org/record/1040168/files/TUT-acoustic-scenes-2017-evaluation.audio.3.zip?download=1' -q --show-progress
!wget -O 'file4.zip' 'https://zenodo.org/record/1040168/files/TUT-acoustic-scenes-2017-evaluation.audio.4.zip?download=1' -q --show-progress

!tar -xf file1.zip 
!tar -xf file2.zip 
!tar -xf file3.zip 
!tar -xf file4.zip 
!rm 'file1.zip' 'file2.zip' 'file3.zip' 'file4.zip'

## Code Overview

### Connecting to Servers
We first start off by connecting to the servers. In this case the docker containers are running on localhost and the ports are the default ports. 

In [4]:
#Connectings to Milvus and Redis
import redis
import milvus

milv = milvus.Milvus(host = '127.0.0.1', port = 19530)
red = redis.Redis(host = '127.0.0.1', port=6379, db=0)

### Building Collection and Setting Index

The next step involves creating a collection. A collection in Milvus is similar to a table in a relational database, and is used for storing all the vectors. To create a collection, we first must select a name, the dimension of the vectors being stored within, the index_file_size, and metric_type. The index_file_size corresponds to how large each data segmet will be within the collection. More information on this can be found here. The metric_type is the distance formula being used to calculate similarity. In this example we are using the Euclidean distance. 

In [9]:
#Creating collection

import time

collection_name = "audio_collection"
milv.drop_collection(collection_name) 
red.flushdb()
time.sleep(.1)

collection_param = {
            'collection_name': collection_name,
            'dimension': 2048,
            'index_file_size': 1024,  # optional
            'metric_type': milvus.MetricType.L2  # optional
            }

status, ok = milv.has_collection(collection_name)

if not ok:
    status = milv.create_collection(collection_param)
    print(status)

Status(code=0, message='Create collection successfully!')


After creating the collection we want to assign it an index type. This can be done before or after inserting the data. When done before, indexes will be made as data comes in and fills the data segments. In this example we are using IVF_SQ8 which requires the 'nlist' parameter. Each index types carries its own parameters. More info about this param can be found [here](https://milvus.io/docs/v1.0.0/index.md#CPU).

In [10]:
#Indexing collection

index_param = {
    'nlist': 512
}

status = milv.create_index(collection_name, milvus.IndexType.IVF_SQ8, index_param)
status, index = milv.get_index_info(collection_name)

### Processing and Storing Audio Files
In order to store the audio tracks in Milvus, we must first get the embeddings. To do this, we start by loading the audio file using Librosa. Once we have the audio clip loaded we can pass it to the PANN model. In this case we are using the panns_inference library to simplfy the importing and processing. Once we recieve the embedding we can push it into Milvus and store each uniqueID and filepath combo into redis. We do this so that we can later access the audio file when displaying the results. 

In [None]:
import os
import librosa
import numpy as np
from panns_inference import SoundEventDetection, labels, AudioTagging

data_dir = './TUT-acoustic-scenes-2017-evaluation/audio'
at = AudioTagging(checkpoint_path=None, device='cpu')

def embed_and_save(path, at):
    audio, _ = librosa.core.load(path, sr=32000, mono=True)
    audio = audio[None, :]
    try:
        _, embedding = at.inference(audio)
        embedding = embedding/np.linalg.norm(embedding)
        status, ids = milv.insert(collection_name=collection_name, records=embedding)
        if not status.OK():
            print("Insert failed: {}".format(status))
        else:
            red.set(str(ids[0]), path)
    except:
        print("failed: " + path)


print("Starting Insert")
for subdir, dirs, files in os.walk(data_dir):
    for file in files:
        path = os.path.join(subdir, file)
        embed_and_save(path, at)
print("Insert Done")
        
        

Checkpoint path: /Users/filiphaltmayer/panns_data/Cnn14_mAP=0.431.pth
Using CPU.
Starting Insert


### Searching
In this example we perform a search on a few randomly selected audio clips. In order to perform the search we must first apply the same processing that was done on the original audio clips. This will result in us having a set of embeddings.

In [None]:
def get_embed(paths, at):
    embedding_list = []
    for x in paths:
        audio, _ = librosa.core.load(x, sr=32000, mono=True)
        audio = audio[None, :]
        try:
            _, embedding = at.inference(audio)
            embedding = embedding/np.linalg.norm(embedding)
            embedding_list.append(embedding)
        except:
            print("Embedding Failed: " + x)
    return np.array(embedding_list, dtype=np.float32).squeeze()
#     return embedding_list

random_ids = [int(red.randomkey()) for x in range(3)]
search_clips = [x.decode("utf-8") for x in red.mget(random_ids)]
embeddings = get_embed(search_clips, at)
print(embeddings.shape)

We can then take these embeddings and perform a search. The search requires a few arguments: the name of the collection, the vectors being searched for, how many closest vectors to be returned, and the parameters for the index, in this case nprobe. Once performed this example will return the searched clip and the result clips. 

In [None]:
import IPython.display as ipd

def show_results(query, results, distances):
    print("Query: ")
    ipd.display(ipd.Audio(query))
    print("Results: ")
    for x in range(len(results)):
        print("Distance: " + str(distances[x]))
        ipd.display(ipd.Audio(results[x]))
    print("-"*50)

print(embeddings.shape)
search_sub_param = {
        "nprobe": 16
    }

search_param = {
    'collection_name': collection_name,
    'query_records': embeddings,
    'top_k': 3,
    'params': search_sub_param,
    }

start = time.time()
status, results = milv.search(**search_param)
end = time.time() - start

print("Search took a total of: ", end)

if status.OK():
    for x in range(len(results)):
        query_file = search_clips[x]
        result_files = [red.get(y.id).decode('utf-8') for y in results[x]]
        distances = [y.distance for y in results[x]]
        show_results(query_file, result_files, distances)
else:
    print("Search Failed.")

## Conclusion
This notebook shows how to search for similar audio clips. 

Check out our [demo system](https://zilliz.com/milvus-demos) to try out different solutions. 