# ðŸŽ§ Audio Similarity Search using PyTorch

### Assignment 2

**Objective:**  
Find audio files similar to an uploaded audio file by comparing it with stored audio files using deep learningâ€“based audio features.

### Technologies Used
- PyTorch
- torchaudio
- MFCC Audio Features
- Cosine Similarity

### Workflow
1. Store multiple audio files
2. Upload a query audio file
3. Extract MFCC embeddings
4. Compare embeddings using cosine similarity
5. Return most similar audio files


In [79]:
"""
Libraries used:
- os: File and directory handling
- torch: Core PyTorch library
- torchaudio: Audio processing utilities built on PyTorch
- numpy: Numerical operations
- scipy.io.wavfile: Reading WAV audio files
- torch.nn.functional.cosine_similarity: Used to compute similarity between embeddings
"""

import os
import torch
import torchaudio
import numpy as np
from scipy.io.wavfile import read
from torch.nn.functional import cosine_similarity



In [80]:
class AudioEmbeddingExtractor:
    """
    AudioEmbeddingExtractor
    -----------------------
    This class is responsible for converting audio files into numerical
    embeddings using MFCC (Mel Frequency Cepstral Coefficients).

    These embeddings can later be used for similarity search or clustering.

    Key Responsibilities:
    - Load audio files
    - Extract MFCC features
    - Convert features into fixed-size embeddings
    """

    def __init__(self, sample_rate=16000, n_mfcc=40):
        """
        Initializes the extractor with MFCC configuration.

        """
        self.sample_rate = sample_rate
        self.mfcc_transform = torchaudio.transforms.MFCC(
            sample_rate=sample_rate,
            n_mfcc=n_mfcc
        )

    def extract_embedding(self, audio_path):
        """
        Extracts an embedding from a given audio file.

        Steps:
        ------
        1. Load WAV file
        2. Convert to mono if stereo
        3. Extract MFCC features
        4. Aggregate MFCCs to a single embedding vector
        """

        # Load audio using SciPy
        sr, waveform = read(audio_path)

        # Convert to float32
        waveform = waveform.astype(np.float32)

        # Normalize
        waveform = waveform / np.max(np.abs(waveform))

        # Convert NumPy â†’ Torch
        waveform = torch.tensor(waveform).unsqueeze(0)

        # Extract MFCC
        mfcc = self.mfcc_transform(waveform)

        # Mean pooling over time axis
        embedding = mfcc.mean(dim=2).squeeze()

        return embedding


In [81]:
"""
    AudioSearchEngine
    -----------------
    This class manages:
    - Indexing audio embeddings
    - Performing similarity search using cosine similarity

    It acts as a simple vector database for audio files.
"""
class AudioSimilarityEngine:
  

    def __init__(self, extractor):
        self.extractor = extractor
        self.audio_embeddings = {}


    """
        Reads all WAV files in the directory,
        extracts embeddings, and stores them in memory.
    """
    def index_audio_database(self, audio_folder):
        for file in os.listdir(audio_folder):
            if file.endswith(".wav"):
                path = os.path.join(audio_folder, file)
                embedding = self.extractor.extract_embedding(path)
                self.audio_embeddings[file] = embedding

        print(f"Indexed {len(self.audio_embeddings)} audio files.")

    """
        Finds the most similar audio files to a query audio.
    """
    def find_similar(self, query_audio_path, top_k=3):
        query_embedding = self.extractor.extract_embedding(query_audio_path)

        results = []
        for name, embedding in self.audio_embeddings.items():
            score = cosine_similarity(
                query_embedding.unsqueeze(0),
                embedding.unsqueeze(0)
            ).item()
            results.append((name, score))

        results.sort(key=lambda x: x[1], reverse=True)
        return results[:top_k]


In [82]:
"""
This cell initializes the AudioSearchEngine and indexes
all audio files present in the specified directory.
"""
extractor = AudioEmbeddingExtractor(
    sample_rate=SAMPLE_RATE,
    n_mfcc=N_MFCC
)

engine = AudioSimilarityEngine(extractor)


In [83]:
extractor = AudioEmbeddingExtractor()
engine = AudioSimilarityEngine(extractor)

engine.index_audio_database("audio_db")



Indexed 6 audio files.


In [84]:
"""
This cell performs a similarity search using a query audio file.
"""
results = engine.find_similar("query.wav")

print("Most Similar Audio Files:\n")
for audio, score in results:
    print(f"{audio} -> Similarity Score: {score:.4f}")



Most Similar Audio Files:

bird.wav -> Similarity Score: 0.9863
fish.wav -> Similarity Score: -0.2500
audio3.wav -> Similarity Score: -0.6737


In [85]:
"""
This cell demonstrates running a second query
to validate consistency of the similarity engine.
"""
results = engine.find_similar("second_query.wav")

print("Most Similar Audio Files:\n")
for audio, score in results:
    print(f"{audio} -> Similarity Score: {score:.4f}")

Most Similar Audio Files:

audio2.wav -> Similarity Score: 0.9977
audio1.wav -> Similarity Score: 0.9670
animal.wav -> Similarity Score: 0.9332
