# QVIM-AES Submission Template

This is the submission template for the Query by Vocal Imitation challenge at the 2025 AES International Conference on Artificial Intelligence and Machine Learning for Audio.

The content of this notebook is inspired by the template provided by the task organizers of the [Sound Scene Synthesis Taks of the DCASE Challenge 2024](https://dcase.community/challenge2024/task-sound-scene-synthesis).

<div class="alert alert-block alert-warning"> 
<b>Confidentiality Statement</b><br> As the organizers of this contest, we assure all participants that their submitted models and code will be treated with strict confidentiality. Submissions will only be accessed by the designated review team for evaluation purposes and will not be shared, distributed, or used beyond the scope of this challenge. Participants retain full ownership of their work. We will not claim any rights over the submitted materials, nor will we use them for any purpose outside of the challenge evaluation process. We appreciate your participation in this challenge.
</div>

#### How to create your submission
- Get familiar with the existing code blocks and the example provided below.
- Set the root path of your environment and your dataset below ("TODO: DEFINE YOUR PATHS HERE.").
- Set up your project ("TODO: SETUP YOUR PROJECT HERE.").
- Implement the retrieval interface below ("TODO: ADD YOUR IMPLEMENTATION HERE.").
    - Use the provided helper functions (helpers) to download your source code, model checkpoints, etc.
- Instantiate your retrieval model ("TODO: INSTANTIATE YOUR MODEL HERE.").
- Before **submitting your notebook**, run this notebook in a clean conda environment (with python >= 3.10) on Ubuntu 24.04 and make sure the evaluation results are in line with your previous results.
- Submit your notebooks and the technical report as described on our [website](https://qvim-aes.github.io/).

##### Some Rules
- DO NOT modify the other code cells.
- DO NOT add new cells.
- Store your project WITHIN 'ROOT_PATH' and your data within 'DATA_PATH'.
- DO NOT use 'ROOT_PATH/output' folder; this is where we will store things.
- DO NOT change the working directory (e.g., `os.chdir('/path/to/a/dir/that/does/not/exist/on/my/machine')`).
- DO NOT use system commands (`!cd ~` or `os.system('cd ~')`, etc.) other than the ones used to set up your environment (i.e., install required packages with pip, conda, ...).

<div class="alert alert-block alert-danger"> 
Participant who submit malicious code will be disqualified.
</div>
    

In [None]:
"""
DO NOT MODIFY THIS BLOCK.
"""
# Install basic packages for template notebook.
! pip install librosa numpy pandas tqdm GitPython gdown==5.1.0

In [2]:
"""
DO NOT MODIFY THIS BLOCK.
"""
# some imports
import sys
import os

from abc import ABC, abstractmethod
from tqdm import tqdm
import numpy as np
import pandas as pd


## Description of the Retrieval Interface 
`QVIMModel` is the interface specification for all query by vocal imitation systems. Each submitted system is expected to subclass this interface and implement the `compute_similarities` method, which computes the similarities between all pairwise combinations of queries (vocal imitations) and items (reference sounds).

`compute_similarities` takes two dictionaries as input:
- queries is a dictionary mapping ids of items to be retrieved to the corresponding file paths.
- items is a dictionary mapping query ids to the corresponding file paths

Participants are expected to load the sounds themselves, e.g., with `librosa.load`.

In [3]:
"""
DO NOT MODIFY THIS BLOCK.
"""

class QVIMModel(ABC):

    @abstractmethod
    def compute_similarities(
            self, items: dict[str, str], queries: dict[str, str]
    ) -> dict[str, dict[str, float]]:
        """Compute similarity scores between items to be retrieved and a set of queries.

        Each <query, item> pairing should be assigned a single floating point score, where higher
        scores indicate higher similarity.

        Args:
            items (dict[str, str]): A dictionary mapping ids of items to be retrieved to the corresponding file path
            queries (dict[str, str]): A dictionary mapping query ids to the corresponding file path

        Returns:
            scores (dict[str, dict[str, float]]): A dictionary mapping query ids to a dictionary of item
                ids and their corresponding similarity scores. E.g:
                {
                    "query_1": {
                        "item_1": 0.8,
                        "item_2": 0.6,
                        ...
                    },
                    "query_2": {
                        "item_1": 0.4,
                        "item_2": 0.9,
                        ...
                    },
                    ...
                }
        """
        pass

## Some Helper Functions

`helpers.py` contains some helpful functions for downloading code and model checkpoints from Google Drive, Git and public links.

The functions were taken (with slight modifications) from the submission template provided by the task organizers of [Task 7 of the DCASE Challenge 2024: Sound Scene Synthesis](https://dcase.community/challenge2024/task-sound-scene-synthesis).

In [4]:
import helpers
from helpers import google_drive_download, wget_download, git_clone_checkout, unpack_file

## Step 1: Setup your paths

- Define `ROOT_PATH`; this is where your project lives; for testing, we'll replace it with our custom ROOT_PATH. We recommend using the current working directory ('.').
- Define `DATA_PATH`; this is where your public development data lives; for testing, we'll replace it with our custom DATA_PATH. We recommend using 'data/qvim-dev'.
    

In [5]:
"""
TODO: DEFINE YOUR PATHS HERE.
"""

# replace this with your custom ROOT_PATH; this is where your code/ checkpoints will be downloaded to
ROOT_PATH = "."

# path to the evaluation data; can be in ROOT_PATH
DATA_PATH = os.path.join("data", "qvim-dev")

In [6]:
helpers.ROOT_PATH = ROOT_PATH
os.makedirs(ROOT_PATH, exist_ok=True)
os.makedirs(DATA_PATH, exist_ok=True)
sys.path.append(os.path.join(ROOT_PATH))

# Step 2: Setup your environment, download checkpoints, etc.

Setup your project and install the required packages here.
The easiest way is to:
1) convert your implementation into a package,
2) clone the repository and checkout the specific branch and commit,
3) install your package with pip install -e name_of_your_fancy_package


Hints:
- Make sure your link to the repository and other URLs are publicly available.
- Use **shared public URLs** (e.g. a shared Google Drive, Dropbox, Zenodo link) to download checkpoints into `ROOT_PATH`.
- Use the provided helper functions (`google_drive_download`, `wget_download`, `git_clone_checkout`, and `unpack_file`).

In [None]:
"""
TODO: SETUP YOUR PROJECT HERE.
"""

# TODO

# Step 3: Implement the QVIMModel Interface

In [None]:
"""
TODO: ADD YOUR IMPLEMENTATION HERE.
"""

# TODO

# Step 4: Create an Instance of your QVIMModel

In [7]:
"""
TODO: INSTANTIATE YOUR MODEL HERE.
"""

QBVIM_MODEL = None # store your model into this variable ...

## Create Predictions

To run this, download the development dataset and store them in `DATA_PATH`.

In [None]:
"""
DO NOT MODIFY THIS BLOCK.
"""
from glob import glob

items_path = os.path.join(DATA_PATH, "Items")
item_files = pd.DataFrame({'path': list(glob(os.path.join(items_path, "**", "*.wav"), recursive=True))})
item_files["Class"] = item_files['path'].transform(lambda x: x.split(os.path.sep)[-2])
item_files["Items"] = item_files['path'].transform(lambda x: x.split(os.path.sep)[-1])

queries_path = os.path.join(DATA_PATH, "Queries")
query_files = pd.DataFrame({'path': list(glob(os.path.join(queries_path, "**", "*.wav"), recursive=True))})
query_files["Class"] = query_files['path'].transform(lambda x: x.split(os.path.sep)[-2])
query_files["Query"] = query_files['path'].transform(lambda x: x.split(os.path.sep)[-1])

print("Total item files:", len(item_files))
print("Total query files:", len(query_files))

if len(query_files) == 0 or len(item_files) == 0:
    raise ValueError("No query files found! Download the development dataset and store it in 'DATA_PATH'.")

In [None]:
"""
DO NOT MODIFY THIS BLOCK.
"""

scores = QBVIM_MODEL.compute_similarities(
    items = {row["Items"]: row["path"] for i, row in item_files.iterrows()},
    queries = {row["Query"]: row["path"] for i, row in query_files.iterrows()}
)

In [12]:
"""
DO NOT MODIFY THIS BLOCK.
"""

import json

os.makedirs(os.path.join(ROOT_PATH, "output"), exist_ok=True)

with open(os.path.join(ROOT_PATH, "output", "similarities.json"), "w") as f:
    json.dump(scores, f)


## Evaluation on the Public Development Set

Computes the Reciprocal Rank (RR) for each query in the public development set. The RR is the inverted rank $r_i$ of the correct item for query $i$. Submissions will be ranked via the Mean Reciprocal Randk (MRR) of queries $Q$ on a hidden test set:

$$MRR = \frac{1}{\lvert Q \rvert} \sum_{i=1}^{\lvert Q\rvert} \frac{1}{r_i}$$

In [None]:
"""
DO NOT MODIFY THIS BLOCK.
"""
import json

with open(os.path.join(ROOT_PATH, "output", "similarities.json"), "r") as f:
    scores = json.load(f)

rankings = pd.DataFrame(dict(
    **{ "id": [i for i in list(scores.keys())]},
    **{ k: [v[k] for v in  scores.values() ] for k in scores[list(scores.keys())[0]].keys()}
)).set_index("id")

df = pd.read_csv(
    os.path.join(DATA_PATH, "DEV Dataset.csv"), skiprows=1
)[['Label', 'Class', 'Items', 'Query 1', 'Query 2', 'Query 3']]

df = df.melt(
    id_vars=[col for col in df.columns if "Query" not in col],
    value_vars=["Query 1", "Query 2", "Query 3"],
    var_name="Query Type",
    value_name="Query"
).dropna()

# remove missing files
rankings = rankings.loc[df["Query"].unique(), df["Items"].unique()]

# load file with ground truth, i.e., query->item mapping; column 0 is item, colum 1 query
ground_truth = {row['Query']: [row['Items']] for i, row in df.iterrows()}

# find the rank of the correct item (real recording) for each query (imitation)
position_of_correct = {}
missing_query_files = []
for query, correct_item_list in ground_truth.items():
    # Skip if query is not in the DataFrame
    if query not in rankings.index:
        missing_query_files.append(query)
        continue
    # Get row and sort items by similarity in descending order
    sorted_items = rankings.loc[query].sort_values(ascending=False)
    # Find rank of correct items
    position_of_correct[query] = {
        item: sorted_items.index.get_loc(item) for item in correct_item_list if item in sorted_items.index
    }
    assert len(position_of_correct[query]) == len(correct_item_list), f"Missing item! Got: {list(position_of_correct[query].keys())}. Expected: {correct_item_list}"

# compute MRR
normalized_rrs = []
for query, items_ranks in position_of_correct.items():
    rr, irr = [], [] # summed RR and ideal RR
    for i, (item, rank) in enumerate(items_ranks.items()):
        rr.append(1 / (rank + 1))
        irr.append(1 / (i + 1))
    normalized_rrs.append(sum(rr) / sum(irr)) # normalize MRR with ideal one
mrr = np.mean(normalized_rrs)

print("Missing query files: ", len(missing_query_files))
print("Missing item files: ", missing_query_files)
print("MRR random:", round((1/ np.arange(1,len(df["Items"].unique()))).mean(), 4))
print("MRR       :", round(mrr, 4))

In [None]:
"""
DO NOT MODIFY THIS BLOCK.
"""

ground_truth = {
    row["Query"]: [row_["Items"] for j, row_ in df.drop_duplicates("Items").iterrows() if row_["Class"] == row["Class"]] for i, row in df.drop_duplicates("Query").iterrows()
}

position_of_correct = {}
missing_query_files = []
for query, correct_item_list in ground_truth.items():
    # Skip if query is not in the DataFrame
    if query not in rankings.index:
        missing_query_files.append(query)
        continue
    # Get row and sort items by similarity in descending order
    sorted_items = rankings.loc[query].sort_values(ascending=False)
    # Find rank of correct items
    position_of_correct[query] = {item: sorted_items.index.get_loc(item) for item in correct_item_list if item in sorted_items.index}
    assert len(position_of_correct[query]) == len(correct_item_list), f"Missing item!"

# compute MRR
normalized_rrs = []
for query, items_ranks in position_of_correct.items():
    rr, irr = [], [] # summed RR and ideal RR
    for i, (item, rank) in enumerate(items_ranks.items()):
        rr.append(1 / (rank + 1))
        irr.append(1 / (i + 1))
    normalized_rrs.append(sum(rr) / sum(irr)) # normalize MRR with ideal one
mrr = np.mean(normalized_rrs)

# compute NDCG
normalized_dcg = []
ndcgs = {}
for query, items_ranks in position_of_correct.items():
    dcg, idcg = [], [] # summed RR and ideal RR
    for i, (item, rank) in enumerate(items_ranks.items()):
        dcg.append(1 / np.log2(rank + 2))
        idcg.append(1 / np.log2(i + 2))
    normalized_dcg.append(sum(dcg) / sum(idcg)) # normalize MRR with ideal one
    ndcgs[query] = sum(dcg) / sum(idcg)
ndcg = np.mean(normalized_dcg)

print("Class-wise MRR :", round(mrr, 4))
print("Class-wise NDCG:", round(ndcg, 4))