
# Quickstart guide

This example demonstrates how to build a simple content-based audio retrieval model and evaluate the retrieval accuracy on a small song dataset, CAL500. This dataset consists of 502 western pop songs, performed by 499 unique artists. Each song is tagged by at least three people using a standard survey and a fixed tag vocabulary of 174 musical concepts.

This package includes a loading utility for getting and processing this dataset, which makes loading quite easy.

In [1]:
from cbar.datasets.freesound import load_freesound

X, Y = load_freesound(512)

Calling `fetch_cal500()` initally downloads the CAL500 dataset to a subfolder of your home directory. You can specify a different location using the `data_home` parameter (`fetch_cal500(data_home='path')`). Subsequents calls simply load the dataset.

The raw dataset consists of about 10,000 39-dimensional features vectors
per minute of audio content which were created by

1. Sliding a half-overlapping short-time window of 12 milliseconds over each song's waveform data.
2. Extracting the 13 mel-frequency cepstral coefficients.
3. Appending the instantaneous first-order and second-order derivatives.

Each song is, then, represented by exactly 10,000 randomly subsampled, real-valued feature vectors as a *bag-of-frames*. The *bag-of-frames* features are further processed into one *k*-dimensional feature vector by encoding the feature vectors using a codebook and pooling them into one compact vector.

Specifically, *k*-means is used to cluster all frame vectors into *k* clusters. The resulting cluster centers correspond to the codewords in the codebook. Each frame vector is assigned to its closest cluster center and a song represented as the counts of frames assigned to each of the *k* cluster centers.

By default, `fetch_cal500()` uses a codebook size of 512 but this size is easily modified with the `codebook_size` parameter (`fetch_cal500(codebook_size=1024)`).

In [2]:
X.shape, Y.shape

((227085, 512), (227085,))

Let's split the data into training data and test data, fit the model on the training data, and evaluate it on the test data. Import and instantiate the model first.

In [3]:
from cbar.pamir import PAMIR

model = PAMIR(valid_interval=1000)

Then split the data and fit the model using the training data.

In [4]:
from cbar.cross_validation import train_test_split_plus

(X_train, X_test,
 Y_train, Y_test,
 Q_vec, weights) = train_test_split_plus(X, Y)

%time model.fit(X_train, Y_train, Q_vec, X_test, Y_test)



iter:        0, P10: 0.002, AP: 0.002, loss: 0.000
iter:     1000, P10: 0.002, AP: 0.002, loss: 1.020
iter:     2000, P10: 0.002, AP: 0.002, loss: 1.003
iter:     3000, P10: 0.002, AP: 0.002, loss: 1.009
iter:     4000, P10: 0.002, AP: 0.002, loss: 1.033
iter:     5000, P10: 0.002, AP: 0.002, loss: 1.009
iter:     6000, P10: 0.002, AP: 0.002, loss: 0.997
iter:     7000, P10: 0.002, AP: 0.002, loss: 0.982
iter:     8000, P10: 0.002, AP: 0.002, loss: 1.014
iter:     9000, P10: 0.002, AP: 0.002, loss: 1.022
iter:    10000, P10: 0.002, AP: 0.002, loss: 1.010
iter:    11000, P10: 0.002, AP: 0.002, loss: 1.019
iter:    12000, P10: 0.002, AP: 0.002, loss: 0.964
iter:    13000, P10: 0.002, AP: 0.002, loss: 0.992
iter:    14000, P10: 0.002, AP: 0.002, loss: 1.036
iter:    15000, P10: 0.002, AP: 0.002, loss: 1.020
iter:    16000, P10: 0.002, AP: 0.002, loss: 1.001
iter:    17000, P10: 0.002, AP: 0.002, loss: 0.967
iter:    18000, P10: 0.002, AP: 0.002, loss: 0.983
iter:    19000, P10: 0.002, AP:

cbar.pamir.PAMIR(max_iter=100000, C=1.0, valid_interval=1000, max_dips=20)

Now, predict the scores for each query with all songs. Ordering the songs from highest to lowest score corresponds to the ranking.

In [None]:
Y_score = model.predict(Q_vec, X_test)

Evaluate the predictions.

In [None]:
from cbar.evaluation import Evaluator
from cbar.utils import make_relevance_matrix

n_relevant = make_relevance_matrix(Q_vec, Y_train).sum(axis=1)

evaluator = Evaluator()
evaluator.eval(Q_vec, weights, Y_score, Y_test, n_relevant)
evaluator.prec_at

defaultdict(list,
            {1: [0.0011557353366079169],
             2: [0.001878069921987865],
             3: [0.0017336030049118752],
             4: [0.0017336030049118754],
             5: [0.001906963305403063],
             6: [0.002167003756139844],
             7: [0.0020638131010855655],
             8: [0.0020225368390638545],
             9: [0.0020024719894699674],
             10: [0.002022536839063855],
             11: [0.0020120301541856006],
             12: [0.0019813856566240273],
             13: [0.0020036113361741794],
             14: [0.002010279040039225],
             15: [0.0020921436263822858],
             16: [0.0021487265022370485],
             17: [0.002185905488249252],
             18: [0.0022268613871261867],
             19: [0.0022681465234732382],
             20: [0.0022141615176185262]})

## Cross-validation

The `cv` function in the `cross_validation` module offers an easy way to evaluate a retrieval method on multiple splits of the data. Let's run the same experiment on three folds.

In [None]:
from cbar.cross_validation import cv

In [None]:
%time cv('freesound', 512, n_folds=3, method='pamir', valid_interval=1000)

2019-04-10 04:59:32,138 [MainThread  ] [INFO ]  Running CV with 3 folds ...
2019-04-10 04:59:40,756 [MainThread  ] [INFO ]  Validating fold 0 ...
iter:        0, P10: 0.002, AP: 0.002, loss: 0.000
iter:     1000, P10: 0.002, AP: 0.002, loss: 1.001
iter:     2000, P10: 0.002, AP: 0.002, loss: 0.999
iter:     3000, P10: 0.002, AP: 0.002, loss: 1.014
iter:     4000, P10: 0.002, AP: 0.002, loss: 0.994
iter:     5000, P10: 0.002, AP: 0.002, loss: 1.007
iter:     6000, P10: 0.002, AP: 0.002, loss: 1.010
iter:     7000, P10: 0.002, AP: 0.002, loss: 1.027
iter:     8000, P10: 0.002, AP: 0.002, loss: 0.989
iter:     9000, P10: 0.002, AP: 0.002, loss: 1.023
iter:    10000, P10: 0.002, AP: 0.002, loss: 1.010


The cross-validation results including retrieval method parameters are written to a JSON file. For each dataset three separate result files for mean average precision (MAP), precision-at-*k*, and precision-at-10 as a function of relevant training examples are written to disk. Here are the mean average precision values of the last cross-validation run.

In [None]:
import json
import os
from cbar.settings import RESULTS_DIR

results = json.load(open(os.path.join(RESULTS_DIR, 'freesound_mean_ap.json')))
results[list(results.keys())[-1]]['precision']

## Start cross-validation with the CLI

This package comes with a simple CLI which makes it easy to start cross-validation experiments from the command line. The CLI enables you to specify a dataset and a retrieval method as well as additional options in one line.

To start an experiment on the CAL500 dataset with the LORETA retrieval method, use the following command.

```
$ cbar crossval --dataset cal500 loreta 
```

This simple command uses all the default parameters for LORETA but you can specify all parameters as arguments to the `loreta` command. To see the available options for the `loreta` command, ask for help like this.

```
$ cbar crossval loreta --help
Usage: cbar crossval loreta [OPTIONS]

Options:
  -n, --max-iter INTEGER        Maximum number of iterations
  -i, --valid-interval INTEGER  Rank of parameter matrix W
  -k INTEGER                    Rank of parameter matrix W
  --n0 FLOAT                    Step size parameter 1
  --n1 FLOAT                    Step size parameter 2
  -t, --rank-thresh FLOAT       Threshold for early stopping
  -l, --lambda FLOAT            Regularization constant
  --loss [warp|auc]             Loss function
  -d, --max-dips INTEGER        Maximum number of dips
  -v, --verbose                 Verbosity
  --help                        Show this message and exit.
```