# HEAR 2021 Evaluation Example

This notebook provides an example of downloading a pre-processed task and running evaluation using the Wav2Vec2 baseline model.

## 1. Install Dependecies
**Note: You may have to restart runtime after installation**

In [None]:
# Replace this with the pip install from PyPI
!git clone -b 2021.1.0 https://github.com/neuralaudio/hear-eval-kit.git
!pip install -e ./hear-eval-kit/

!git clone -b 2021.1.0 https://github.com/neuralaudio/hear-baseline.git
!pip install -e ./hear-baseline/

Cloning into 'hear-eval-kit'...
remote: Enumerating objects: 5703, done.[K
remote: Counting objects: 100% (1190/1190), done.[K
remote: Compressing objects: 100% (261/261), done.[K
remote: Total 5703 (delta 994), reused 933 (delta 928), pack-reused 4513[K
Receiving objects: 100% (5703/5703), 898.54 KiB | 10.70 MiB/s, done.
Resolving deltas: 100% (4033/4033), done.
Obtaining file:///content/hear-eval-kit
Collecting dcase_util
  Downloading dcase_util-0.2.18.tar.gz (2.1 MB)
[K     |████████████████████████████████| 2.1 MB 7.6 MB/s 
Collecting numpy==1.19.2
  Downloading numpy-1.19.2-cp37-cp37m-manylinux2010_x86_64.whl (14.5 MB)
[K     |████████████████████████████████| 14.5 MB 55.4 MB/s 
Collecting pynvml
  Downloading pynvml-11.4.1-py3-none-any.whl (46 kB)
[K     |████████████████████████████████| 46 kB 5.7 MB/s 
[?25hCollecting pytorch-lightning
  Downloading pytorch_lightning-1.5.9-py3-none-any.whl (527 kB)
[K     |████████████████████████████████| 527 kB 73.8 MB/s 
Collecting

Cloning into 'hear-baseline'...
remote: Enumerating objects: 679, done.[K
remote: Counting objects: 100% (679/679), done.[K
remote: Compressing objects: 100% (399/399), done.[K
remote: Total 679 (delta 409), reused 490 (delta 263), pack-reused 0[K
Receiving objects: 100% (679/679), 18.72 MiB | 30.72 MiB/s, done.
Resolving deltas: 100% (409/409), done.
Obtaining file:///content/hear-baseline
Collecting speechbrain
  Downloading speechbrain-0.5.11-py3-none-any.whl (408 kB)
[K     |████████████████████████████████| 408 kB 7.6 MB/s 
[?25hCollecting transformers
  Downloading transformers-4.15.0-py3-none-any.whl (3.4 MB)
[K     |████████████████████████████████| 3.4 MB 64.4 MB/s 
[?25hCollecting torchcrepe
  Downloading torchcrepe-0.0.15-py3-none-any.whl (72.3 MB)
[K     |████████████████████████████████| 72.3 MB 7.5 kB/s 
[?25hCollecting torchopenl3
  Downloading torchopenl3-1.0.0-py2.py3-none-any.whl (14 kB)
Collecting hyperpyyaml
  Downloading HyperPyYAML-1.0.0-py3-none-any.whl

## 2. Download the Dataset

We download the Mridingham Tonic dataset with samplerate 16000 as an example.

In [None]:
!wget https://github.com/neuralaudio/HEAR2021-GTZAN-musicspeech/raw/main/hear2021-mridangam_tonic-v1.5-full-16000.tar.gz
!tar -xzf hear2021-mridangam_tonic-v1.5-full-16000.tar.gz

--2022-01-21 02:09:11--  https://github.com/neuralaudio/HEAR2021-GTZAN-musicspeech/raw/main/hear2021-mridangam_tonic-v1.5-full-16000.tar.gz
Resolving github.com (github.com)... 192.30.255.113
Connecting to github.com (github.com)|192.30.255.113|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://github.com/neuralaudio/hear2021-sample-datasets/raw/main/hear2021-mridangam_tonic-v1.5-full-16000.tar.gz [following]
--2022-01-21 02:09:11--  https://github.com/neuralaudio/hear2021-sample-datasets/raw/main/hear2021-mridangam_tonic-v1.5-full-16000.tar.gz
Reusing existing connection to github.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/neuralaudio/hear2021-sample-datasets/main/hear2021-mridangam_tonic-v1.5-full-16000.tar.gz [following]
--2022-01-21 02:09:12--  https://raw.githubusercontent.com/neuralaudio/hear2021-sample-datasets/main/hear2021-mridangam_tonic-v1.5-full-16000.tar.gz
Resolving

## 3. Compute embeddings using Wav2Vec2

In [None]:
!python3 -m heareval.embeddings.runner hearbaseline.wav2vec2 --tasks-dir ./tasks/

Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
2022-01-21 02:10:14.710225: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
embeddings
Importing hearbaseline.wav2vec2
Downloading: 100% 212/212 [00:00<00:00, 346kB/s]
Downloading: 100% 1.73k/1.73k [00:00<00:00, 2.71MB/s]
Downloading: 100% 1.18G/1.18G [00:19<00:00, 65.7MB/s]
Some weights of the model checkpoint at facebook/wav2vec2-large-100k-voxpopuli were not used when initializing Wav2Vec2Model: ['quantizer.weight_proj.weight', 'project_hid.bias', 'project_hid.weight', 'project_q.bias', 'project_q.weight', 'quantizer.weight_proj.bias', 'quantizer.codevectors']
- This IS expected if you are initializing Wav2Vec2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a Be

## 4. Evaluate
Train and predict with a shallow downstream classifier using the computed embeddings.

In [None]:
# Required for determinism
%env CUBLAS_WORKSPACE_CONFIG=:4096:8

# Train and evaluate classifier using hearbaseline embeddings
!python3 -m heareval.predictions.runner embeddings/hearbaseline.wav2vec2/*

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
grid: 2it [00:24, 11.66s/it][Apredict - mridangam_stroke - 2022-01-21 02:12:18,024 - 24 - Grid point 3 of 8: {'batch_size': 1024, 'check_val_every_n_epoch': 3, 'dropout': 0.1, 'embedding_norm': <class 'torch.nn.modules.linear.Identity'>, 'hidden_dim': 1024, 'hidden_layers': 1, 'hidden_norm': <class 'torch.nn.modules.batchnorm.BatchNorm1d'>, 'initialization': <function xavier_normal_ at 0x7f3d27ea65f0>, 'lr': 0.001, 'max_epochs': 500, 'norm_after_activation': False, 'optim': <class 'torch.optim.adam.Adam'>, 'patience': 20}
  x = self.activation(x)
Layer (type:depth-idx)                   Output Shape              Param #
FullyConnectedPrediction                 --                        --
├─Sequential: 1-1                        [64, 1024]                --
│    └─Linear: 2-1                       [64, 1024]                1,049,600
│    └─BatchNorm1d: 2-2                  [64, 1024]                2,048
│    └─Dropout: 

## Results

In [None]:
import json
results = json.load(open("embeddings/hearbaseline.wav2vec2/mridangam_tonic-v1.5-full/test.predicted-scores.json"))
results["aggregated_scores"]

{'epoch_mean': 221.4,
 'epoch_std': 63.049980174461595,
 'test_aucroc_mean': 0.9673656254046769,
 'test_aucroc_std': 0.006645488022238807,
 'test_d_prime_mean': 3.225787954524603,
 'test_d_prime_std': 0.14826965861426444,
 'test_loss_mean': 0.5152512609958648,
 'test_loss_std': 0.05847410765629445,
 'test_mAP_mean': 0.8749861362956045,
 'test_mAP_std': 0.02735409543436427,
 'test_score_mean': 0.8294376134872437,
 'test_score_std': 0.019911728656792756,
 'test_top1_acc_mean': 0.8294376134872437,
 'test_top1_acc_std': 0.019911728656792756,
 'time_in_min_mean': 0.369238403638204,
 'time_in_min_std': 0.06784636156581825,
 'validation_score_mean': 0.8282931089401245,
 'validation_score_std': 0.01605677167624731}

In [None]:
results["aggregated_scores"]['test_score_mean']

0.8294376134872437