# HEAR 2021 Evaluation Example

This notebook provides an example of downloading a pre-processed task and running evaluation using the Wav2Vec2 baseline model.

## 1. Install Dependecies
**Note: You may have to restart runtime after installation**

In [1]:
!pip install heareval
!pip install hearbaseline

Collecting heareval
  Downloading heareval-2021.1.0-py3-none-any.whl (37 kB)
Collecting pynvml
  Downloading pynvml-11.4.1-py3-none-any.whl (46 kB)
[?25l[K     |███████                         | 10 kB 44.6 MB/s eta 0:00:01[K     |██████████████                  | 20 kB 50.8 MB/s eta 0:00:01[K     |█████████████████████           | 30 kB 36.2 MB/s eta 0:00:01[K     |████████████████████████████    | 40 kB 40.8 MB/s eta 0:00:01[K     |████████████████████████████████| 46 kB 5.6 MB/s 
[?25hCollecting pytorch-lightning
  Downloading pytorch_lightning-1.5.9-py3-none-any.whl (527 kB)
[K     |████████████████████████████████| 527 kB 14.6 MB/s 
Collecting numba==0.48
  Downloading numba-0.48.0-1-cp37-cp37m-manylinux2014_x86_64.whl (3.5 MB)
[K     |████████████████████████████████| 3.5 MB 37.3 MB/s 
Collecting torchinfo
  Downloading torchinfo-1.6.3-py3-none-any.whl (20 kB)
Collecting sed-eval
  Downloading sed_eval-0.2.1.tar.gz (21 kB)
Collecting spotty
  Downloading spotty-1.3.2

Collecting hearbaseline
  Downloading hearbaseline-2021.1.0-py3-none-any.whl (26 kB)
Collecting torchcrepe
  Downloading torchcrepe-0.0.15-py3-none-any.whl (72.3 MB)
[K     |████████████████████████████████| 72.3 MB 4.4 kB/s 
Collecting speechbrain
  Downloading speechbrain-0.5.11-py3-none-any.whl (408 kB)
[K     |████████████████████████████████| 408 kB 64.3 MB/s 
Collecting torchopenl3
  Downloading torchopenl3-1.0.0-py2.py3-none-any.whl (14 kB)
Collecting transformers
  Downloading transformers-4.15.0-py3-none-any.whl (3.4 MB)
[K     |████████████████████████████████| 3.4 MB 48.1 MB/s 
Collecting sentencepiece
  Downloading sentencepiece-0.1.96-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
[K     |████████████████████████████████| 1.2 MB 56.9 MB/s 
Collecting hyperpyyaml
  Downloading HyperPyYAML-1.0.0-py3-none-any.whl (15 kB)
Collecting huggingface-hub
  Downloading huggingface_hub-0.4.0-py3-none-any.whl (67 kB)
[K     |████████████████████████████████| 67

## 2. Download the Dataset

We download the Mridingham Tonic dataset with samplerate 16000 as an example.

In [2]:
!wget https://github.com/hearbenchmark/hear2021-sample-datasets/raw/main/hear2021-mridangam_tonic-v1.5-full-16000.tar.gz
!tar -xzf hear2021-mridangam_tonic-v1.5-full-16000.tar.gz

--2022-01-21 11:02:49--  https://github.com/neuralaudio/HEAR2021-GTZAN-musicspeech/raw/main/hear2021-mridangam_tonic-v1.5-full-16000.tar.gz
Resolving github.com (github.com)... 192.30.255.113
Connecting to github.com (github.com)|192.30.255.113|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://github.com/neuralaudio/hear2021-sample-datasets/raw/main/hear2021-mridangam_tonic-v1.5-full-16000.tar.gz [following]
--2022-01-21 11:02:50--  https://github.com/neuralaudio/hear2021-sample-datasets/raw/main/hear2021-mridangam_tonic-v1.5-full-16000.tar.gz
Reusing existing connection to github.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/neuralaudio/hear2021-sample-datasets/main/hear2021-mridangam_tonic-v1.5-full-16000.tar.gz [following]
--2022-01-21 11:02:50--  https://raw.githubusercontent.com/neuralaudio/hear2021-sample-datasets/main/hear2021-mridangam_tonic-v1.5-full-16000.tar.gz
Resolving

## 3. Compute embeddings using Wav2Vec2

In [3]:
!python3 -m heareval.embeddings.runner hearbaseline.wav2vec2 --tasks-dir ./tasks/

Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
2022-01-21 11:03:06.220054: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
embeddings
Importing hearbaseline.wav2vec2
Downloading: 100% 212/212 [00:00<00:00, 247kB/s]
Downloading: 100% 1.73k/1.73k [00:00<00:00, 2.20MB/s]
Downloading: 100% 1.18G/1.18G [00:21<00:00, 59.5MB/s]
Some weights of the model checkpoint at facebook/wav2vec2-large-100k-voxpopuli were not used when initializing Wav2Vec2Model: ['project_q.weight', 'project_hid.bias', 'quantizer.weight_proj.weight', 'project_q.bias', 'project_hid.weight', 'quantizer.weight_proj.bias', 'quantizer.codevectors']
- This IS expected if you are initializing Wav2Vec2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a Be

## 4. Evaluate
Train and predict with a shallow downstream classifier using the computed embeddings.

In [4]:
# Required for determinism
%env CUBLAS_WORKSPACE_CONFIG=:4096:8

# Train and evaluate classifier using hearbaseline embeddings
!python3 -m heareval.predictions.runner embeddings/hearbaseline.wav2vec2/*

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
  x = self.activation(x)
Epoch 56: 100% 7/7 [00:00<00:00, 70.87it/s, loss=0.183, v_num=3, val_loss=0.492, val_top1_acc=0.813, val_d_prime=3.300, val_aucroc=0.964, val_mAP=0.863]
Epoch 59:   0% 0/7 [00:00<?, ?it/s, loss=0.18, v_num=3, val_loss=0.492, val_top1_acc=0.813, val_d_prime=3.300, val_aucroc=0.964, val_mAP=0.863]
Validating: 0it [00:00, ?it/s][A
  x = self.activation(x)
Epoch 59: 100% 7/7 [00:00<00:00, 63.93it/s, loss=0.176, v_num=3, val_loss=0.483, val_top1_acc=0.816, val_d_prime=3.290, val_aucroc=0.966, val_mAP=0.869]
Epoch 62:   0% 0/7 [00:00<?, ?it/s, loss=0.163, v_num=3, val_loss=0.483, val_top1_acc=0.816, val_d_prime=3.290, val_aucroc=0.966, val_mAP=0.869]
Validating: 0it [00:00, ?it/s][A
  x = self.activation(x)
Epoch 62: 100% 7/7 [00:00<00:00, 66.13it/s, loss=0.161, v_num=3, val_loss=0.488, val_top1_acc=0.813, val_d_prime=3.300, val_aucroc=0.964, val_mAP=0.862]
Epoch 65:   0% 0/7 [00:00<?, ?it/s, loss=0.1

## Results

In [5]:
import json
results = json.load(open("embeddings/hearbaseline.wav2vec2/mridangam_tonic-v1.5-full/test.predicted-scores.json"))
results["aggregated_scores"]

{'epoch_mean': 271.2,
 'epoch_std': 151.79822133345306,
 'test_aucroc_mean': 0.9673808788134807,
 'test_aucroc_std': 0.006372859218694436,
 'test_d_prime_mean': 3.246692553780611,
 'test_d_prime_std': 0.19468979532149563,
 'test_loss_mean': 0.532979679107666,
 'test_loss_std': 0.07979386450979473,
 'test_mAP_mean': 0.8763554420794701,
 'test_mAP_std': 0.025752593878212687,
 'test_score_mean': 0.8304423451423645,
 'test_score_std': 0.021899563266157787,
 'test_top1_acc_mean': 0.8304423451423645,
 'test_top1_acc_std': 0.021899563266157787,
 'time_in_min_mean': 0.4922747866312663,
 'time_in_min_std': 0.19352015766160782,
 'validation_score_mean': 0.8358903646469116,
 'validation_score_std': 0.015872782760605525}

In [6]:
results["aggregated_scores"]['test_score_mean']

0.8304423451423645