# wav2vec-u CV-sv - GAN
> "GAN training for wav2vec-u on Common Voice Swedish"

- toc: false
- branch: master
- badges: true
- comments: true
- categories: [kaggle, colab, wav2vec-u]

The original attempt on [Kaggle](https://www.kaggle.com/jimregan/wav2vec-u-cv-swedish-gan) won't run because of an issue with CuDNN, but this notebook runs fine on Colab.

## Preparation

In [None]:
!pip install condacolab

Collecting condacolab
  Downloading https://files.pythonhosted.org/packages/ee/47/6f9fe13087c31aba889c4b09f9beaa558bf216bf9108c9ccef44e6c9dcfe/condacolab-0.1.2-py3-none-any.whl
Installing collected packages: condacolab
Successfully installed condacolab-0.1.2


In [None]:
import condacolab
condacolab.install()

⏬ Downloading https://github.com/jaimergp/miniforge/releases/latest/download/Mambaforge-colab-Linux-x86_64.sh...
📦 Installing...
📌 Adjusting configuration...
🩹 Patching environment...
⏲ Done in 0:00:34
🔁 Restarting kernel...


In [None]:
!conda install -c pykaldi pykaldi -y

In [None]:
!git clone https://github.com/jimregan/fairseq/ --branch issue3581

Cloning into 'fairseq'...
remote: Enumerating objects: 28296, done.[K
remote: Total 28296 (delta 0), reused 0 (delta 0), pack-reused 28296[K
Receiving objects: 100% (28296/28296), 11.77 MiB | 16.71 MiB/s, done.
Resolving deltas: 100% (21291/21291), done.


In [None]:
!git clone https://github.com/kpu/kenlm

Cloning into 'kenlm'...
remote: Enumerating objects: 13824, done.[K
remote: Counting objects: 100% (137/137), done.[K
remote: Compressing objects: 100% (79/79), done.[K
remote: Total 13824 (delta 76), reused 92 (delta 45), pack-reused 13687[K
Receiving objects: 100% (13824/13824), 5.49 MiB | 11.12 MiB/s, done.
Resolving deltas: 100% (7956/7956), done.


In [None]:
%%capture
!apt-get -y install libeigen3-dev liblzma-dev zlib1g-dev libbz2-dev

In [None]:
%cd kenlm
!mkdir build
%cd build
!cmake ..
!make -j 4
%cd /tmp

In [None]:
%cd /content/kenlm
!python setup.py install
%cd /tmp

In [None]:
import os
os.environ['PATH'] = f"{os.environ['PATH']}:/content/kenlm/build/bin/"
os.environ['FAIRSEQ_ROOT'] = '/content/fairseq'

In [None]:
%cd /content/fairseq/

/content/fairseq


In [None]:
!python setup.py install

In [None]:
os.environ['HYDRA_FULL_ERROR'] = '1'

In [11]:
%%capture
!pip install editdistance

https://colab.research.google.com/github/corrieann/kaggle/blob/master/kaggle_api_in_colab.ipynb

In [None]:
!pip install kaggle

In [13]:
from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))
  
# Then move kaggle.json into the folder where the API expects to find it.
!mkdir -p ~/.kaggle/ && mv kaggle.json ~/.kaggle/ && chmod 600 ~/.kaggle/kaggle.json

Saving kaggle.json to kaggle.json
User uploaded file "kaggle.json" with length 64 bytes


In [15]:
%cd /content

/content


In [17]:
!kaggle datasets download "jimregan/w2vu-cvsv-prepared-text"

Downloading w2vu-cvsv-prepared-text.zip to /content
 29% 5.00M/17.4M [00:00<00:00, 31.7MB/s]
100% 17.4M/17.4M [00:00<00:00, 75.8MB/s]


In [None]:
!unzip /content/w2vu-cvsv-prepared-text.zip

In [24]:
!kaggle datasets download -d jimregan/w2vu-cvsv-precompute-pca512-cls128-mean-pooled

Downloading w2vu-cvsv-precompute-pca512-cls128-mean-pooled.zip to /content
 94% 369M/394M [00:03<00:00, 122MB/s]
100% 394M/394M [00:03<00:00, 120MB/s]


In [None]:
!unzip w2vu-cvsv-precompute-pca512-cls128-mean-pooled.zip

In [26]:
!rm *.zip

## GAN

In [29]:
import torch
torch.version.cuda

'10.1'

In [30]:
torch.backends.cudnn.version()

7603

In [31]:
%cd /content/fairseq

/content/fairseq


In [32]:
%%writefile rungan.sh
PREFIX=w2v_unsup_gan_xp
#TASK_DATA=/path/to/features/unfiltered/precompute_unfiltered_pca512_cls128_mean_pooled
TASK_DATA=/content/precompute_pca512_cls128_mean_pooled
#TEXT_DATA=/path/to/data  # path to fairseq-preprocessed GAN data
TEXT_DATA=/content/preppedtext/phones/
#KENLM_PATH=/path/to/data/kenlm.phn.o4.bin  # KenLM 4-gram phoneme language model (LM data = GAN data here)
KENLM_PATH=/content/preppedtext/phones/lm.phones.filtered.04.bin

PREFIX=$PREFIX CUDA_LAUNCH_BLOCKING=1 fairseq-hydra-train \
	-m --config-dir fairseq/config/model/wav2vecu/gan \
	--config-name w2vu \
	task.data=${TASK_DATA} \
	task.text_data=${TEXT_DATA} \
	task.kenlm_path=${KENLM_PATH} \
	checkpoint.no_epoch_checkpoints=false \
	'common.seed=range(0,5)'

Writing rungan.sh


In [None]:
!bash rungan.sh

[2021-06-03 21:49:40,282][fairseq.tasks.unpaired_audio_text][INFO] - REF: ɛ n f œ ʂ ə n a d ɵ ʂ ə k t f øː r d eː t s ɔ m h ɛ n d ə p oː ɕ œ r k ɔ n s ɛ t ə n
[2021-06-03 21:49:40,286][fairseq.tasks.unpaired_audio_text][INFO] - HYP: s oː ɵ s ɵ yː f øː r m yː ʃ ɕ ɵ ʃ yː s ʃ yː v a k ɵ l a k ɔ m ə a s ɡ v uː l v a r d ə ɕ uː tː f øː d ə m ə oː
[2021-06-03 21:49:40,295][fairseq.tasks.unpaired_audio_text][INFO] - LM [REF]: -53.44462585449219, 0.05339602260269112
[2021-06-03 21:49:40,295][fairseq.tasks.unpaired_audio_text][INFO] - LM [HYP]: -97.8524169921875, 0.012059384906545269
[2021-06-03 21:49:40,917][valid][INFO] - {"epoch": 601, "valid_loss": "1.003", "valid_ntokens": "3039.79", "valid_nsentences": "144.214", "valid_lm_score_sum": "-90380.3", "valid_num_pred_chars": "48062", "valid_vocab_seen_pct": "0.881533", "valid_uer": "100.282", "valid_weighted_lm_ppl": "82.0741", "valid_lm_ppl": "63.7798", "valid_wps": "16767.1", "valid_wpb": "3039.8", "valid_bsz": "144.2", "valid_num_updates": 