<a href="https://colab.research.google.com/github/harveenchadha/bol/blob/main/demos/robus/evaluate_model_hf.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Evaluation Code 1: HuggingFace Original Eval Script

In [1]:
!pip -q install datasets transformers jiwer
!huggingface-cli login
!wget https://raw.githubusercontent.com/huggingface/transformers/master/examples/research_projects/robust-speech-event/eval.py
!sed '89s/.*/        batch["prediction"] = prediction["text"].replace("<s>","")/' eval.py > eval_test.py


        _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
        _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
        _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
        _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
        _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

        To login, `huggingface_hub` now requires a token generated from https://huggingface.co/settings/token.
        (Deprecated, will be removed in v0.3.0) To login with username and password instead, interrupt with Ctrl+C.
        
Token: 
Login successful
Your token has been saved to /root/.huggingface/token
[1m[31mAuthenticated through git-credential store but this isn't the helper defined on you

In [2]:

!python eval_test.py --model_id Harveenchadha/vakyansh-wav2vec2-hindi-him-4200 --dataset mozilla-foundation/common_voice_7_0 --config hi --split test --log_outputs

Reusing dataset common_voice (/root/.cache/huggingface/datasets/mozilla-foundation___common_voice/hi/7.0.0/33e08856cfa0d0665e837bcad73ffd920a0bc713ce8c5fffb55dbdf1c084d5ba)
100% 10/10 [00:09<00:00,  1.05ex/s]
WER: 0.3
CER: 0.09818181818181818
100% 10/10 [00:00<00:00, 10005.50ex/s]


In [3]:
!python eval_test.py --model_id Harveenchadha/vakyansh-wav2vec2-urdu-urm-60 --dataset mozilla-foundation/common_voice_7_0 --config ur --split test --log_outputs

Reusing dataset common_voice (/root/.cache/huggingface/datasets/mozilla-foundation___common_voice/ur/7.0.0/33e08856cfa0d0665e837bcad73ffd920a0bc713ce8c5fffb55dbdf1c084d5ba)
100% 10/10 [00:09<00:00,  1.04ex/s]
WER: 0.2948717948717949
CER: 0.10119047619047619
100% 10/10 [00:00<00:00, 10094.59ex/s]


##Evaluation Code 2: Manually Calculating without Pipeline

In [4]:
import torch
import torchaudio
from datasets import load_dataset, load_metric,  Audio
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
import re
wer = load_metric("wer")
cer = load_metric("cer")
chars_to_ignore_regex = '[,?.!\-\;\:"“%‘”�—’…–]'


def load_data(dataset_id, language, split='test'):
    test_dataset = load_dataset(dataset_id, language, split=split, use_auth_token=True)
    test_dataset = test_dataset.cast_column("audio", Audio(sampling_rate=16_000))
    return test_dataset

def speech_file_to_array_fn(batch):
    batch["sentence"] = re.sub(chars_to_ignore_regex, '', batch["sentence"]).lower()
    batch["speech"] = batch["audio"]["array"]
    return batch

def load_model(model_id):
    processor = Wav2Vec2Processor.from_pretrained(model_id)
    model = Wav2Vec2ForCTC.from_pretrained(model_id)
    
    return processor, model


def evaluate(batch):
    processor, model = load_model(model_id)
    model.to('cuda')
    inputs = processor(batch["speech"], sampling_rate=16_000, return_tensors="pt", padding=True)
    with torch.no_grad():
        logits = model(inputs.input_values.to('cuda')).logits

    pred_ids = torch.argmax(logits, dim=-1)
    batch["pred_strings"] = processor.batch_decode(pred_ids, skip_special_tokens=True)
    return batch



In [5]:
dataset_id = "mozilla-foundation/common_voice_7_0"
language="hi"
split="test"
model_id = 'Harveenchadha/vakyansh-wav2vec2-hindi-him-4200'

test_dataset = load_data(dataset_id, language, split)
test_dataset = test_dataset.map(speech_file_to_array_fn)

result = test_dataset.map(evaluate, batched=True, batch_size=64)

print("WER: {:2f}".format(wer.compute(predictions=result["pred_strings"], references=result["sentence"])))
print("CER: {:2f}".format(cer.compute(predictions=result["pred_strings"], references=result["sentence"])))

Reusing dataset common_voice (/root/.cache/huggingface/datasets/mozilla-foundation___common_voice/hi/7.0.0/33e08856cfa0d0665e837bcad73ffd920a0bc713ce8c5fffb55dbdf1c084d5ba)
Loading cached processed dataset at /root/.cache/huggingface/datasets/mozilla-foundation___common_voice/hi/7.0.0/33e08856cfa0d0665e837bcad73ffd920a0bc713ce8c5fffb55dbdf1c084d5ba/cache-a6eeff86882f324d.arrow
Loading cached processed dataset at /root/.cache/huggingface/datasets/mozilla-foundation___common_voice/hi/7.0.0/33e08856cfa0d0665e837bcad73ffd920a0bc713ce8c5fffb55dbdf1c084d5ba/cache-a6b421e9280caf7d.arrow


WER: 0.590176
CER: 0.328442


In [6]:
del test_dataset
del result

torch.cuda.empty_cache()
import gc
gc.collect()

1584

In [7]:
dataset_id = "mozilla-foundation/common_voice_7_0"
language="ur"
split="test"
model_id = 'Harveenchadha/vakyansh-wav2vec2-urdu-urm-60'

test_dataset = load_data(dataset_id, language)
test_dataset = test_dataset.map(speech_file_to_array_fn)
result = test_dataset.map(evaluate, batched=True, batch_size=64)

print("WER: {:2f}".format(wer.compute(predictions=result["pred_strings"], references=result["sentence"])))
print("CER: {:2f}".format(cer.compute(predictions=result["pred_strings"], references=result["sentence"])))

Reusing dataset common_voice (/root/.cache/huggingface/datasets/mozilla-foundation___common_voice/ur/7.0.0/33e08856cfa0d0665e837bcad73ffd920a0bc713ce8c5fffb55dbdf1c084d5ba)


  0%|          | 0/142 [00:00<?, ?ex/s]

  0%|          | 0/3 [00:00<?, ?ba/s]

WER: 0.480384
CER: 0.260955
