## Inferencing and Peformance Analysis

In this notebook:

1. we will do inferencing on a few test dataset.
2. Analyse the performance metrics for best performing model
    
Audio Classification performance metrices are:

1. Accuracy
2. Precision
3. Recall
4. f1 (harmonic mean of Precision and Recall)

### Inferencing

In [1]:
from transformers import pipeline
from pathlib import Path
import numpy as np
import librosa
import IPython
from IPython.display import Audio
from typing import Union

SAMPLE_RATE = 16000 # model needs the audio to be sampled with 16K sample_rate

The best performing model is: https://huggingface.co/MuhammadIqbalBazmi/wav2vec2-base-intent-classification-ori

##### Loading model

In [2]:
hub_model_id = "MuhammadIqbalBazmi/wav2vec2-base-intent-classification-ori"
model = pipeline("audio-classification", model=hub_model_id)

Writing inference function

In [3]:
def intent_inference(audio: Union[str, np.ndarray]):
    """this method returns an inference result on the given audio path"""
    print("Listen the audio file")
    if isinstance(audio, str):
        IPython.display.display(Audio(audio))
        audio, _ = librosa.load(audio, sr=SAMPLE_RATE)
    elif isinstance(audio, np.ndarray):
        IPython.display.display(Audio(audio, rate=SAMPLE_RATE))
    else:
        raise ValueError("audio should be either Path or np.ndarray")
    return model(audio)

##### loading dataset

In [4]:
from datasets import load_dataset
audio_dataset = load_dataset("MuhammadIqbalBazmi/intent-dataset")
audio_dataset

Using custom data configuration MuhammadIqbalBazmi--intent-dataset-3a947b6d9cc17a8c
Found cached dataset parquet (C:/Users/modassir/.cache/huggingface/datasets/MuhammadIqbalBazmi___parquet/MuhammadIqbalBazmi--intent-dataset-3a947b6d9cc17a8c/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)


  0%|          | 0/2 [00:00<?, ?it/s]

DatasetDict({
    train: Dataset({
        features: ['audio', 'label'],
        num_rows: 112
    })
    test: Dataset({
        features: ['audio', 'label'],
        num_rows: 48
    })
})

In [5]:
# audio sample
audio_dataset["test"][0]

{'audio': {'path': None,
  'array': array([ 0.00000000e+00,  6.10351562e-05, -2.74658203e-04, ...,
         -1.94091797e-02, -1.82495117e-02, -1.73645020e-02]),
  'sampling_rate': 16000},
 'label': 5}

##### Creating `id2label` and `label2id` dictionary mapping

In [6]:
label2id, id2label = dict(), dict()
labels = audio_dataset["train"].features["label"].names
for i, label in enumerate(labels):
    label2id[label] = str(i)
    id2label[str(i)] = label

id2label["5"]

'casual_talk_goodbye'

##### script to get inference on test dataset

In [15]:
# Inference on the test dataset
for item in audio_dataset["test"]:
    audio_array = item["audio"]["array"]
    label = id2label[str(item["label"])] # in the dataset, label is number(int)

    print("-"*40)
    prediction = intent_inference(audio_array)
    print(f"original label: {label}")
    k = 1 # get top-k values, k<=5
    print(f"prediction (top-{k}): {prediction[:k]}")

----------------------------------------
Listen the audio file


original label: casual_talk_goodbye
prediction (top-1): [{'score': 0.6152533888816833, 'label': 'casual_talk_goodbye'}]
----------------------------------------
Listen the audio file


original label: Locate_Dealer
prediction (top-1): [{'score': 0.9812334775924683, 'label': 'Locate_Dealer'}]
----------------------------------------
Listen the audio file


original label: casual_talk_greeting
prediction (top-1): [{'score': 0.9480897188186646, 'label': 'casual_talk_greeting'}]
----------------------------------------
Listen the audio file


original label: Running_operating_cost
prediction (top-1): [{'score': 0.9856768846511841, 'label': 'Running_operating_cost'}]
----------------------------------------
Listen the audio file


original label: bike_modes
prediction (top-1): [{'score': 0.9962104558944702, 'label': 'bike_modes'}]
----------------------------------------
Listen the audio file


original label: bike_modes
prediction (top-1): [{'score': 0.9962143301963806, 'label': 'bike_modes'}]
----------------------------------------
Listen the audio file


original label: bike_modes
prediction (top-1): [{'score': 0.9962189793586731, 'label': 'bike_modes'}]
----------------------------------------
Listen the audio file


original label: bike_modes
prediction (top-1): [{'score': 0.9962069988250732, 'label': 'bike_modes'}]
----------------------------------------
Listen the audio file


original label: Top_speed
prediction (top-1): [{'score': 0.9746355414390564, 'label': 'Top_speed'}]
----------------------------------------
Listen the audio file


original label: About_iQube
prediction (top-1): [{'score': 0.9760268330574036, 'label': 'About_iQube'}]
----------------------------------------
Listen the audio file


original label: casual_talk_greeting
prediction (top-1): [{'score': 0.7842295169830322, 'label': 'bike_modes'}]
----------------------------------------
Listen the audio file


original label: About_iQube
prediction (top-1): [{'score': 0.9760189056396484, 'label': 'About_iQube'}]
----------------------------------------
Listen the audio file


original label: bike_modes
prediction (top-1): [{'score': 0.9962154030799866, 'label': 'bike_modes'}]
----------------------------------------
Listen the audio file


original label: Running_operating_cost
prediction (top-1): [{'score': 0.975145161151886, 'label': 'Running_operating_cost'}]
----------------------------------------
Listen the audio file


original label: battery
prediction (top-1): [{'score': 0.9761717319488525, 'label': 'battery'}]
----------------------------------------
Listen the audio file


original label: casual_talk_greeting
prediction (top-1): [{'score': 0.9727259874343872, 'label': 'casual_talk_greeting'}]
----------------------------------------
Listen the audio file


original label: Running_operating_cost
prediction (top-1): [{'score': 0.9857215881347656, 'label': 'Running_operating_cost'}]
----------------------------------------
Listen the audio file


original label: casual_talk_goodbye
prediction (top-1): [{'score': 0.8253825306892395, 'label': 'casual_talk_goodbye'}]
----------------------------------------
Listen the audio file


original label: bike_modes
prediction (top-1): [{'score': 0.9962040781974792, 'label': 'bike_modes'}]
----------------------------------------
Listen the audio file


original label: bike_modes
prediction (top-1): [{'score': 0.9962033629417419, 'label': 'bike_modes'}]
----------------------------------------
Listen the audio file


original label: Locate_Dealer
prediction (top-1): [{'score': 0.9629424810409546, 'label': 'Locate_Dealer'}]
----------------------------------------
Listen the audio file


original label: casual_talk_goodbye
prediction (top-1): [{'score': 0.9635090231895447, 'label': 'casual_talk_goodbye'}]
----------------------------------------
Listen the audio file


original label: bike_modes
prediction (top-1): [{'score': 0.9962097406387329, 'label': 'bike_modes'}]
----------------------------------------
Listen the audio file


original label: book_now
prediction (top-1): [{'score': 0.9822916984558105, 'label': 'book_now'}]
----------------------------------------
Listen the audio file


original label: casual_talk_greeting
prediction (top-1): [{'score': 0.9727118015289307, 'label': 'casual_talk_greeting'}]
----------------------------------------
Listen the audio file


original label: bike_modes
prediction (top-1): [{'score': 0.9961887001991272, 'label': 'bike_modes'}]
----------------------------------------
Listen the audio file


original label: book_now
prediction (top-1): [{'score': 0.9831183552742004, 'label': 'book_now'}]
----------------------------------------
Listen the audio file


original label: battery
prediction (top-1): [{'score': 0.9766289591789246, 'label': 'battery'}]
----------------------------------------
Listen the audio file


original label: casual_talk_greeting
prediction (top-1): [{'score': 0.9722917079925537, 'label': 'casual_talk_greeting'}]
----------------------------------------
Listen the audio file


original label: Locate_Dealer
prediction (top-1): [{'score': 0.825822651386261, 'label': 'Locate_Dealer'}]
----------------------------------------
Listen the audio file


original label: battery
prediction (top-1): [{'score': 0.9730747938156128, 'label': 'battery'}]
----------------------------------------
Listen the audio file


original label: bike_modes
prediction (top-1): [{'score': 0.996216356754303, 'label': 'bike_modes'}]
----------------------------------------
Listen the audio file


original label: Top_speed
prediction (top-1): [{'score': 0.9851453304290771, 'label': 'Top_speed'}]
----------------------------------------
Listen the audio file


original label: bike_modes
prediction (top-1): [{'score': 0.9962186217308044, 'label': 'bike_modes'}]
----------------------------------------
Listen the audio file


original label: Top_speed
prediction (top-1): [{'score': 0.9850050210952759, 'label': 'Top_speed'}]
----------------------------------------
Listen the audio file


original label: casual_talk_goodbye
prediction (top-1): [{'score': 0.9410508275032043, 'label': 'casual_talk_goodbye'}]
----------------------------------------
Listen the audio file


original label: book_now
prediction (top-1): [{'score': 0.9396755695343018, 'label': 'book_now'}]
----------------------------------------
Listen the audio file


original label: bike_modes
prediction (top-1): [{'score': 0.9962210059165955, 'label': 'bike_modes'}]
----------------------------------------
Listen the audio file


original label: book_now
prediction (top-1): [{'score': 0.9644947052001953, 'label': 'About_iQube'}]
----------------------------------------
Listen the audio file


original label: bike_modes
prediction (top-1): [{'score': 0.9962226152420044, 'label': 'bike_modes'}]
----------------------------------------
Listen the audio file


original label: Running_operating_cost
prediction (top-1): [{'score': 0.9297416806221008, 'label': 'Running_operating_cost'}]
----------------------------------------
Listen the audio file


original label: About_iQube
prediction (top-1): [{'score': 0.9760263562202454, 'label': 'About_iQube'}]
----------------------------------------
Listen the audio file


original label: Running_operating_cost
prediction (top-1): [{'score': 0.995050847530365, 'label': 'bike_modes'}]
----------------------------------------
Listen the audio file


original label: Locate_Dealer
prediction (top-1): [{'score': 0.9830295443534851, 'label': 'book_now'}]
----------------------------------------
Listen the audio file


original label: battery
prediction (top-1): [{'score': 0.973327100276947, 'label': 'battery'}]
----------------------------------------
Listen the audio file


original label: book_now
prediction (top-1): [{'score': 0.9818295240402222, 'label': 'book_now'}]
----------------------------------------
Listen the audio file


original label: Top_speed
prediction (top-1): [{'score': 0.9549031257629395, 'label': 'Top_speed'}]
----------------------------------------
Listen the audio file


original label: Top_speed
prediction (top-1): [{'score': 0.963047981262207, 'label': 'Top_speed'}]


I think the model is performing well. But, we will also analyse the metrics like Precision, Recall, and f1-metrics for the test dataset.

In [7]:
audio = audio_dataset["test"][0]["audio"]["array"]

### Performance 

Will look into the performance of the best model

In [8]:
from sklearn.metrics import classification_report, confusion_matrix
predictions = []
references = []
for item in audio_dataset["test"]:
    audio, label = item["audio"]["array"], item["label"]
    references.append(label)
    pred = model(audio)
    predictions.append(int(label2id[pred[0]["label"]]))
print("------- Classification Report -------")
print(classification_report(references, predictions))
print("------- Confusion Matrix ------------")
print(confusion_matrix(references, predictions))

------- Classification Report -------
              precision    recall  f1-score   support

           0       1.00      1.00      1.00         4
           1       1.00      0.80      0.89         5
           2       1.00      0.75      0.86         4
           3       1.00      0.80      0.89         5
           4       1.00      1.00      1.00         5
           5       1.00      1.00      1.00         4
           6       0.75      1.00      0.86         3
           7       0.87      1.00      0.93        13
           8       0.80      0.80      0.80         5

    accuracy                           0.92        48
   macro avg       0.94      0.91      0.91        48
weighted avg       0.93      0.92      0.92        48

------- Confusion Matrix ------------
[[ 4  0  0  0  0  0  0  0  0]
 [ 0  4  0  0  0  0  0  1  0]
 [ 0  0  3  0  0  0  0  0  1]
 [ 0  0  0  4  0  0  0  1  0]
 [ 0  0  0  0  5  0  0  0  0]
 [ 0  0  0  0  0  4  0  0  0]
 [ 0  0  0  0  0  0  3  0  0]
 [ 0  0  

The below code is without sklearn dependency

In [15]:
import evaluate

metrics_list = ["precision", "recall","f1"]
average_list = ["micro", "macro","weighted"]

result = dict()
for metric in metrics_list:
    for average in average_list:
        print("="*30)
        print(f"metric = {metric}, average= {average}")
        loaded_metric = evaluate.load(metric)

        predictions = []
        references = []
        for item in audio_dataset["test"]:
            audio, label = item["audio"]["array"], item["label"]
            references.append(label)
            pred = model(audio)
            predictions.append(int(label2id[pred[0]["label"]]))

        score = loaded_metric.compute(predictions=predictions, references=references, average=average)
        result[f"{average}_{metric}"]=score
        print(f"Score: {score}", end="")

        

metric = precision, average= micro
Score: {'precision': 0.9166666666666666}
metric = precision, average= macro
Score: {'precision': 0.9351851851851853}
metric = precision, average= weighted
Score: {'precision': 0.9274305555555555}
metric = recall, average= micro


Downloading builder script:   0%|          | 0.00/7.36k [00:00<?, ?B/s]

Score: {'recall': 0.9166666666666666}
metric = recall, average= macro
Score: {'recall': 0.9055555555555556}
metric = recall, average= weighted
Score: {'recall': 0.9166666666666666}
metric = f1, average= micro


Downloading builder script:   0%|          | 0.00/6.77k [00:00<?, ?B/s]

Score: {'f1': 0.9166666666666666}
metric = f1, average= macro
Score: {'f1': 0.9134038800705468}
metric = f1, average= weighted
Score: {'f1': 0.915839947089947}
