# Data Analysis of Intent Classification Audio files

<pre>In this notebook we will try to understand the nature of audio files. We will also create Huggingface dataset to use it further.</pre>

### Problem Statement

Classify the the intent using audio Speech. Please don't use Speech-to-Text models.

### Mapping to Machine Learning Problem

It can be proposed as __Multi-class Audio Classification__

### Metric

We can use __micro-f1__ metrics to control both __Precision__ and __Recall__.

### Proposed Solution

We will fine-tune __Hugginface Speech Models Wav2Vec2__(both english and multilingual)




In [None]:
%%capture
!pip install datasets
!pip install transformers
!pip install librosa

In [20]:
# Import necessary libraries
from pathlib import Path

import numpy as np
import pandas as pd

import datasets
from datasets import Dataset

import warnings
warnings.filterwarnings("ignore")

In [21]:
# getting path for audios and csv
dataset_dir = Path.cwd().parent/"dataset"
meta_csv_filepath = dataset_dir/"csv"/"speech_to_intent.csv"
audio_dir = dataset_dir/"audio"

In [22]:
# load meta csv file containing intent and audio_filename
meta_csv = pd.read_csv(meta_csv_filepath)
# header of the meta csv file
meta_csv.head()

Unnamed: 0,intent,audio_file
0,casual_talk_greeting,audio_0
1,casual_talk_greeting,audio_1
2,casual_talk_greeting,audio_2
3,casual_talk_greeting,audio_3
4,casual_talk_greeting,audio_4


In [23]:
meta_csv.shape[0]

329

##### Distribution of the intent classes

In [None]:
meta_csv.intent.value_counts()

##### Listening audio files

In [None]:
# getting audio filepaths and audio labels
# for only those audio files which exists 
# Means: We are having 329 records in meta-csv but for those we are not having audio files
# So, getting the filepath and labels for existing audios only and then
# creating the dataset using these information and push it to the huggingface hub.
audio_filepaths = []
audio_labels = []

for item in meta_csv.values:
    filepath = Path(f"{audio_dir}/{item[1]}.wav")
    if filepath.exists():
        audio_filepaths.append(str(filepath))
        audio_labels.append(item[0])

In [11]:
import IPython
count=0
LIMIT = 20
for audio,label in list(zip(audio_filepaths,audio_labels)):
    if count<=LIMIT:
        print(f"index: {count}, Label: {label}")
        print(audio)
        IPython.display.display(IPython.display.Audio(audio))
        print("-"*20)
    count+=1

index: 0, Label: casual_talk_greeting
g:\Interviews\Ori\dataset\audio\audio_0.wav


--------------------
index: 1, Label: casual_talk_greeting
g:\Interviews\Ori\dataset\audio\audio_1.wav


--------------------
index: 2, Label: casual_talk_greeting
g:\Interviews\Ori\dataset\audio\audio_2.wav


--------------------
index: 3, Label: casual_talk_greeting
g:\Interviews\Ori\dataset\audio\audio_3.wav


--------------------
index: 4, Label: casual_talk_greeting
g:\Interviews\Ori\dataset\audio\audio_4.wav


--------------------
index: 5, Label: casual_talk_greeting
g:\Interviews\Ori\dataset\audio\audio_5.wav


--------------------
index: 6, Label: casual_talk_greeting
g:\Interviews\Ori\dataset\audio\audio_6.wav


--------------------
index: 7, Label: casual_talk_greeting
g:\Interviews\Ori\dataset\audio\audio_7.wav


--------------------
index: 8, Label: casual_talk_greeting
g:\Interviews\Ori\dataset\audio\audio_8.wav


--------------------
index: 9, Label: casual_talk_greeting
g:\Interviews\Ori\dataset\audio\audio_11.wav


--------------------
index: 10, Label: casual_talk_greeting
g:\Interviews\Ori\dataset\audio\audio_13.wav


--------------------
index: 11, Label: casual_talk_greeting
g:\Interviews\Ori\dataset\audio\audio_21.wav


--------------------
index: 12, Label: casual_talk_greeting
g:\Interviews\Ori\dataset\audio\audio_24.wav


--------------------
index: 13, Label: casual_talk_greeting
g:\Interviews\Ori\dataset\audio\audio_30.wav


--------------------
index: 14, Label: casual_talk_greeting
g:\Interviews\Ori\dataset\audio\audio_32.wav


--------------------
index: 15, Label: casual_talk_greeting
g:\Interviews\Ori\dataset\audio\audio_35.wav


--------------------
index: 16, Label: casual_talk_goodbye
g:\Interviews\Ori\dataset\audio\audio_41.wav


--------------------
index: 17, Label: casual_talk_goodbye
g:\Interviews\Ori\dataset\audio\audio_42.wav


--------------------
index: 18, Label: casual_talk_goodbye
g:\Interviews\Ori\dataset\audio\audio_43.wav


--------------------
index: 19, Label: casual_talk_goodbye
g:\Interviews\Ori\dataset\audio\audio_44.wav


--------------------
index: 20, Label: casual_talk_goodbye
g:\Interviews\Ori\dataset\audio\audio_45.wav


--------------------


__Observation__: 

1. The intent are multilingual. So, It is better to train Multilingual Speech model.
2. But, It is better to get more samples to get better result on Multilingual intents classification.

__Creating dataset for intent classification and push it to huggingface_hub__

In [13]:
audio_dataset = datasets.Dataset.from_dict({"audio":audio_filepaths, "label":audio_labels}).cast_column("audio", datasets.Audio(sampling_rate=16000))
audio_dataset = audio_dataset.cast_column("label", datasets.ClassLabel(
    num_classes=len(set(audio_labels)),
    names=list(set(audio_labels)),
))

Casting the dataset: 100%|██████████| 1/1 [00:00<00:00, 165.55ba/s]


In [14]:
audio_dataset

Dataset({
    features: ['audio', 'label'],
    num_rows: 160
})

In [15]:
# Split the dataset into TRAIN and TEST split
audio_dataset = audio_dataset.train_test_split(test_size=0.3, stratify_by_column="label", seed=42)

In [19]:
# see the updated audio_dataset
audio_dataset

DatasetDict({
    train: Dataset({
        features: ['audio', 'label'],
        num_rows: 112
    })
    test: Dataset({
        features: ['audio', 'label'],
        num_rows: 48
    })
})

In [16]:
# login to hugging face
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [18]:
token = "<token>" # pass the token from hugginface to push (if you haven't logged in using above cell)
audio_dataset.push_to_hub(repo_id="MuhammadIqbalBazmi/intent-dataset",token=token)

Pushing split train to the Hub.
100%|██████████| 1/1 [00:00<00:00,  3.61ba/s]
Pushing dataset shards to the dataset hub: 100%|██████████| 1/1 [00:16<00:00, 16.01s/it]
Deleting unused files from dataset repository: 100%|██████████| 1/1 [00:04<00:00,  4.61s/it]
Pushing split test to the Hub.
100%|██████████| 1/1 [00:00<00:00,  5.20ba/s]
Pushing dataset shards to the dataset hub: 100%|██████████| 1/1 [00:12<00:00, 12.08s/it]
Deleting unused files from dataset repository: 100%|██████████| 1/1 [00:04<00:00,  4.41s/it]
