<a href="https://www.kaggle.com/code/ishandutta/audio-ml-tutorial-part-1-time-domain-features-w-b?scriptVersionId=121597097" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

<h1><center>Audio ML Tutorial Part-1: Time Domain Features</center></h1>
                                                      
<center><img src = "https://i.natgeofe.com/n/a189dd67-bc78-4716-aa29-cdbaceb5e4d0/photo-ark-parrots-endangered-bird-world-intelligence-3_3x2.jpg" width = "750" height = "500"/></center>                                                                          

<div class="list-group" id="list-tab" role="tablist">
<h3 class="list-group-item list-group-item-action active" data-toggle="list" style='background:maroon; border:0; color:white' role="tab" aria-controls="home"><center>If you find this notebook useful, do give me an upvote, it helps to keep up my motivation. This notebook will be updated frequently so keep checking for furthur developments.</center></h3>

<h2 class="list-group-item list-group-item-action active" data-toggle="list" style='background:orange; border:0; color:white' role="tab" aria-controls="home"><center>Contents</center></h2>

<ul style="list-style-type:square">
    <li><a href="#1">Preliminaries</a></li>
    <li><a href="#2">Global Config</a></li>
    <li><a href="#3">Load Datasets</a></li>
    <li><a href="#4">Weights and Biases</a></li>
    <li><a href="#5">Basic Analysis</a></li>
    <li><a href="#6">Time Domain Features</a></li>
</ul>



<a id="1"></a>
<h2 class="list-group-item list-group-item-action active" data-toggle="list" style='background:orange; border:0; color:white' role="tab" aria-controls="home"><center>Preliminaries</center></h2>

In [None]:
#### Basics

import os
import sklearn
import numpy as np 
import pandas as pd 
from tqdm import tqdm

#### PyTorch
import torch
import torchaudio

#### Data Visualization
import seaborn as sns
import plotly.express as px 
import IPython.display as ipd 
import matplotlib.pyplot as plt
%matplotlib inline 

#### Librosa
import librosa 
import librosa.display 

#### Aesthetics
import warnings 
warnings.filterwarnings("ignore")

#### Logging
import wandb

<a id="2"></a>
<div class="list-group" id="list-tab" role="tablist">
<h2 class="list-group-item list-group-item-action active" data-toggle="list" style='background:orange; border:0; color:white' role="tab" aria-controls="home"><center>Global Config</center></h2>

In [None]:
config = {"competition": "BirdCLEF2023",
          "_wandb_kernel": "ishandutta",
          "sample_rate": 32000
          }

<a id="3"></a>
<div class="list-group" id="list-tab" role="tablist">
<h2 class="list-group-item list-group-item-action active" data-toggle="list" style='background:orange; border:0; color:white' role="tab" aria-controls="home"><center>Load Datasets</center></h2>

In [None]:
df = pd.read_csv("/kaggle/input/birdclef-2023/train_metadata.csv")
df.head()

<a id="4"></a>
<div class="list-group" id="list-tab" role="tablist">
<h2 class="list-group-item list-group-item-action active" data-toggle="list" style='background:orange; border:0; color:white' role="tab" aria-controls="home"><center>Weights and Biases</center></h2>

<center><img src = "https://i.imgur.com/1sm6x8P.png" width = "750" height = "500"/></center>        

**Weights & Biases** is the machine learning platform for developers to build better models faster.

You can use W&B's lightweight, interoperable tools to

- quickly track experiments,
- version and iterate on datasets,
- evaluate model performance,
- reproduce models,
- visualize results and spot regressions,
- and share findings with colleagues.
  
Set up W&B in 5 minutes, then quickly iterate on your machine learning pipeline with the confidence that your datasets and models are tracked and versioned in a reliable system of record.

In this notebook I will use Weights and Biases's amazing features to perform wonderful visualizations and logging seamlessly.

In [None]:
# Initialise the Run

run = wandb.init(project=config['competition'], job_type='Visualization', name='BirdCLEF 2023 Audio')

In [None]:
# Here is a minimal example on how you can log the audio to your wandb dashboard 

# Initialise a table with the column names
table = wandb.Table(columns=['Audio Sample', 'Primary Label'])

# For simplicity I have selected 100 rows only, you can have the entire dataframe
minimal_df = df.sample(100).reset_index(drop=True)

# Log the data to table
for i in tqdm(range(len(minimal_df))):
    row = minimal_df.loc[i]
    audio = wandb.Audio(row.full_path, sample_rate=config['sample_rate'])
    table.add_data(audio, row.primary_label)

wandb.log({'BirdCLEF 2023 Audio': table})

# Finish the run
run.finish()

### [Interactive W&B Table for Audio $\rightarrow$](https://wandb.ai/ishandutta/BirdCLEF2023/runs/4rvhdvlv?workspace=user-ishandutta)

<center><img src = "https://i.ibb.co/PzCngK6/Screenshot-2.png" width = "1000" height = "750"/></center>       

<a id="5"></a>
<div class="list-group" id="list-tab" role="tablist">
<h2 class="list-group-item list-group-item-action active" data-toggle="list" style='background:orange; border:0; color:white' role="tab" aria-controls="home"><center>Basic Analysis</center></h2>

## **<span style="color:orange;">Get Audio Samples</span>** 

In [None]:
df.primary_label.nunique()

In [None]:
# df.primary_label.unique()

In [None]:
def get_audio_sample(df: pd.DataFrame, bird_label: str):
    """
    Function to get the audio sample for a bird using it's label
    
    Args:
        df (pandas.DataFrame): DataFrame with the metadata
        birl_label (str): Label of the bird for which audio sample is required
    """
    AUDIO_DIR_PATH = "/kaggle/input/birdclef-2023/train_audio"

    df['full_path'] = AUDIO_DIR_PATH  + '/' + df['filename']
    
    return df[df['primary_label'] == bird_label].sample(1, random_state = 42)['full_path'].values[0]

In [None]:
abethr1_file = get_audio_sample(df, 'abethr1')
blcapa2_file = get_audio_sample(df, 'blcapa2')
chibat1_file = get_audio_sample(df, 'chibat1')
dotbar1_file = get_audio_sample(df, 'dotbar1')

In [None]:
ipd.Audio(abethr1_file)

In [None]:
ipd.Audio(blcapa2_file)

In [None]:
ipd.Audio(chibat1_file)

In [None]:
ipd.Audio(dotbar1_file)

---

> ### **<span style="color:orange;">This notebook is inspired by the [Audio Signal Processing For Machine Learning Series](https://www.youtube.com/playlist?list=PL-wATfeyAMNqIee7cH3q1bh4QJFAaeNv0) by Valerio Velardo.  </span>** 

## **<span style="color:orange;">Load Audio Signal</span>** 

In [None]:
# Using librosa we will load these audio files
# These files are downsampled at 32Khz, which we pass
# to the sr argument of librosa.load or we can pass None

abethr1, sr = librosa.load(abethr1_file, sr=None)
blcapa2, sr = librosa.load(blcapa2_file, sr=None)
chibat1, sr = librosa.load(chibat1_file, sr=None)
dotbar1, sr = librosa.load(dotbar1_file, sr=None)

In [None]:
abethr1_file

In [None]:
abethr1.size

In [None]:
# Tip: If you want to load 5 seconds of a file 
# which starts from 10 seconds use offset and duration parameters

abethr1_5, sr = librosa.load(abethr1_file, sr=None, offset=15.0, duration=5.0)
abethr1_5.size

In [None]:
# For now let us focus on abethr1
print(abethr1)

In [None]:
# Every element of this array is actually a sample of the audio
# The value corresponding to that audio sample is the amplitude w.r.t that sample

print(f"abethr1 has a total of {len(abethr1)} samples")

## **<span style="color:orange;">Calculate Audio Duration</span>** 

In [None]:
# To find the duration of 1 sample, we take inverse of sampling rate
sample_duration = 1/sr
print(f"Duration of 1 sample of audio: {sample_duration:.6f} seconds")

In [None]:
# Now via ipd.Audio we already saw that the duration of abethr1 is 19 secs
# We can also calculate it as

duration = sample_duration * len(abethr1)
print(f"Duration of Audio Signal: {duration} seconds")

## **<span style="color:orange;">Waveforms</span>** 

In [None]:
plt.figure(figsize=(17, 15))

birds = {
    'abethr1': abethr1, 
    'blcapa2': blcapa2, 
    'chibat1': chibat1, 
    'dotbar1': dotbar1}

for i, (bird_name, bird_arr) in enumerate(birds.items()):
    plt.subplot(2, 2, i+1)
    # Librosa has an inbuilt function to display the plot directly
    librosa.display.waveshow(bird_arr)
    plt.title(str(bird_name))
    plt.ylim((-1,1))

As an exercise, listen to the audio signals again and focus on the portions where the waveforms show spikes.  
Are you able to listen the spikes in the audio?
  
Another interesting thing to note is that the `blcapa2` audio is a lot more soothing and has observable patterns than the `dotbar1`. Can we say that the audio for `blcapa2` is a song and that for `dotbar1` is not?

Let us find out!

In [None]:
def get_audio_type(df: pd.DataFrame, full_audio_path: str):
    """
    Function to get the audio type from the audio path
    
    Args:
        df (pd.DataFrame): Metadata for the audio signals
        full_audio_path (str): File path for the audio signal
        
    Returns:
        type of audio signal
    """
    
    return df[df['full_path'] == full_audio_path]['type'].values[0]

In [None]:
print(f"Type of Audio for blcapa2: {get_audio_type(df, blcapa2_file)}")
print(f"Type of Audio for dotbar1: {get_audio_type(df, dotbar1_file)}")

Thus our thought process was correct!

---

<a id="6"></a>
<div class="list-group" id="list-tab" role="tablist">
<h2 class="list-group-item list-group-item-action active" data-toggle="list" style='background:orange; border:0; color:white' role="tab" aria-controls="home"><center>Time Domain Features</center></h2>

On a high level this includes three main features:
1. Amplitude Envelope (AE)
2. Root Mean Square Energy (RMS)
3. Zero Crossing Rate (RMS)

## **<span style="color:orange;">Amplitude Envelope (AE)</span>** 

Amplitude Envelope gives us the envelope (boundaries) of the sound. What is the boundary for a single frame? It is it's maximum amplitude.
  
Hence, the aim of amplitude envelope is to get the maximum amplitude for each frame. But why is this helpful?
  
Intuitively, the amplitude of an audio signal is indicative of how loud the audio is or simply what is it's volume. 
  
To obtain the AE, we split the audio signal into multiple windows, each having the same size. Then for each of the windows we find the maximum amplitude amongst the constituent frames in it. 

An interesting application of AE is for onset detection, or the detection of the beginning of a sound. 

## **<span style="color:orange;">Root Mean Square Energy (RMS)</span>** 

One key problem with the Amplitude Envelope is that it is very sensitive to outliers. To tackle this we have another time domain feature called the Root Mean Squared Energy. It is simply the Root Mean Squared of all the samples in a frame. 
  
This is an indicator of loudness as well but is much less sensitive to outliers as compared to Amplitude Envelope.

<h1><center>More Plots coming soon!</center></h1>
                                                      
<center><img src = "https://static.wixstatic.com/media/5f8fae_7581e21a24a1483085024f88b0949a9d~mv2.jpg/v1/fill/w_934,h_379,al_c,q_90/5f8fae_7581e21a24a1483085024f88b0949a9d~mv2.jpg" width = "750" height = "500"/></center> 

<div class="list-group" id="list-tab" role="tablist">
<h3 class="list-group-item list-group-item-action active" data-toggle="list" style='background:maroon; border:0; color:white' role="tab" aria-controls="home"><center>If you find this notebook useful, do give me an upvote, it helps to keep up my motivation. This notebook will be updated frequently so keep checking for furthur developments.</center></h3>