# Birdclef2022 simple starter code error analysis

- This notebook includes some error analysis methods and insight from it.

- This notebook is based on inference results on validation set  from ["PyTorch Simple Starter Using only 21 classes"](https://www.kaggle.com/code/myso1987/pytorch-simple-starter-using-only-21-classes) 
- Similarly, the model also used the code from the link above.



In [None]:
import torch
import torchaudio
import IPython.display as ipd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
import pandas as pd
import ipywidgets as widgets
import json
from sklearn.preprocessing import MultiLabelBinarizer
from tqdm import tqdm
from pathlib import Path

In [None]:
bird_dir = Path("../input/birdclef-2022/")
csv_file = Path("../input/birdclef2022-val-analysis/val_infer.csv")
df = pd.read_csv(csv_file)

`csv_file` is the output of using the validation set and model of the above code.

We will manipulate the `pd.Dataframe` in a "multi hot encoding" format to easily analyze the results.

In [None]:
df.columns =['filename','TF']
#add a True/False column.

The vocab of the scored bird to be classified in 21 species

In [None]:
#21
vocab = np.array(['akiapo', 'aniani', 'apapan', 'barpet', 'crehon', 'elepai', 'ercfra', 'hawama',
                   'hawcre', 'hawgoo', 'hawhaw', 'hawpet1', 'houfin', 'iiwi', 'jabwar', 'maupar',
                   'omao', 'puaioh', 'skylar', 'warwhe1', 'yefcan'])

In [None]:
df["class"] = df.filename.apply(lambda x: x.split("_")[-1])
df['filename']=df.filename.apply(lambda x: "_".join(x.split("_")[:-1]))

In [None]:
tmp = {}

for idx, x in df.iterrows()    :    
    if x.filename not in tmp.keys():
        tmp[x.filename]={"classes":[]}
    else :
        if x.TF ==True :
            tmp[x.filename]["classes"].append(x["class"])      

multiclass = pd.DataFrame.from_dict(tmp,orient="index")            

del(tmp)   
    

In [None]:
multiclass.index.name="filename"

In [None]:
mlb = MultiLabelBinarizer()
mlb.classes = vocab
y_data = mlb.fit_transform(multiclass['classes'])

## multi hot encoded result

In [None]:
multi_hot = pd.DataFrame(y_data,columns=mlb.classes_)
multi_hot.head(10)

In [None]:
multiclass =multiclass.drop("classes",axis=1)
multiclass = multiclass.reset_index(level=0)
multiclass =multiclass.join(multi_hot)

In [None]:
multiclass.to_csv("multi_hot.csv")

In [None]:
labels = multiclass.iloc[:,1:]

In [None]:
multiclass[multiclass.filename=="skylar/XC636223_140"].iloc[:,1:].values

## Converter from `torchaudio.transfomrs`

`n_fft`, `hop_length`, and `n_mels` are different from the base code. 

These are just my preferred parameters.

In [None]:
PATH = bird_dir/Path("train_audio")
SR = 32000
SEC = 5
mel_converter = torchaudio.transforms.MelSpectrogram(sample_rate=SR,n_fft=1024,hop_length=256,n_mels=128) 
spec_converter = torchaudio.transforms.Spectrogram(n_fft=1024)
db_converter = torchaudio.transforms.AmplitudeToDB()

In [None]:
def predict_classes(x):  
    return vocab[torch.where(torch.tensor(multiclass[multiclass.filename==x].iloc[:,1:].values).squeeze())]
def len_predict(x):        
    return 1 if type(predict_classes(x))==np.str_ else len(predict_classes(x))


options =[f"{x}_{predict_classes(x)}" for x in list(multiclass.filename)] #option list for interact

In [None]:
def frequency_bin_to_hz(bin_index, sr, n_fft):
    return bin_index*(sr/n_fft)

def time_bin_to_second(bin_index, sr, n_fft, hop_size):  
    return  bin_index*hop_size/sr

def change_ytick_to_frequency(sr, n_fft):
    #if you wanna use a spectrogram, use it for changing to y_tick
    prev_yticks = plt.yticks()[0][1:-1] 
    ytick_labels= [frequency_bin_to_hz(bin_index, sr=sr, n_fft=n_fft) for bin_index in prev_yticks]
    plt.yticks(ticks=prev_yticks, labels=ytick_labels)

def change_xtick_to_seconds(sr, n_fft, hop_size):
    prev_xticks = plt.xticks()[0][1:-1]
    xtick_labels= [time_bin_to_second(bin_index, sr=sr, n_fft=n_fft, hop_size=hop_size) for bin_index in prev_xticks]
    plt.xticks(ticks=prev_xticks, labels=xtick_labels)
    
def change_wav_xticks_to_seconde(sr):
    prev_xticks = plt.xticks()[0][1:-1]
    xtick_labels= [frame/sr for frame in prev_xticks]
    plt.xticks(ticks=prev_xticks, labels=xtick_labels)

# Mel-spectrogram plotting with prediction

I prefer interactive analysis using ipywidget.

When you select an item from the dropdown menu, you can see the corresponding outputs.

The outputs are as follows.
- Full mel-spectrogram plot of ".ogg" file
    - The red box indicates that part.
- A sound with a length of 5 seconds (sound of selected item. i.e. area of the red box)
- A mel-spectrogram with a length of 5 seconds(same as above)


In [None]:
HOP_LENGTH=256
N_FFT= 1024
SR =32000
def file_select(filename:str):
    file_path = Path(PATH) /Path(filename.split("_")[0]).with_suffix(".ogg")
    start = int(filename.split("_")[1])
    filename ="_".join(filename.split("_")[:2])
    y,sr = torchaudio.load(file_path)
    clip = y[:,start:start+SR*5]
    ipd.display(ipd.Audio(clip,rate=sr))
    prediction = vocab[torch.where(torch.tensor(multiclass[multiclass.filename==filename].iloc[:,1:].values).squeeze())]
    print(f"prediction : {prediction} filename : {filename}")
    mono_clip = clip.mean(dim=0)
    mel = db_converter(mel_converter(mono_clip))    
    #mel = db_converter(spec_converter(mono_clip))
    plt.figure(figsize=(16,6),dpi=120)
    plt.subplot(311)
    plt.imshow(db_converter(mel_converter(y.mean(dim=0))),interpolation='nearest', aspect='auto',origin="lower")    
    change_xtick_to_seconds(SR, N_FFT, HOP_LENGTH)
    #change_ytick_to_frequency(SR,N_FFT)
    plt.gca().add_patch(Rectangle(((start*SR)//HOP_LENGTH+1,1),625,125,linewidth=1,edgecolor='r',facecolor='none'))
    plt.title("full mel-spectrogram")
    plt.subplot(312)
    plt.plot(mono_clip)
    change_wav_xticks_to_seconde(SR)
    plt.margins(0)
    plt.title(f"5-sec wavform {start}-{start+5}sec")
    plt.subplot(313)
    plt.imshow(mel,interpolation='nearest', aspect='auto',origin="lower")    
    change_xtick_to_seconds(SR, N_FFT, HOP_LENGTH)    
    #change_ytick_to_frequency(SR,N_FFT)
    plt.title(f"5-sec mel-spectrogram {start}-{start+5}sec")
    plt.suptitle(f"{filename}      [prediction : {prediction}]")
    plt.tight_layout()
    
    
     
widgets.interact(file_select, filename=options);

# birds

The informations of birds from this notebook.
### skylar
https://ebird.org/species/skylar#

<img src="https://cdn.download.ams.birds.cornell.edu/api/v1/asset/311380331/1800" width="400">

### warwhe1
https://ebird.org/species/warwhe1#

<img src="https://cdn.download.ams.birds.cornell.edu/api/v1/asset/188765311/1800" width="400">

### yefcan
https://ebird.org/species/yefcan#

<img src="https://cdn.download.ams.birds.cornell.edu/api/v1/asset/97651761/1800" width="400">

### houfin
https://ebird.org/species/houfin#

<img src="https://cdn.download.ams.birds.cornell.edu/api/v1/asset/306327341/1800" width="400">

# Results and insights
## Results
For some files, it is determined that there is a bird even though it is nocall.

## insights
Based on the index of the mel bin, less than 40 is not worth it (For these examples only).

General noise is distributed in the low-frequency band. Therefore, cutting out the part corresponding to the low frequency is also an option.

If you want to use specaug, set the mel bin range(frequncy range) to 40 or less. Otherwise, it will affect the bird call.


