<img src="https://i.pinimg.com/originals/a8/97/b2/a897b2f00156fdebec75f7478a330c7d.jpg" alt="drawing" height="100%" width="100%" class="center"/>

## Understand Business Problem <a id="1"></a>

In the challenge, you are  identify which birds are calling in long recordings, given training data generated in meaningfully different contexts. This is the exact problem facing scientists trying to automate the remote monitoring of bird populations.


## Data Overview <a id="2"></a>


`train_audio` : The train data consists of short recordings of individual bird calls generously uploaded by users of xenocanto.org.

`test_audio` : The hidden test_audio directory contains approximately 150 recordings in mp3 format, each roughly 10 minutes long. They will not all fit in a notebook's memory at the same time. The recordings were taken at three separate remote locations. Sites 1 and 2 were labeled in 5 second increments and need matching predictions, but due to the time consuming nature of the labeling process the site 3 files are only labeled at the file level. Accordingly, site 3 has relatively few rows in the test set and needs lower time resolution predictions.

Two example soundscapes from another data source are also provided to illustrate how the soundscapes are labeled and the hidden dataset folder structure. The two example audio files are `BLKFR-10-CPL_20190611_093000.pt540.mp3` and `ORANGE-7-CAP_20190606_093000.pt623.mp3`. These soundscapes were kindly provided by Jack Dumbacher of the California Academy of Science's Department of Ornithology and Mammology.

`test.csv` Only the first three rows are available for download; the full `test.csv` is in the hidden test set.

- `site`: Site ID.

- `row_id`: ID code for the row.

- `seconds`: the second ending the time window, if any. Site 3 time windows cover the entire audio file and have null entries for seconds.

- `audio_id`: ID code for the audio file.

- `example_test_audio_metadata.csv` Complete metadata for the example test audio. These labels have higher time precision than is used for the hidden test set.

`example_test_audio_summary.csv` Metadata for the example test audio, converted to the same format as used in the hidden test set.

- `filename_seconds`: a row identifier.

- `birds`: all ebird codes present in the time window.

- `filename`: audio file names

- `seconds`: the second ending the time window.

`train.csv` A wide range of metadata is provided for the training data. The most directly relevant fields are:

- `ebird_code`: a code for the bird species. You can review detailed information about the bird codes by appending the code to https://ebird.org/species/, such as https://ebird.org/species/amecro for the American Crow.

- `recodist`: the user who provided the recording.

- `location`: where the recording was taken. Some bird species may have local call 'dialects', so you may want to seek geographic diversity in your training data.

- `date`: while some bird calls can be made year round, such as an alarm call, some are restricted to a specific season. You may want to seek temporal diversity in your training data.

- `filename`: the name of the associated audio file.

In [None]:
!curl https://raw.githubusercontent.com/pytorch/xla/master/contrib/scripts/env-setup.py -o pytorch-xla-env-setup.py
!python pytorch-xla-env-setup.py --version nightly --apt-packages libomp5 libopenblas-dev
!pip install pretrainedmodels
!pip install pydub

In [None]:
import numpy as np
import pandas as pd
import librosa
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm import tqdm
import random

from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import KFold, StratifiedKFold
import math
from collections import OrderedDict

from PIL import Image
import albumentations
from pydub import AudioSegment

import torch
import torch.nn as nn
import torch.nn.functional as F

from torch.utils.data import Dataset, DataLoader

import pretrainedmodels

import torch_xla.core.xla_model as xm
import torch_xla.distributed.parallel_loader as pl
import torch_xla.distributed.xla_multiprocessing as xmp

import warnings
warnings.filterwarnings('ignore')

## Preprocessing <a id="3"></a>

In [None]:
train = pd.read_csv("../input/birdsong-recognition/train.csv")
test = pd.read_csv("../input/birdsong-recognition/test.csv")
submission = pd.read_csv("../input/birdsong-recognition/sample_submission.csv")

### e-bird code

a code for the bird species. we need to predict `ebird_code` using metadata and audio data 


In [None]:
print("Number of Unique birds : ", train.ebird_code.nunique())

In [None]:
# label encoding for target values
train["ebird_label"] = LabelEncoder().fit_transform(train.ebird_code.values)

### K-Fold

In [None]:
train.loc[:, "kfold"] = -1

train= train.sample(frac=1).reset_index(drop=True)

X = train.filename.values
y = train.ebird_code.values

kfold = StratifiedKFold(n_splits=5)

for fold, (t_idx, v_idx) in enumerate(kfold.split(X, y)):
    train.loc[v_idx, "kfold"] = fold

print(train.kfold.value_counts())

### Arguments

In [None]:
class args:
    
    ROOT_PATH = "../input/birdsong-recognition/train_audio"
    
    num_classes = 264
    max_duration= 5 # seconds
    
    sample_rate = 32000
    
    img_height = 128
    img_width = 313
    std = (0.229, 0.224, 0.225)
    mean = (0.485, 0.456, 0.406)
    
    batch_size = 16
    num_workers = 4
    epochs = 2
    
    lr = 0.001
    wd = 1e-5
    momentum = 0.9
    eps = 1e-8
    betas = (0.9, 0.999)
    
    melspectrogram_parameters = {
        "n_mels": 128,
        "fmin": 20,
        "fmax": 16000
    }
    

### Loading Audio Files

In [None]:
def load_audio(path):
    try:
        sound = AudioSegment.from_mp3(path)
        sound = sound.set_frame_rate(args.sample_rate)
        sound = sound[:args.max_duration*1000]
        sound_array = np.array(sound.get_array_of_samples())
    except:
        sound_array = np.random.rand(args.sample_rate * args.max_duration)
        
    return sound_array, args.sample_rate

In [None]:
# Example
sound, sample_rate = load_audio("../input/birdsong-recognition/train_audio/ameavo/XC139921.mp3")
plt.plot(sound)
plt.show()

### Melspectrogram

- extract Melspectrogram features from raw audio into using librosa

In [None]:
def Melspectrogram(audio_path):
        
    y, sr = load_audio(audio_path)
    y = y.astype(float)

    melspec = librosa.feature.melspectrogram(y, sr=args.sample_rate, **args.melspectrogram_parameters)
    melspec = librosa.power_to_db(melspec).astype(np.float32)

    return melspec


# Example
spect = Melspectrogram("../input/birdsong-recognition/train_audio/ameavo/XC139921.mp3")
print("shape ", spect.shape)

plt.figure(figsize=(5 ,30))
plt.imshow(spect)
plt.show()

### Pytorch DataLoader

In [None]:
class BirdDataset:
    def __init__(self, df):
        
        self.filename = df.filename.values
        self.ebird_label = df.ebird_label.values
        self.ebird_code = df.ebird_code.values
        
        self.aug = albumentations.Compose([
                albumentations.Resize(args.img_height, args.img_width, always_apply=True),
                albumentations.Normalize(args.mean, args.std, always_apply=True)
            ])
    
    def __len__(self):
        return len(self.filename)
    
    def __getitem__(self, item):
        
        filename = self.filename[item]
        ebird_code = self.ebird_code[item]
        ebird_label = self.ebird_label[item]

        spect = Melspectrogram(f"{args.ROOT_PATH}/{ebird_code}/{filename}")
        spect = Image.fromarray(spect).convert("RGB") # converted to RGB
        spect = self.aug(image=np.array(spect))["image"] # apply augmentation
        spect = np.transpose(spect, (2, 0, 1)).astype(np.float32)
        
        target = ebird_label
        
        return {
            "spect" : torch.tensor(spect, dtype=torch.float), 
            "target" : torch.tensor(target, dtype=torch.long)
        }
    
    

In [None]:
# Example 
dataset = BirdDataset(train)
d = dataset.__getitem__(10)

d["spect"].shape, d["target"]

### ResNet50 Model

In [None]:
class ResNet50(nn.Module):
    def __init__(self, pretrained):
        super(ResNet50, self).__init__()
        if pretrained is True:
            self.model = pretrainedmodels.__dict__["resnet50"](pretrained="imagenet")
        else:
            self.model = pretrainedmodels.__dict__["resnet50"](pretrained=None)
        
        self.l0 = nn.Linear(2048, args.num_classes)
        
    def forward(self, x):
        bs, _, _, _ = x.shape
        x = self.model.features(x)
        x = F.adaptive_avg_pool2d(x, 1).reshape(bs, -1)
        x = self.l0(x)
        
        return x
    

### Utility functions

In [None]:
def to_list(tensor):
    return tensor.detach().cpu().tolist()

def reduce_fn(vals):
    return sum(vals) / len(vals)

class AverageMeter(object):
    """Computes and stores the average and current values"""
    def __init__(self):
        self.reset()
    
    def __init__(self):
        self.val = 0
        self.avg = 0
        self.sum = 0
        self.count = 0

    def update(self, val, n=1):
        self.val = val
        self.sum += val * n
        self.count += n
        self.avg = self.sum / self.count

def get_position_accuracy(logits, labels):
    predictions = np.argmax(F.softmax(logits, dim=1).cpu().data.numpy(), axis=1)
    labels = labels.cpu().data.numpy()
    total_num = 0
    sum_correct = 0
    for i in range(len(labels)):
        if labels[i] >= 0:
            total_num += 1
            if predictions[i] == labels[i]:
                sum_correct += 1
    if total_num == 0:
        total_num = 1e-7
    return np.float32(sum_correct) / total_num, total_num

### Loss function

In [None]:
def loss_fn(preds, labels):
    loss = nn.CrossEntropyLoss(ignore_index=-1)(preds, labels)
    return loss

### train & validation functions

In [None]:
def train_fn(train_loader, model, optimizer, epoch):
    total_loss = AverageMeter()
    accuracies = AverageMeter()
    
    model.train()

    t = tqdm(train_loader, disable=not xm.is_master_ordinal())
    for step, d in enumerate(t):
        
        spect = d["spect"].to(args.device)
        targets = d["target"].to(args.device)
        
        outputs = model(spect)

        loss = loss_fn(outputs, targets)

        acc, n_position = get_position_accuracy(outputs, targets)
        

        total_loss.update(loss.item(), n_position)
        accuracies.update(acc, n_position)

        optimizer.zero_grad()
        
        loss.backward()
        xm.optimizer_step(optimizer)
        
        t.set_description(f"Train E:{epoch+1} - Loss:{total_loss.avg:0.4f} - Acc:{accuracies.avg:0.4f}")
    
    return total_loss.avg

def valid_fn(valid_loader, model, epoch):
    total_loss = AverageMeter()
    accuracies = AverageMeter()
    
    model.eval()

    t = tqdm(valid_loader, disable=not xm.is_master_ordinal())
    for step, d in enumerate(t):
        
        with torch.no_grad():
        
            spect = d["spect"].to(args.device)
            targets = d["target"].to(args.device)

            outputs = model(spect)

            loss = loss_fn(outputs, targets)

            acc, n_position = get_position_accuracy(outputs, targets)


            total_loss.update(loss.item(), n_position)
            accuracies.update(acc, n_position)
            
            t.set_description(f"Eval E:{epoch+1} - Loss:{total_loss.avg:0.4f} - Acc:{accuracies.avg:0.4f}")

    return total_loss.avg, accuracies.avg

In [None]:
def main(fold_index):
    
    MX = ResNet50(pretrained=False)
    
    #args.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    
    
    # Setting seed
    seed = 42
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.deterministic = True
    
    train_df = train[~train.kfold.isin([fold_index])]
    train_dataset = BirdDataset(df=train_df)
    
    
    valid_df = train[train.kfold.isin([fold_index])]
    valid_dataset = BirdDataset(df=valid_df)
        
    def run():
        
        args.device = xm.xla_device()
        model = MX.to(args.device)

        
        
        
        train_sampler = torch.utils.data.distributed.DistributedSampler(
            train_dataset,
            num_replicas=xm.xrt_world_size(),
            rank=xm.get_ordinal(),
            shuffle=True
        )

        train_loader = DataLoader(
            dataset = train_dataset,
            batch_size = args.batch_size,
            sampler = train_sampler,
            num_workers = args.num_workers,
            pin_memory = True,
            drop_last = False
        )


        
        valid_sampler = torch.utils.data.distributed.DistributedSampler(
            valid_dataset,
            num_replicas=xm.xrt_world_size(),
            rank=xm.get_ordinal(),
            shuffle=True
        )

        valid_loader = DataLoader(
            dataset = valid_dataset,
            batch_size = args.batch_size,
            sampler = valid_sampler,
            num_workers = args.num_workers,
            pin_memory = True,
            drop_last = False
        )
        
        optimizer = torch.optim.AdamW(model.parameters(),
                                          lr=args.lr * xm.xrt_world_size(),
                                          betas=args.betas,
                                          eps=args.eps,
                                          weight_decay=args.wd
                                     )
        
        xm.master_print("Training is Starting.........")

        best_acc = 0

        for epoch in range(args.epochs):
            para_loader = pl.ParallelLoader(train_loader, [args.device])
            train_loss = train_fn(para_loader.per_device_loader(args.device), model, optimizer, epoch)
            
            para_loader = pl.ParallelLoader(valid_loader, [args.device])
            valid_loss, valid_acc = valid_fn(para_loader.per_device_loader(args.device), model, epoch)

            xm.master_print(f"**** Epoch {epoch+1} **==>** Accuracy = {valid_acc}")
            
            acc = xm.mesh_reduce("auc_reduce", valid_acc, reduce_fn)

            if acc > best_acc:
                xm.master_print("**** Model Improved !!!! Saving Model")
                xm.save(model.state_dict(), f"fold_{fold_index}.bin")
                best_acc = acc
    
    def _mp_fn(rank, flags):
        torch.set_default_tensor_type('torch.FloatTensor')
        a = run()
    
    FLAGS={}
    xmp.spawn(_mp_fn, args=(FLAGS,), nprocs=8, start_method='fork')

### 5 Folds

In [None]:
# fold0
main(0)

In [None]:
# fold1
#main(1)

In [None]:
# fold2
#main(2)

In [None]:
# fold3
#main(3)

In [None]:
# fold4
#main(4)

<h2 style="color:red;"> Please upvote if you like it. It motivates me. Thank you ☺️ .</h2>