# Test the Pipeline
In this notebook we just test the implemented modules to verify that everything works.

## Notes
- If we want to apply mixup, we have to do so before passing the data to the model. In their implementation, when the model is in training mode, it gets both x and y and computes the mixup during the forward call. The way we have set this up here, the model expects *only* x, so if we want Mixup, we have to do so on the batch _before_ passing it to the model.
- The modules themselves are still very simple and can be improved/we can add more functionality to them. 

In [9]:
%load_ext autoreload
%autoreload 2 

from TransformApplier import TransformApplier 
from Wav2Spec import Wav2Spec
from SimpleDataset import SimpleDataset
import pandas as pd 
from PretrainedModel import *
from OnlyXTransform import OnlyXTransform
import torch.nn as nn
import torch
import timm
import json

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [10]:
DATA_PATH = '../birdclef-2022/'
metadata = pd.read_csv(f'{DATA_PATH}train_metadata.csv')

with open(f'{DATA_PATH}scored_birds.json') as f:
    birds = json.load(f)

In [11]:
dataset = SimpleDataset(metadata, DATA_PATH, labels=birds)

In [12]:
import numpy as np
for i in range(1000):
    d = dataset.__getitem__(i, debug=False)
    if np.sum(d[-1]) > 2:
        dataset.__getitem__(i, debug=True)
        print(i, d)
        break

train_audio/akepa1/XC122473.ogg
22 (tensor([ 4.8033e-06,  2.8010e-06,  5.4884e-06,  ..., -2.1241e-02,
        -1.3902e-02, -4.4505e-03]), array([1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0]))


In [22]:
from model_utils import Mixup # has to be applied BEFORE THE MODEL SEES THE DATA! 

class SimpleAttention(nn.Module):
    """
    Example post-processing step
    """
    def __init__(self, n_in, width=512, n_out=1):
        super().__init__()
        self.att = nn.Sequential(nn.Linear(n_in, width), nn.ReLU(), nn.Linear(width, n_out))
    
    def forward(self, x):
        weight = torch.softmax(self.att(x), axis=1)
        return (x * weight).sum(1)

transforms1 = TransformApplier([nn.Identity()])

wav2spec = Wav2Spec()

transforms2 = TransformApplier([OnlyXTransform()])

cnn = PretrainedModel(
    model_name='efficientnet_b2', 
    in_chans=1, # normally 3 for RGB-images
)

transforms3 = TransformApplier([SimpleAttention(cnn.get_out_dim())])

output_head = OutputHead(n_in=cnn.get_out_dim(), n_out=21)

In [23]:
pipeline = nn.Sequential(
    transforms1, 
    wav2spec,
    transforms2, 
    cnn,
    transforms3, 
    output_head,
)

In [24]:
with torch.no_grad():
    print(pipeline(dataset[2]).shape)

torch.Size([1, 21])
