# Audio Classification

In this notebook, we will demonstrate using a custom SagemMaker PyTorch container to train an acoustic classification model in SageMaker script mode.

In this example, the model take reference to the paper VERY DEEP CONVOLUTIONAL NEURAL NETWORKS FOR RAW WAVEFORMS by Wei Dai et al., you can get more information by reading the paper.


### Dataset
We will use the UrbanSound8K dataset to train our network. It is available for free here <https://urbansounddataset.weebly.com/> and contains 10 audio classes with over 8000 audio samples. Once you have downloaded the compressed dataset, extract it to your current working directory. There is a csv files that contain metadata of all the sound wave metadata.

Alternatively, the dataset is also available on Kaggle <https://www.kaggle.com/chrisfilo/urbansound8k/download>.

The following are the class labels:
```
0 = airconditioner 
1 = carhorn
2 = childrenplaying 
3 = dogbark
4 = drilling
5 = engineidling 
6 = gunshot
7 = jackhammer
8 = siren
9 = street_music
```


The expected directory structure is as follows with respect to this notebook:

```
../data/UrbanSound8K/
|-- fold1
|   |-- 1.wav
|-- fold2
|   |-- 2.wav
...
|   
`-- UrbanSound8K.csv
```

Let's take a look at a sample file to ensure dataset is downloaded to the correct location.

In [None]:
from IPython.display import Audio

filename = '../data/UrbanSound8K/fold1/101415-3-0-2.wav'
Audio(filename, autoplay=True)

## Step 1: Create custom container based on SageMaker PyTorch Deep Learning Framework

Set `role` to your SageMaker role arn.

In [None]:
role = "arn:aws:iam::342474125894:role/service-role/xxx"

In [None]:
import boto3
import sagemaker
from sagemaker import get_execution_role
from sagemaker.pytorch import PyTorch
import warnings
warnings.filterwarnings('ignore')

ecr_repository_name = 'pytorch-audio-classification'
account_id = role.split(':')[4]
region = boto3.Session().region_name
sess = sagemaker.session.Session()
bucket = sess.default_bucket()

print('Account: {}'.format(account_id))
print('Region: {}'.format(region))
print('Role: {}'.format(role))
print('S3 Bucket: {}'.format(bucket))

### Create Dockerfile

We will build a custom container on top of existing SageMaker deep learning container by installing additional linux package `libsndfile1` which is requred by python package `librosa`.

In [None]:
%%writefile Dockerfile

FROM 763104351884.dkr.ecr.ap-southeast-1.amazonaws.com/pytorch-training:1.5.1-gpu-py3

RUN apt-get update \
    && apt-get install -y  --allow-downgrades --allow-change-held-packages --no-install-recommends \
    libsndfile1


### Build training container

Next we will create a script that will build and upload the custom container image into ECR. It has to be in the same region where the job is run.

In [None]:
%%writefile build_and_push.sh

ACCOUNT_ID=$1
REGION=$2
REPO_NAME=$3
DOCKERFILE=$4
SERVER="${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com"

echo "ACCOUNT_ID: ${ACCOUNT_ID}"
echo "REPO_NAME: ${REPO_NAME}"
echo "REGION: ${REGION}"
echo "DOCKERFILE: ${DOCKERFILE}"

# Login to retrieve base container
aws ecr get-login-password | docker login --username AWS --password-stdin 763104351884.dkr.ecr.${REGION}.amazonaws.com

docker build -q -f ${DOCKERFILE} -t ${REPO_NAME} .

docker tag ${REPO_NAME} ${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com/${REPO_NAME}:latest

aws ecr get-login-password | docker login --username AWS --password-stdin ${SERVER}
aws ecr describe-repositories --repository-names ${REPO_NAME} || aws ecr create-repository --repository-name ${REPO_NAME}

docker push ${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com/${REPO_NAME}:latest


In [None]:
!bash build_and_push.sh $account_id $region $ecr_repository_name Dockerfile

In [None]:
train_image_uri = '{0}.dkr.ecr.{1}.amazonaws.com/{2}:latest'.format(account_id, region, ecr_repository_name)
print('ECR training container ARN: {}'.format(train_image_uri))

The docker image is now pushed to ECR. In the next section, we will show how to train an acoustic classification model using the custom container.

## Step 2: Training on custom container

### Upload Training Data

Upload data to S3, local training dataset has to be in Amazon S3 and the S3 URL to our dataset is passed into the fit() call. Due to the large dataset size, it will take a while for download to complete.

In [None]:
data = "../data/UrbanSound8K"

train_data = sess.upload_data(
    data,
    bucket=bucket,
    key_prefix="UrbanSound8K",
)

train_data = sagemaker.session.s3_input(train_data,
                                    distribution='FullyReplicated',
                                    content_type='csv',
                                    s3_data_type='S3Prefix')


### Start Training

Define the configuration of training to run. `container_image_uri` is where you can provide link to your custom container. Hyperparameters are fed into the training script with data directory (directory where the training dataset is stored).

Epochs and cv have been set to low for training to complete fast. You can get 50%+ accuracy by setting epochs to 60.

In [None]:

hyperparams = {'model': 'M5', # This is default model. You can implement addtional model in train.py
               'epochs': 2, # Set to 2 for demo purpose
               'batch-size': 128,
               'cv': 0, # Set to 1 to perform 10 fold cross validation for all dataset
               'stepsize': 20, # Optimizer stepsize
               'num-workers': 30,
              }

pytorch_estimator = PyTorch(image_name=train_image_uri,
                            entry_point='train.py',
                            source_dir='../src',
                            role=role,
                            train_instance_type='ml.p3.8xlarge',
                            train_instance_count=1,
                            py_version='py3',
                            framework_version='1.5.1',
                            hyperparameters = hyperparams,
                           )


pytorch_estimator.fit({'training': train_data}, wait=True)

### Retrieve model location

In [None]:
model_location = pytorch_estimator.model_data
print(model_location)

## Step 3: Inference

For inference, we will use default inference image. Mandatory `model_fn` is implemented in `inference.py`. PyTorchModel is used to deploy custom model that we trained previously.

### Deploy model

In [None]:
from sagemaker.pytorch import PyTorchModel

pytorch_model = PyTorchModel(model_data=model_location, 
                             role=role, 
                             entry_point='inference.py',
                             source_dir='../src',
                             py_version='py3',
                             framework_version='1.5.1',
                            )
predictor = pytorch_model.deploy(initial_instance_count=1, instance_type='ml.p2.8xlarge', wait=True)


In [None]:
pytorch_model.endpoint_name

### Install python package

Install python packages to load sample test data

In [None]:
!pip install -q librosa==0.7.2 numba==0.48

### Perform inference on sample test data

Create dataloader to perform inference by batch

In [None]:
from torch.utils.data import Dataset
import numpy as np
import librosa
from pathlib import Path
from typing import Iterable
import pandas as pd
import torch

class UrbanSoundDataset(Dataset):
    def __init__(
        self, csv_path: Path, file_path: Path, folderList: Iterable[int], new_sr=8000, audio_len=20, sampling_ratio=5
    ):
        """[summary]

        Args:
            csv_path (Path): Path to dataset metadata csv
            file_path (Path): Path to data folders
            folderList (Iterable[int]): Data folders to be included in dataset
            new_sr (int, optional): New sampling rate. Defaults to 8000.
            audio_len (int, optional): Audio length based on new sampling rate (sec). Defaults to 20.
            sampling_ratio (int, optional): Additional downsampling ratio. Defaults to 5.
        """

        df = pd.read_csv(csv_path)
        self.file_names = []
        self.labels = []
        self.folders = []
        for i in range(0, len(df)):
            if df.iloc[i, 5] in list(folderList):
                self.labels.append(df.iloc[i, 6])
                self.folders.append(df.iloc[i, 5])
                temp = "fold" + str(df.iloc[i, 5]) + "/" + str(df.iloc[i, 0])
                temp = file_path / temp
                self.file_names.append(temp)

        self.file_path = Path(file_path)
        self.folderList = folderList
        self.new_sr = new_sr
        self.audio_len = audio_len
        self.sampling_ratio = sampling_ratio

    def __getitem__(self, index):
        # format the file path and load the file
        path = self.file_names[index]
        sound, sr = librosa.core.load(str(path), mono=False, sr=None)
        if sound.ndim < 2:
            sound = np.expand_dims(sound, axis=0)
        # Convert into single channel format
        sound = sound.mean(axis=0, keepdims=True)
        # Downsampling
        sound = librosa.core.resample(sound, orig_sr=sr, target_sr=self.new_sr)

        # Zero padding to keep desired audio length in seconds
        const_len = self.new_sr * self.audio_len
        tempData = np.zeros([1, const_len])
        if sound.shape[1] < const_len:
            tempData[0, : sound.shape[1]] = sound[:]
        else:
            tempData[0, :] = sound[0, :const_len]
        sound = tempData
        # Resampling
        new_const_len = const_len // self.sampling_ratio
        soundFormatted = torch.zeros([1, new_const_len])
        soundFormatted[0, :] = torch.tensor(sound[0, ::5], dtype=float)

        return soundFormatted, self.labels[index]

    def __len__(self):
        return len(self.file_names)


In [None]:
test_folder = [10]
datapath = Path("../data/UrbanSound8K")
csvpath = datapath / "UrbanSound8K.csv"

test_set = UrbanSoundDataset(csvpath, datapath, test_folder)
test_loader = torch.utils.data.DataLoader(test_set, batch_size=5, shuffle=True)

In [None]:
X, y = next(iter(test_loader))
print(X.shape, y)

You can see the prediction returned from model.

In [None]:
response = predictor.predict(X.numpy())
response = np.transpose(response, (1, 0, 2))
prediction = response[0].argmax(axis=1)
print(prediction)

## Step 4: Optional Cleanup

When you're done with the endpoint, you should clean it up.

All of the training jobs, models and endpoints we created can be viewed through the SageMaker console of your AWS account.

In [None]:
predictor.delete_endpoint()