# **Homework 2 Phoneme Classification**

* Slides: https://docs.google.com/presentation/d/1v6HkBWiJb8WNDcJ9_-2kwVstxUWml87b9CnA16Gdoio/edit?usp=sharing
* Kaggle: https://www.kaggle.com/c/ml2022spring-hw2
* Video: TBA


In [1]:
!nvidia-smi

Sun Mar 13 09:24:58 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 511.79       Driver Version: 511.79       CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA GeForce ... WDDM  | 00000000:07:00.0  On |                  N/A |
|  0%   43C    P5     9W / 100W |   1083MiB /  4096MiB |     27%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

## Download Data
Download data from google drive, then unzip it.

You should have
- `libriphone/train_split.txt`
- `libriphone/train_labels`
- `libriphone/test_split.txt`
- `libriphone/feat/train/*.pt`: training feature<br>
- `libriphone/feat/test/*.pt`:  testing feature<br>

after running the following block.

> **Notes: if the links are dead, you can download the data directly from [Kaggle](https://www.kaggle.com/c/ml2022spring-hw2/data) and upload it to the workspace, or you can use [the Kaggle API](https://www.kaggle.com/general/74235) to directly download the data into colab.**


### Download train/test metadata

In [2]:
# Main link
!wget -O libriphone.zip "https://github.com/xraychen/shiny-robot/releases/download/v1.0/libriphone.zip"

# Backup Link 0
# !pip install --upgrade gdown
# !gdown --id '1o6Ag-G3qItSmYhTheX6DYiuyNzWyHyTc' --output libriphone.zip

# Backup link 1
# !pip install --upgrade gdown
# !gdown --id '1R1uQYi4QpX0tBfUWt2mbZcncdBsJkxeW' --output libriphone.zip

# Backup link 2
# !wget -O libriphone.zip "https://www.dropbox.com/s/wqww8c5dbrl2ka9/libriphone.zip?dl=1"

# Backup link 3
# !wget -O libriphone.zip "https://www.dropbox.com/s/p2ljbtb2bam13in/libriphone.zip?dl=1"

!unzip -q libriphone.zip
!ls libriphone

'wget' is not recognized as an internal or external command,
operable program or batch file.
'unzip' is not recognized as an internal or external command,
operable program or batch file.
'ls' is not recognized as an internal or external command,
operable program or batch file.


### Preparing Data

**Helper functions to pre-process the training data from raw MFCC features of each utterance.**

A phoneme may span several frames and is dependent to past and future frames. \
Hence we concatenate neighboring phonemes for training to achieve higher accuracy. The **concat_feat** function concatenates past and future k frames (total 2k+1 = n frames), and we predict the center frame.

Feel free to modify the data preprocess functions, but **do not drop any frame** (if you modify the functions, remember to check that the number of frames are the same as mentioned in the slides)

In [3]:
import os
import random
import pandas as pd
import torch
from tqdm import tqdm

def load_feat(path):
    feat = torch.load(path)
    return feat

def shift(x, n):
    if n < 0:
        left = x[0].repeat(-n, 1)
        right = x[:n]

    elif n > 0:
        right = x[-1].repeat(n, 1)
        left = x[n:]
    else:
        return x

    return torch.cat((left, right), dim=0)

def concat_feat(x, concat_n):
    assert concat_n % 2 == 1 # n must be odd
    if concat_n < 2:
        return x
    seq_len, feature_dim = x.size(0), x.size(1)
    x = x.repeat(1, concat_n) 
    x = x.view(seq_len, concat_n, feature_dim).permute(1, 0, 2) # concat_n, seq_len, feature_dim
    mid = (concat_n // 2)
    for r_idx in range(1, mid+1):
        x[mid + r_idx, :] = shift(x[mid + r_idx], r_idx)
        x[mid - r_idx, :] = shift(x[mid - r_idx], -r_idx)

    return x.permute(1, 0, 2).view(seq_len, concat_n * feature_dim)

def preprocess_data(split, feat_dir, phone_path, concat_nframes, train_ratio=0.8, train_val_seed=1337):
    class_num = 41 # NOTE: pre-computed, should not need change
    mode = 'train' if (split == 'train' or split == 'val') else 'test'

    label_dict = {}
    if mode != 'test':
      phone_file = open(os.path.join(phone_path, f'{mode}_labels.txt')).readlines()

      for line in phone_file:
          line = line.strip('\n').split(' ')
          label_dict[line[0]] = [int(p) for p in line[1:]]

    if split == 'train' or split == 'val':
        # split training and validation data
        usage_list = open(os.path.join(phone_path, 'train_split.txt')).readlines()
        random.seed(train_val_seed)
        random.shuffle(usage_list)
        percent = int(len(usage_list) * train_ratio)
        usage_list = usage_list[:percent] if split == 'train' else usage_list[percent:]
    elif split == 'test':
        usage_list = open(os.path.join(phone_path, 'test_split.txt')).readlines()
    else:
        raise ValueError('Invalid \'split\' argument for dataset: PhoneDataset!')

    usage_list = [line.strip('\n') for line in usage_list]
    print('[Dataset] - # phone classes: ' + str(class_num) + ', number of utterances for ' + split + ': ' + str(len(usage_list)))

    max_len = 3000000
    X = torch.empty(max_len, 39 * concat_nframes)
    if mode != 'test':
      y = torch.empty(max_len, dtype=torch.long)

    idx = 0
    for i, fname in tqdm(enumerate(usage_list)):
        feat = load_feat(os.path.join(feat_dir, mode, f'{fname}.pt'))
        cur_len = len(feat)
        feat = concat_feat(feat, concat_nframes)
        if mode != 'test':
          label = torch.LongTensor(label_dict[fname])

        X[idx: idx + cur_len, :] = feat
        if mode != 'test':
          y[idx: idx + cur_len] = label

        idx += cur_len

    X = X[:idx, :]
    if mode != 'test':
      y = y[:idx]

    print(f'[INFO] {split} set')
    print(X.shape)
    if mode != 'test':
      print(y.shape)
      return X, y
    else:
      return X


## Define Dataset

In [4]:
import torch
from torch.utils.data import Dataset
from torch.utils.data import DataLoader

class LibriDataset(Dataset):
    def __init__(self, X, y=None):
        self.data = X
        if y is not None:
            self.label = torch.LongTensor(y)
        else:
            self.label = None

    def __getitem__(self, idx):
        if self.label is not None:
            return self.data[idx], self.label[idx]
        else:
            return self.data[idx]

    def __len__(self):
        return len(self.data)


## Define Model

In [5]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class BasicBlock(nn.Module):
    def __init__(self, input_dim, output_dim):
        super(BasicBlock, self).__init__()

        self.block = nn.Sequential(
            nn.Linear(input_dim, output_dim),
            nn.Dropout(p=0.3),
            nn.BatchNorm1d(output_dim),
            nn.ReLU(),
        )

    def forward(self, x):
        x = self.block(x)
        return x


class Classifier(nn.Module):
    def __init__(self, input_dim, output_dim=41, hidden_layers=1, hidden_dim=256):
        super(Classifier, self).__init__()

        self.fc = nn.Sequential(
            BasicBlock(input_dim, hidden_dim),
            *[BasicBlock(hidden_dim, hidden_dim) for _ in range(hidden_layers)],
            nn.Linear(hidden_dim, output_dim)
        )

    def forward(self, x):
        x = self.fc(x)
        return x

## Hyper-parameters

In [6]:
# data prarameters
concat_nframes = 25             # the number of frames to concat with, n must be odd (total 2k+1 = n frames)
train_ratio = 0.8               # the ratio of data used for training, the rest will be used for validation

# training parameters
seed = 0                        # random seed
batch_size = 512                # batch size
num_epoch = 20                  # the number of training epoch
learning_rate = 0.0001          # learning rate
model_path = './model.ckpt'     # the path where the checkpoint will be saved

# model parameters
input_dim = 39 * concat_nframes # the input dim of the model, you should not change the value
hidden_layers = 8               # the number of hidden layers
hidden_dim = 512                # the hidden dim

## Prepare dataset and model

In [7]:
import gc

# preprocess data
train_X, train_y = preprocess_data(split='train', feat_dir='./libriphone/feat', phone_path='./libriphone', concat_nframes=concat_nframes, train_ratio=train_ratio)
val_X, val_y = preprocess_data(split='val', feat_dir='./libriphone/feat', phone_path='./libriphone', concat_nframes=concat_nframes, train_ratio=train_ratio)

# get dataset
train_set = LibriDataset(train_X, train_y)
val_set = LibriDataset(val_X, val_y)

# remove raw feature to save memory
del train_X, train_y, val_X, val_y
gc.collect()

# get dataloader
train_loader = DataLoader(train_set, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_set, batch_size=batch_size, shuffle=False)

FileNotFoundError: [Errno 2] No such file or directory: './libriphone\\train_labels.txt'

In [None]:
device = 'cuda:0' if torch.cuda.is_available() else 'cpu'
print(f'DEVICE: {device}')

DEVICE: cuda:0


In [None]:
import numpy as np

#fix seed
def same_seeds(seed):
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed(seed)
        torch.cuda.manual_seed_all(seed)  
    np.random.seed(seed)  
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic = True

In [None]:
# fix random seed
same_seeds(seed)

# create model, define a loss function, and optimizer
model = Classifier(input_dim=input_dim, hidden_layers=hidden_layers, hidden_dim=hidden_dim).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate)

## Training

In [None]:
best_acc = 0.0
for epoch in range(num_epoch):
    train_acc = 0.0
    train_loss = 0.0
    val_acc = 0.0
    val_loss = 0.0
    
    # training
    model.train() # set the model to training mode
    for i, batch in enumerate(tqdm(train_loader)):
        features, labels = batch
        features = features.to(device)
        labels = labels.to(device)
        
        optimizer.zero_grad() 
        outputs = model(features) 
        
        loss = criterion(outputs, labels)
        loss.backward() 
        optimizer.step() 
        
        _, train_pred = torch.max(outputs, 1) # get the index of the class with the highest probability
        train_acc += (train_pred.detach() == labels.detach()).sum().item()
        train_loss += loss.item()
    
    # validation
    if len(val_set) > 0:
        model.eval() # set the model to evaluation mode
        with torch.no_grad():
            for i, batch in enumerate(tqdm(val_loader)):
                features, labels = batch
                features = features.to(device)
                labels = labels.to(device)
                outputs = model(features)
                
                loss = criterion(outputs, labels) 
                
                _, val_pred = torch.max(outputs, 1) 
                val_acc += (val_pred.cpu() == labels.cpu()).sum().item() # get the index of the class with the highest probability
                val_loss += loss.item()

            print('[{:03d}/{:03d}] Train Acc: {:3.6f} Loss: {:3.6f} | Val Acc: {:3.6f} loss: {:3.6f}'.format(
                epoch + 1, num_epoch, train_acc/len(train_set), train_loss/len(train_loader), val_acc/len(val_set), val_loss/len(val_loader)
            ))

            # if the model improves, save a checkpoint at this epoch
            if val_acc > best_acc:
                best_acc = val_acc
                torch.save(model.state_dict(), model_path)
                print('saving model with acc {:.3f}'.format(best_acc/len(val_set)))
    else:
        print('[{:03d}/{:03d}] Train Acc: {:3.6f} Loss: {:3.6f}'.format(
            epoch + 1, num_epoch, train_acc/len(train_set), train_loss/len(train_loader)
        ))

# if not validating, save the last epoch
if len(val_set) == 0:
    torch.save(model.state_dict(), model_path)
    print('saving model at last epoch')


100%|██████████| 4134/4134 [01:34<00:00, 43.54it/s]
100%|██████████| 1031/1031 [00:08<00:00, 118.35it/s]


[001/020] Train Acc: 0.462228 Loss: 1.851013 | Val Acc: 0.574333 loss: 1.477411
saving model with acc 0.574


100%|██████████| 4134/4134 [01:33<00:00, 44.07it/s]
100%|██████████| 1031/1031 [00:08<00:00, 118.17it/s]


[002/020] Train Acc: 0.563764 Loss: 1.453504 | Val Acc: 0.613231 loss: 1.408430
saving model with acc 0.613


100%|██████████| 4134/4134 [01:33<00:00, 44.04it/s]
100%|██████████| 1031/1031 [00:08<00:00, 117.32it/s]


[003/020] Train Acc: 0.593325 Loss: 1.347626 | Val Acc: 0.631202 loss: 1.372119
saving model with acc 0.631


100%|██████████| 4134/4134 [01:33<00:00, 44.09it/s]
100%|██████████| 1031/1031 [00:08<00:00, 117.88it/s]


[004/020] Train Acc: 0.610782 Loss: 1.284787 | Val Acc: 0.643510 loss: 1.345541
saving model with acc 0.644


100%|██████████| 4134/4134 [01:34<00:00, 43.89it/s]
100%|██████████| 1031/1031 [00:08<00:00, 116.81it/s]


[005/020] Train Acc: 0.622318 Loss: 1.242518 | Val Acc: 0.652805 loss: 1.316949
saving model with acc 0.653


100%|██████████| 4134/4134 [01:34<00:00, 43.78it/s]
100%|██████████| 1031/1031 [00:08<00:00, 118.74it/s]


[006/020] Train Acc: 0.631780 Loss: 1.209234 | Val Acc: 0.659520 loss: 1.301219
saving model with acc 0.660


100%|██████████| 4134/4134 [01:34<00:00, 43.96it/s]
100%|██████████| 1031/1031 [00:08<00:00, 117.21it/s]


[007/020] Train Acc: 0.639255 Loss: 1.183495 | Val Acc: 0.666837 loss: 1.267334
saving model with acc 0.667


100%|██████████| 4134/4134 [01:34<00:00, 43.94it/s]
100%|██████████| 1031/1031 [00:08<00:00, 118.25it/s]


[008/020] Train Acc: 0.645165 Loss: 1.161588 | Val Acc: 0.671614 loss: 1.259854
saving model with acc 0.672


100%|██████████| 4134/4134 [01:34<00:00, 43.84it/s]
100%|██████████| 1031/1031 [00:08<00:00, 117.67it/s]


[009/020] Train Acc: 0.650158 Loss: 1.142857 | Val Acc: 0.675646 loss: 1.243545
saving model with acc 0.676


100%|██████████| 4134/4134 [01:33<00:00, 44.00it/s]
100%|██████████| 1031/1031 [00:08<00:00, 115.47it/s]


[010/020] Train Acc: 0.654583 Loss: 1.126405 | Val Acc: 0.680739 loss: 1.233663
saving model with acc 0.681


100%|██████████| 4134/4134 [01:34<00:00, 43.84it/s]
100%|██████████| 1031/1031 [00:08<00:00, 117.56it/s]


[011/020] Train Acc: 0.658529 Loss: 1.113129 | Val Acc: 0.684376 loss: 1.214264
saving model with acc 0.684


100%|██████████| 4134/4134 [01:33<00:00, 44.18it/s]
100%|██████████| 1031/1031 [00:08<00:00, 116.00it/s]


[012/020] Train Acc: 0.661587 Loss: 1.101015 | Val Acc: 0.686788 loss: 1.209513
saving model with acc 0.687


100%|██████████| 4134/4134 [01:34<00:00, 43.95it/s]
100%|██████████| 1031/1031 [00:08<00:00, 117.80it/s]


[013/020] Train Acc: 0.665138 Loss: 1.089414 | Val Acc: 0.691330 loss: 1.180618
saving model with acc 0.691


100%|██████████| 4134/4134 [01:33<00:00, 44.15it/s]
100%|██████████| 1031/1031 [00:08<00:00, 116.63it/s]


[014/020] Train Acc: 0.668151 Loss: 1.080774 | Val Acc: 0.693619 loss: 1.181007
saving model with acc 0.694


100%|██████████| 4134/4134 [01:34<00:00, 43.52it/s]
100%|██████████| 1031/1031 [00:08<00:00, 117.48it/s]


[015/020] Train Acc: 0.670355 Loss: 1.070948 | Val Acc: 0.695699 loss: 1.177359
saving model with acc 0.696


100%|██████████| 4134/4134 [01:33<00:00, 44.03it/s]
100%|██████████| 1031/1031 [00:08<00:00, 117.02it/s]


[016/020] Train Acc: 0.672850 Loss: 1.062759 | Val Acc: 0.697698 loss: 1.169121
saving model with acc 0.698


100%|██████████| 4134/4134 [01:33<00:00, 44.02it/s]
100%|██████████| 1031/1031 [00:08<00:00, 116.74it/s]


[017/020] Train Acc: 0.674888 Loss: 1.055188 | Val Acc: 0.700949 loss: 1.157141
saving model with acc 0.701


100%|██████████| 4134/4134 [01:34<00:00, 43.94it/s]
100%|██████████| 1031/1031 [00:08<00:00, 115.48it/s]


[018/020] Train Acc: 0.677167 Loss: 1.047875 | Val Acc: 0.702492 loss: 1.147821
saving model with acc 0.702


100%|██████████| 4134/4134 [01:34<00:00, 43.80it/s]
100%|██████████| 1031/1031 [00:08<00:00, 116.85it/s]


[019/020] Train Acc: 0.678507 Loss: 1.041999 | Val Acc: 0.702618 loss: 1.153802
saving model with acc 0.703


100%|██████████| 4134/4134 [01:34<00:00, 43.90it/s]
100%|██████████| 1031/1031 [00:08<00:00, 116.02it/s]

[020/020] Train Acc: 0.680306 Loss: 1.036098 | Val Acc: 0.705737 loss: 1.124737
saving model with acc 0.706





In [None]:
del train_loader, val_loader
gc.collect()

50

## Testing
Create a testing dataset, and load model from the saved checkpoint.

In [None]:
# load data
test_X = preprocess_data(split='test', feat_dir='./libriphone/feat', phone_path='./libriphone', concat_nframes=concat_nframes)
test_set = LibriDataset(test_X, None)
test_loader = DataLoader(test_set, batch_size=batch_size, shuffle=False)

[Dataset] - # phone classes: 41, number of utterances for test: 1078


301it [00:02, 122.40it/s]

In [None]:
# load model
model = Classifier(input_dim=input_dim, hidden_layers=hidden_layers, hidden_dim=hidden_dim).to(device)
model.load_state_dict(torch.load(model_path))

Make prediction.

In [None]:
test_acc = 0.0
test_lengths = 0
pred = np.array([], dtype=np.int32)

model.eval()
with torch.no_grad():
    for i, batch in enumerate(tqdm(test_loader)):
        features = batch
        features = features.to(device)

        outputs = model(features)

        _, test_pred = torch.max(outputs, 1) # get the index of the class with the highest probability
        pred = np.concatenate((pred, test_pred.cpu().numpy()), axis=0)


Write prediction to a CSV file.

After finish running this block, download the file `prediction.csv` from the files section on the left-hand side and submit it to Kaggle.

In [None]:
with open('prediction.csv', 'w') as f:
    f.write('Id,Class\n')
    for i, y in enumerate(pred):
        f.write('{},{}\n'.format(i, y))