Auto-Encoder
---------------------------------

경미의 [논문](https://drive.google.com/file/d/1RArk7z4NqY5HkwkUWx4cR2ApZNnAQxdF/view?usp=sharing)에 따르면 AE가 좀더 generalize한 feature를 뽑아준다고한다. 물론 image에 대해서 실험했고 (28x28, 32x32 의 작은...) task 간의 generalization에 대해 언급해서 조금 context가 다르다.

그래서 일단 xvector의 feature를 가지고 간단한게 AE를 구현해보려고한다.

Center-Loss
----------------------

Auto-Encoder의 MSELoss에 Center loss를 넣어서 더 모이게 한다면 어떻게 될까?

Unsupervise에서 Supervise로 된거다.

Faster
---------------

현재 centroid가 너무 많아서 연산이 오래걸린다.
하지만 실제로 사용하는 centroid는 batch에 존재하는 class들이다.

그래프를 깨지 않으면서 효율적으로 연산하는 방법을 생각해보자.

Scratch
---------------

xvector feature의 한계일 수 도있어서.

아예 fbank에서부터 트레이닝을 시켜보려고한다.

### Environment

In [1]:
%load_ext autoreload
%autoreload 2
%pylab
%matplotlib inline

import pandas as pd
import pickle
import numpy as np
import sys
import os

Using matplotlib backend: TkAgg
Populating the interactive namespace from numpy and matplotlib


In [2]:
sys.path.append('../')
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"   # see issue #152
os.environ["CUDA_VISIBLE_DEVICES"]="1"

### Configuration

In [3]:
from sv_system.utils.parser import set_train_config
import easydict

# datasets
# voxc1_fbank_xvector
# gcommand_fbank_xvector

config = dict(dataset="voxc1_fbank_xvector",
              input_frames=300, input_clip=True,
              splice_frames=[300],
              stride_frames=1,
              input_format='fbank',
              no_eer=False,
              cache_size=10000)


### AE Model

In [4]:
import torch
import torch.nn as nn

class autoencoder(nn.Module):
    def __init__(self):
        super(autoencoder, self).__init__()
        
        self.encoder = nn.Sequential(
            # set padding following the 'same'.
            # ============ tdnn layers ============
            nn.Conv1d(64, 128, stride=1, dilation=1, kernel_size=5, padding=2),
            nn.ReLU(True),
            nn.BatchNorm1d(128),
            nn.Conv1d(128, 128, stride=1, dilation=3, kernel_size=3, padding=1),
            nn.ReLU(True),
            nn.BatchNorm1d(128),
        )
        
        self.fc = nn.Sequential(
            # ============ fc layers ============
            nn.Linear(128, 128),
            nn.ReLU(True),
            nn.Linear(128, 128)
        )
        
        self.decoder = nn.Sequential(
            # ============ fc layers ============
            nn.Linear(128, 128),
            nn.ReLU(True),
            nn.Linear(128, 128),
            nn.ReLU(True),
            # ============ tdnn layers ============
            nn.Conv1d(128, 128, stride=1, dilation=3, kernel_size=3, padding=1),
            nn.ReLU(True),
            nn.BatchNorm1d(128),
            nn.Conv1d(128, 64, stride=1, dilation=1, kernel_size=5, padding=2),
            nn.Tanh()
        )
        
        self.latent_dim = 128

    def forward(self, x):
        x = x.squeeze().permute(0,2,1)
        x = self.encoder(x)
        print(x.shape)
        latent = self.fc(x)
        output = self.decoder(latent)
        return latent, output
    
    def embed(self, x):
        latent = self.encoder(x)
        return latent

### Center-Loss

In [5]:
import torch                                                                                                                                                                                                                                                                 
import torch.nn as nn                                                                                                                                                                                                                                                        

class CenterLoss(nn.Module):                                                                                                                                                                                                                                                 
    """Center loss.                                                                                                                                                                                                                                                          

    Reference:                                                                                                                                                                                                                                                               
    Wen et al. A Discriminative Feature Learning Approach for Deep Face Recognition. ECCV 2016.                                                                                                                                                                              

    Args:                                                                                                                                                                                                                                                                    
     num_classes (int): number of classes.                                                                                                                                                                                                                                
     feat_dim (int): feature dimension.                                                                                                                                                                                                                                   
    """                                                                                                                                                                                                                                                                      
    def __init__(self, num_classes, feat_dim, use_gpu=True):                                                                                                                                                                                                            
        super(CenterLoss, self).__init__()                                                                                                                                                                                                                                   
        self.num_classes = num_classes                                                                                                                                                                                                                                       
        self.feat_dim = feat_dim                                                                                                                                                                                                                                             
        self.use_gpu = use_gpu                                                                                                                                                                                                                                               

        if self.use_gpu:                                                                                                                                                                                                                                                     
            self.centers = nn.Parameter(torch.randn(self.num_classes, self.feat_dim).cuda())                                                                                                                                                                                 
        else:                                                                                                                                                                                                                                                                
            self.centers = nn.Parameter(torch.randn(self.num_classes, self.feat_dim))                                                                                                                                                                                        

    def forward(self, x, labels):                                                                                                                                                                                                                                            
        """                                                                                                                                                                                                                                                                  
        Args:                                                                                                                                                                                                                                                                
         x: feature matrix with shape (batch_size, feat_dim).  |                                                                                                                                                                                                           
         labels: ground truth labels with shape (num_classes).                                                                                                                                                                                                            
        """                                                                                                                                                                                                                                                                  
        batch_size = x.size(0)                                                                                                                                                                                                                                               

        centers_for_batch = self.centers[labels]
        distvec = torch.pow(x, 2).sum(dim=1) + torch.pow(centers_for_batch, 2).sum(dim=1)
#         print(distvec.shape, x.shape, centers_for_batch.shape)
        distvec += torch.sum(-2*x*centers_for_batch, dim=1)
        distvec.clamp(min=1e-12, max=1e+12) 
                                         
        loss = distvec.mean()                                                                                                                                                                                                                                                   

        return loss

### Dataset

In [6]:
from sv_system.data.data_utils import find_dataset, find_trial

_, (si_dataset, sv_dataset) = find_dataset(config, basedir='../', split=False)
trial = find_trial(config, basedir='../')

### Training

In [7]:
import torch.nn.functional as F
from sklearn.metrics import roc_curve

def embeds_utterance(val_dataloader, model):
    embeddings = []
    labels = []
    if torch.cuda.is_available():
            model = model.cuda()
    model.eval()

    with torch.no_grad():
        for batch in val_dataloader:
            X, y = batch
            if not no_cuda:
                X = X.cuda()
                model = model.cuda()
                
            model_output = model.embed(X).cpu().detach()
            embeddings.append(model_output)
            labels.append(y.numpy())
        embeddings = torch.cat(embeddings)
        labels = np.hstack(labels)
    return embeddings, labels 

def sv_test(sv_loader, model, trial):
    embeddings, _ = embeds_utterance(sv_loader, model)
    trial_enroll = embeddings[trial.enrolment_id.tolist()]
    trial_test = embeddings[trial.test_id.tolist()]

    score_vector = F.cosine_similarity(trial_enroll, trial_test, dim=1)
    label_vector = np.array(trial.label)
    fpr, tpr, thres = roc_curve(
            label_vector, score_vector, pos_label=1)
    eer = fpr[np.nanargmin(np.abs(fpr - (1 - tpr)))]

    return eer

In [8]:
num_epochs = 100
batch_size = 128
learning_rate = 1e-3
no_cuda = False

In [9]:
model = autoencoder().cuda()

In [10]:
import torch

weight_cent = 0.01 
criterion_ae = nn.MSELoss()
criterion_cent = CenterLoss(num_classes=7324, feat_dim=model.latent_dim, use_gpu=(not no_cuda))
optimizer_ae = torch.optim.SGD(model.parameters(), lr=learning_rate, weight_decay=1e-05, momentum=0.9)
optimizer_cent = torch.optim.SGD(criterion_cent.parameters(), lr=0.5)

In [11]:
from torch.utils.data.dataloader import DataLoader

si_loader = DataLoader(si_dataset, num_workers=0, batch_size=batch_size, 
                           drop_last=True, pin_memory=True)

sv_loader = DataLoader(sv_dataset, batch_size=batch_size, num_workers=0, shuffle=False)

In [12]:
if not no_cuda:
    model = model.cuda()
    
for epoch in range(num_epochs):
    loss_sum = 0
    total = 0
    for batch_idx, (X, y) in enumerate(si_loader):
        model.train()
        if not no_cuda:
            X = X.cuda()
            y = y.cuda()
        # ===================forward=====================
        latent, output = model(X)
        loss_ae = criterion_ae(output, X)
        loss_cent = criterion_cent(latent, y)
        loss_cent *= weight_cent
        loss = loss_ae + loss_cent
        # ===================backward====================
        optimizer_ae.zero_grad()
        optimizer_cent.zero_grad()
        loss.backward()
        
        optimizer_ae.step()
        for param in criterion_cent.parameters():                                                                                 
            param.grad.data *= (1. / weight_cent)
        optimizer_cent.step()
        
#         loss_sum += loss.item()
#         total += X.size(0)
#         if batch_idx % 100 == 0:
#             print(f"train loss: {loss_sum/total}")
   
    # ===================log========================
    print('epoch [{}/{}], loss:{:.4f}, ae_loss:{:.4f}, cent_loss:{:.4f}'
          .format(epoch + 1, num_epochs, loss.item(), loss_ae.item(), loss_cent.item()))
    
    # =================sv_loss======================
    for batch_idx, (X, y)  in enumerate(sv_loader):
        model.eval()
        if not no_cuda:
                X = X.cuda()
                y.cuda()
        latent, output = model(X)
        loss_ae = criterion_ae(output, X)
        loss_cent = criterion_cent(latent, y)
        loss_cent *= weight_cent
        loss = loss_ae + loss_cent
    eer = sv_test(sv_loader, model, trial)
    print("sv loss: {:.4f}, sv eer: {:.4f}".format(loss.item(), eer))    

torch.Size([128, 128, 296])


RuntimeError: size mismatch, m1: [16384 x 296], m2: [128 x 128] at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:249

In [None]:
torch.save(model.state_dict(), open("saved_models/center_ae_test.pt", "wb"))

In [None]:
X.shape