<a href="https://colab.research.google.com/github/kwanglo/mge51101-20195171/blob/master/final_project/04_Multi_modal_CNN_fastText.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Multi-modal Analysis using CNN+FastText Embedding**

In this section, we will build multi-modal classifier for both sentiment and utterance using CNN+FastText Embedding <br>
<br>
**Applied embedding :** <br>
fastText Korean ver. using wikipedia<br>
**Applied deep learning model :** <br>
CNN
<br>

**Reference** <br>
Code used in current page refered to below link.

1. https://github.com/bentrevett/pytorch-sentiment-analysis/blob/master/3%20-%20Faster%20Sentiment%20Analysis.ipynb
2. https://github.com/bentrevett/pytorch-sentiment-analysis/blob/master/4%20-%20Convolutional%20Sentiment%20Analysis.ipynb
3. https://github.com/bentrevett/pytorch-sentiment-analysis/blob/master/5%20-%20Multi-class%20Sentiment%20Analysis.ipynb

In [None]:
from google.colab import drive
drive.mount('/gdrive', force_remount=True)

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /gdrive


In [None]:
!nvidia-smi

Fri Jun 19 18:28:53 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.36.06    Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla K80           Off  | 00000000:00:04.0 Off |                    0 |
| N/A   36C    P8    26W / 149W |      0MiB / 11441MiB |      0%      Default |
|                               |                      |                 ERR! |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [None]:
!pip3 install konlpy
!pip3 install soynlp

Collecting konlpy
[?25l  Downloading https://files.pythonhosted.org/packages/85/0e/f385566fec837c0b83f216b2da65db9997b35dd675e107752005b7d392b1/konlpy-0.5.2-py2.py3-none-any.whl (19.4MB)
[K     |████████████████████████████████| 19.4MB 51.7MB/s 
[?25hCollecting JPype1>=0.7.0
[?25l  Downloading https://files.pythonhosted.org/packages/2d/9b/e115101a833605b3c0e6f3a2bc1f285c95aaa1d93ab808314ca1bde63eed/JPype1-0.7.5-cp36-cp36m-manylinux2010_x86_64.whl (3.6MB)
[K     |████████████████████████████████| 3.6MB 39.3MB/s 
Collecting colorama
  Downloading https://files.pythonhosted.org/packages/c9/dc/45cdef1b4d119eb96316b3117e6d5708a08029992b2fee2c143c7a0a5cc5/colorama-0.4.3-py2.py3-none-any.whl
Collecting tweepy>=3.7.0
  Downloading https://files.pythonhosted.org/packages/36/1b/2bd38043d22ade352fc3d3902cf30ce0e2f4bf285be3b304a2782a767aec/tweepy-3.8.0-py2.py3-none-any.whl
Collecting beautifulsoup4==4.6.0
[?25l  Downloading https://files.pythonhosted.org/packages/9e/d4/10f46e5cfac773e2270723

In [None]:
import os
import re

from sklearn import datasets, model_selection

import pandas as pd
import numpy as np

In [None]:
path='/gdrive/My Drive/Colab Notebooks/Final Project/dataset/'

# Data preprocessing - Emotion

Since we already done separating dataset, we can jump to data preprocessing session. <br>
We will implement KoNLPy Okt tokenizer and stopwords to refine dataset.

In [None]:
from soynlp.tokenizer import MaxScoreTokenizer
from soynlp.normalizer import *
import re
from konlpy.tag import Okt

def tokenizer(text): # create a tokenizer function
    okt = Okt()
    text = only_hangle(text)
    text = repeat_normalize(text, num_repeats = 2)
    x = okt.morphs(text , stem= True)
    return x

In [None]:
stop_words_set = pd.read_csv(path+'stopwords100.txt',header = 0, delimiter = '\t', quoting = 3)
stop_words= (list(stop_words_set['aa']))
stop_words2 = ['은', '는', '이', '가', '하', '아', '것', '들','의', '있', '되', '수', '보', '주', '등', '한']
stop_words.extend(stop_words)

Now we will build input value TEXT and LABEL for torch.text 

In [None]:
import torch
from torchtext import data
from torchtext import datasets
from soynlp.tokenizer import MaxScoreTokenizer
SEED = 3432

torch.manual_seed(SEED)
torch.backends.cudnn.deterministic = True


TEXT_emo = data.Field(tokenize=tokenizer, stop_words = stop_words)
LABEL_emo = data.LabelField()

In [None]:
from torch.utils.data import TensorDataset, DataLoader, RandomSampler, SequentialSampler
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix

In [None]:
from torchtext.data import TabularDataset
fields_emo = [("Sentence", TEXT_emo),("Emotion", LABEL_emo)]

In [None]:
train_emo,valid_emo, test_emo = data.TabularDataset.splits(
                                        path = path,
                                        train = 'sentiment_train.csv',
                                        validation = 'sentiment_valid.csv',
                                        test = 'sentiment_test.csv',
                                        format = 'csv',
                                        fields = fields_emo,
                                        skip_header = True
)

In [None]:
vars(train_emo[3])

{'Emotion': '5', 'Sentence': ['어제', '런닝맨', '완전', '재밌다']}

This will import FastText word vectors.

In [None]:
import torchtext
vec = torchtext.vocab.Vectors('wiki.ko.vec', cache=path)

In [None]:
MAX_VOCAB_SIZE = 25000

TEXT_emo.build_vocab(train_emo, 
                 max_size = MAX_VOCAB_SIZE, 
                 vectors = vec, 
                 unk_init = torch.Tensor.normal_)

LABEL_emo.build_vocab(train_emo)

In [None]:
from torchtext.data import Iterator, BucketIterator
BATCH_SIZE = 32

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

train_iterator_emo, valid_iterator_emo, test_iterator_emo = data.BucketIterator.splits(
    (train_emo, valid_emo, test_emo), 
    batch_size = BATCH_SIZE, 
    device = device, sort = False)


# **Data preprocessing - Utterance**

Utterance follows same preprocessing procedure as sentiment.

In [None]:
SEED = 3432

torch.manual_seed(SEED)
torch.backends.cudnn.deterministic = True

TEXT_utt = data.Field(tokenize=tokenizer, stop_words = stop_words)
LABEL_utt = data.LabelField()

In [None]:
fields_utt = [("text", TEXT_utt),("label", LABEL_utt)]

In [None]:
train_utt,valid_utt, test_utt = data.TabularDataset.splits(
                                        path = path,
                                        train = 'utterance_train.csv',
                                        validation = 'utterance_valid.csv',
                                        test = 'utterance_test.csv',
                                        format = 'csv',
                                        fields = fields_utt,
                                        skip_header = True
)

In [None]:
MAX_VOCAB_SIZE = 25000

TEXT_utt.build_vocab(train_utt, 
                 max_size = MAX_VOCAB_SIZE, 
                 vectors = vec, 
                 unk_init = torch.Tensor.normal_)

LABEL_utt.build_vocab(train_utt)

In [None]:
train_iterator_utt, valid_iterator_utt, test_iterator_utt = data.BucketIterator.splits(
    (train_utt, valid_utt, test_utt), 
    batch_size = BATCH_SIZE, 
    device = device, sort = False)

#**Model building**

Identical CNN model was applied for training. 


In [None]:
import torch.nn as nn
import torch.nn.functional as F

class CNN_emo(nn.Module):
    def __init__(self, vocab_size, embedding_dim, n_filters, filter_sizes, output_dim, 
                 dropout, pad_idx):
        
        super().__init__()        
        self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx=pad_idx)        
        self.convs = nn.ModuleList([
                                    nn.Conv2d(in_channels = 1, 
                                              out_channels = n_filters, 
                                              kernel_size = (fs, embedding_dim)) 
                                    for fs in filter_sizes
                                    ])
        
        self.fc = nn.Linear(len(filter_sizes) * n_filters, output_dim)        
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, text):
        
        text = text.permute(1, 0)        
        embedded = self.embedding(text)

        embedded = embedded.unsqueeze(1)
        conved = [F.relu(conv(embedded)).squeeze(3) for conv in self.convs]
        pooled = [F.max_pool1d(conv, conv.shape[2]).squeeze(2) for conv in conved]
        cat = self.dropout(torch.cat(pooled, dim = 1))
            
        return self.fc(cat)

In [None]:
class CNN_utt(nn.Module):
    def __init__(self, vocab_size, embedding_dim, n_filters, filter_sizes, output_dim, 
                 dropout, pad_idx):
        
        super().__init__()        
        self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx=pad_idx)        
        self.convs = nn.ModuleList([
                                    nn.Conv2d(in_channels = 1, 
                                              out_channels = n_filters, 
                                              kernel_size = (fs, embedding_dim)) 
                                    for fs in filter_sizes
                                    ])
        
        self.fc = nn.Linear(len(filter_sizes) * n_filters, output_dim)        
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, text):
        
        text = text.permute(1, 0)        
        embedded = self.embedding(text)

        embedded = embedded.unsqueeze(1)
        conved = [F.relu(conv(embedded)).squeeze(3) for conv in self.convs]
        pooled = [F.max_pool1d(conv, conv.shape[2]).squeeze(2) for conv in conved]
        cat = self.dropout(torch.cat(pooled, dim = 1))
            
        return self.fc(cat)

In [None]:
INPUT_DIM_emo = len(TEXT_emo.vocab)
OUTPUT_DIM_emo = len(LABEL_emo.vocab)
PAD_IDX_emo = TEXT_emo.vocab.stoi[TEXT_emo.pad_token]

INPUT_DIM_utt = len(TEXT_utt.vocab)
OUTPUT_DIM_utt = len(LABEL_utt.vocab)
PAD_IDX_utt = TEXT_utt.vocab.stoi[TEXT_utt.pad_token]

EMBEDDING_DIM = 300
N_FILTERS = 100
FILTER_SIZES = [2,3,4]
DROPOUT = 0.5

model_emo = CNN_emo(INPUT_DIM_emo, EMBEDDING_DIM, N_FILTERS, FILTER_SIZES, OUTPUT_DIM_emo, DROPOUT, PAD_IDX_emo)
model_utt = CNN_utt(INPUT_DIM_utt, EMBEDDING_DIM, N_FILTERS, FILTER_SIZES, OUTPUT_DIM_utt, DROPOUT, PAD_IDX_utt)

Before you move further, you need to check the count parameters fit to your expectations. If they are too large or small, there might need some adjustment in preprocessing or TEXT,LABEL field.

In [None]:
def count_parameters_emo(model_emo):
    return sum(p.numel() for p in model_emo.parameters() if p.requires_grad)

print(f'The model has {count_parameters_emo(model_emo):,} trainable parameters')

The model has 5,058,307 trainable parameters


In [None]:
def count_parameters_utt(model_utt):
    return sum(p.numel() for p in model_utt.parameters() if p.requires_grad)

print(f'The model has {count_parameters_utt(model_utt):,} trainable parameters')

The model has 3,218,707 trainable parameters


In [None]:
pretrained_embeddings_emo = TEXT_emo.vocab.vectors
pretrained_embeddings_utt = TEXT_utt.vocab.vectors

In [None]:
UNK_IDX_emo = TEXT_emo.vocab.stoi[TEXT_emo.unk_token]
UNK_IDX_utt = TEXT_utt.vocab.stoi[TEXT_utt.unk_token]

model_emo.embedding.weight.data[UNK_IDX_emo] = torch.zeros(EMBEDDING_DIM)
model_emo.embedding.weight.data[PAD_IDX_emo] = torch.zeros(EMBEDDING_DIM)

model_utt.embedding.weight.data[UNK_IDX_utt] = torch.zeros(EMBEDDING_DIM)
model_utt.embedding.weight.data[PAD_IDX_utt] = torch.zeros(EMBEDDING_DIM)

In [None]:
import torch.optim as optim

optimizer_emo = optim.Adam(model_emo.parameters())
optimizer_utt = optim.Adam(model_utt.parameters())
criterion = nn.CrossEntropyLoss()

model_emo = model_emo.to(device)
model_utt = model_utt.to(device)
criterion = criterion.to(device)

In [None]:
def categorical_accuracy(preds, y):
    """
    Returns accuracy per batch, i.e. if you get 8/10 right, this returns 0.8, NOT 8
    """
    max_preds = preds.argmax(dim = 1, keepdim = True) # get the index of the max probability
    correct = max_preds.squeeze(1).eq(y)
    return correct.sum() / torch.FloatTensor([y.shape[0]])

In [None]:
import sklearn.metrics as sk

Below are train & validation and test models for each task.

In [None]:
def train_model_emo(model, iterator, optimizer, criterion):
    
    epoch_loss = 0
    epoch_acc = 0
    
    model.train()
    
    for batch in iterator:
        
        optimizer.zero_grad()
        
        predictions = model(batch.Sentence)
        
        loss = criterion(predictions, batch.Emotion)
        
        acc = categorical_accuracy(predictions, batch.Emotion)
        
        loss.backward()
        
        optimizer.step()
        
        epoch_loss += loss.item()
        epoch_acc += acc.item()
        
    return epoch_loss / len(iterator), epoch_acc / len(iterator)

In [None]:
def evaluate_model_emo(model, iterator, criterion):
    
    epoch_loss = 0
    epoch_acc = 0
    
    model.eval()

    y_pred = []
    y_actual = []
    
    with torch.no_grad():
    
        for batch in iterator:

            predictions = model(batch.Sentence)
            pred = torch.max(predictions, 1).indices

            pred = pred.tolist()
            pred
            actual = batch.Emotion.tolist()
            actual

            loss = criterion(predictions, batch.Emotion)
            
            acc = categorical_accuracy(predictions, batch.Emotion)

            epoch_loss += loss.item()
            epoch_acc += acc.item()
            y_pred = y_pred + pred
            y_actual = y_actual + actual
        
    return epoch_loss / len(iterator), epoch_acc / len(iterator), y_pred, y_actual

In [None]:
def train_model_utt(model, iterator, optimizer, criterion):
    
    epoch_loss = 0
    epoch_acc = 0
    
    model.train()
    
    for batch in iterator:
        
        optimizer.zero_grad()
        
        predictions = model(batch.text)        
        loss = criterion(predictions, batch.label)        
        acc = categorical_accuracy(predictions, batch.label)
        
        loss.backward()
        
        optimizer.step()
        
        epoch_loss += loss.item()
        epoch_acc += acc.item()
        
    return epoch_loss / len(iterator), epoch_acc / len(iterator)

In [None]:
def evaluate_model_utt(model, iterator, criterion):
    
    epoch_loss = 0
    epoch_acc = 0
    
    model.eval()

    y_pred = []
    y_actual = []
    
    with torch.no_grad():
    
        for batch in iterator:

            predictions = model(batch.text)
            pred = torch.max(predictions, 1).indices

            pred = pred.tolist()
            pred
            actual = batch.label.tolist()
            actual

            loss = criterion(predictions, batch.label)
            
            acc = categorical_accuracy(predictions, batch.label)

            epoch_loss += loss.item()
            epoch_acc += acc.item()
            y_pred = y_pred + pred
            y_actual = y_actual + actual
        
    return epoch_loss / len(iterator), epoch_acc / len(iterator), y_pred, y_actual

In [None]:
import time

def epoch_time(start_time, end_time):
    elapsed_time = end_time - start_time
    elapsed_mins = int(elapsed_time / 60)
    elapsed_secs = int(elapsed_time - (elapsed_mins * 60))
    return elapsed_mins, elapsed_secs

# **Model training**

torch.save(model_emo.state_dict(), 'emo-model.pt') will save best parameters for validation and recalled afterward. This prevents overfitting issue when evaluating the model.

In [None]:
#Emotion
N_EPOCHS = 10

best_valid_loss = float('inf')

for epoch in range(N_EPOCHS):

    start_time = time.time()
    
    train_loss_emo, train_acc_emo = train_model_emo(model_emo, train_iterator_emo, optimizer_emo, criterion)
    valid_loss_emo, valid_acc_emo, y_predict, y_real = evaluate_model_emo(model_emo, valid_iterator_emo, criterion)
    
    end_time = time.time()

    epoch_mins, epoch_secs = epoch_time(start_time, end_time)
    
    if valid_loss_emo < best_valid_loss:
        best_valid_loss = valid_loss_emo
        torch.save(model_emo.state_dict(), 'emo-model.pt')
    
    print(f'Epoch: {epoch+1:02} | Epoch Time: {epoch_mins}m {epoch_secs}s')
    print(f'\tTrain Loss: {train_loss_emo:.3f} | Train Acc: {train_acc_emo*100:.2f}%')
    print(f'\t Val. Loss: {valid_loss_emo:.3f} |  Val. Acc: {valid_acc_emo*100:.2f}%')

Epoch: 01 | Epoch Time: 0m 23s
	Train Loss: 1.801 | Train Acc: 29.77%
	 Val. Loss: 1.595 |  Val. Acc: 37.67%
Epoch: 02 | Epoch Time: 0m 23s
	Train Loss: 1.491 | Train Acc: 43.44%
	 Val. Loss: 1.546 |  Val. Acc: 39.53%
Epoch: 03 | Epoch Time: 0m 23s
	Train Loss: 1.296 | Train Acc: 51.38%
	 Val. Loss: 1.532 |  Val. Acc: 41.52%
Epoch: 04 | Epoch Time: 0m 23s
	Train Loss: 1.106 | Train Acc: 59.22%
	 Val. Loss: 1.596 |  Val. Acc: 41.85%
Epoch: 05 | Epoch Time: 0m 23s
	Train Loss: 0.919 | Train Acc: 66.92%
	 Val. Loss: 1.714 |  Val. Acc: 42.22%
Epoch: 06 | Epoch Time: 0m 23s
	Train Loss: 0.752 | Train Acc: 72.82%
	 Val. Loss: 1.821 |  Val. Acc: 42.14%
Epoch: 07 | Epoch Time: 0m 23s
	Train Loss: 0.621 | Train Acc: 77.42%
	 Val. Loss: 1.945 |  Val. Acc: 42.48%
Epoch: 08 | Epoch Time: 0m 23s
	Train Loss: 0.508 | Train Acc: 81.98%
	 Val. Loss: 2.121 |  Val. Acc: 42.28%
Epoch: 09 | Epoch Time: 0m 23s
	Train Loss: 0.443 | Train Acc: 84.59%
	 Val. Loss: 2.315 |  Val. Acc: 42.18%
Epoch: 10 | Epoch T

In [None]:
model_emo.load_state_dict(torch.load('emo-model.pt'))

test_loss_emo, test_acc_emo, pred, actual = evaluate_model_emo(model_emo, test_iterator_emo, criterion)

f1_score = sk.f1_score(pred,pred, average = 'weighted')
print(f'Test Loss: {test_loss_emo:.3f} | Test Acc: {test_acc_emo*100:.2f}% | F1 Score: {f1_score:.2f}')
confusion_matrix(actual, pred)

Test Loss: 1.533 | Test Acc: 41.14% | F1 Score: 1.00


array([[ 897,  114,   71,  205,  262,  142,   78],
       [ 296, 1088,   29,   79,  157,  107,   55],
       [ 279,   34,  357,  106,  778,   87,   59],
       [ 346,   49,   42,  679,  159,  309,   56],
       [ 334,   52,  196,  102,  785,   86,   74],
       [ 207,   64,   36,  247,  153,  837,   36],
       [ 423,  122,  100,  175,  406,  105,  118]])

In [None]:
#Utterance
N_EPOCHS = 10

best_valid_loss = float('inf')

for epoch in range(N_EPOCHS):

    start_time = time.time()
    
    train_loss_utt, train_acc_utt = train_model_utt(model_utt, train_iterator_utt, optimizer_utt, criterion)
    valid_loss_utt, valid_acc_utt, pred, actual = evaluate_model_utt(model_utt, valid_iterator_utt, criterion)
    
    end_time = time.time()

    epoch_mins, epoch_secs = epoch_time(start_time, end_time)
    
    if valid_loss_utt < best_valid_loss:
        best_valid_loss = valid_loss_utt
        torch.save(model_utt.state_dict(), 'utt-model.pt')
    
    print(f'Epoch: {epoch+1:02} | Epoch Time: {epoch_mins}m {epoch_secs}s')
    print(f'\tTrain Loss: {train_loss_utt:.3f} | Train Acc: {train_acc_utt*100:.2f}%')
    print(f'\t Val. Loss: {valid_loss_utt:.3f} |  Val. Acc: {valid_acc_utt*100:.2f}%')

Epoch: 01 | Epoch Time: 0m 20s
	Train Loss: 1.240 | Train Acc: 53.87%
	 Val. Loss: 1.060 |  Val. Acc: 61.45%
Epoch: 02 | Epoch Time: 0m 20s
	Train Loss: 0.980 | Train Acc: 64.58%
	 Val. Loss: 0.997 |  Val. Acc: 64.56%
Epoch: 03 | Epoch Time: 0m 20s
	Train Loss: 0.839 | Train Acc: 69.44%
	 Val. Loss: 1.022 |  Val. Acc: 63.84%
Epoch: 04 | Epoch Time: 0m 20s
	Train Loss: 0.729 | Train Acc: 73.33%
	 Val. Loss: 1.041 |  Val. Acc: 65.57%
Epoch: 05 | Epoch Time: 0m 20s
	Train Loss: 0.638 | Train Acc: 76.19%
	 Val. Loss: 1.081 |  Val. Acc: 65.24%
Epoch: 06 | Epoch Time: 0m 20s
	Train Loss: 0.567 | Train Acc: 79.07%
	 Val. Loss: 1.146 |  Val. Acc: 65.25%
Epoch: 07 | Epoch Time: 0m 20s
	Train Loss: 0.514 | Train Acc: 80.61%
	 Val. Loss: 1.213 |  Val. Acc: 64.47%
Epoch: 08 | Epoch Time: 0m 20s
	Train Loss: 0.464 | Train Acc: 82.71%
	 Val. Loss: 1.278 |  Val. Acc: 64.13%
Epoch: 09 | Epoch Time: 0m 20s
	Train Loss: 0.417 | Train Acc: 84.33%
	 Val. Loss: 1.354 |  Val. Acc: 64.57%
Epoch: 10 | Epoch T

In [None]:
model_utt.load_state_dict(torch.load('utt-model.pt'))

test_loss_utt, test_acc_utt, pred, actual = evaluate_model_utt(model_utt, test_iterator_utt, criterion)

f1_score = sk.f1_score(pred,pred, average = 'weighted')
print(f'Test Loss: {test_loss_emo:.3f} | Test Acc: {test_acc_emo*100:.2f}% | F1 Score: {f1_score:.2f}')
confusion_matrix(actual, pred)

Test Loss: 1.533 | Test Acc: 41.14% | F1 Score: 1.00


array([[4330,  487,  394,  293,   37,   12,   27],
       [ 903, 3590,  699,   52,   34,   31,   13],
       [1036,  890, 1851,   52,    9,    7,   14],
       [ 194,   26,   29, 1521,    4,    1,    5],
       [ 391,   97,   71,   41,  356,    2,   10],
       [ 314,   59,   27,   24,   16,   87,    3],
       [ 187,   13,   29,   18,    2,    1,   88]])

# **Testing new input**
Both models were tested and below is testing new input sentences.
Different to reference, we need to tokenize it first before implementing evaluated model above.

In [None]:
def predict_emo(model, sentence, min_len = 4):
    model.eval()
    # 이 부분에서 그냥 바로 tokenizing
    tokenized = tokenizer(sentence)
    if len(tokenized) < min_len:
        tokenized += ['<pad>'] * (min_len - len(tokenized))
    indexed = [TEXT_emo.vocab.stoi[t] for t in tokenized]
    tensor = torch.LongTensor(indexed).to(device)
    tensor = tensor.unsqueeze(1)
    preds = model(tensor)
    max_preds = preds.argmax(dim = 1)
    return max_preds.item()

Emotion / Utterance classifier will convert digits into char.

In [None]:
def emotion_classifier(logits):
  global sentiment
  if logits == "0":
    sentiment = '중립'
  elif logits == "1":
    sentiment = '공포'
  elif logits == "2":
    sentiment = '놀람'
  elif logits == "3":
    sentiment = '분노'
  elif logits == "4":
    sentiment = '슬픔'
  elif logits == "5":
    sentiment = '행복'
  elif logits == "6":
    sentiment = '혐오'

  return sentiment

In [None]:
def predict_utt(model, sentence, min_len = 4):
    model.eval()
    # 이 부분에서 그냥 바로 tokenizing
    tokenized = tokenizer(sentence)
    if len(tokenized) < min_len:
        tokenized += ['<pad>'] * (min_len - len(tokenized))
    indexed = [TEXT_utt.vocab.stoi[t] for t in tokenized]
    tensor = torch.LongTensor(indexed).to(device)
    tensor = tensor.unsqueeze(1)
    preds = model(tensor)
    max_preds = preds.argmax(dim = 1)
    return max_preds.item()

In [None]:
def utterance_classifier(logits):
  global utterance
  if logits == "0":
    utterance = '미완'
  elif logits == "1":
    utterance = '서술'
  elif logits == "2":
    utterance = '질문'
  elif logits == "3":
    utterance = '요구'
  elif logits == "4":
    utterance = '수사의문'
  elif logits == "5":
    utterance = '수사명령'
  elif logits == "6":
    utterance = '억양'

  return utterance

Below is tesitng section for new inputs. <br>
Enjoy !

In [None]:
pred_sentence = input()
pred_emo = predict_emo(model_emo, pred_sentence)
pred_utt = predict_utt(model_utt, pred_sentence)
logit_1 = LABEL_emo.vocab.itos[pred_emo]
logit_2 = LABEL_utt.vocab.itos[pred_utt]

print(f'이 문장의 감정은 {emotion_classifier(logit_1)}이고, 발화 의도는 {utterance_classifier(logit_2)}입니다')

프로젝트가 끝나서 너무 기쁘지 않니?
이 문장의 감정은 행복이고, 발화 의도는 서술입니다


In [None]:
pred_sentence = input()
pred_emo = predict_emo(model_emo, pred_sentence)
pred_utt = predict_utt(model_utt, pred_sentence)
logit_1 = LABEL_emo.vocab.itos[pred_emo]
logit_2 = LABEL_utt.vocab.itos[pred_utt]

print(f'이 문장의 감정은 {emotion_classifier(logit_1)}이고, 발화 의도는 {utterance_classifier(logit_2)}입니다')

그렇지만 학점이 좋지 않을 텐데.
이 문장의 감정은 공포이고, 발화 의도는 서술입니다


In [None]:
pred_sentence = input()
pred_emo = predict_emo(model_emo, pred_sentence)
pred_utt = predict_utt(model_utt, pred_sentence)
logit_1 = LABEL_emo.vocab.itos[pred_emo]
logit_2 = LABEL_utt.vocab.itos[pred_utt]

print(f'이 문장의 감정은 {emotion_classifier(logit_1)}이고, 발화 의도는 {utterance_classifier(logit_2)}입니다')

서술 말고 다른 걸 말해.
이 문장의 감정은 중립이고, 발화 의도는 요구입니다
