# Text Classification

## Environment
You will use Python 3.10 and PyTorch 2.0, which is already available on Colab and some other libraries


## 실습 Part 0. Checking GPU
In this section, you will make sure you are using the GPU of google colab and download the libraries

In [1]:
from platform import python_version
import torch
import numpy as np
import pandas as pd
from torch.utils.data import Dataset, DataLoader
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch.nn.functional as F
import nltk

print("python", python_version())
print("torch", torch.__version__)

  from .autonotebook import tqdm as notebook_tqdm


python 3.8.10
torch 1.14.0a0+44dac51


In [2]:
#check GPU
is_cuda = torch.cuda.is_available()

# If we have a GPU available, we'll set our device to GPU. We'll use this device variable later in our code.
if is_cuda:
    device = torch.device("cuda")
    print("GPU is available")
else:
    device = torch.device("cpu")
    print("GPU not available, CPU used")

GPU is available


## 실습 Part 1. Downloading Dataset
In this section, you will download Stanford Sentiment Treebank (SST), a popular dataset for sentiment classification

In [3]:
!pip install datasets

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
You should consider upgrading via the '/usr/bin/python -m pip install --upgrade pip' command.[0m


Download SST and print the first example:

In [4]:
from datasets import load_dataset
from pprint import pprint

sst_dataset = load_dataset('sst') #download sst dataset
pprint(sst_dataset['train'][0]) #printing the first (sentence,label) example in the dataset

{'label': 0.6944400072097778,
 'sentence': "The Rock is destined to be the 21st Century 's new `` Conan '' "
             "and that he 's going to make a splash even greater than Arnold "
             'Schwarzenegger , Jean-Claud Van Damme or Steven Segal .',
 'tokens': "The|Rock|is|destined|to|be|the|21st|Century|'s|new|``|Conan|''|and|that|he|'s|going|to|make|a|splash|even|greater|than|Arnold|Schwarzenegger|,|Jean-Claud|Van|Damme|or|Steven|Segal|.",
 'tree': '70|70|68|67|63|62|61|60|58|58|57|56|56|64|65|55|54|53|52|51|49|47|47|46|46|45|40|40|41|39|38|38|43|37|37|69|44|39|42|41|42|43|44|45|50|48|48|49|50|51|52|53|54|55|66|57|59|59|60|61|62|63|64|65|66|67|68|69|71|71|0'}


You will only use **'sentence'** and **'label'** of the data. Please ignore the other values. Note that the label is between 0 and 1. You will round it to either 0 or 1 for binary classification (1 means it is a positive review and 0 means it is a negative review)


## 실습 Part 2. Word Embedding
In this section, you will download a pretrained word embedding called Glove and use it to convert words in to a vector representation.


In [5]:
from torchtext.vocab import GloVe
#word embedding 종류 주변에 있는 단어와 전체 문장의 유사성 즉 단어 간의 유사성을 잘 포착하며, 단어의 의미를 저차원의 벡터로 효과적으로 표현

glove = GloVe(name='6B',dim = 300)


.vector_cache/glove.6B.zip: 862MB [02:41, 5.33MB/s]                                                           
100%|█████████████████████████████████████████████████████████████▉| 399999/400000 [00:31<00:00, 12553.71it/s]


In [6]:
print(glove['apple'])

tensor([-0.2084, -0.0197,  0.0640, -0.7140, -0.2118, -0.5928, -0.1532,  0.0442,
         0.6329, -0.8482, -0.2113, -0.1976,  0.1903, -0.5623,  0.2713,  0.2378,
        -0.5189, -0.2452,  0.0352,  0.0968,  0.2490,  0.7128,  0.0383, -0.1051,
        -0.4779, -0.3952, -0.2719, -0.4443,  0.0611, -0.2318, -0.3590, -0.1824,
         0.0355, -0.0877, -1.0816, -0.4252,  0.0032, -0.4599, -0.0435, -0.3903,
         0.5190,  0.2114, -0.2553,  1.1805, -0.1904, -0.1216,  0.0342, -0.0623,
         0.1442, -0.5337,  0.4742, -0.4471,  0.5805,  0.4358,  0.1321, -0.0957,
        -0.3718, -0.0138,  0.2060, -0.1010,  0.1068, -0.3372,  0.1099,  0.3480,
        -0.0998,  0.3694, -0.5292,  0.1241, -0.4613, -0.3848, -0.1011, -0.1763,
         0.3757,  0.1638, -0.2198, -0.2684,  0.8471, -0.3562, -0.0840, -0.2028,
        -0.5654,  0.1911, -0.1413, -0.7812,  0.6919, -0.0836, -0.5429,  0.1644,
         0.0376, -0.6890, -0.6871, -0.1337, -0.4779,  0.2013,  0.0851, -0.0639,
        -0.1710, -0.3243, -0.1762, -0.51

In [7]:
print(glove['notaword'])

tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 

In [8]:
print(glove['Apple'])

tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 

Now you will convert every word in every training sentence with its corresponding word embedding vector

In [9]:
nltk.download('punkt')
training_data = []
length = 16

for idx_, sentence in enumerate(sst_dataset['train']['sentence']):
  #tokenize word
  words = nltk.word_tokenize(sentence)

  #padding or truncating based on the length
  if len(words) > 16:
    words = words[:16]
  else:
    for i in range(0,16-len(words)):
      words.append('PAD')

  #convert words to their embeddings
  ret = glove.get_vecs_by_tokens(words, lower_case_backup = True)
  #print(ret.size())
  training_data.append(ret)

training_data = torch.stack(training_data) #convert list of tensors to tensors
print(training_data.size()) #note that now the training data is now of shape (#training data, length of the sentence, dimension of word vector representation) =  (8544, 16, 300)

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


torch.Size([8544, 16, 300])


Do the same for testing data (except for padding and truncating because we will be inputting test sentence one by one during inference, so they don't have to be of equal length)

In [10]:
testing_data = []
for idx_, sentence in enumerate(sst_dataset['test']['sentence']):
  #tokenize word
  words = nltk.word_tokenize(sentence)

  #convert words to their embeddings
  ret = glove.get_vecs_by_tokens(words, lower_case_backup = True)
  testing_data.append(ret)

## Utility Functions
<br> <font color='red' > **Utility functions and code for 실습 3~4. Please run this before doing 실습 3 and 실습 4. You do not need to change anything here**</font>

In [11]:
#utilities
from torch.utils.data import DataLoader,Dataset
from torch import nn

#function for creating dataloaders
def create_dataloader(data,label,train):
    #create DataLoader
    if (train == 0):
        print(data.size())
        print(label)
        print(torch.round(torch.Tensor(label)).to(torch.long))
        train = torch.utils.data.TensorDataset(data, torch.round(torch.Tensor(label)).to(torch.long))
        #train_data, val_data = torch.utils.data.random_split(train,[int(0.80*len(train)),len(train)-int(0.80*len(train))], generator= torch.Generator().manual_seed(42) )
        return DataLoader(train, batch_size = 16, shuffle=True, drop_last = True) #DataLoader(val_data, batch_size = 16, shuffle=True, drop_last = True)
    else:
        test = torch.utils.data.TensorDataset(torch.Tensor(input).to(torch.int32), (torch.round(torch.Tensor(data['label']))).to(torch.long))
        return DataLoader(test, batch_size = 16, shuffle=True)

def train(num_epoch, model, train_loader):
  for epoch in range(0,num_epoch):
      train_loss = 0
      model.train()
      for batch_id, (data,label) in enumerate(train_loader):
          data = data.to(device)
          label = label.to(device)
          optimizer.zero_grad()
          logits = model(data)
          loss = cel(logits,label)
          loss.backward()
          optimizer.step()
          train_loss += loss.item()
      average_loss = train_loss / len(train_loader.dataset)
      print('====> Epoch: {} Average training loss: {:.4f}'.format(
            epoch, average_loss))


def accuracy(pred,target):
    correct = 0
    for i in range(0,pred.size()[0]):
        if pred[i] == target[i]:
            correct += 1
    return correct

def test(model,test_data, test_label):
  test_label = torch.round(torch.Tensor(test_label)).to(torch.long).to(device)
  total_correct = 0
  total = 0
  model.eval()
  for idx in range(0,len(test_data)):
    data = test_data[idx].to(device).view(-1, test_data[idx].size()[0], test_data[idx].size()[1])
    label = test_label[idx].to(device).view(1)
    logits = model(data)
    pred = m(logits)
    pred = torch.argmax(pred,dim=1)
    total_correct += accuracy(pred,label)
    total += label.size()[0]

  print('Accuracy on the test data is: ' + str(total_correct/total))

## 실습 Part 3. Vanilla RNN
In this section, you implement a vanilla RNN and perform text classification with it.

In [12]:
import torch
import torch.nn as nn

class Model(nn.Module):
    def __init__(self, d):
        super(Model, self).__init__()

        self.input_dim = d
        self.hidden_dim = d
        self.output_dim = 2

        # x -> h layer
        self.U = nn.Linear(self.input_dim, self.hidden_dim)

        # h -> h layer
        self.W = nn.Linear(self.hidden_dim, self.hidden_dim)

        # h -> output layer
        self.V = nn.Linear(self.hidden_dim, self.output_dim)

    def forward(self, x):  # x: size [BatchSize, Length, Word Vector Length]
        # Initializing hidden state for first input using method defined below
        batch_size = x.size(0)
        hidden = self.init_hidden(batch_size)

        # length of sequence
        self.length = x.size()[1]

        # Iterate through sentence and input to RNN sequentially
        for t in range(0, self.length):
            xt = x[:, t, :]  # shape [BatchSize, Length of word embedding vector]
            hidden = torch.tanh(self.U(xt) + self.W(hidden))  # Update hidden state

        # output logit using last sequence
        return self.V(hidden)

    def init_hidden(self, batch_size):
        # This method generates the first hidden state of zeros which we'll use in the forward pass
        # We'll send the tensor holding the hidden state to the device we specified earlier as well
        hidden = torch.zeros(batch_size, self.hidden_dim).to(device)
        return hidden


Now perform text classification with your vanilla RNN. You will see your model's accuracy.

In [13]:
#dataloader
train_loader = create_dataloader(training_data, sst_dataset['train']['label'], 0)

#create model obejct
d = 300
rnn_model = Model(d).to(device)
m = nn.Softmax(1)
cel = nn.CrossEntropyLoss()

#optimizer
optimizer = torch.optim.SGD(rnn_model.parameters(), lr = 0.01)

#train
num_epoch = 15
train(num_epoch,rnn_model,train_loader)

#test
test(rnn_model,testing_data, sst_dataset['test']['label'])

torch.Size([8544, 16, 300])
[0.6944400072097778, 0.833329975605011, 0.625, 0.5, 0.7222200036048889, 0.833329975605011, 0.875, 0.7222200036048889, 0.833329975605011, 0.7361099720001221, 0.9027799963951111, 0.44444000720977783, 0.8055599927902222, 0.44444000720977783, 0.8194400072097778, 0.75, 0.6111099720001221, 0.44444000720977783, 0.8194400072097778, 0.7777799963951111, 0.8194400072097778, 0.6388900279998779, 0.5555599927902222, 0.875, 0.5555599927902222, 0.5138900279998779, 0.9444400072097778, 0.7222200036048889, 0.9305599927902222, 0.3333300054073334, 0.8194400072097778, 0.7777799963951111, 0.5694400072097778, 0.7361099720001221, 0.8611099720001221, 0.6805599927902222, 0.7361099720001221, 0.7222200036048889, 0.541670024394989, 0.6805599927902222, 0.7638900279998779, 0.833329975605011, 0.4027799963951111, 0.6527799963951111, 0.5277799963951111, 0.16666999459266663, 0.7777799963951111, 0.6111099720001221, 0.375, 0.9444400072097778, 0.75, 0.8611099720001221, 0.6388900279998779, 0.88889

====> Epoch: 0 Average training loss: 0.0432
====> Epoch: 1 Average training loss: 0.0426
====> Epoch: 2 Average training loss: 0.0399
====> Epoch: 3 Average training loss: 0.0365
====> Epoch: 4 Average training loss: 0.0352
====> Epoch: 5 Average training loss: 0.0345
====> Epoch: 6 Average training loss: 0.0341
====> Epoch: 7 Average training loss: 0.0336
====> Epoch: 8 Average training loss: 0.0332
====> Epoch: 9 Average training loss: 0.0328
====> Epoch: 10 Average training loss: 0.0324
====> Epoch: 11 Average training loss: 0.0321
====> Epoch: 12 Average training loss: 0.0317
====> Epoch: 13 Average training loss: 0.0312
====> Epoch: 14 Average training loss: 0.0307
Accuracy on the test data is: 0.7384615384615385


## 실습 Part 4. LSTM
In this section, you implement a LSTM and perform text classification with it.

In [14]:
class LSTM(nn.Module):
    def __init__(self, d):
        super().__init__()
        self.input_dim = d
        self.hidden_dim = d
        self.output_dim = 2

        self.Wf = nn.Linear(self.input_dim, self.hidden_dim)
        self.Uf = nn.Linear(self.hidden_dim, self.hidden_dim)

        self.Wi = nn.Linear(self.input_dim, self.hidden_dim)
        self.Ui = nn.Linear(self.hidden_dim, self.hidden_dim)

        self.Wo = nn.Linear(self.input_dim, self.hidden_dim)
        self.Uo = nn.Linear(self.hidden_dim, self.hidden_dim)

        self.Wc = nn.Linear(self.input_dim, self.hidden_dim)
        self.Uc = nn.Linear(self.hidden_dim, self.hidden_dim)

        self.W = nn.Linear(self.hidden_dim, self.output_dim)

    def init_hidden(self, batch_size):
        ht = torch.zeros(batch_size, self.hidden_dim).to(device)
        ct = torch.zeros(batch_size, self.hidden_dim).to(device)
        return ht, ct

    def forward(self, x): 
        batch_size = x.size(0)
        seq_length = x.size(1)
        ht, ct = self.init_hidden(batch_size) 

        for t in range(0, seq_length):
            xt = x[:, t, :]

            ft = torch.sigmoid(self.Wf(xt) + self.Uf(ht))
            it = torch.sigmoid(self.Wi(xt) + self.Ui(ht))
            ot = torch.sigmoid(self.Wo(xt) + self.Uo(ht))
            ct_tilda = torch.tanh(self.Wc(xt) + self.Uc(ht))

            ct = ft * ct + it * ct_tilda
            ht = ot * torch.tanh(ct)

        return self.W(ht)

Now perform text classification using your LSTM model. You will see your model's accuracy

In [15]:
#create model obejct
d = 300
lstm_model = LSTM(d).to(device)
m = nn.Softmax(1)
cel = nn.CrossEntropyLoss()

#optimizer
optimizer = torch.optim.SGD(lstm_model.parameters(), lr = 0.01)


#train
num_epoch = 15
train(num_epoch,lstm_model,train_loader)

#test
test(lstm_model,testing_data, sst_dataset['test']['label'])

====> Epoch: 0 Average training loss: 0.0433
====> Epoch: 1 Average training loss: 0.0432
====> Epoch: 2 Average training loss: 0.0431
====> Epoch: 3 Average training loss: 0.0430
====> Epoch: 4 Average training loss: 0.0428
====> Epoch: 5 Average training loss: 0.0426
====> Epoch: 6 Average training loss: 0.0424
====> Epoch: 7 Average training loss: 0.0421
====> Epoch: 8 Average training loss: 0.0417
====> Epoch: 9 Average training loss: 0.0406
====> Epoch: 10 Average training loss: 0.0377
====> Epoch: 11 Average training loss: 0.0356
====> Epoch: 12 Average training loss: 0.0348
====> Epoch: 13 Average training loss: 0.0340
====> Epoch: 14 Average training loss: 0.0335
Accuracy on the test data is: 0.744343891402715


# 실습 Part 5. Transformer
In this section, you will train BERT (transformer) for text classification.

Download naver sentiment corpus for this transformer experiment

In [16]:
# naver sentiment corpus
!git clone https://github.com/e9t/nsmc.git

Cloning into 'nsmc'...
remote: Enumerating objects: 14763, done.[K
remote: Counting objects: 100% (14762/14762), done.[K
remote: Compressing objects: 100% (13012/13012), done.[K
remote: Total 14763 (delta 1748), reused 14762 (delta 1748), pack-reused 1[K
Receiving objects: 100% (14763/14763), 56.19 MiB | 19.65 MiB/s, done.
Resolving deltas: 100% (1748/1748), done.


Once the download is finished, you can check the example of the dataset

In [17]:
train_df = pd.read_csv('./nsmc/ratings_train.txt', sep='\t')
test_df = pd.read_csv('./nsmc/ratings_test.txt', sep='\t')
train_df.head(15)

Unnamed: 0,id,document,label
0,9976970,아 더빙.. 진짜 짜증나네요 목소리,0
1,3819312,흠...포스터보고 초딩영화줄....오버연기조차 가볍지 않구나,1
2,10265843,너무재밓었다그래서보는것을추천한다,0
3,9045019,교도소 이야기구먼 ..솔직히 재미는 없다..평점 조정,0
4,6483659,사이몬페그의 익살스런 연기가 돋보였던 영화!스파이더맨에서 늙어보이기만 했던 커스틴 ...,1
5,5403919,막 걸음마 뗀 3세부터 초등학교 1학년생인 8살용영화.ㅋㅋㅋ...별반개도 아까움.,0
6,7797314,원작의 긴장감을 제대로 살려내지못했다.,0
7,9443947,별 반개도 아깝다 욕나온다 이응경 길용우 연기생활이몇년인지..정말 발로해도 그것보단...,0
8,7156791,액션이 없는데도 재미 있는 몇안되는 영화,1
9,5912145,왜케 평점이 낮은건데? 꽤 볼만한데.. 헐리우드식 화려함에만 너무 길들여져 있나?,1


For the fast experiment, we will use only small fraction of the dataset (40% of each dataset).

In [18]:
train_df.dropna(inplace=True)
test_df.dropna(inplace=True)

train_df = train_df.sample(frac=0.2, random_state=999)
test_df = test_df.sample(frac=0.2, random_state=999)

Build the Dataset and DataLoader to train the model


In [19]:
class NsmcDataset(Dataset):
    ''' Naver Sentiment Movie Corpus Dataset '''
    def __init__(self, df):
        self.df = df

    def __len__(self):
        return len(self.df)

    def __getitem__(self, idx):
        text = self.df.iloc[idx, 1]
        label = self.df.iloc[idx, 2]
        return text, label

Generate Dataloader object for train and test



In [20]:
#dataloader
nsmc_train_dataset = NsmcDataset(train_df)
train_loader = DataLoader(nsmc_train_dataset, batch_size=8, shuffle=True)

nsmc_eval_dataset = NsmcDataset(test_df)
eval_loader = DataLoader(nsmc_eval_dataset, batch_size=8, shuffle=False)

Create the model object and tokenizer. In this experiment we will use AdamW optimzer with learning rate 1e-5

In [21]:
#create model obejct
tokenizer = AutoTokenizer.from_pretrained('bert-base-multilingual-cased')
transformer_model = AutoModelForSequenceClassification.from_pretrained('bert-base-multilingual-cased')
transformer_model.to(device)

Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-multilingual

BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(119547, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elemen

In [22]:
#optimizer
optimizer = torch.optim.AdamW(transformer_model.parameters(), lr = 1e-5)

You can train the model and see the result with test code

In [23]:
#train
itr = 1
p_itr = 200
epochs = 1
total_loss = 0
total_len = 0
total_correct = 0


transformer_model.train()
for epoch in range(epochs):

    for text, label in train_loader:
        optimizer.zero_grad()

        encoded = tokenizer(text, padding=True, truncation=True, max_length=512, return_tensors="pt")
        encoded, label = encoded.to(device), label.to(device)
        outputs = transformer_model(**encoded, labels=label)

        loss = outputs.loss
        logits = outputs.logits

        pred = torch.argmax(F.softmax(logits), dim=1)
        correct = pred.eq(label)
        total_correct += correct.sum().item()
        total_len += len(label)
        total_loss += loss.item()
        loss.backward()
        optimizer.step()

        if itr % p_itr == 0:
            print('[Epoch {}/{}] Iteration {} -> Train Loss: {:.4f}, Accuracy: {:.3f}'.format(epoch+1, epochs, itr, total_loss/p_itr, total_correct/total_len))
            total_loss = 0
            total_len = 0
            total_correct = 0

        itr+=1

#test
transformer_model.eval()

nsmc_eval_dataset = NsmcDataset(test_df)
eval_loader = DataLoader(nsmc_eval_dataset, batch_size=8, shuffle=False)

total_loss = 0
total_len = 0
total_correct = 0

for text, label in eval_loader:

    encoded = tokenizer(text, padding=True, truncation=True, max_length=512, return_tensors="pt")
    encoded, label = encoded.to(device), label.to(device)
    outputs = transformer_model(**encoded, labels=label)

    logits = outputs.logits

    pred = torch.argmax(F.softmax(logits), dim=1)
    correct = pred.eq(label)
    total_correct += correct.sum().item()
    total_len += len(label)

print('Test accuracy: ', total_correct / total_len)

  pred = torch.argmax(F.softmax(logits), dim=1)


[Epoch 1/1] Iteration 200 -> Train Loss: 0.6658, Accuracy: 0.596
[Epoch 1/1] Iteration 400 -> Train Loss: 0.5645, Accuracy: 0.709
[Epoch 1/1] Iteration 600 -> Train Loss: 0.5240, Accuracy: 0.738
[Epoch 1/1] Iteration 800 -> Train Loss: 0.4934, Accuracy: 0.773
[Epoch 1/1] Iteration 1000 -> Train Loss: 0.4973, Accuracy: 0.768
[Epoch 1/1] Iteration 1200 -> Train Loss: 0.4693, Accuracy: 0.776
[Epoch 1/1] Iteration 1400 -> Train Loss: 0.4652, Accuracy: 0.784
[Epoch 1/1] Iteration 1600 -> Train Loss: 0.4325, Accuracy: 0.799
[Epoch 1/1] Iteration 1800 -> Train Loss: 0.4642, Accuracy: 0.767
[Epoch 1/1] Iteration 2000 -> Train Loss: 0.4324, Accuracy: 0.796
[Epoch 1/1] Iteration 2200 -> Train Loss: 0.4467, Accuracy: 0.786
[Epoch 1/1] Iteration 2400 -> Train Loss: 0.4226, Accuracy: 0.795
[Epoch 1/1] Iteration 2600 -> Train Loss: 0.4039, Accuracy: 0.816
[Epoch 1/1] Iteration 2800 -> Train Loss: 0.4267, Accuracy: 0.809
[Epoch 1/1] Iteration 3000 -> Train Loss: 0.3999, Accuracy: 0.819
[Epoch 1/1] It

  pred = torch.argmax(F.softmax(logits), dim=1)


Test accuracy:  0.8157815781578158


You can try trained model with your own input

In [24]:
# You can try trained model with your own input
text = 'ㅋㅋ'

encoded = tokenizer(text, padding=True, truncation=True, max_length=512, return_tensors="pt").to(device)
outputs = transformer_model(**encoded)

logits = outputs.logits
soft = F.softmax(logits, dim=1).cpu().detach().numpy()[0]

print("입력:", text)
print("입력이 긍정적일 확률:", soft[1])
print("입력이 부정적일 확률:", soft[0])

입력: ㅋㅋ
입력이 긍정적일 확률: 0.6901508
입력이 부정적일 확률: 0.30984926
