# SEP532 인공지능 이론과 실제
## Deep Learning Practice 
#### Prof. Ho-Jin Choi
#### School of Computing, KAIST

---

## Advanced Models
### BERT

BERT and other Transformer encoder architectures have been wildly successful on a variety of tasks in NLP (natural language processing). They compute vector-space representations of natural language that are suitable for use in deep learning models. The BERT family of models uses the Transformer encoder architecture to process each token of input text in the full context of all tokens before and after, hence the name: Bidirectional Encoder Representations from Transformers.

BERT models are usually pre-trained on a large corpus of text, then fine-tuned for specific tasks.

![BERT model](images/bert.png)

#### Masked Language Modeling
Masked Language Modeling is a fill-in-the-blank task, where a model uses the context words surrounding a mask token to try to predict what the masked word should be. Masked language modeling is a great way to train a language model in a self-supervised setting (without human-annotated labels). 

![Maksed language model](images/masked-language-model.png)

### Setup
#### Hugginface Transformers
In this notebook, we will use 🤗 Transformers which provides a lot of Transformer architectures and their pre-trained weights.

> 🤗 Transformers provides APIs to easily download and train state-of-the-art pretrained models. 
> Using pretrained models can reduce your compute costs, carbon footprint, and save you time from training a model from scratch. 
> The models can be used across different modalities such as:
> - 📝 Text: text classification, information extraction, question answering, summarization, translation, and text generation in over 100 languages.
> - 🖼️ Images: image classification, object detection, and segmentation.
> - 🗣️ Audio: speech recognition and audio classification.
> - 🐙 Multimodal: table question answering, optical character recognition, information extraction from scanned documents, video classification, and visual question answering.

All models currently supported by HuggingFace can be found at [this link](https://huggingface.co/docs/transformers/en/index#supported-models).

In [None]:
!pip install \
    transformers \
    datasets \
    sentencepiece \
    "git+https://github.com/SKTBrain/KoBERT.git#egg=kobert_tokenizer&subdirectory=kobert_hf"

### Sentiment analysis
This notebook trains a sentiment analysis model to classify movie reviews as positive or negative, based on the text of the review.

We will use the [Naver sentiment movie corpus](https://github.com/e9t/nsmc) that contains the text of 200,000 movie reviews.

### Download the NSMC dataset
Let's download and extract the dataset. Thanks to 🤗 datasets, we can access the NSMC dataset by just calling the function `load_dataset`.

In [None]:
from datasets import load_dataset

raw_datasets = 

Each item in the NSMC dataset consists of 
- `id`: The review id, provieded by Naver
- `document`: The actual review
- `label`: The sentiment class of the review. (`0`: negative, `1`: positive)

In [None]:
raw_datasets

In [None]:
raw_datasets['train'][0]

### Loading pre-trained models
BERT is used as a way to fine-tune pre-trained models to sub-tasks that we are interested in. In this notebook, we use KoBERT which is trained on Korean corpus by SKT

In [None]:
from kobert_tokenizer import KoBERTTokenizer
from transformers import BertForSequenceClassification

tokenizer = 
model = 

### Preprocessing dataset
Text inputs need to be transformed to numeric token ids and arranged in several Tensors before being input to BERT. To do that, we will use the `tokenizer` that comes with the BERT model. To process our dataset in one step, use 🤗 Datasets `map` method to apply a preprocessing function over the entire dataset:

In [None]:
def tokenize(examples):
    pass

datasets = 

In [None]:
datasets['train'][0]['input_ids'][:64]

### Train the model
Similar to `TensorFlow`'s `compile()` and `fit()`, 🤗 Transformers provides a [`Trainer`](https://huggingface.co/docs/transformers/en/main_classes/trainer#transformers.Trainer) class to train the model. All behavior of the `Trainer` class can be adjusted with `TrainingArguments`.

In [None]:
from transformers import TrainingArguments

training_arguments = TrainingArguments()

#### Metrics
`Trainer` does not automatically evaluate model performance during training. We will need to pass `Trainer` a function to compute and report metrics. The 🤗 Datasets library provides a simple accuracy function you can load with the `load_metric` function:

In [None]:
import numpy as np
from datasets import load_metric

metric_accuracy = 

def compute_metrics(logits_and_labels):
    pass

#### Trainer
Create a `Trainer` object with your model, training arguments, training and test datasets, and evaluation function:

In [None]:
from transformers import Trainer

trainer = Trainer()

Then fine-tune your model by calling `train()`:

In [None]:
trainer.train()

### Evaluate the model

In [None]:
trainer.evaluate()

In [None]:
import torch

for example in datasets['test'].shuffle().select(range(8)):
    input_ids = torch.as_tensor([example['input_ids']]).to('cuda')
    attention_mask = torch.as_tensor([example['attention_mask']]).to('cuda')
    
    output = model(input_ids=input_ids, attention_mask=attention_mask)
    print('Text:', example['document'])
    print('Predicted:', torch.argmax(output.logits).cpu().numpy())
    print('Acutal:', example['label'])
    print()