# RoBERTa

Supported by [huggingface/transformers](https://github.com/huggingface/transformers), PyTorch version.

In [1]:
from google.colab import drive
drive.mount('/content/drive')

PROJ_DIR = "drive/MyDrive/CS4248 Project/"

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [2]:
%%capture
!pip install transformers evaluate

In [3]:
import pandas as pd
from datasets import Dataset, DatasetDict
from sklearn.model_selection import train_test_split

fulltrain = pd.read_csv(PROJ_DIR + 'raw_data/fulltrain.csv', names = ['label', 'text'])
fulltrain['label'] = fulltrain['label'] - 1
# fulltrain = fulltrain.iloc[:1000,:]  # TODO

train, valid = train_test_split(fulltrain, test_size=0.2, shuffle=True)

data = DatasetDict()
data['train'] = Dataset.from_pandas(train)
data['valid'] = Dataset.from_pandas(valid)
data['train'][0]

{'label': 3,
 'text': "International: AFGHAN-FLYNN -- KABUL -- Maj. Gen. Michael T. Flynn is trying to change the way Western forces operate in Afghanistan. 1,370 words, by Julian E. Barnes (Times). One photo. With AFGHAN-DECIDE, AFGHAN-POLICY National: MINNEAPOLIS-SOMALIS -- MINNEAPOLIS -- Little Mogadishu is the home of the largest concentration of Somali refugees in the U.S. and is also the center of the largest militant operation uncovered by federal authorities since the terrorist attacks of Sept. 11, 2001. 1,090 words, by Bob Drogin (Times). POLLIN-OBIT -- WASHINGTON -- Abe Pollin, who brought professional basketball and hockey franchises to Washington and spent $220 million of his own money to build a massive sports and entertainment arena that dramatically changed the city's downtown, dies at the age of 85. 1,880 words, by Peter Perl (Post). With POLLIN-OBIT-TIMELINE. TURKEY-PARDON -- WASHINGTON -- The turkeys who will get a presidential pardon enjoy their 15 minutes of fame at

In [4]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained('roberta-base')
data_tok = data.map(
    lambda x: tokenizer(x['text'], padding="max_length", truncation=True),
    batched=True,
    batch_size=128
)
# data_tok['train'][0]

train_data = data_tok["train"].shuffle(seed=123)
valid_data = data_tok["valid"].shuffle(seed=123)

Map:   0%|          | 0/39083 [00:00<?, ? examples/s]

Map:   0%|          | 0/9771 [00:00<?, ? examples/s]

In [5]:
from transformers import (AutoModelForSequenceClassification,
                          TrainingArguments,
                          Trainer)
import evaluate
import numpy as np

model = AutoModelForSequenceClassification.from_pretrained("roberta-base", num_labels=4)
training_args = TrainingArguments(output_dir="checkpoints", evaluation_strategy="epoch")
metric = evaluate.load("accuracy")

def compute_metrics(pred):
    logits, labels = pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_data,
    eval_dataset=valid_data,
    compute_metrics=compute_metrics,
)

Some weights of the model checkpoint at roberta-base were not used when initializing RobertaForSequenceClassification: ['lm_head.dense.bias', 'roberta.pooler.dense.bias', 'roberta.pooler.dense.weight', 'lm_head.layer_norm.weight', 'lm_head.decoder.weight', 'lm_head.dense.weight', 'lm_head.bias', 'lm_head.layer_norm.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.

In [6]:
trainer.train()



Epoch,Training Loss,Validation Loss,Accuracy
1,0.0508,0.062261,0.990073
2,0.0334,0.037777,0.995292
3,0.0123,0.017572,0.997851


TrainOutput(global_step=14658, training_loss=0.05024330057720959, metrics={'train_runtime': 12193.7056, 'train_samples_per_second': 9.616, 'train_steps_per_second': 1.202, 'total_flos': 3.0850062100475904e+16, 'train_loss': 0.05024330057720959, 'epoch': 3.0})

In [7]:
trainer.save_model(PROJ_DIR+"model/")