## Pre Step

In [1]:
!nvidia-smi

Fri Feb  3 21:29:02 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA A100-SXM...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   43C    P0    54W / 400W |      0MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [3]:
import torch

print(torch.cuda.is_available())

True


In [4]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification

MODEL_NAME = 'bert-base-uncased'

Tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
Classifier = AutoModelForSequenceClassification.from_pretrained("bert-base-cased", num_labels=2)

tokenizer = Tokenizer
model = Classifier.to("cuda")

Downloading (…)okenizer_config.json:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

Downloading (…)"pytorch_model.bin";:   0%|          | 0.00/436M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at b

## Load Data

In [9]:
import datasets

dataset = datasets.load_from_disk("./data/fnn_s")
dataset

DatasetDict({
    train: Dataset({
        features: ['text', 'label', '__index_level_0__'],
        num_rows: 51200
    })
    validation: Dataset({
        features: ['text', 'label', '__index_level_0__'],
        num_rows: 12800
    })
    test: Dataset({
        features: ['text', 'label', '__index_level_0__'],
        num_rows: 16000
    })
})

In [10]:
from transformers import DataCollatorWithPadding

# tokenizer = Tokenizer.from_pretrained("bert-base-cased")

def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True, max_length=64)

tokenized_datasets = dataset.map(tokenize_function, batched=True)
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

  0%|          | 0/52 [00:00<?, ?ba/s]

  0%|          | 0/13 [00:00<?, ?ba/s]

  0%|          | 0/16 [00:00<?, ?ba/s]

In [11]:
tokenized_datasets['train']

Dataset({
    features: ['text', 'label', '__index_level_0__', 'input_ids', 'token_type_ids', 'attention_mask'],
    num_rows: 51200
})

In [12]:
tokenized_datasets = tokenized_datasets.remove_columns(["text", "__index_level_0__"])
tokenized_datasets = tokenized_datasets.rename_column("label", "labels")
tokenized_datasets.set_format("torch")
tokenized_datasets["train"].column_names

['labels', 'input_ids', 'token_type_ids', 'attention_mask']

## Training

In [23]:
import time

from transformers import TrainingArguments, Trainer

id = time.strftime('%H%M%S')

training_args = TrainingAzrguments(
    output_dir=f"trainer_{id}",
    evaluation_strategy="steps",
    prediction_loss_only=False,
    per_device_train_batch_size=128,
    per_device_eval_batch_size=128,
    gradient_accumulation_steps=1,
    num_train_epochs=10,
    logging_dir=f"log_{id}",
    load_best_model_at_end=True,
    seed=2023
    )

print(id)

using `logging_steps` to initialize `eval_steps` to 500
PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).


In [24]:
import numpy as np
import evaluate

metric = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

In [25]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets['train'],
    eval_dataset=tokenized_datasets['validation'],
    compute_metrics=compute_metrics,
)

In [None]:
trainer.train()


***** Running training *****
  Num examples = 51200
  Num Epochs = 10
  Instantaneous batch size per device = 128
  Total train batch size (w. parallel, distributed & accumulation) = 128
  Gradient Accumulation steps = 1
  Total optimization steps = 4000
  Number of trainable parameters = 108311810


Step,Training Loss,Validation Loss,Accuracy
500,0.1324,0.161074,0.946797
1000,0.0617,0.167385,0.949453
1500,0.031,0.19103,0.955781
2000,0.0197,0.256322,0.954141
2500,0.0133,0.271678,0.955625
3000,0.0081,0.291502,0.955625
3500,0.0057,0.305356,0.957266


***** Running Evaluation *****
  Num examples = 12800
  Batch size = 128
Saving model checkpoint to trainer_213251/checkpoint-500
Configuration saved in trainer_213251/checkpoint-500/config.json
Model weights saved in trainer_213251/checkpoint-500/pytorch_model.bin
***** Running Evaluation *****
  Num examples = 12800
  Batch size = 128
Saving model checkpoint to trainer_213251/checkpoint-1000
Configuration saved in trainer_213251/checkpoint-1000/config.json
Model weights saved in trainer_213251/checkpoint-1000/pytorch_model.bin
***** Running Evaluation *****
  Num examples = 12800
  Batch size = 128
Saving model checkpoint to trainer_213251/checkpoint-1500
Configuration saved in trainer_213251/checkpoint-1500/config.json
Model weights saved in trainer_213251/checkpoint-1500/pytorch_model.bin
***** Running Evaluation *****
  Num examples = 12800
  Batch size = 128
Saving model checkpoint to trainer_213251/checkpoint-2000
Configuration saved in trainer_213251/checkpoint-2000/config.json