For reference:
- https://huggingface.co/docs/transformers/training
- https://huggingface.co/dbmdz/bert-base-german-cased
- https://huggingface.co/bert-base-german-cased

## Train Base BERT (no preprocessing of data)

In [1]:
import torch

from lib.bert_pytorch.train import train_model_on_full_train_data, train_model_on_train_data

import warnings
warnings.filterwarnings("ignore")

MODEL_NAME = "bert-base-german-cased"
# MODEL_NAME = "bert-base-german-dbmdz-cased"
DATA_PATH = "data/selected_data.csv"
BATCH_SIZE = 32
NUM_EPOCHS = 12
SEED = 42
TRAIN_MODEL_ON_FULL_TRAINING_DATA = True
SAVE_NEW_MODEL = True
DOWNLOAD_WEIGHTS = False


if TRAIN_MODEL_ON_FULL_TRAINING_DATA:
    model, training_stats = train_model_on_full_train_data(DATA_PATH, MODEL_NAME, BATCH_SIZE, NUM_EPOCHS, SEED)
else:
    model, training_stats = train_model_on_train_data(DATA_PATH, MODEL_NAME, BATCH_SIZE, NUM_EPOCHS, SEED)
# print("\nTraining results: ", training_stats)

if SAVE_NEW_MODEL:
    torch.save(model.state_dict(), "pretrained_models/bert_pytorch/model_state_dict.pt")
    torch.save(model, "pretrained_models/bert_pytorch/entire_model.pt")

Some weights of the model checkpoint at bert-base-german-cased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoi

There are 1 GPU(s) available.
We will use the GPU: NVIDIA GeForce RTX 2070 with Max-Q Design
EPOCH 1/12


Avg training loss:    2.5934352493948407
EPOCH 2/12


Avg training loss:    2.101401318278578
EPOCH 3/12


Avg training loss:    1.8058103372653325
EPOCH 4/12


Avg training loss:    1.491440539351768
EPOCH 5/12


Avg training loss:    1.2814586069434881
EPOCH 6/12


Avg training loss:    1.0629782736715343
EPOCH 7/12


Avg training loss:    0.8888123526444865
EPOCH 8/12


Avg training loss:    0.7744647389174335
EPOCH 9/12


Avg training loss:    0.6937694469363325
EPOCH 10/12


Avg training loss:    0.6170416643015213
EPOCH 11/12


Avg training loss:    0.564439774180452
EPOCH 12/12


Avg training loss:    0.5379270181422018


In [2]:
import pandas as pd

tmp = pd.read_csv("results/results.csv")
tmp = tmp.append(   {
                    "model_name": "BERT base baseline",
                    "parameters": "default, 12 epochs",
                    "dataset": "raw without duplicates",
                    "macro_f1": training_stats[-1]["validation_macro_f1_score"]["f1"],
                    "weighted_f1": training_stats[-1]["validation_weighted_f1_score"]["f1"]
                    },
                    ignore_index=True)

tmp.to_csv("results/results.csv", index=False)
tmp.head(10)

Unnamed: 0,model_name,parameters,dataset,macro_f1,weighted_f1
0,BERT base baseline,"default, 12 epochs",raw without duplicates,0.402406,0.714564


## Check system hardware

In [1]:
from lib.bert_pytorch.helper_functions import get_cuda_info, get_torch_info

get_cuda_info()

NVIDIA Graphics Card Driver:  Wed Feb  2 03:59:08 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 496.76       Driver Version: 496.76       CUDA Version: 11.5     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA GeForce ... WDDM  | 00000000:01:00.0  On |                  N/A |
| N/A   51C    P8     6W /  N/A |    911MiB /  8192MiB |     25%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                    
CUDA version:  nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 20

In [2]:
get_torch_info()

Name: torch
Version: 1.7.1+cu110
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: c:\users\lisandro\desktop\projects\case-lisandro\venv\lib\site-packages
Requires: typing-extensions, numpy
Required-by: torchvision, torchaudio 



In [3]:
from lib.bert_pytorch.helper_functions import get_device

device = get_device()

There are 1 GPU(s) available.
We will use the GPU: NVIDIA GeForce RTX 2070 with Max-Q Design
