For reference:
- https://huggingface.co/docs/transformers/training
- https://huggingface.co/dbmdz/bert-base-german-cased
- https://huggingface.co/bert-base-german-cased

## Train Base BERT (no preprocessing of data)

In [3]:
import torch
from transformers import AutoModelForSequenceClassification

from lib.bert_pytorch.train import train_model_on_full_train_data, train_model_on_train_data

import warnings
warnings.filterwarnings("ignore")

MODEL_NAME = "bert-base-german-cased"
# MODEL_NAME = "bert-base-german-dbmdz-cased"
DATA_PATH = "data/preprocessed_data.csv"
BATCH_SIZE = 32
NUM_EPOCHS = 20
SEED = 42
TRAIN_MODEL_ON_FULL_TRAINING_DATA = False
SAVE_NEW_MODEL = True
USE_PRETRAINED_MODEL = False
DOWNLOAD_WEIGHTS = False

if USE_PRETRAINED_MODEL:
    if DOWNLOAD_WEIGHTS:
        import gdown
        print("Downloading weights")
        id = ""
        output = "pretrained_model/model_state_dict.pt"
        gdown.download(output=output, quiet=False, id=id)

    model = AutoModelForSequenceClassification.from_pretrained(
            MODEL_NAME,
            num_labels = 24,
            output_attentions = False,
            output_hidden_states = False,
        )
    model.load_state_dict(torch.load("pretrained_model/model_state_dict.pt"))

else:
    if TRAIN_MODEL_ON_FULL_TRAINING_DATA:
        model, training_stats = train_model_on_full_train_data(DATA_PATH, MODEL_NAME, BATCH_SIZE, NUM_EPOCHS, SEED)
    else:
        model, training_stats = train_model_on_train_data(DATA_PATH, MODEL_NAME, BATCH_SIZE, NUM_EPOCHS, SEED)
    print("\nTraining results: ", training_stats)

    if SAVE_NEW_MODEL:
        torch.save(model.state_dict(), "pretrained_models/bert_pytorch/model_state_dict.pt")
        torch.save(model, "pretrained_models/bert_pytorch/entire_model.pt")

Some weights of the model checkpoint at bert-base-german-cased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoi

There are 1 GPU(s) available.
We will use the GPU: NVIDIA GeForce RTX 2070 with Max-Q Design
EPOCH 1/20


Avg training loss:    2.3768566325306892
Avg validation loss:  1.8856872618198395
F1 validation score:  {'f1': 0.3846322078800518}

EPOCH 2/20


Avg training loss:    1.6215808428823948
Avg validation loss:  1.510185420513153
F1 validation score:  {'f1': 0.5208472769638829}

EPOCH 3/20


Avg training loss:    1.1745678875595331
Avg validation loss:  1.299536556005478
F1 validation score:  {'f1': 0.5949453043891153}

EPOCH 4/20


Avg training loss:    0.8900032313540578
Avg validation loss:  1.2293509244918823
F1 validation score:  {'f1': 0.6178768983628741}

EPOCH 5/20


Avg training loss:    0.7025119243189692
Avg validation loss:  1.1901220679283142
F1 validation score:  {'f1': 0.6605654031023939}

EPOCH 6/20


Avg training loss:    0.5519289677031338
Avg validation loss:  1.1641127169132233
F1 validation score:  {'f1': 0.6586609754751348}

EPOCH 7/20


Avg training loss:    0.44

In [7]:
import pandas as pd

tmp = pd.read_csv("results/results.csv")
tmp = tmp.append({"model_name": "BERT base baseline","parameters": "default","dataset": "raw without duplicates","accuracy": None ,"f1": 0.7372518337120106}, ignore_index=True)
tmp.to_csv("results/results.csv", index=False)
tmp.head(10)

Unnamed: 0,model_name,parameters,dataset,accuracy,f1
0,BERT base baseline,default,raw without duplicates,,0.737252


## Check system hardware

In [1]:
from lib.bert_pytorch.helper_functions import get_cuda_info, get_torch_info

get_cuda_info()

NVIDIA Graphics Card Driver:  Wed Feb  2 03:59:08 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 496.76       Driver Version: 496.76       CUDA Version: 11.5     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA GeForce ... WDDM  | 00000000:01:00.0  On |                  N/A |
| N/A   51C    P8     6W /  N/A |    911MiB /  8192MiB |     25%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                    
CUDA version:  nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 20

In [2]:
get_torch_info()

Name: torch
Version: 1.7.1+cu110
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: c:\users\lisandro\desktop\projects\case-lisandro\venv\lib\site-packages
Requires: typing-extensions, numpy
Required-by: torchvision, torchaudio 



In [3]:
from lib.bert_pytorch.helper_functions import get_device

device = get_device()

There are 1 GPU(s) available.
We will use the GPU: NVIDIA GeForce RTX 2070 with Max-Q Design
