# Deep Learning - Exercise 9

This lecture is focused on the transformer models using HuggingFace wrapper for Tensorflow 2

The lecture is based on [official Huggingface tutorials](https://huggingface.co/transformers/v4.2.2/notebooks.html)

[Open in Google colab](https://colab.research.google.com/github/rasvob/VSB-FEI-Deep-Learning-Exercises/blob/main/dl_09.ipynb)
[Download from Github](https://github.com/rasvob/VSB-FEI-Deep-Learning-Exercises/blob/main/dl_09.ipynb)

##### Remember to set **GPU** runtime in Colab!

In [None]:
! pip install transformers datasets huggingface_hub evaluate

In [1]:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np 
import pandas as pd
import seaborn as sns
import tensorflow as tf
import tensorflow.keras as keras
from tensorflow import string as tf_string
from tensorflow.keras.layers import TextVectorization
from tensorflow.keras.layers import LSTM, GRU, Bidirectional

from sklearn.model_selection import train_test_split # 
from sklearn.metrics import accuracy_score, f1_score, confusion_matrix, classification_report
from sklearn.preprocessing import normalize
from scipy.spatial.distance import cosine
from sklearn.metrics.pairwise import cosine_distances
import scipy
import itertools
import string
import re
import tqdm
import io

tf.version.VERSION

2024-04-26 14:04:38.001204: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-26 14:04:38.001240: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-26 14:04:38.002006: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-04-26 14:04:38.006278: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


'2.15.0'

In [2]:
import transformers
from datasets import load_dataset, Dataset
from evaluate import load
from transformers import AutoTokenizer, TFAutoModelForSequenceClassification, create_optimizer, AutoModelForSequenceClassification
from transformers.keras_callbacks import KerasMetricCallback

print(transformers.__version__)

4.40.1


In [3]:
SEED = 13

# 📒 What is the main idea behind transformer models? 

## The good news is that you already know most of the things from the Attention-focused lecture 🙂

* 💡 The main idea behind the transformer architecture is to use **self-attention mechanisms** to capture the relationships between different words in a sentence
* Self-attention allows the model to focus on different parts of the input sequence when processing each word in the sequence
    * This allows the model to take into account the context and dependencies between different words in the sequence, which is important for many NLP tasks

![att](https://github.com/rasvob/VSB-FEI-Deep-Learning-Exercises/blob/main/images/dl_008_meme_02.png?raw=true)


## 🔎 Is there any difference when you compare it to the RNN model? 🔎
* The main difference between the transformer architecture and recurrent neural networks (RNNs) is the way they handle sequential data
* RNNs process sequential data one element at a time, using hidden states to capture information about the previous elements in the sequence
    * In contrast, the transformer architecture processes the entire sequence at once, using self-attention mechanisms to capture dependencies between different elements in the sequence
* 💡 The transformer architecture is **easier parallelizable**. 
    * 📌 The transformer architecture processes the entire sequence at once, it can be trained more efficiently on parallel hardware like GPUs


# ⚡ We will use the BERT model for sample classification task from the GLUE Benchmark
* We will test the model on CoLA dataset which is meant for classification as we need to label every sencente if it is grammatically correct or not

### 💡 You can use any of these datasets in this notebook for your experiments

* [CoLA](https://nyu-mll.github.io/CoLA/) (Corpus of Linguistic Acceptability) Determine if a sentence is grammatically correct or not.is a  dataset containing sentences labeled grammatically correct or not.
* [MNLI](https://arxiv.org/abs/1704.05426) (Multi-Genre Natural Language Inference) Determine if a sentence entails, contradicts or is unrelated to a given hypothesis. (This dataset has two versions, one with the validation and test set coming from the same distribution, another called mismatched where the validation and test use out-of-domain data.)
* [MRPC](https://www.microsoft.com/en-us/download/details.aspx?id=52398) (Microsoft Research Paraphrase Corpus) Determine if two sentences are paraphrases from one another or not.
* [QNLI](https://rajpurkar.github.io/SQuAD-explorer/) (Question-answering Natural Language Inference) Determine if the answer to a question is in the second sentence or not. (This dataset is built from the SQuAD dataset.)
* [QQP](https://data.quora.com/First-Quora-Dataset-Release-Question-Pairs) (Quora Question Pairs2) Determine if two questions are semantically equivalent or not.
* [RTE](https://aclweb.org/aclwiki/Recognizing_Textual_Entailment) (Recognizing Textual Entailment) Determine if a sentence entails a given hypothesis or not.
* [SST-2](https://nlp.stanford.edu/sentiment/index.html) (Stanford Sentiment Treebank) Determine if the sentence has a positive or negative sentiment.
* [STS-B](http://ixa2.si.ehu.es/stswiki/index.php/STSbenchmark) (Semantic Textual Similarity Benchmark) Determine the similarity of two sentences with a score from 1 to 5.
* [WNLI](https://cs.nyu.edu/faculty/davise/papers/WinogradSchemas/WS.html) (Winograd Natural Language Inference) Determine if a sentence with an anonymous pronoun and a sentence with this pronoun replaced are entailed or not. (This dataset is built from the Winograd Schema Challenge dataset.)

## You just need to select any task from the list below
* 💡 The **batch_size** should be set according to your GPU memory

###  We will use `distilbert-base-uncased` model
* 💡 The model is primarily aimed at being fine-tuned on tasks that use the whole sentence to make decisions, such as sequence classification
    * This model is uncased: it does not make a difference between english and English
* DistilBERT is a transformers model, smaller and faster than BERT, which was pretrained on the same corpus in a self-supervised fashion, using the BERT base model as a teacher
    * This means it was pretrained on the raw texts only, with no humans labelling them in any way with an automatic process to generate inputs and labels from those texts using the BERT base model
    * 💡 It "mimics" the original BERT outputs using a smaller, less demanding, model
* 📌 You can check https://huggingface.co/distilbert/distilbert-base-uncased for more details


In [4]:
GLUE_TASKS = [
    "cola",
    "mnli",
    "mrpc",
    "qnli",
    "qqp",
    "rte",
    "sst2",
    "stsb",
    "wnli",
]

task = "cola"
model_checkpoint = "distilbert-base-uncased"
batch_size = 16

### We will use the `datasets` library to download the data and the `evaluate` library to get the metric we need to use for evaluation (to compare our model to the benchmark)
* This can be easily done with the `load_dataset` function from datasets and and the `load` function from evaluate

In [5]:
dataset = load_dataset("glue", task)
metric = load("glue", task)

Downloading readme:   0%|          | 0.00/35.3k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/251k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/37.6k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/37.7k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/8551 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/1043 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1063 [00:00<?, ? examples/s]

Downloading builder script:   0%|          | 0.00/5.75k [00:00<?, ?B/s]

* The `dataset` object itself is [DatasetDict](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasetdict), which contains one key for the training, validation and test set

In [6]:
dataset

DatasetDict({
    train: Dataset({
        features: ['sentence', 'label', 'idx'],
        num_rows: 8551
    })
    validation: Dataset({
        features: ['sentence', 'label', 'idx'],
        num_rows: 1043
    })
    test: Dataset({
        features: ['sentence', 'label', 'idx'],
        num_rows: 1063
    })
})

## We should always take a look at the example data

In [7]:
dataset["train"][0]

{'sentence': "Our friends won't buy this analysis, let alone the next one we propose.",
 'label': 1,
 'idx': 0}

In [8]:
dataset["train"][:5]

{'sentence': ["Our friends won't buy this analysis, let alone the next one we propose.",
  "One more pseudo generalization and I'm giving up.",
  "One more pseudo generalization or I'm giving up.",
  'The more we study verbs, the crazier they get.',
  'Day by day the facts are getting murkier.'],
 'label': [1, 1, 1, 1, 1],
 'idx': [0, 1, 2, 3, 4]}

In [9]:
dataset["test"][0]

{'sentence': 'Bill whistled past the house.', 'label': -1, 'idx': 0}

In [10]:
dataset["validation"][0]

{'sentence': 'The sailors rode the breeze clear of the rocks.',
 'label': 1,
 'idx': 0}

### The `metric` is an instance of [datasets.Metric](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Metric)
* 💡 It simplify the process of model evaluation so we don't have to use raw scikit-learn functions 

In [11]:
metric

EvaluationModule(name: "glue", module_type: "metric", features: {'predictions': Value(dtype='int64', id=None), 'references': Value(dtype='int64', id=None)}, usage: """
Compute GLUE evaluation metric associated to each GLUE dataset.
Args:
    predictions: list of predictions to score.
        Each translation should be tokenized into a list of tokens.
    references: list of lists of references for each translation.
        Each reference should be tokenized into a list of tokens.
Returns: depending on the GLUE subset, one or several of:
    "accuracy": Accuracy
    "f1": F1 score
    "pearson": Pearson Correlation
    "spearmanr": Spearman Correlation
    "matthews_correlation": Matthew Correlation
Examples:

    >>> glue_metric = evaluate.load('glue', 'sst2')  # 'sst2' or any of ["mnli", "mnli_mismatched", "mnli_matched", "qnli", "rte", "wnli", "hans"]
    >>> references = [0, 1]
    >>> predictions = [0, 1]
    >>> results = glue_metric.compute(predictions=predictions, references=ref

##  You can call its compute method with your predictions and labels directly and it will return a dictionary with the metric(s) value
* 💡The metric is chosen by the task name we specified so we use the right metric for the benchmark

In [12]:
fake_preds = np.random.randint(0, 2, size=(64,))
fake_labels = np.random.randint(0, 2, size=(64,))
metric.compute(predictions=fake_preds, references=fake_labels)

{'matthews_correlation': -0.1252448582170299}

## Preprocessing the data
* Before we can feed those texts to our model, we need to preprocess them. This is done by a Transformers `Tokenizer` which will (as the name indicates) tokenize the inputs (including converting the tokens to their corresponding IDs in the pretrained vocabulary) and put it in a format the model expects, as well as generate the other inputs that model requires

* 💡 To do all of this, we instantiate our tokenizer with the `AutoTokenizer.from_pretrained` method, which will ensure:
    * We get a tokenizer that corresponds to the model architecture we want to use
    * We download the vocabulary used when pretraining this specific checkpoint

* That vocabulary will be cached, so it's not downloaded again the next time we run the cell

In [13]:
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

## Nothing new here - just a regular word2id mapping 🤗

In [14]:
tokenizer("Hello, this is a sentence!", "And this sentence goes with it.")

{'input_ids': [101, 7592, 1010, 2023, 2003, 1037, 6251, 999, 102, 1998, 2023, 6251, 3632, 2007, 2009, 1012, 102], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}

### To preprocess our dataset, we will  need the names of the columns containing the sentence(s)
* The following dictionary keeps track of the correspondence task to column names
    * 💡 Do you remember that sentence, label, idx dict?

In [15]:
task_to_keys = {
    "cola": ("sentence", None),
    "mnli": ("premise", "hypothesis"),
    "mnli-mm": ("premise", "hypothesis"),
    "mrpc": ("sentence1", "sentence2"),
    "qnli": ("question", "sentence"),
    "qqp": ("question1", "question2"),
    "rte": ("sentence1", "sentence2"),
    "sst2": ("sentence", None),
    "stsb": ("sentence1", "sentence2"),
    "wnli": ("sentence1", "sentence2"),
}

sentence1_key, sentence2_key = task_to_keys[task]
if sentence2_key is None:
    print(f"Sentence: {dataset['train'][0][sentence1_key]}")
else:
    print(f"Sentence 1: {dataset['train'][0][sentence1_key]}")
    print(f"Sentence 2: {dataset['train'][0][sentence2_key]}")

Sentence: Our friends won't buy this analysis, let alone the next one we propose.


### We just feed them to the tokenizer with the arguments `truncation=True` and `padding='longest'`
* 💡 This will ensure that an input longer that what the model selected can handle will be truncated to the maximum length accepted by the model, and all inputs will be padded to the maximum input length to give us a single input array

In [16]:
def preprocess_function(examples):
    if sentence2_key is None:
        return tokenizer(examples[sentence1_key], truncation=True)
    return tokenizer(examples[sentence1_key], examples[sentence2_key], truncation=True)

## Using this code we can tokenize the sentences in our dataset
* To apply this function on all the sentences in our dataset, we just use the map method of our dataset object we created earlier
* 💡 This will apply the function on all the elements of all the splits in dataset, so our training, validation and testing data will be preprocessed in one single command

In [18]:
preprocess_function(dataset["train"][:5])

{'input_ids': [[101, 2256, 2814, 2180, 1005, 1056, 4965, 2023, 4106, 1010, 2292, 2894, 1996, 2279, 2028, 2057, 16599, 1012, 102], [101, 2028, 2062, 18404, 2236, 3989, 1998, 1045, 1005, 1049, 3228, 2039, 1012, 102], [101, 2028, 2062, 18404, 2236, 3989, 2030, 1045, 1005, 1049, 3228, 2039, 1012, 102], [101, 1996, 2062, 2057, 2817, 16025, 1010, 1996, 13675, 16103, 2121, 2027, 2131, 1012, 102], [101, 2154, 2011, 2154, 1996, 8866, 2024, 2893, 14163, 8024, 3771, 1012, 102]], 'attention_mask': [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]}

In [19]:
pre_tokenizer_columns = set(dataset["train"].features)
encoded_dataset = dataset.map(preprocess_function, batched=True)
tokenizer_columns = list(set(encoded_dataset["train"].features) - pre_tokenizer_columns)
print("Columns added by tokenizer:", tokenizer_columns)

Map:   0%|          | 0/8551 [00:00<?, ? examples/s]

Map:   0%|          | 0/1043 [00:00<?, ? examples/s]

Map:   0%|          | 0/1063 [00:00<?, ? examples/s]

Columns added by tokenizer: ['input_ids', 'attention_mask']


## 🚀 Fine-tuning the model
* Now that our data is ready, we can download the pretrained model and fine-tune it
    * Since all our tasks are about sentence classification, we use the `TFAutoModelForSequenceClassification` class
* 💡 The only thing we have to specify is the number of labels for our dataset

In [20]:
num_labels = 3 if task.startswith("mnli") else 1 if task == "stsb" else 2
if task == "stsb":
    num_labels = 1
elif task.startswith("mnli"):
    num_labels = 3
else:
    num_labels = 2
    
# This next little bit is optional, but will give us cleaner label outputs later
# If you're using a task other than CoLA, you will probably need to change these
# to match the label names for your task!
id2label = {0: "Invalid", 1: "Valid"}
label2id = {val: key for key, val in id2label.items()}

model = TFAutoModelForSequenceClassification.from_pretrained(
    model_checkpoint, num_labels=num_labels, id2label=id2label, label2id=label2id
)

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

2024-04-26 14:21:22.576205: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-04-26 14:21:22.576485: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-04-26 14:21:22.576721: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-

## ⚡ One of the last steps is to create a TF datasets which will feed the data into the model

In [21]:
validation_key = (
    "validation_mismatched"
    if task == "mnli-mm"
    else "validation_matched"
    if task == "mnli"
    else "validation"
)

tf_train_dataset = model.prepare_tf_dataset(
    encoded_dataset["train"],
    shuffle=True,
    batch_size=16,
    tokenizer=tokenizer
)

tf_validation_dataset = model.prepare_tf_dataset(
    encoded_dataset[validation_key],
    shuffle=False,
    batch_size=16,
    tokenizer=tokenizer,
)

## Compile the model and specify the optimizer

In [23]:
num_epochs = 3
batches_per_epoch = len(encoded_dataset["train"]) // batch_size
total_train_steps = int(batches_per_epoch * num_epochs)

optimizer, schedule = create_optimizer(
    init_lr=2e-5, num_warmup_steps=0, num_train_steps=total_train_steps
)
model.compile(optimizer=optimizer)
model.summary()

Model: "tf_distil_bert_for_sequence_classification"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 distilbert (TFDistilBertMa  multiple                  66362880  
 inLayer)                                                        
                                                                 
 pre_classifier (Dense)      multiple                  590592    
                                                                 
 classifier (Dense)          multiple                  1538      
                                                                 
 dropout_19 (Dropout)        multiple                  0 (unused)
                                                                 
Total params: 66955010 (255.41 MB)
Trainable params: 66955010 (255.41 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


## The last thing we need to define is how to compute the metrics from the predictions 
* We need to define a function for this, which will just use the metric we loaded earlier
    * 💡 The only preprocessing we have to do is to take the argmax of our predicted logits

* In addition, let's wrap this metric computation function in a `KerasMetricCallback`. 
    * 💡 This callback will compute the metric on the validation set each epoch, including printing it and logging it for other callbacks like `EarlyStopping`.

In [24]:
def compute_metrics(eval_predictions):
    predictions, labels = eval_predictions
    if task != "stsb":
        predictions = np.argmax(predictions, axis=1)
    else:
        predictions = predictions[:, 0]
    return metric.compute(predictions=predictions, references=labels)


metric_callback = KerasMetricCallback(metric_fn=compute_metrics, eval_dataset=tf_validation_dataset)

## 🚀 We can finally fit the model!
* 💡 Make sure that you pass the TF datasets, and not the original ones! 

In [25]:
callbacks = [metric_callback]

model.fit(
    tf_train_dataset,
    validation_data=tf_validation_dataset,
    epochs=num_epochs,
    callbacks=callbacks,
)

Epoch 1/3


2024-04-26 14:25:29.624828: I external/local_xla/xla/service/service.cc:168] XLA service 0x7ff554007280 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2024-04-26 14:25:29.624885: I external/local_xla/xla/service/service.cc:176]   StreamExecutor device (0): NVIDIA GeForce RTX 3090, Compute Capability 8.6
2024-04-26 14:25:29.624895: I external/local_xla/xla/service/service.cc:176]   StreamExecutor device (1): NVIDIA GeForce RTX 3090, Compute Capability 8.6
2024-04-26 14:25:29.624901: I external/local_xla/xla/service/service.cc:176]   StreamExecutor device (2): NVIDIA GeForce RTX 3090, Compute Capability 8.6
2024-04-26 14:25:29.624907: I external/local_xla/xla/service/service.cc:176]   StreamExecutor device (3): NVIDIA GeForce RTX 3090, Compute Capability 8.6
2024-04-26 14:25:29.630191: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2024-04

Epoch 2/3
Epoch 3/3


<keras.src.callbacks.History at 0x7ff83646bf50>

# Now we can do inference using our own inputs
* Now, let's make up some sentences and see if the model can classify them properly!
* The first sentence is valid English, but the second one makes a grammatical mistake.

In [None]:
sentences = [
    "The judge told the jurors to think carefully.",
    "The judge told that the jurors to think carefully."
]

## To feed them into our model, we'll need to tokenize them and then get our model's predictions

In [None]:
tokenized = tokenizer(sentences, return_tensors="np", padding="longest")

outputs = model(tokenized).logits

classifications = np.argmax(outputs, axis=1)
print(classifications)

In [None]:
classifications = [model.config.id2label[output] for output in classifications]
print(classifications)

## 💡 But how can we utilize such models in more std. task setup - I have data in Pandas DF and what is next?
* Let's do such use-case together

In [None]:
dataset = load_dataset("imdb")

In [None]:
dataset

In [None]:
df_train = pd.DataFrame({'text': dataset['train']['text'], 'labels': dataset['train']['label']})
df_test = pd.DataFrame({'text': dataset['test']['text'], 'labels': dataset['test']['label']})

In [None]:
df_train.head()

In [None]:
df_train.shape, df_test.shape

In [None]:
df_train.labels.value_counts()

## Ok, Pandas seems ready 🙂
* The easies way is to wrap the Pandas dataset in HF Dataset object and proceed with their API

In [None]:
hf_df_train = Dataset.from_pandas(df_train)
hf_df_test = Dataset.from_pandas(df_test)

## We can split the data into train and valid subsets

In [None]:
ds = hf_df_train.train_test_split(test_size=0.2, shuffle=True)

In [None]:
ds

## And add validation set

In [None]:
ds_tst = ds['test']
ds['valid'] = ds_tst
ds['test'] = hf_df_test

In [None]:
ds

In [None]:
ds['train'][0]

In [None]:
ds['valid'][0]

In [None]:
ds['test'][0]

In [None]:
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-cased")

def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)

tokenized_datasets = ds.map(tokenize_function, batched=True)

# ⚠ BEWARE: The label columns must be named as **labels** because the model expects this name!

In [None]:
tokenized_datasets

In [None]:
model = TFAutoModelForSequenceClassification.from_pretrained("distilbert-base-cased", num_labels=2)

In [None]:
batch_size = 16
tf_train_dataset = model.prepare_tf_dataset(
    tokenized_datasets["train"],
    shuffle=True,
    batch_size=batch_size,
    tokenizer=tokenizer
)

tf_test_dataset = model.prepare_tf_dataset(
    tokenized_datasets['test'],
    shuffle=False,
    batch_size=batch_size,
    tokenizer=tokenizer
)

tf_valid_dataset = model.prepare_tf_dataset(
    tokenized_datasets['valid'],
    shuffle=False,
    batch_size=batch_size,
    tokenizer=tokenizer
)

In [None]:
num_epochs = 3
batches_per_epoch = len(tokenized_datasets["train"]) // batch_size
total_train_steps = int(batches_per_epoch * num_epochs)

optimizer, schedule = create_optimizer(
    init_lr=2e-5, num_warmup_steps=0, num_train_steps=total_train_steps
)
model.compile(optimizer=optimizer)

In [None]:
model.fit(
    tf_train_dataset,
    validation_data=tf_valid_dataset,
    epochs=num_epochs,
)

In [None]:
y_test = np.array(ds['test']['labels'])

In [None]:
y_test

In [None]:
y_pred = model.predict(tf_test_dataset)

In [None]:
y_pred.logits.shape

In [None]:
y_pred_f = np.argmax(y_pred.logits, axis=1)

In [None]:
y_pred_f

## Now we can compute an accuracy_score like usually

In [None]:
accuracy_score(y_true=y_test, y_pred=y_pred_f)

![dude](https://github.com/rasvob/VSB-FEI-Deep-Learning-Exercises/blob/main/images/dl_008_meme_01.png?raw=true)