# Genomic sequence classification with deep learning

## Outline:
1. Objectives
2. What is *deep learning*?
3. Why the sequence classification problem is important?
4. Practice
    * Dataset
    * Models
    * Training
    * Evaluation
5. Conclusions

# Objectives

1. Learn the basic theory and practice of deep learning
2. Understand the basic deep learning workflow
3. Deploy a model for genomic sequence classification
    * With helpers
    * Manually

# What is *Deep Learning*?
Deep Learning is a subset of machine learning techniques that uses **artificial neural network**-based models (ANN). What makes it **deep** is the presence of many transformation *layers* within the models. Figure 1 shows a Venn diagram of the organization between artificial intelligence, machine and deep learning. Figure 2 shows a representation of a Multilayer Perceptron (MLP), the most basic architecture in deep learning, which mimics the biological neural conections.

Deep learning models are able to learn from raw data. This is one of the main differences against traditional learning pipelines. With these kind of models you can assemble a learning system that tunes itself automatically rather than fixing and updating each individial component one by one. It replaces some of the labor-intensive processes needed for other methods, like field-specific data preprocessing and/or manual feature extraction. Deep learning models can learn and process these features in an automated fashion, generate accurate predictions, and be fine-tuned for specific applications when an available model exists.

One of the most common and persistent disadvantages of applying deep learning methods in your work is the large amount of data needed to train the model. To capture the features and generalize the phenomena in your studies, a substantial amount of data (sometimes labeled) must be available so the model can fit them and produce accurate predictions. However, with the ever-increasing availability of graphical processing units (GPUs), the massive amounts of data generated in clinical and biological scenarios, and the possibility of fine-tuning existing models, implementing a deep learning architecture for your specific applications is becoming increasingly simple.

<fig>
<img    src="images/ENG_IA_ML_DL.png"
        width=600
        height=600>
<figcaption>Fig. 1: Venn diagram displaying the organization between AI, machine, and deep learning.
</figure>

<fig>
<img    src="images/MLP.png"
        width=600
        height=600>
<figcaption>Fig. 2: Grapphical representation of an MLP.
</figure>

In [28]:
# This cell of code is used to import the necessary libraries for the notebook
import torch
import torch.nn as nn
from torch.utils.data import DataLoader, Dataset
from torchtext.data.utils import get_tokenizer
import torch.optim as optim

from transformers import AutoModelForSequenceClassification, AutoTokenizer, Trainer, TrainingArguments
from tqdm.auto import tqdm

from genomic_benchmarks.data_check import list_datasets, info, is_downloaded
from genomic_benchmarks.loc2seq import download_dataset
from genomic_benchmarks.dataset_getters.pytorch_datasets import HumanEnhancersCohn

We will be working with the **Genomic Benchmarks** datasets, a set of benchmarks for classification of genomic sequences to test models' capabilities.

In the next code cell we list the available datasets in the **Genomic Benchmarks** module:

In [2]:
list_datasets()

['demo_coding_vs_intergenomic_seqs',
 'human_enhancers_ensembl',
 'human_ocr_ensembl',
 'human_enhancers_cohn',
 'human_nontata_promoters',
 'dummy_mouse_enhancers_ensembl',
 'drosophila_enhancers_stark',
 'demo_human_or_worm',
 'human_ensembl_regulatory']

For illustrative pourpuses you will work on the `human_enhancers_cohn` which contains multiple genomic sequences that are or are not enhancers for Cohn disease. In machine learning verbose, you will work on a **binary classification** problem. We can display some information of this dataset as follows:

In [3]:
info("human_enhancers_cohn", version=0)

Dataset `human_enhancers_cohn` has 2 classes: negative, positive.

All lengths of genomic intervals equals 500.

Totally 27791 sequences have been found, 20843 for training and 6948 for testing.


Unnamed: 0,train,test
negative,10422,3474
positive,10421,3474


The `genomic_benchmarks` module offers multiple data handlers and helpers to load, show and give you an idea of how each of its datasets are composed.
In the next code cell we use the `HumanEnhancersCohn` function to download and assign the dataset into two variables, `train_dataset` and `test_dataset`, respectively.

In [4]:
# Load the dataset and split it into training and test sets
train_dataset = HumanEnhancersCohn(split="train", version=0)
test_dataset = HumanEnhancersCohn(split="test", version=0)

Just to make sure we imported the correct dataset we can print the lengths of each set and check if the numbers match the ones shown above.

In [5]:
# Print the lengths of each set check if they match the info we saw previously
print(f"Length of the training dataset: {len(train_dataset)}. Length of the test dataset: {len(test_dataset)}") 

Length of the training dataset: 20843. Length of the test dataset: 6948


But how this data actually look like? You have only downloaded some data from a library for binary classification up to this point. How can you actually see some samples? Turns out to be very easy to do so! Let's see two examples, one for the `positive` and one for the `negative` dianogses, correspondingly.

In [6]:
# Get a sample from train_dataset with a 1 on the second element of the tuple
positive_sample = next(filter(lambda x: x[1] == 1, train_dataset)) # A 1 indicates that the sample is a positive sample
negative_sample = next(filter(lambda x: x[1] == 0, train_dataset)) # A 0 indicates that the sample is a negative sample
print(f"{positive_sample}\n{negative_sample}")

('AGCAGCAGGTCAACATTTTTGCACTCACAAAATAATTTGGAAAAACTATATACCTCTTTCACATTTTTTTTTTTTTGAGATGGAGTCTCACTCTGTCGCCCAGGCTGGAGTGCAGTGGTGCAATCTCGGCTCACTGCAAGCTCTGACTCCTGGATTCATGCCATTCGCCTGCCTCAGCCTCCCGAGTAGCTGGGACTATAGGCGCCCGCCACCATGCCTGGCTAATTTTTTGTATTTTTAGTAGAGACGGGGTTTCCCCGTGTTAGCCAGGACGGTCTCTAGCTCCTGACCTTGCGATCCACCTGCCTCGGCCTCCCAAAGTGCTGGGATTACAGGCATGAGCCACTGCACCAGGCCCTCTTTCACATTTTTAAGTTTTCTGTTATCTATTTCAAAAGGTGTAGTTAACATATTTTAAATATTAACAATTCAAAAATAAAACTATTATAGAATTTTTAAACAGTATCCAGATAAATTTTTATTATTAATTTCATACTCAA', 1)
('CTGATGAAACCCGGCGAGGTGTGGTCTGCCCTGGAGGACAGCAGCCAGTGTGGGGGGCAGTCCCTTCTCTCTGTCCCAAGGGAGGATACAGCTCCACTGTGGTCACTGGCTCTATGTGAGGGGGTGCATGCATCAGAGACAACAGATGAGAGGGCCCTTCAGTTGGCTTTTCTGCCTCCAGTTCTTTCTGTTCATGAGAGGAAAAGCTACTGGTAGACAGAACAATGTTAAATGTAATAAAAATAAGCAAGTTCCCTGGGTTTATGCAGTGCCAAATGTCAAGATGGTTGTATACAGGAGAAGACGTCCAAGACACGTCTTTTCCGAGTGTCCCAGAGCTCAGAACTCTGTGAGCACTTTGAGCTTCCCCAGACCTCTTTCTTCCCTGGGTGTGAGCCCTGCACAGTGCTCCGAAAAGAGCTGGGGTCCGTAAATACGGATGGCAAACAGCTCACCTGGGTTTCTCACATGGATTTGTTTTCTTGGGGGT

Can you tell, without looking at the corresponding labels, which of these sequences is positive for an enhancer of Cohn's disease and which is not?
Well, you can determine this by some other studies like genome-wide chromatin immunoprecipitation or RNA sequencing. But if you saw these sequences alone, would you be able to tell whether it is an enhacer or not? A deep learning model can do it! At the cost of massive amounts of already-labeled data that were obtained with techniques like the previously mentioned ones (that is, what we just downloaded).

You already have downloaded the dataset, but it isn't ready to be used with a neural network yet. PyTorch neural networks expect their inputs to be arranged in a special data structure known as **tensor**s. If you have experience with Numpy's `ndarray`s getting to know tensors will be pretty easy. These are n-dimensional number arrays optimized for gradient calculus and other operations that run in the background when training a neural network. More information about tensors and specifications on [Pytorch's website](https://pytorch.org).

Now, how does a tensor look like? In the following code cells we initialize random tensors and display them just for illustrative purposes.

In [7]:
my_tensor = torch.tensor([1, 2, 3, 4, 5]) # As you can see, it takes a list an input
print(my_tensor)

tensor([1, 2, 3, 4, 5])


Tensor's data type is an important consideration always. Many errors arise when tensor's data types aren't the same bewteen the inputs and the labels. Specifically, this error arises when calculating the **loss**, a value that measures the difference between the model predictions and the ground-truth (your labels).

In [8]:
my_tensor.dtype # Using the dtype attribute, we can see the data type of the tensor

torch.int64

You can manipulate tensors by adding or removing elements, changing their data types, do any mathematical or arithmetical operation with them, etc.

In [9]:
my_tensor = my_tensor.to(torch.float64) # We can change the data type of the tensor using the .float() method
print(my_tensor.dtype)

torch.float64


In [10]:
tensor_mul = my_tensor * my_tensor # We can perform element-wise multiplication on tensors
print(tensor_mul)

tensor([ 1.,  4.,  9., 16., 25.], dtype=torch.float64)


Now that you know you need these sequences in a specific format, the next question is: how do you transform these sequences into representable tensors? The answer lies in the **Natural Language Processing** (NLP) field. NLP is a sub-field in computer science and AI that uses different kinds of algorithms to enable computers to understand human spoken language. Its applications range from text encoding and generation, voice recognition, and speaking systems. Some examples in daily life are chatbots, comand execution through voice activation (Amazon's Alexa), digital assistants (Bixby, Siri, Google Assistant), etc.

In this case we are working on sequence classification, this means each of these sequences have an inherent "grammar" or structure. As in NLP they analyze sentence grammatics and decompose them by encoding each word you're going to do the same. The process of encoding words and turning them into meaningful numerical representations is called **tokenization**.

Nowadays it is really easy to implement and use a tokenizer to convert our raw genomic sequences into representable numbers. Thanks to HuggingFace's `transformers` and `tokenizers` library, you can download and use pretrained neural networks and their corresponding tokenizers.

In [11]:
# instantiate tokenizer
checkpoint = 'LongSafari/hyenadna-tiny-1k-seqlen-hf' # This is the model's name we are going to use
max_length = 1024 # This variable will represent the maximum length of the input sequences

# Load the model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(checkpoint, trust_remote_code=True)

Let's apply this tokenizer to the first sequence to look how the data comes out of it.

In [14]:
enconded_sequence = tokenizer(train_dataset[0][0])
print(enconded_sequence)

{'input_ids': [8, 10, 9, 7, 10, 9, 7, 7, 7, 8, 8, 8, 9, 9, 8, 9, 7, 9, 9, 10, 9, 10, 9, 9, 10, 8, 10, 9, 8, 8, 8, 10, 9, 9, 7, 9, 9, 7, 8, 7, 9, 8, 7, 9, 8, 8, 7, 9, 10, 9, 10, 9, 9, 9, 9, 9, 9, 8, 7, 9, 10, 8, 8, 8, 10, 10, 8, 10, 8, 10, 8, 10, 9, 10, 8, 8, 8, 7, 7, 9, 9, 9, 7, 9, 9, 7, 10, 7, 8, 7, 9, 8, 10, 8, 8, 7, 8, 10, 9, 10, 9, 9, 10, 8, 7, 8, 10, 9, 9, 8, 10, 8, 10, 7, 10, 9, 10, 9, 7, 9, 9, 9, 9, 9, 10, 9, 8, 7, 10, 9, 8, 7, 10, 8, 7, 9, 7, 9, 7, 8, 7, 7, 8, 7, 9, 7, 10, 9, 7, 9, 7, 9, 9, 9, 8, 8, 8, 10, 10, 8, 7, 9, 10, 10, 9, 9, 8, 10, 10, 10, 10, 8, 10, 9, 8, 8, 10, 8, 8, 7, 9, 10, 10, 8, 10, 10, 10, 8, 10, 9, 10, 10, 8, 7, 10, 9, 7, 9, 7, 9, 9, 7, 7, 7, 7, 9, 8, 10, 7, 8, 10, 9, 9, 10, 7, 9, 7, 8, 7, 9, 7, 7, 8, 7, 7, 10, 9, 10, 10, 7, 7, 7, 10, 9, 10, 7, 7, 10, 7, 7, 7, 7, 7, 10, 7, 7, 9, 8, 7, 7, 9, 10, 10, 8, 8, 8, 10, 9, 9, 9, 10, 10, 10, 7, 10, 9, 8, 7, 9, 10, 9, 8, 8, 7, 7, 7, 10, 9, 10, 8, 7, 7, 9, 7, 10, 9, 9, 10, 10, 9, 10, 7, 10, 7, 8, 7, 9, 9, 7, 9, 7, 7, 9, 7,

As you can see, now your sequences is represented as a bunch of numbers. Each of them represents each of the nucleotides that conform the sequences we used as input and the tokenizer adds some special tokens to differentiate from sequence to sequence when passed into the training phase. Let's see how the network will read your sequences in meaningful way for you.

In [15]:
decoded = tokenizer.decode(enconded_sequence['input_ids'])
print(decoded)

CTGATGAAACCCGGCGAGGTGTGGTCTGCCCTGGAGGACAGCAGCCAGTGTGGGGGGCAGTCCCTTCTCTCTGTCCCAAGGGAGGATACAGCTCCACTGTGGTCACTGGCTCTATGTGAGGGGGTGCATGCATCAGAGACAACAGATGAGAGGGCCCTTCAGTTGGCTTTTCTGCCTCCAGTTCTTTCTGTTCATGAGAGGAAAAGCTACTGGTAGACAGAACAATGTTAAATGTAATAAAAATAAGCAAGTTCCCTGGGTTTATGCAGTGCCAAATGTCAAGATGGTTGTATACAGGAGAAGACGTCCAAGACACGTCTTTTCCGAGTGTCCCAGAGCTCAGAACTCTGTGAGCACTTTGAGCTTCCCCAGACCTCTTTCTTCCCTGGGTGTGAGCCCTGCACAGTGCTCCGAAAAGAGCTGGGGTCCGTAAATACGGATGGCAAACAGCTCACCTGGGTTTCTCACATGGATTTGTTTTCTTGGGGGTCTCTGTATGG[SEP]


If you want to obtain the PyTorch tensors directly just add the `return_tensors="pt"` parameter.

In [16]:
encoded_tensors = tokenizer(train_dataset[0][0], return_tensors='pt')
print(encoded_tensors)

{'input_ids': tensor([[ 8, 10,  9,  7, 10,  9,  7,  7,  7,  8,  8,  8,  9,  9,  8,  9,  7,  9,
          9, 10,  9, 10,  9,  9, 10,  8, 10,  9,  8,  8,  8, 10,  9,  9,  7,  9,
          9,  7,  8,  7,  9,  8,  7,  9,  8,  8,  7,  9, 10,  9, 10,  9,  9,  9,
          9,  9,  9,  8,  7,  9, 10,  8,  8,  8, 10, 10,  8, 10,  8, 10,  8, 10,
          9, 10,  8,  8,  8,  7,  7,  9,  9,  9,  7,  9,  9,  7, 10,  7,  8,  7,
          9,  8, 10,  8,  8,  7,  8, 10,  9, 10,  9,  9, 10,  8,  7,  8, 10,  9,
          9,  8, 10,  8, 10,  7, 10,  9, 10,  9,  7,  9,  9,  9,  9,  9, 10,  9,
          8,  7, 10,  9,  8,  7, 10,  8,  7,  9,  7,  9,  7,  8,  7,  7,  8,  7,
          9,  7, 10,  9,  7,  9,  7,  9,  9,  9,  8,  8,  8, 10, 10,  8,  7,  9,
         10, 10,  9,  9,  8, 10, 10, 10, 10,  8, 10,  9,  8,  8, 10,  8,  8,  7,
          9, 10, 10,  8, 10, 10, 10,  8, 10,  9, 10, 10,  8,  7, 10,  9,  7,  9,
          7,  9,  9,  7,  7,  7,  7,  9,  8, 10,  7,  8, 10,  9,  9, 10,  7,  9,
          7,  

The output's data structure is a dictionary containing the `input_ids` key which contains the actual tensors you're going to use to train your model.

The `genomic_benchmarks` datasets contain their labels already in a tensor format. If we load these dataset in another pytorch-specific data structure called `DataLoader` we can display mini-batches of samples from the whole dataset. This looks as follows:

In [50]:
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

In [51]:
for batch, (x, y) in enumerate(train_loader):
    print(f"Batch number: {batch}\nInputs: {x}\nLabels: {y}")
    break

Batch number: 0
Inputs: ('TGAAGTAGAGGGACCTATGAAATCATTGGTTGCAAAAAATGGAGGTAAGCTAGTAGCCAAAGTATCTTATAGAATTAATACTATAGGCACTTTGTAAAATATTTATTTACAGAGTGCACTGAGAAATTAACTGTGAGAATGATTAAGCAATCTGTTTAAGATTATAGAGAGAGACAAAGAGAAAAACAAAAAGTTCTTTGAATTTTTTTTTTTTTTTTTTTTTTTGAGACAGAGTCTCACTGTGTTGCCCAGGCTGGAGTGCAGTGGCACGACCTCGGCTCACTGCAAGCTCTGCCTCCCGGGTTCACACCATTTTCCTGCCTCAGCCTCCCGAGTAGCTGGGACCACAGGCGCCCAGCACCAGGCCAGCTAATTTTTTGTATTTTTAGTAGAGACAGGGTTTCACCGTGTTAGCCAGGATGGTCTCCATCTCCTGAACTTGTGATCCGCCCTCCTCGGCCTCCCAAAGTGCTGGGATTACAGGTGTGAGCCACTCCGCC', 'GCTGGCATTCTAGGAGGTAGATCCACTCGAAAGCAATCAAAGCCACGGAGTGTGTGATTCTGACAGAGGGGGTGGTGGGATGTCAGCGGTGGAGGGCTTCTTATGGCCTAGAAAAGTGAGGGGACTTCTTGCAGAAGTGGGCTTGACCTGGGCTTTTCAGGATGGGAGCAAATTTGAAGGAGGAGAGAGGATACCTCCATGTGCTCACACATTTGCACACACATTTACTCTCGTGCACACACCCAGTCCCTTAAGCCCCCCATCTATAGGTGCTCACAGCCAGCAGTATGCACACACACCTCACACACAGCCAAAGTAATGCACACAAACTCAGAAACACACACACTGGGGCATTCACACTCTTATGAGGCATGCCGATGCTACAACGGTCGCCCACATACACTCTGGGCACCAGAAACATACAGGCCCCTTAGAAATGGGCTTGGGAGAAGATGAAGATGTGTCTGCAA

Let's apply the transformation we just learned to convert the sequences into meaningful tensors:

In [52]:
for batch, (x, y) in enumerate(train_loader):
    x = tokenizer(x, return_tensors='pt', padding=True, truncation=True, max_length=max_length)["input_ids"]
    print(f"Batch number: {batch}\nInputs: {x}\nLabels: {y}")
    break

Batch number: 0
Inputs: tensor([[ 9, 10,  9,  ...,  9,  9,  1],
        [ 8,  7,  9,  ...,  8, 10,  1],
        [ 7,  7, 10,  ...,  8,  7,  1],
        ...,
        [ 7,  9, 10,  ..., 10,  8,  1],
        [ 8,  8,  7,  ..., 10,  9,  1],
        [ 8, 10,  9,  ...,  7,  7,  1]])
Labels: tensor([1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0,
        0, 1, 0, 1, 0, 0, 0, 0])


Excellent! Now you can see the relationship between the sequences and labels contained in the dataset. Let's now load the model you're going to train! It's really important to keep in mind that if you're going to use a pretrained model (as you are doing right now), you have to use the same `checkpoint` for both the tokenizer and the model. Doing so you will give the inputs to the model as it expects and avoid lots of error messages.

In [41]:
# Load the model using the same checkpoint as the tokenizer
# The num_labels parameter is set to 2 because we have two classes in the dataset (positive and negative)
model = AutoModelForSequenceClassification.from_pretrained(checkpoint, trust_remote_code=True, num_labels=2)


Some weights of HyenaDNAForSequenceClassification were not initialized from the model checkpoint at LongSafari/hyenadna-tiny-1k-seqlen-hf and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


You are almost there to train your model! You're a few parameters and functions away of training the model. Let's finish the set up to start the training process!

To start its training a deep learning model needs:
* a `device`: where the math operations will occur. It's advised to use a GPU for accelerated training
* `loss function`: this tells you how much difference is there between the model's precictions and the ground-truth labels in the dataset
* a number of `epochs`: the amount of times the model will "see" the samples within the dataset
* an `optimizer`: a way of calculating and update the model's parameters to fit better the data after each operation
* a `training` and `test` loops: to declare how the information will pass through the network and in which moment to update the parameters

Let's declare them in the following code cell:

In [54]:
# Define the optimizer and the loss function
loss_fn = nn.CrossEntropyLoss()

# create optimizer and define its parameters
learning_rate = 1e-5
weight_decay = 0.1
optimizer = optim.AdamW(model.parameters(), lr=learning_rate, weight_decay=weight_decay)

# Define a device
device = torch.device("cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu")
print(f"Using device: {device}")

Using device: mps


In [46]:
# Define the training loop
def train(model, device, train_loader, max_length):
    model.to(device)
    model.train()
    size = len(train_loader.dataset)
    for batch, (x, y) in enumerate(train_loader):
        x = tokenizer(x, return_tensors='pt', padding=True, truncation=True, max_length=max_length)["input_ids"] # Tokenize the input sequences
        x, y = x.to(device), y.to(device) # Move the data to the device
        # Forward pass
        outputs = model(x, labels=y) # Get the outputs of the model
        # Backward pass
        loss = outputs.loss
        loss.backward()
        optimizer.step()
        # Zero the gradients
        optimizer.zero_grad()
        if batch % 100 == 0:
            loss, current = loss.item(), (batch + 1) * len(x)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

In [56]:
# Define the test loop
def test(model, device, test_loader, max_length, losses, accuracies):
    model.to(device)
    model.eval()
    size = len(test_loader.dataset)
    test_loss, correct = 0, 0
    with torch.no_grad():
        for x, y in test_loader:
            x = tokenizer(x, return_tensors='pt', padding=True, truncation=True, max_length=max_length)["input_ids"]
            x, y = x.to(device), y.to(device)
            outputs = model(x, labels=y)
            test_loss += outputs.loss.item()
            correct += (outputs.logits.argmax(1) == y).type(torch.float).sum().item()
    test_loss /= size
    losses.append(test_loss)
    correct /= size
    accuracies.append(correct)
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

Now you have all the parameters, objects, and functions needed to train a deep learning model! The last thing you have to do is to declare how many **epochs** the training phase will have. This is how many times will the model "see" or process all the samples contained in the training dataset. This is done to try to fit the model as most as possible to the data. That's why your dataset has to include a wide variety of examples, covering the most cases as possible from the phenomenon you're studying.

In [57]:
# Define the number of epochs
epochs = 3

# Define the lists to store the losses and accuracies
losses = []
accuracies = []

for epoch in range(epochs):
    print(f"Epoch {epoch + 1}\n-------------------------------")
    train(model, device, train_loader, max_length)
    test(model, device, test_loader, max_length, losses, accuracies)

Epoch 1
-------------------------------
loss: 0.725337  [   32/20843]
loss: 0.680516  [ 3232/20843]
loss: 0.692339  [ 6432/20843]
loss: 0.662142  [ 9632/20843]
loss: 0.685585  [12832/20843]
loss: 0.684949  [16032/20843]
loss: 0.669596  [19232/20843]
Test Error: 
 Accuracy: 61.9%, Avg loss: 0.020826 

Epoch 2
-------------------------------
loss: 0.687336  [   32/20843]
loss: 0.661461  [ 3232/20843]
loss: 0.691168  [ 6432/20843]
loss: 0.648936  [ 9632/20843]
loss: 0.639548  [12832/20843]
loss: 0.634887  [16032/20843]
loss: 0.622727  [19232/20843]
Test Error: 
 Accuracy: 64.6%, Avg loss: 0.020077 

Epoch 3
-------------------------------
loss: 0.565129  [   32/20843]
loss: 0.710092  [ 3232/20843]
loss: 0.632840  [ 6432/20843]
loss: 0.614384  [ 9632/20843]
loss: 0.573174  [12832/20843]
loss: 0.611180  [16032/20843]
loss: 0.615772  [19232/20843]
Test Error: 
 Accuracy: 66.1%, Avg loss: 0.019404 



# References

1. Dive into Deep Learning
2. Genomic Benchmarks paper and repository
3. HuggingFace