<a href="https://colab.research.google.com/github/mssongit/bert-sentence-classification/blob/main/bert_sentence_classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Bert Sentence classification


**BERT**

BERT: Pre-training of Deep Bidirectional Transformers for
Language Understanding

https://arxiv.org/pdf/1810.04805.pdf

In [1]:
!pip install transformers==4.1.1
!pip install wandb

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers==4.1.1
  Downloading transformers-4.1.1-py3-none-any.whl (1.5 MB)
[K     |████████████████████████████████| 1.5 MB 4.7 MB/s 
Collecting sacremoses
  Downloading sacremoses-0.0.53.tar.gz (880 kB)
[K     |████████████████████████████████| 880 kB 44.1 MB/s 
[?25hCollecting tokenizers==0.9.4
  Downloading tokenizers-0.9.4-cp37-cp37m-manylinux2010_x86_64.whl (2.9 MB)
[K     |████████████████████████████████| 2.9 MB 25.3 MB/s 
Building wheels for collected packages: sacremoses
  Building wheel for sacremoses (setup.py) ... [?25l[?25hdone
  Created wheel for sacremoses: filename=sacremoses-0.0.53-py3-none-any.whl size=895259 sha256=a387051bfd9565113c3bcf285ae9e18f58548d9fc619f99ca21f62e8c92b69d1
  Stored in directory: /root/.cache/pip/wheels/87/39/dd/a83eeef36d0bf98e7a4d1933a4ad2d660295a40613079bafc9
Successfully built sacremoses
Installing collected packages: token


# The COLA dataset
We’ll use The Corpus of Linguistic Acceptability (CoLA) dataset for single sentence classification. It’s a set of sentences labeled as grammatically correct or incorrect. It was first published in May of 2018, and is one of the tests included in the “GLUE Benchmark” on which models like BERT are competing.


In [2]:
!pip install wget
import wget
import os

print('Downloading dataset...')

# The URL for the dataset zip file.
url = 'https://nyu-mll.github.io/CoLA/cola_public_1.1.zip'

# Download the file (if we haven't already)
if not os.path.exists('./cola_public_1.1.zip'):
    wget.download(url, './cola_public_1.1.zip')

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting wget
  Downloading wget-3.2.zip (10 kB)
Building wheels for collected packages: wget
  Building wheel for wget (setup.py) ... [?25l[?25hdone
  Created wheel for wget: filename=wget-3.2-py3-none-any.whl size=9675 sha256=9ac12e57ea09c0036626797eaa464d288c6623c0edf3334aa873df259d48c8ad
  Stored in directory: /root/.cache/pip/wheels/a1/b6/7c/0e63e34eb06634181c63adacca38b79ff8f35c37e3c13e3c02
Successfully built wget
Installing collected packages: wget
Successfully installed wget-3.2
Downloading dataset...


In [3]:
if not os.path.exists('./cola_public/'):
    !unzip cola_public_1.1.zip

Archive:  cola_public_1.1.zip
   creating: cola_public/
  inflating: cola_public/README      
   creating: cola_public/tokenized/
  inflating: cola_public/tokenized/in_domain_dev.tsv  
  inflating: cola_public/tokenized/in_domain_train.tsv  
  inflating: cola_public/tokenized/out_of_domain_dev.tsv  
   creating: cola_public/raw/
  inflating: cola_public/raw/in_domain_dev.tsv  
  inflating: cola_public/raw/in_domain_train.tsv  
  inflating: cola_public/raw/out_of_domain_dev.tsv  


In [4]:
import pandas as pd

# Load the dataset into a pandas dataframe.
df = pd.read_csv("./cola_public/raw/in_domain_train.tsv", delimiter='\t', header=None, names=['sentence_source', 'label', 'label_notes', 'sentence'])

# Report the number of sentences.
print('Number of training sentences: {:,}\n'.format(df.shape[0]))

# Display 10 random rows from the data.
df.sample(10)

Number of training sentences: 8,551



Unnamed: 0,sentence_source,label,label_notes,sentence
8415,ad03,0,*,Him loves him
4818,ks08,1,,I don't know how to do it.
1440,r-67,0,*,Did that he played the piano surprise you?
2153,l-93,0,*,Janet broke at the vase.
3961,ks08,1,,I wonder if you will come back tomorrow.
1287,r-67,0,*,I went to the store to have bought some whisky.
5097,ks08,0,*,That you have done it really well is what I me...
5640,c_13,1,,The puppy loved peanut butter cookies.
1580,r-67,1,,Seven more soldiers came in after ten had left.
6689,m_02,1,,Sarah devoured the cakes in the kitchen last n...


In [5]:
# Get the lists of sentences and their labels.
sentences = df.sentence.values
labels = df.label.values

# Tokenization
As mentioned earlier, the sentences that are to be fed into the BERT model must be tokenized using the BERT tokenizer. Let’s take a look at an example.

In [6]:
from transformers import BertTokenizer

# Load the BERT tokenizer.
print('Loading BERT tokenizer...')
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased', do_lower_case=True)
'''
Original:  Our friends won't buy this analysis, let alone the next one we propose.
Tokenized:  ['our', 'friends', 'won', "'", 't', 'buy', 'this', 'analysis', ',', 'let', 'alone', 'the', 'next', 'one', 'we', 'propose', '.']
Token IDs:  [2256, 2814, 2180, 1005, 1056, 4965, 2023, 4106, 1010, 2292, 2894, 1996, 2279, 2028, 2057, 16599, 1012]
'''

Loading BERT tokenizer...


Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

'\nOriginal:  Our friends won\'t buy this analysis, let alone the next one we propose.\nTokenized:  [\'our\', \'friends\', \'won\', "\'", \'t\', \'buy\', \'this\', \'analysis\', \',\', \'let\', \'alone\', \'the\', \'next\', \'one\', \'we\', \'propose\', \'.\']\nToken IDs:  [2256, 2814, 2180, 1005, 1056, 4965, 2023, 4106, 1010, 2292, 2894, 1996, 2279, 2028, 2057, 16599, 1012]\n'

Before we process the entire dataset using this tokenizer, there are a few conditions that we need to satisfy in order to setup the training data for BERT:

- Add special tokens to the start and end of each sentence. At the end of every sentence, we need to append the special `[SEP]` token and for classification tasks, we must prepend the special `[CLS]` token to the beginning of every sentence.
- Pad & truncate all sentences to a single constant length
- Explicitly differentiate real tokens from padding tokens with the “attention mask”. The “Attention Mask” is simply an array of 1s and 0s indicating which tokens are padding and which aren’t


In [7]:
max_len = 0

# For every sentence...
for sent in sentences:

    # Tokenize the text and add `[CLS]` and `[SEP]` tokens.
    input_ids = tokenizer.encode(sent, add_special_tokens=True)

    # Update the maximum sentence length.
    max_len = max(max_len, len(input_ids))

print('Max sentence length: ', max_len)

Max sentence length:  47


# Wandb Config

In [8]:
import wandb
sweep_config = {
    'method': 'random', #grid, random
    'metric': {
      'name': 'val_accuracy',
      'goal': 'maximize'   
    },
    'parameters': {

        'learning_rate': {
            'values': [ 5e-5, 3e-5, 2e-5]
        },
        'batch_size': {
            'values': [16, 32]
        },
        'epochs':{
            'values':[2, 3, 4]
        }
    }
}
sweep_defaults = {
    'learning_rate': 5e-5,
    'batch_size': 32,
    'epochs':2
}

sweep_id = wandb.sweep(sweep_config)


ERROR:wandb.jupyter:Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit: 

··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


Create sweep with ID: m4q957fd
Sweep URL: https://wandb.ai/mssong/uncategorized/sweeps/m4q957fd


# Tonekize
To follow a general convention that the sizes should be in powers of 2, we’ll choose the closest number that is a power of 2, i.e, 64.

Now, we’re ready to perform the real tokenization. But as we’re using transformers, we can use an inbuilt function tokenizer.encode_plus which automates all of the following tasks:

1. Split the sentence into tokens.
2. Add the special `[CLS]` and `[SEP]` tokens.
3. Map the tokens to their IDs.
4. Pad or truncate all sentences to the same length.
5. Create the attention masks which explicitly differentiate real tokens from `[PAD]` tokens.

In [9]:
import torch
# Tokenize all of the sentences and map the tokens to thier word IDs.
input_ids = []
attention_masks = []

# For every sentence...
for sent in sentences:
    # `encode_plus` will:
    #   (1) Tokenize the sentence.
    #   (2) Prepend the `[CLS]` token to the start.
    #   (3) Append the `[SEP]` token to the end.
    #   (4) Map tokens to their IDs.
    #   (5) Pad or truncate the sentence to `max_length`
    #   (6) Create attention masks for [PAD] tokens.
    encoded_dict = tokenizer.encode_plus(
                        sent,                      # Sentence to encode.
                        add_special_tokens = True, # Add '[CLS]' and '[SEP]'
                        max_length = 64,           # Pad & truncate all sentences.
                        pad_to_max_length = True,
                        return_attention_mask = True,   # Construct attn. masks.
                        return_tensors = 'pt',     # Return pytorch tensors.
                   )
    
    # Add the encoded sentence to the list.    
    input_ids.append(encoded_dict['input_ids'])
    
    # And its attention mask (simply differentiates padding from non-padding).
    attention_masks.append(encoded_dict['attention_mask'])

# Convert the lists into tensors.
input_ids = torch.cat(input_ids, dim=0)
attention_masks = torch.cat(attention_masks, dim=0)
labels = torch.tensor(labels)

# Print sentence 0, now as a list of IDs.
print('Original: ', sentences[0])
print('Token IDs:', input_ids[0])

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


Original:  Our friends won't buy this analysis, let alone the next one we propose.
Token IDs: tensor([  101,  2256,  2814,  2180,  1005,  1056,  4965,  2023,  4106,  1010,
         2292,  2894,  1996,  2279,  2028,  2057, 16599,  1012,   102,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0])


# Dataset

In [10]:
from torch.utils.data import TensorDataset, random_split

# Combine the training inputs into a TensorDataset.
dataset = TensorDataset(input_ids, attention_masks, labels)

# Create a 90-10 train-validation split.

# Calculate the number of samples to include in each set.
train_size = int(0.9 * len(dataset))
val_size = len(dataset) - train_size

# Divide the dataset by randomly selecting samples.
train_dataset, val_dataset = random_split(dataset, [train_size, val_size])

print('{:>5,} training samples'.format(train_size))
print('{:>5,} validation samples'.format(val_size))

7,695 training samples
  856 validation samples


In [11]:
from torch.utils.data import DataLoader, RandomSampler, SequentialSampler
import wandb
# WANDB PARAMETER
def ret_dataloader():
    batch_size = wandb.config.batch_size
    print('batch_size = ', batch_size)
    train_dataloader = DataLoader(
                train_dataset,  # The training samples.
                sampler = RandomSampler(train_dataset), # Select batches randomly
                batch_size = batch_size # Trains with this batch size.
            )

    validation_dataloader = DataLoader(
                val_dataset, # The validation samples.
                sampler = SequentialSampler(val_dataset), # Pull out batches sequentially.
                batch_size = batch_size # Evaluate with this batch size.
            )
    return train_dataloader,validation_dataloader

# Load Pre-trained BERT model

In [12]:
from transformers import BertForSequenceClassification, AdamW, BertConfig

def ret_model():

    model = BertForSequenceClassification.from_pretrained(
        "bert-base-uncased", 
        num_labels = 2, 
        output_attentions = False, # Whether the model returns attentions weights.
        output_hidden_states = False, # Whether the model returns all hidden-states.
    )

    return model

In [13]:
def ret_optim(model):
    print('Learning_rate = ',wandb.config.learning_rate )
    optimizer = AdamW(model.parameters(),
                      lr = wandb.config.learning_rate, 
                      eps = 1e-8 
                    )
    return optimizer

In [14]:
from transformers import get_linear_schedule_with_warmup

def ret_scheduler(train_dataloader,optimizer):
    epochs = wandb.config.epochs
    print('epochs =>', epochs)
    # Total number of training steps is [number of batches] x [number of epochs]. 
    # (Note that this is not the same as the number of training samples).
    total_steps = len(train_dataloader) * epochs

    # Create the learning rate scheduler.
    scheduler = get_linear_schedule_with_warmup(optimizer, 
                                                num_warmup_steps = 0, # Default value in run_glue.py
                                                num_training_steps = total_steps)
    return scheduler

In [15]:
import numpy as np

# Function to calculate the accuracy of our predictions vs labels
def flat_accuracy(preds, labels):
    pred_flat = np.argmax(preds, axis=1).flatten()
    labels_flat = labels.flatten()
    return np.sum(pred_flat == labels_flat) / len(labels_flat)
import time
import datetime

def format_time(elapsed):
    '''
    Takes a time in seconds and returns a string hh:mm:ss
    '''
    # Round to the nearest second.
    elapsed_rounded = int(round((elapsed)))
    
    # Format as hh:mm:ss
    return str(datetime.timedelta(seconds=elapsed_rounded))

# The Train Function

In [16]:
import random
import numpy as np

    # This training code is based on the `run_glue.py` script here:
    # https://github.com/huggingface/transformers/blob/5bfcd0485ece086ebcbed2d008813037968a9e58/examples/run_glue.py#L128

    # Set the seed value all over the place to make this reproducible.
def train():
    wandb.init(config=sweep_defaults)
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    print(device)
    model = ret_model()
    model.to(device)
    #wandb.init(config=sweep_defaults)
    train_dataloader,validation_dataloader = ret_dataloader()
    optimizer = ret_optim(model)
    scheduler = ret_scheduler(train_dataloader,optimizer)

    #print("config ",wandb.config.learning_rate, "\n",wandb.config)
    seed_val = 42
   
    random.seed(seed_val)
    np.random.seed(seed_val)
    torch.manual_seed(seed_val)
    #torch.cuda.manual_seed_all(seed_val)

    # We'll store a number of quantities such as training and validation loss, 
    # validation accuracy, and timings.
    training_stats = []

    # Measure the total training time for the whole run.
    total_t0 = time.time()
    epochs = wandb.config.epochs
    # For each epoch...
    for epoch_i in range(0, epochs):
        
        # ========================================
        #               Training
        # ========================================
        
        # Perform one full pass over the training set.

        print("")
        print('======== Epoch {:} / {:} ========'.format(epoch_i + 1, epochs))
        print('Training...')

        # Measure how long the training epoch takes.
        t0 = time.time()

        # Reset the total loss for this epoch.
        total_train_loss = 0

        # Put the model into training mode. Don't be mislead--the call to 
        # `train` just changes the *mode*, it doesn't *perform* the training.
        # `dropout` and `batchnorm` layers behave differently during training
        # vs. test (source: https://stackoverflow.com/questions/51433378/what-does-model-train-do-in-pytorch)
        model.train()

        # For each batch of training data...
        for step, batch in enumerate(train_dataloader):

            # Progress update every 40 batches.
            if step % 40 == 0 and not step == 0:
                # Calculate elapsed time in minutes.
                elapsed = format_time(time.time() - t0)
                
                # Report progress.
                print('  Batch {:>5,}  of  {:>5,}.    Elapsed: {:}.'.format(step, len(train_dataloader), elapsed))

            # Unpack this training batch from our dataloader. 
            #
            # As we unpack the batch, we'll also copy each tensor to the GPU using the 
            # `to` method.
            #
            # `batch` contains three pytorch tensors:
            #   [0]: input ids 
            #   [1]: attention masks
            #   [2]: labels 
            b_input_ids = batch[0].to(device)
            b_input_mask = batch[1].to(device)
            b_labels = batch[2].to(device)

            # Always clear any previously calculated gradients before performing a
            # backward pass. PyTorch doesn't do this automatically because 
            # accumulating the gradients is "convenient while training RNNs". 
            # (source: https://stackoverflow.com/questions/48001598/why-do-we-need-to-call-zero-grad-in-pytorch)
            model.zero_grad()        

            # Perform a forward pass (evaluate the model on this training batch).
            # The documentation for this `model` function is here: 
            # https://huggingface.co/transformers/v2.2.0/model_doc/bert.html#transformers.BertForSequenceClassification
            # It returns different numbers of parameters depending on what arguments
            # arge given and what flags are set. For our useage here, it returns
            # the loss (because we provided labels) and the "logits"--the model
            # outputs prior to activation.
            outputs = model(b_input_ids, 
                                token_type_ids=None, 
                                attention_mask=b_input_mask, 
                                labels=b_labels)
            loss, logits = outputs['loss'], outputs['logits']
            wandb.log({'train_batch_loss':loss.item()})
            # Accumulate the training loss over all of the batches so that we can
            # calculate the average loss at the end. `loss` is a Tensor containing a
            # single value; the `.item()` function just returns the Python value 
            # from the tensor.
            total_train_loss += loss.item()

            # Perform a backward pass to calculate the gradients.
            loss.backward()

            # Clip the norm of the gradients to 1.0.
            # This is to help prevent the "exploding gradients" problem.
            torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)

            # Update parameters and take a step using the computed gradient.
            # The optimizer dictates the "update rule"--how the parameters are
            # modified based on their gradients, the learning rate, etc.
            optimizer.step()

            # Update the learning rate.
            scheduler.step()

        # Calculate the average loss over all of the batches.
        avg_train_loss = total_train_loss / len(train_dataloader)            
        
        # Measure how long this epoch took.
        training_time = format_time(time.time() - t0)

        wandb.log({'avg_train_loss':avg_train_loss})

        print("")
        print("  Average training loss: {0:.2f}".format(avg_train_loss))
        print("  Training epcoh took: {:}".format(training_time))
            
        # ========================================
        #               Validation
        # ========================================
        # After the completion of each training epoch, measure our performance on
        # our validation set.

        print("")
        print("Running Validation...")

        t0 = time.time()

        # Put the model in evaluation mode--the dropout layers behave differently
        # during evaluation.
        model.eval()

        # Tracking variables 
        total_eval_accuracy = 0
        total_eval_loss = 0
        nb_eval_steps = 0

        # Evaluate data for one epoch
        for batch in validation_dataloader:
            
            # Unpack this training batch from our dataloader. 
            #
            # As we unpack the batch, we'll also copy each tensor to the GPU using 
            # the `to` method.
            #
            # `batch` contains three pytorch tensors:
            #   [0]: input ids 
            #   [1]: attention masks
            #   [2]: labels 
            b_input_ids = batch[0].cuda()
            b_input_mask = batch[1].to(device)
            b_labels = batch[2].to(device)
            
            # Tell pytorch not to bother with constructing the compute graph during
            # the forward pass, since this is only needed for backprop (training).
            with torch.no_grad():        

                # Forward pass, calculate logit predictions.
                # token_type_ids is the same as the "segment ids", which 
                # differentiates sentence 1 and 2 in 2-sentence tasks.
                # The documentation for this `model` function is here: 
                # https://huggingface.co/transformers/v2.2.0/model_doc/bert.html#transformers.BertForSequenceClassification
                # Get the "logits" output by the model. The "logits" are the output
                # values prior to applying an activation function like the softmax.
                outputs = model(b_input_ids, 
                                      token_type_ids=None, 
                                      attention_mask=b_input_mask,
                                      labels=b_labels)
                loss, logits = outputs['loss'], outputs['logits']
                
            # Accumulate the validation loss.
            total_eval_loss += loss.item()

            # Move logits and labels to CPU
            logits = logits.detach().cpu().numpy()
            label_ids = b_labels.to('cpu').numpy()

            # Calculate the accuracy for this batch of test sentences, and
            # accumulate it over all batches.
            total_eval_accuracy += flat_accuracy(logits, label_ids)
            

        # Report the final accuracy for this validation run.
        avg_val_accuracy = total_eval_accuracy / len(validation_dataloader)
        print("  Accuracy: {0:.2f}".format(avg_val_accuracy))

        # Calculate the average loss over all of the batches.
        avg_val_loss = total_eval_loss / len(validation_dataloader)
        
        # Measure how long the validation run took.
        validation_time = format_time(time.time() - t0)
        wandb.log({'val_accuracy':avg_val_accuracy,'avg_val_loss':avg_val_loss})
        print("  Validation Loss: {0:.2f}".format(avg_val_loss))
        print("  Validation took: {:}".format(validation_time))

        # Record all statistics from this epoch.
        training_stats.append(
            {
                'epoch': epoch_i + 1,
                'Training Loss': avg_train_loss,
                'Valid. Loss': avg_val_loss,
                'Valid. Accur.': avg_val_accuracy,
                'Training Time': training_time,
                'Validation Time': validation_time
            }
        )

    print("")
    print("Training complete!")

    print("Total training took {:} (h:mm:ss)".format(format_time(time.time()-total_t0)))

In [None]:
wandb.agent(sweep_id,function=train)

[34m[1mwandb[0m: Agent Starting Run: 07qdpdvb with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 4
[34m[1mwandb[0m: 	learning_rate: 2e-05
ERROR:wandb.jupyter:Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Currently logged in as: [33mmssong[0m. Use [1m`wandb login --relogin`[0m to force relogin


cuda


Downloading:   0%|          | 0.00/570 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  32
Learning_rate =  2e-05
epochs => 4

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:27.
  Batch   120  of    241.    Elapsed: 0:00:39.
  Batch   160  of    241.    Elapsed: 0:00:52.
  Batch   200  of    241.    Elapsed: 0:01:04.
  Batch   240  of    241.    Elapsed: 0:01:17.

  Average training loss: 0.51
  Training epcoh took: 0:01:18

Running Validation...
  Accuracy: 0.83
  Validation Loss: 0.40
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0:00:13.
  Batch    80  of    241.    Elapsed: 0:00:27.
  Batch   120  of    241.    Elapsed: 0:00:41.
  Batch   160  of    241.    Elapsed: 0:00:55.
  Batch   200  of    241.    Elapsed: 0:01:09.
  Batch   240  of    241.    Elapsed: 0:01:23.

  Average training loss: 0.33
  Training epcoh took: 0:01:23

Running Validation...
  Accuracy: 0.85
  Validation Loss: 0.40
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0

0,1
avg_train_loss,█▄▂▁
avg_val_loss,▁▁▅█
train_batch_loss,▇█▆▇▅▇▅▇▄█▇▅▆▅▃▃▄▄▄▅▃▄▂▃▃▂▁▂▄▃▂▂▁▃▁▁▂▃▁▅
val_accuracy,▁▆▇█

0,1
avg_train_loss,0.16241
avg_val_loss,0.46218
train_batch_loss,0.04409
val_accuracy,0.85455


[34m[1mwandb[0m: Agent Starting Run: 9pxdp7rk with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 4
[34m[1mwandb[0m: 	learning_rate: 3e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  16
Learning_rate =  3e-05
epochs => 4

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of    481.    Elapsed: 0:00:42.
  Batch   240  of    481.    Elapsed: 0:00:50.
  Batch   280  of    481.    Elapsed: 0:00:58.
  Batch   320  of    481.    Elapsed: 0:01:07.
  Batch   360  of    481.    Elapsed: 0:01:15.
  Batch   400  of    481.    Elapsed: 0:01:23.
  Batch   440  of    481.    Elapsed: 0:01:31.
  Batch   480  of    481.    Elapsed: 0:01:39.

  Average training loss: 0.49
  Training epcoh took: 0:01:39

Running Validation...
  Accuracy: 0.83
  Validation Loss: 0.40
  Validation took: 0:00:04

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of

0,1
avg_train_loss,█▄▂▁
avg_val_loss,▁▂▇█
train_batch_loss,▆▆▅▆▃▇▅▄▃▇▂▅▃▃▃▃▇▄▃▃▂▁▁▁▃▂▂▁█▄▁▁▁▃▁▁▃▃▂▁
val_accuracy,▁▇▆█

0,1
avg_train_loss,0.10183
avg_val_loss,0.64548
train_batch_loss,0.01171
val_accuracy,0.85301


[34m[1mwandb[0m: Agent Starting Run: nwdntkdj with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 4
[34m[1mwandb[0m: 	learning_rate: 2e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  32
Learning_rate =  2e-05
epochs => 4

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:42.
  Batch   160  of    241.    Elapsed: 0:00:56.
  Batch   200  of    241.    Elapsed: 0:01:10.
  Batch   240  of    241.    Elapsed: 0:01:24.

  Average training loss: 0.50
  Training epcoh took: 0:01:24

Running Validation...
  Accuracy: 0.84
  Validation Loss: 0.38
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:42.
  Batch   160  of    241.    Elapsed: 0:00:55.
  Batch   200  of    241.    Elapsed: 0:01:09.
  Batch   240  of    241.    Elapsed: 0:01:23.

  Average training loss: 0.32
  Training epcoh took: 0:01:23

Running Validation...
  Accuracy: 0.85
  Validation Loss: 0.39
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0

0,1
avg_train_loss,█▄▂▁
avg_val_loss,▁▁▅█
train_batch_loss,▇█▆▇▆▇▅█▄█▆▅▆▆▃▃▄▅▄▅▃▅▂▂▃▂▂▂▄▂▃▂▁▁▂▁▃▂▁▃
val_accuracy,▁▇▇█

0,1
avg_train_loss,0.14666
avg_val_loss,0.48499
train_batch_loss,0.03927
val_accuracy,0.85262


[34m[1mwandb[0m: Agent Starting Run: o4921rda with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 3e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  32
Learning_rate =  3e-05
epochs => 2

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:42.
  Batch   160  of    241.    Elapsed: 0:00:56.
  Batch   200  of    241.    Elapsed: 0:01:10.
  Batch   240  of    241.    Elapsed: 0:01:24.

  Average training loss: 0.49
  Training epcoh took: 0:01:24

Running Validation...
  Accuracy: 0.83
  Validation Loss: 0.38
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:41.
  Batch   160  of    241.    Elapsed: 0:00:55.
  Batch   200  of    241.    Elapsed: 0:01:09.
  Batch   240  of    241.    Elapsed: 0:01:23.

  Average training loss: 0.28
  Training epcoh took: 0:01:23

Running Validation...
  Accuracy: 0.84
  Validation Loss: 0.39
  Validation took: 0:00:03

Training complete!
Total training took 0:02:53 (h:m

0,1
avg_train_loss,█▁
avg_val_loss,▁█
train_batch_loss,▆▆▆▅█▅▅▆▅▅▆▅▆▇▃▆▃▅▅▇▅▃▅▂▅▄▄▃▂▇▁▂▃▅▂▃▃▃▃▄
val_accuracy,▁█

0,1
avg_train_loss,0.28226
avg_val_loss,0.39011
train_batch_loss,0.21012
val_accuracy,0.84336


[34m[1mwandb[0m: Agent Starting Run: d7mkn29t with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 4
[34m[1mwandb[0m: 	learning_rate: 2e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  32
Learning_rate =  2e-05
epochs => 4

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:42.
  Batch   160  of    241.    Elapsed: 0:00:56.
  Batch   200  of    241.    Elapsed: 0:01:10.
  Batch   240  of    241.    Elapsed: 0:01:24.

  Average training loss: 0.50
  Training epcoh took: 0:01:24

Running Validation...
  Accuracy: 0.83
  Validation Loss: 0.39
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:42.
  Batch   160  of    241.    Elapsed: 0:00:55.
  Batch   200  of    241.    Elapsed: 0:01:09.
  Batch   240  of    241.    Elapsed: 0:01:23.

  Average training loss: 0.32
  Training epcoh took: 0:01:23

Running Validation...
  Accuracy: 0.85
  Validation Loss: 0.38
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0

VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
avg_train_loss,█▄▂▁
avg_val_loss,▂▁▄█
train_batch_loss,██▆▇▅█▅▇▄█▅▅▆▆▃▃▄▅▄▅▃▄▂▃▂▃▁▂▃▂▂▁▁▂▁▂▂▂▁▅
val_accuracy,▁█▆█

0,1
avg_train_loss,0.13615
avg_val_loss,0.4838
train_batch_loss,0.01934
val_accuracy,0.85571


[34m[1mwandb[0m: Agent Starting Run: 6337j0rk with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 4
[34m[1mwandb[0m: 	learning_rate: 5e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  32
Learning_rate =  5e-05
epochs => 4

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:42.
  Batch   160  of    241.    Elapsed: 0:00:56.
  Batch   200  of    241.    Elapsed: 0:01:10.
  Batch   240  of    241.    Elapsed: 0:01:24.

  Average training loss: 0.49
  Training epcoh took: 0:01:24

Running Validation...
  Accuracy: 0.83
  Validation Loss: 0.41
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:42.
  Batch   160  of    241.    Elapsed: 0:00:55.
  Batch   200  of    241.    Elapsed: 0:01:09.
  Batch   240  of    241.    Elapsed: 0:01:23.

  Average training loss: 0.27
  Training epcoh took: 0:01:23

Running Validation...
  Accuracy: 0.84
  Validation Loss: 0.42
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0

VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
avg_train_loss,█▄▂▁
avg_val_loss,▁▁▅█
train_batch_loss,▇▇▆▇▅█▄▅▄█▄▄▄▆▂▂▃▅▃▄▂▅▃▃▂▂▁▂▃▂▁▁▁▂▁▁▁▄▁▃
val_accuracy,▁▆▇█

0,1
avg_train_loss,0.07979
avg_val_loss,0.61347
train_batch_loss,0.00402
val_accuracy,0.85185


[34m[1mwandb[0m: Agent Starting Run: suip1o1n with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 3e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  16
Learning_rate =  3e-05
epochs => 2

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of    481.    Elapsed: 0:00:42.
  Batch   240  of    481.    Elapsed: 0:00:50.
  Batch   280  of    481.    Elapsed: 0:00:58.
  Batch   320  of    481.    Elapsed: 0:01:07.
  Batch   360  of    481.    Elapsed: 0:01:15.
  Batch   400  of    481.    Elapsed: 0:01:23.
  Batch   440  of    481.    Elapsed: 0:01:31.
  Batch   480  of    481.    Elapsed: 0:01:39.

  Average training loss: 0.48
  Training epcoh took: 0:01:39

Running Validation...
  Accuracy: 0.83
  Validation Loss: 0.39
  Validation took: 0:00:04

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:17.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of

VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
avg_train_loss,█▁
avg_val_loss,▁█
train_batch_loss,▅▅▆▆▅▆▄▆▄▅▆▅▅▇▃▃▂▆▄█▂▂▅▄▂▁▂▅▂▃▅▃▅▆▄▁▃▄▂▃
val_accuracy,▁█

0,1
avg_train_loss,0.25152
avg_val_loss,0.43946
train_batch_loss,0.06903
val_accuracy,0.84838


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: dbnk1bpn with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 4
[34m[1mwandb[0m: 	learning_rate: 5e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  16
Learning_rate =  5e-05
epochs => 4

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of    481.    Elapsed: 0:00:42.
  Batch   240  of    481.    Elapsed: 0:00:50.
  Batch   280  of    481.    Elapsed: 0:00:58.
  Batch   320  of    481.    Elapsed: 0:01:07.
  Batch   360  of    481.    Elapsed: 0:01:15.
  Batch   400  of    481.    Elapsed: 0:01:23.
  Batch   440  of    481.    Elapsed: 0:01:31.
  Batch   480  of    481.    Elapsed: 0:01:39.

  Average training loss: 0.50
  Training epcoh took: 0:01:39

Running Validation...
  Accuracy: 0.83
  Validation Loss: 0.42
  Validation took: 0:00:04

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of

0,1
avg_train_loss,█▄▂▁
avg_val_loss,▁▁▆█
train_batch_loss,▆▆▇▇▄▇▄▄▂█▄▅▆▃▃▄▇▅▃▃▁▁▁▁▁▄▁▁▆▃▁▁▁▁▁▁▁▁▁▁
val_accuracy,▃▁▆█

0,1
avg_train_loss,0.09422
avg_val_loss,0.72915
train_batch_loss,0.00364
val_accuracy,0.84606


[34m[1mwandb[0m: Agent Starting Run: sp3b36ne with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 3
[34m[1mwandb[0m: 	learning_rate: 2e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  16
Learning_rate =  2e-05
epochs => 3

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of    481.    Elapsed: 0:00:42.
  Batch   240  of    481.    Elapsed: 0:00:50.
  Batch   280  of    481.    Elapsed: 0:00:58.
  Batch   320  of    481.    Elapsed: 0:01:07.
  Batch   360  of    481.    Elapsed: 0:01:15.
  Batch   400  of    481.    Elapsed: 0:01:23.
  Batch   440  of    481.    Elapsed: 0:01:31.
  Batch   480  of    481.    Elapsed: 0:01:39.

  Average training loss: 0.48
  Training epcoh took: 0:01:40

Running Validation...
  Accuracy: 0.82
  Validation Loss: 0.39
  Validation took: 0:00:04

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:17.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of

0,1
avg_train_loss,█▃▁
avg_val_loss,▁▄█
train_batch_loss,▇▅▇▇▇▆▆▇▄▄▃▆█▆▄▄▃▃▃▄▆▆▅▄▄▂▃▁▁▁▃▁▄▂▁▁▁▃▁▂
val_accuracy,▁▇█

0,1
avg_train_loss,0.17129
avg_val_loss,0.54039
train_batch_loss,0.01923
val_accuracy,0.85648


[34m[1mwandb[0m: Agent Starting Run: bzmm3lgl with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 3e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  16
Learning_rate =  3e-05
epochs => 2

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of    481.    Elapsed: 0:00:42.
  Batch   240  of    481.    Elapsed: 0:00:50.
  Batch   280  of    481.    Elapsed: 0:00:58.
  Batch   320  of    481.    Elapsed: 0:01:07.
  Batch   360  of    481.    Elapsed: 0:01:15.
  Batch   400  of    481.    Elapsed: 0:01:23.
  Batch   440  of    481.    Elapsed: 0:01:31.
  Batch   480  of    481.    Elapsed: 0:01:39.

  Average training loss: 0.48
  Training epcoh took: 0:01:39

Running Validation...
  Accuracy: 0.83
  Validation Loss: 0.38
  Validation took: 0:00:04

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:17.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of

VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
avg_train_loss,█▁
avg_val_loss,▁█
train_batch_loss,▅▆▇▆▆▆▅▅▅▅▅▅▆▇▃▃▂▆▅█▃▂▅▃▃▂▃▆▂▃▇▃▆▅▅▁▅▄▃▁
val_accuracy,▁█

0,1
avg_train_loss,0.25804
avg_val_loss,0.43314
train_batch_loss,0.04877
val_accuracy,0.84838


[34m[1mwandb[0m: Agent Starting Run: 67ppx4af with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 5e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  16
Learning_rate =  5e-05
epochs => 2

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of    481.    Elapsed: 0:00:42.
  Batch   240  of    481.    Elapsed: 0:00:50.
  Batch   280  of    481.    Elapsed: 0:00:58.
  Batch   320  of    481.    Elapsed: 0:01:07.
  Batch   360  of    481.    Elapsed: 0:01:15.
  Batch   400  of    481.    Elapsed: 0:01:23.
  Batch   440  of    481.    Elapsed: 0:01:31.
  Batch   480  of    481.    Elapsed: 0:01:39.

  Average training loss: 0.50
  Training epcoh took: 0:01:40

Running Validation...
  Accuracy: 0.84
  Validation Loss: 0.39
  Validation took: 0:00:04

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:17.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of

VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
avg_train_loss,█▁
avg_val_loss,▁█
train_batch_loss,▅▆▇▅▆▆▄▆▅▅▆▄▆▆▃▃▂▇▅▇▃▆█▄▃▃▁▇▃▅▂▂▅▄▅▃▄▃▃▂
val_accuracy,█▁

0,1
avg_train_loss,0.25894
avg_val_loss,0.44929
train_batch_loss,0.09801
val_accuracy,0.83449


[34m[1mwandb[0m: Agent Starting Run: 6unp23m1 with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 3
[34m[1mwandb[0m: 	learning_rate: 2e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  16
Learning_rate =  2e-05
epochs => 3

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of    481.    Elapsed: 0:00:42.
  Batch   240  of    481.    Elapsed: 0:00:50.
  Batch   280  of    481.    Elapsed: 0:00:58.
  Batch   320  of    481.    Elapsed: 0:01:07.
  Batch   360  of    481.    Elapsed: 0:01:15.
  Batch   400  of    481.    Elapsed: 0:01:23.
  Batch   440  of    481.    Elapsed: 0:01:31.
  Batch   480  of    481.    Elapsed: 0:01:39.

  Average training loss: 0.48
  Training epcoh took: 0:01:39

Running Validation...
  Accuracy: 0.83
  Validation Loss: 0.38
  Validation took: 0:00:04

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:17.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of

VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
avg_train_loss,█▃▁
avg_val_loss,▁▄█
train_batch_loss,█▆▆▆▇▆▆█▅▄▃▆█▆▄▄▃▄▃▄▆▇▆▄▄▂▃▁▁▁▂▁▄▃▁▁▁▁▁▅
val_accuracy,▁▇█

0,1
avg_train_loss,0.17092
avg_val_loss,0.56861
train_batch_loss,0.01621
val_accuracy,0.85069


[34m[1mwandb[0m: Agent Starting Run: berzdrat with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 3e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  16
Learning_rate =  3e-05
epochs => 2

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of    481.    Elapsed: 0:00:42.
  Batch   240  of    481.    Elapsed: 0:00:50.
  Batch   280  of    481.    Elapsed: 0:00:58.
  Batch   320  of    481.    Elapsed: 0:01:07.
  Batch   360  of    481.    Elapsed: 0:01:15.
  Batch   400  of    481.    Elapsed: 0:01:23.
  Batch   440  of    481.    Elapsed: 0:01:31.
  Batch   480  of    481.    Elapsed: 0:01:39.

  Average training loss: 0.48
  Training epcoh took: 0:01:39

Running Validation...
  Accuracy: 0.83
  Validation Loss: 0.38
  Validation took: 0:00:04

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of

VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
avg_train_loss,█▁
avg_val_loss,▁█
train_batch_loss,▅▆▇▆▆▆▅▅▅▅▅▅▆▇▃▃▂▆▅█▃▂▅▃▃▂▃▆▂▃▇▃▆▅▅▁▅▄▃▁
val_accuracy,▁█

0,1
avg_train_loss,0.25804
avg_val_loss,0.43314
train_batch_loss,0.04877
val_accuracy,0.84838


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: mrgzhkr5 with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 4
[34m[1mwandb[0m: 	learning_rate: 5e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  16
Learning_rate =  5e-05
epochs => 4

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of    481.    Elapsed: 0:00:42.
  Batch   240  of    481.    Elapsed: 0:00:50.
  Batch   280  of    481.    Elapsed: 0:00:59.
  Batch   320  of    481.    Elapsed: 0:01:07.
  Batch   360  of    481.    Elapsed: 0:01:15.
  Batch   400  of    481.    Elapsed: 0:01:23.
  Batch   440  of    481.    Elapsed: 0:01:31.
  Batch   480  of    481.    Elapsed: 0:01:39.

  Average training loss: 0.50
  Training epcoh took: 0:01:40

Running Validation...
  Accuracy: 0.83
  Validation Loss: 0.42
  Validation took: 0:00:04

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:17.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of

0,1
avg_train_loss,█▄▂▁
avg_val_loss,▁▁▆█
train_batch_loss,▆▆▇▇▄▇▄▄▂█▄▅▆▃▃▄▇▅▃▃▁▁▁▁▁▄▁▁▆▃▁▁▁▁▁▁▁▁▁▁
val_accuracy,▃▁▆█

0,1
avg_train_loss,0.09422
avg_val_loss,0.72915
train_batch_loss,0.00364
val_accuracy,0.84606


[34m[1mwandb[0m: Agent Starting Run: kdavsttp with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 4
[34m[1mwandb[0m: 	learning_rate: 5e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  16
Learning_rate =  5e-05
epochs => 4

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of    481.    Elapsed: 0:00:42.
  Batch   240  of    481.    Elapsed: 0:00:50.
  Batch   280  of    481.    Elapsed: 0:00:58.
  Batch   320  of    481.    Elapsed: 0:01:07.
  Batch   360  of    481.    Elapsed: 0:01:15.
  Batch   400  of    481.    Elapsed: 0:01:23.
  Batch   440  of    481.    Elapsed: 0:01:31.
  Batch   480  of    481.    Elapsed: 0:01:39.

  Average training loss: 0.50
  Training epcoh took: 0:01:40

Running Validation...
  Accuracy: 0.83
  Validation Loss: 0.41
  Validation took: 0:00:04

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:17.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of

0,1
avg_train_loss,█▄▂▁
avg_val_loss,▁▃▆█
train_batch_loss,█▆▇▇▅█▄▅▄▆▄▄▃▄▃▅▇▆▄▂▃▁▁▁▃▆▁▁█▂▁▁▁▁▁▁▁▁▃▁
val_accuracy,▁▃▆█

0,1
avg_train_loss,0.08115
avg_val_loss,0.73479
train_batch_loss,0.0024
val_accuracy,0.85301


[34m[1mwandb[0m: Agent Starting Run: 8jlvo9wm with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 4
[34m[1mwandb[0m: 	learning_rate: 2e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  16
Learning_rate =  2e-05
epochs => 4

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of    481.    Elapsed: 0:00:42.
  Batch   240  of    481.    Elapsed: 0:00:50.
  Batch   280  of    481.    Elapsed: 0:00:58.
  Batch   320  of    481.    Elapsed: 0:01:06.
  Batch   360  of    481.    Elapsed: 0:01:15.
  Batch   400  of    481.    Elapsed: 0:01:23.
  Batch   440  of    481.    Elapsed: 0:01:31.
  Batch   480  of    481.    Elapsed: 0:01:39.

  Average training loss: 0.48
  Training epcoh took: 0:01:39

Running Validation...
  Accuracy: 0.83
  Validation Loss: 0.39
  Validation took: 0:00:04

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of

0,1
avg_train_loss,█▄▂▁
avg_val_loss,▁▃▆█
train_batch_loss,▅▄▅▆▃▆▄▅▃▅▃▅▃▂▂▅▆▃▃▂▁▂▁▁▃▄▁▁█▂▂▁▁▃▁▁▃▃▂▂
val_accuracy,▁▆▇█

0,1
avg_train_loss,0.12506
avg_val_loss,0.60299
train_batch_loss,0.01217
val_accuracy,0.8588


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: rg9v26zp with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 3
[34m[1mwandb[0m: 	learning_rate: 3e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  16
Learning_rate =  3e-05
epochs => 3

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of    481.    Elapsed: 0:00:42.
  Batch   240  of    481.    Elapsed: 0:00:50.
  Batch   280  of    481.    Elapsed: 0:00:58.
  Batch   320  of    481.    Elapsed: 0:01:07.
  Batch   360  of    481.    Elapsed: 0:01:15.
  Batch   400  of    481.    Elapsed: 0:01:23.
  Batch   440  of    481.    Elapsed: 0:01:31.
  Batch   480  of    481.    Elapsed: 0:01:39.

  Average training loss: 0.48
  Training epcoh took: 0:01:40

Running Validation...
  Accuracy: 0.84
  Validation Loss: 0.39
  Validation took: 0:00:04

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:17.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of

0,1
avg_train_loss,█▃▁
avg_val_loss,▁▃█
train_batch_loss,▇▆▇▆▇▅▆█▅▄▄▆▇▇▄▆▃▄▃▄▄▅▄▄▄▂▃▃▂▁▁▁▃▁▁▁▁▁▁▁
val_accuracy,▁▅█

0,1
avg_train_loss,0.15375
avg_val_loss,0.56
train_batch_loss,0.01707
val_accuracy,0.86574


[34m[1mwandb[0m: Agent Starting Run: ca7f9z7w with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 3
[34m[1mwandb[0m: 	learning_rate: 3e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  16
Learning_rate =  3e-05
epochs => 3

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of    481.    Elapsed: 0:00:42.
  Batch   240  of    481.    Elapsed: 0:00:50.
  Batch   280  of    481.    Elapsed: 0:00:58.
  Batch   320  of    481.    Elapsed: 0:01:07.
  Batch   360  of    481.    Elapsed: 0:01:15.
  Batch   400  of    481.    Elapsed: 0:01:23.
  Batch   440  of    481.    Elapsed: 0:01:31.
  Batch   480  of    481.    Elapsed: 0:01:39.

  Average training loss: 0.50
  Training epcoh took: 0:01:40

Running Validation...
  Accuracy: 0.82
  Validation Loss: 0.40
  Validation took: 0:00:04

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:17.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of

0,1
avg_train_loss,█▃▁
avg_val_loss,▁▂█
train_batch_loss,▆▅▆▅▆▅▅▇▄▄▃▅█▇▂▅▄▃▃▄▅▅▄▃▄▃▂▁▁▁▁▁▄▂▁▁▁▁▁▂
val_accuracy,▁▇█

0,1
avg_train_loss,0.16463
avg_val_loss,0.55841
train_batch_loss,0.03498
val_accuracy,0.85648


[34m[1mwandb[0m: Agent Starting Run: 24etjxrh with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 5e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


VBox(children=(Label(value='Waiting for wandb.init()...\r'), FloatProgress(value=0.01666874981666903, max=1.0)…

cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  16
Learning_rate =  5e-05
epochs => 2

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of    481.    Elapsed: 0:00:42.
  Batch   240  of    481.    Elapsed: 0:00:50.
  Batch   280  of    481.    Elapsed: 0:00:58.
  Batch   320  of    481.    Elapsed: 0:01:06.
  Batch   360  of    481.    Elapsed: 0:01:15.
  Batch   400  of    481.    Elapsed: 0:01:23.
  Batch   440  of    481.    Elapsed: 0:01:31.
  Batch   480  of    481.    Elapsed: 0:01:39.

  Average training loss: 0.50
  Training epcoh took: 0:01:39

Running Validation...
  Accuracy: 0.82
  Validation Loss: 0.40
  Validation took: 0:00:04

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:17.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of

0,1
avg_train_loss,█▁
avg_val_loss,▁█
train_batch_loss,▄▆▆▆▅▆▃▆▃▅▄▄█▆▃▃▃▆▅▇▂▂▅▃▃▁▁█▃▃▅▂▄▂▄▃▂▃▃▁
val_accuracy,▁█

0,1
avg_train_loss,0.24895
avg_val_loss,0.46483
train_batch_loss,0.08793
val_accuracy,0.84838


[34m[1mwandb[0m: Agent Starting Run: snl6qjty with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 4
[34m[1mwandb[0m: 	learning_rate: 5e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  16
Learning_rate =  5e-05
epochs => 4

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of    481.    Elapsed: 0:00:42.
  Batch   240  of    481.    Elapsed: 0:00:50.
  Batch   280  of    481.    Elapsed: 0:00:58.
  Batch   320  of    481.    Elapsed: 0:01:07.
  Batch   360  of    481.    Elapsed: 0:01:15.
  Batch   400  of    481.    Elapsed: 0:01:23.
  Batch   440  of    481.    Elapsed: 0:01:31.
  Batch   480  of    481.    Elapsed: 0:01:39.

  Average training loss: 0.50
  Training epcoh took: 0:01:39

Running Validation...
  Accuracy: 0.83
  Validation Loss: 0.42
  Validation took: 0:00:04

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:17.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of

0,1
avg_train_loss,█▄▂▁
avg_val_loss,▁▁▆█
train_batch_loss,▆▆▇▇▄▇▄▄▂█▄▅▆▃▃▄▇▅▃▃▁▁▁▁▁▄▁▁▆▃▁▁▁▁▁▁▁▁▁▁
val_accuracy,▃▁▆█

0,1
avg_train_loss,0.09422
avg_val_loss,0.72915
train_batch_loss,0.00364
val_accuracy,0.84606


[34m[1mwandb[0m: Agent Starting Run: bc7fdaff with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 2e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  32
Learning_rate =  2e-05
epochs => 2

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:42.
  Batch   160  of    241.    Elapsed: 0:00:56.
  Batch   200  of    241.    Elapsed: 0:01:10.
  Batch   240  of    241.    Elapsed: 0:01:24.

  Average training loss: 0.50
  Training epcoh took: 0:01:24

Running Validation...
  Accuracy: 0.83
  Validation Loss: 0.38
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:41.
  Batch   160  of    241.    Elapsed: 0:00:55.
  Batch   200  of    241.    Elapsed: 0:01:09.
  Batch   240  of    241.    Elapsed: 0:01:23.

  Average training loss: 0.33
  Training epcoh took: 0:01:23

Running Validation...
  Accuracy: 0.84
  Validation Loss: 0.39
  Validation took: 0:00:03

Training complete!
Total training took 0:02:54 (h:m

0,1
avg_train_loss,█▁
avg_val_loss,▁█
train_batch_loss,▆▆▇▅█▅▅▇▅▅█▅▅▇▃▅▂▅▅▇▅▂▄▃▆▄▄▃▂▇▁▂▃▃▂▃▃▃▄▄
val_accuracy,▁█

0,1
avg_train_loss,0.32849
avg_val_loss,0.38701
train_batch_loss,0.3725
val_accuracy,0.84221


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: fh8qcstq with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 4
[34m[1mwandb[0m: 	learning_rate: 3e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  16
Learning_rate =  3e-05
epochs => 4

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of    481.    Elapsed: 0:00:42.
  Batch   240  of    481.    Elapsed: 0:00:50.
  Batch   280  of    481.    Elapsed: 0:00:58.
  Batch   320  of    481.    Elapsed: 0:01:07.
  Batch   360  of    481.    Elapsed: 0:01:15.
  Batch   400  of    481.    Elapsed: 0:01:23.
  Batch   440  of    481.    Elapsed: 0:01:31.
  Batch   480  of    481.    Elapsed: 0:01:39.

  Average training loss: 0.48
  Training epcoh took: 0:01:40

Running Validation...
  Accuracy: 0.83
  Validation Loss: 0.38
  Validation took: 0:00:04

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:17.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of

0,1
avg_train_loss,█▄▂▁
avg_val_loss,▁▃▆█
train_batch_loss,▆▅▆▇▃▇▄▅▃▆▃▄▄▂▂▃▆▅▃▁▁▂▁▁▃▂▁▁█▄▁▁▁▄▁▁▁▁▄▃
val_accuracy,▁▅▆█

0,1
avg_train_loss,0.09121
avg_val_loss,0.67923
train_batch_loss,0.00919
val_accuracy,0.86227


[34m[1mwandb[0m: Agent Starting Run: nw59fcd5 with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 4
[34m[1mwandb[0m: 	learning_rate: 3e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  16
Learning_rate =  3e-05
epochs => 4

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of    481.    Elapsed: 0:00:42.
  Batch   240  of    481.    Elapsed: 0:00:50.
  Batch   280  of    481.    Elapsed: 0:00:58.
  Batch   320  of    481.    Elapsed: 0:01:06.
  Batch   360  of    481.    Elapsed: 0:01:15.
  Batch   400  of    481.    Elapsed: 0:01:23.
  Batch   440  of    481.    Elapsed: 0:01:31.
  Batch   480  of    481.    Elapsed: 0:01:39.

  Average training loss: 0.49
  Training epcoh took: 0:01:39

Running Validation...
  Accuracy: 0.83
  Validation Loss: 0.40
  Validation took: 0:00:04

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of

0,1
avg_train_loss,█▄▂▁
avg_val_loss,▁▂▇█
train_batch_loss,▆▆▅▆▃▇▅▄▃▇▂▅▃▃▃▃▇▄▃▃▂▁▁▁▃▂▂▁█▄▁▁▁▃▁▁▃▃▂▁
val_accuracy,▁▇▆█

0,1
avg_train_loss,0.10183
avg_val_loss,0.64548
train_batch_loss,0.01171
val_accuracy,0.85301


[34m[1mwandb[0m: Agent Starting Run: 5fw9mrnz with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 4
[34m[1mwandb[0m: 	learning_rate: 3e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  16
Learning_rate =  3e-05
epochs => 4

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of    481.    Elapsed: 0:00:42.
  Batch   240  of    481.    Elapsed: 0:00:50.
  Batch   280  of    481.    Elapsed: 0:00:58.
  Batch   320  of    481.    Elapsed: 0:01:07.
  Batch   360  of    481.    Elapsed: 0:01:15.
  Batch   400  of    481.    Elapsed: 0:01:23.
  Batch   440  of    481.    Elapsed: 0:01:31.
  Batch   480  of    481.    Elapsed: 0:01:39.

  Average training loss: 0.49
  Training epcoh took: 0:01:39

Running Validation...
  Accuracy: 0.83
  Validation Loss: 0.40
  Validation took: 0:00:04

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of

0,1
avg_train_loss,█▄▂▁
avg_val_loss,▁▂▇█
train_batch_loss,▆▆▅▆▃▇▅▄▃▇▂▅▃▃▃▃▇▄▃▃▂▁▁▁▃▂▂▁█▄▁▁▁▃▁▁▃▃▂▁
val_accuracy,▁▇▆█

0,1
avg_train_loss,0.10183
avg_val_loss,0.64548
train_batch_loss,0.01171
val_accuracy,0.85301


[34m[1mwandb[0m: Agent Starting Run: v7xvlt2j with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 3
[34m[1mwandb[0m: 	learning_rate: 3e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  32
Learning_rate =  3e-05
epochs => 3

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:42.
  Batch   160  of    241.    Elapsed: 0:00:56.
  Batch   200  of    241.    Elapsed: 0:01:10.
  Batch   240  of    241.    Elapsed: 0:01:24.

  Average training loss: 0.49
  Training epcoh took: 0:01:24

Running Validation...
  Accuracy: 0.84
  Validation Loss: 0.38
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:41.
  Batch   160  of    241.    Elapsed: 0:00:55.
  Batch   200  of    241.    Elapsed: 0:01:09.
  Batch   240  of    241.    Elapsed: 0:01:23.

  Average training loss: 0.29
  Training epcoh took: 0:01:23

Running Validation...
  Accuracy: 0.85
  Validation Loss: 0.40
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0

0,1
avg_train_loss,█▄▁
avg_val_loss,▁▃█
train_batch_loss,▇▆▆▆▇▅▆▇▅▅▆▆█▆▃▄▅▆▄▃▂▅▃▄▄▅▅▃▃▂▂▂▁▃▁▁▁▄▂▁
val_accuracy,▁█▇

0,1
avg_train_loss,0.16084
avg_val_loss,0.45728
train_batch_loss,0.03638
val_accuracy,0.85069


[34m[1mwandb[0m: Agent Starting Run: 8p18eh0p with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 4
[34m[1mwandb[0m: 	learning_rate: 2e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  16
Learning_rate =  2e-05
epochs => 4

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of    481.    Elapsed: 0:00:42.
  Batch   240  of    481.    Elapsed: 0:00:50.
  Batch   280  of    481.    Elapsed: 0:00:58.
  Batch   320  of    481.    Elapsed: 0:01:07.
  Batch   360  of    481.    Elapsed: 0:01:15.
  Batch   400  of    481.    Elapsed: 0:01:23.
  Batch   440  of    481.    Elapsed: 0:01:31.
  Batch   480  of    481.    Elapsed: 0:01:39.

  Average training loss: 0.49
  Training epcoh took: 0:01:39

Running Validation...
  Accuracy: 0.83
  Validation Loss: 0.39
  Validation took: 0:00:04

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:17.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of

0,1
avg_train_loss,█▄▂▁
avg_val_loss,▁▂▆█
train_batch_loss,▆▅▆▆▃▆▄▅▃▇▂▄▄▂▃▆▇▄▄▃▁▂▁▂▂▅▁▁█▃▁▁▁▃▁▁▃▄▃▃
val_accuracy,▁▅█▇

0,1
avg_train_loss,0.12253
avg_val_loss,0.61211
train_batch_loss,0.01292
val_accuracy,0.8588


[34m[1mwandb[0m: Agent Starting Run: puni3o0q with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 3
[34m[1mwandb[0m: 	learning_rate: 5e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  32
Learning_rate =  5e-05
epochs => 3

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:42.
  Batch   160  of    241.    Elapsed: 0:00:56.
  Batch   200  of    241.    Elapsed: 0:01:10.
  Batch   240  of    241.    Elapsed: 0:01:24.

  Average training loss: 0.49
  Training epcoh took: 0:01:24

Running Validation...
  Accuracy: 0.83
  Validation Loss: 0.40
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:41.
  Batch   160  of    241.    Elapsed: 0:00:55.
  Batch   200  of    241.    Elapsed: 0:01:09.
  Batch   240  of    241.    Elapsed: 0:01:23.

  Average training loss: 0.26
  Training epcoh took: 0:01:23

Running Validation...
  Accuracy: 0.85
  Validation Loss: 0.42
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0

0,1
avg_train_loss,█▄▁
avg_val_loss,▁▂█
train_batch_loss,▇▇▅▆▆▆▆█▄▅▆▅█▆▃▃▄▅▄▂▃▅▂▅▄▃▆▁▂▂▃▃▂▃▁▁▁▄▃▃
val_accuracy,▁▇█

0,1
avg_train_loss,0.11736
avg_val_loss,0.54011
train_batch_loss,0.01742
val_accuracy,0.85841


[34m[1mwandb[0m: Agent Starting Run: ks69m8hk with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 3
[34m[1mwandb[0m: 	learning_rate: 2e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  32
Learning_rate =  2e-05
epochs => 3

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:42.
  Batch   160  of    241.    Elapsed: 0:00:56.
  Batch   200  of    241.    Elapsed: 0:01:10.
  Batch   240  of    241.    Elapsed: 0:01:23.

  Average training loss: 0.49
  Training epcoh took: 0:01:24

Running Validation...
  Accuracy: 0.82
  Validation Loss: 0.39
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:27.
  Batch   120  of    241.    Elapsed: 0:00:41.
  Batch   160  of    241.    Elapsed: 0:00:55.
  Batch   200  of    241.    Elapsed: 0:01:09.
  Batch   240  of    241.    Elapsed: 0:01:23.

  Average training loss: 0.31
  Training epcoh took: 0:01:23

Running Validation...
  Accuracy: 0.84
  Validation Loss: 0.39
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0

0,1
avg_train_loss,█▄▁
avg_val_loss,▁▁█
train_batch_loss,▇▇▆▆▇▆▅▇▄▅▆▆█▆▄▄▆▅▄▄▂▃▃▄▄▄▄▃▃▃▂▃▂▄▁▁▂▅▂▁
val_accuracy,▁█▇

0,1
avg_train_loss,0.20282
avg_val_loss,0.44122
train_batch_loss,0.10633
val_accuracy,0.83873


[34m[1mwandb[0m: Agent Starting Run: q1j9jt5s with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 2e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  16
Learning_rate =  2e-05
epochs => 2

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of    481.    Elapsed: 0:00:41.
  Batch   240  of    481.    Elapsed: 0:00:50.
  Batch   280  of    481.    Elapsed: 0:00:58.
  Batch   320  of    481.    Elapsed: 0:01:06.
  Batch   360  of    481.    Elapsed: 0:01:15.
  Batch   400  of    481.    Elapsed: 0:01:23.
  Batch   440  of    481.    Elapsed: 0:01:31.
  Batch   480  of    481.    Elapsed: 0:01:39.

  Average training loss: 0.49
  Training epcoh took: 0:01:39

Running Validation...
  Accuracy: 0.83
  Validation Loss: 0.39
  Validation took: 0:00:04

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:17.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of

0,1
avg_train_loss,█▁
avg_val_loss,▁█
train_batch_loss,▆▆▆▆▇▆▅▆▅▆▅▇▇█▃▃▂▅▄█▃▂▅▄▃▂▃▇▃▂▆▄▇▆▅▁▄▅▃▂
val_accuracy,▁█

0,1
avg_train_loss,0.28718
avg_val_loss,0.42802
train_batch_loss,0.08328
val_accuracy,0.83796


[34m[1mwandb[0m: Agent Starting Run: 7t9scuv6 with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 3e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  16
Learning_rate =  3e-05
epochs => 2

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of    481.    Elapsed: 0:00:42.
  Batch   240  of    481.    Elapsed: 0:00:50.
  Batch   280  of    481.    Elapsed: 0:00:58.
  Batch   320  of    481.    Elapsed: 0:01:07.
  Batch   360  of    481.    Elapsed: 0:01:15.
  Batch   400  of    481.    Elapsed: 0:01:23.
  Batch   440  of    481.    Elapsed: 0:01:31.
  Batch   480  of    481.    Elapsed: 0:01:40.

  Average training loss: 0.48
  Training epcoh took: 0:01:40

Running Validation...
  Accuracy: 0.84
  Validation Loss: 0.36
  Validation took: 0:00:04

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:17.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of

VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
avg_train_loss,█▁
avg_val_loss,▁█
train_batch_loss,▆▅▇▆▅▆▄▆▆▆▅▆▆█▃▃▃▆▄▇▃▃▅▄▃▁▂█▂▂▅▂▇▅▅▁▃▄▂▃
val_accuracy,▁█

0,1
avg_train_loss,0.24122
avg_val_loss,0.43609
train_batch_loss,0.07097
val_accuracy,0.85301


[34m[1mwandb[0m: Agent Starting Run: hhabxjpp with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 2e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  16
Learning_rate =  2e-05
epochs => 2

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of    481.    Elapsed: 0:00:41.
  Batch   240  of    481.    Elapsed: 0:00:50.
  Batch   280  of    481.    Elapsed: 0:00:58.
  Batch   320  of    481.    Elapsed: 0:01:06.
  Batch   360  of    481.    Elapsed: 0:01:15.
  Batch   400  of    481.    Elapsed: 0:01:23.
  Batch   440  of    481.    Elapsed: 0:01:31.
  Batch   480  of    481.    Elapsed: 0:01:39.

  Average training loss: 0.48
  Training epcoh took: 0:01:39

Running Validation...
  Accuracy: 0.83
  Validation Loss: 0.38
  Validation took: 0:00:04

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:17.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of

0,1
avg_train_loss,█▁
avg_val_loss,▁█
train_batch_loss,▆▆▇▅▅▅▄▆▅▅▅▅▆▇▃▃▂▅▄▇▃▂▄▄▂▂▃█▂▃▅▄█▇▅▁▄▄▂▂
val_accuracy,▁█

0,1
avg_train_loss,0.28587
avg_val_loss,0.42674
train_batch_loss,0.10137
val_accuracy,0.84954


[34m[1mwandb[0m: Agent Starting Run: ek31pjyp with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 3
[34m[1mwandb[0m: 	learning_rate: 2e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  16
Learning_rate =  2e-05
epochs => 3

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of    481.    Elapsed: 0:00:42.
  Batch   240  of    481.    Elapsed: 0:00:50.
  Batch   280  of    481.    Elapsed: 0:00:58.
  Batch   320  of    481.    Elapsed: 0:01:07.
  Batch   360  of    481.    Elapsed: 0:01:15.
  Batch   400  of    481.    Elapsed: 0:01:23.
  Batch   440  of    481.    Elapsed: 0:01:31.
  Batch   480  of    481.    Elapsed: 0:01:39.

  Average training loss: 0.48
  Training epcoh took: 0:01:39

Running Validation...
  Accuracy: 0.83
  Validation Loss: 0.38
  Validation took: 0:00:04

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:17.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of

VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
avg_train_loss,█▃▁
avg_val_loss,▁▄█
train_batch_loss,█▆▆▆▇▆▆█▅▄▃▆█▆▄▄▃▄▃▄▆▇▆▄▄▂▃▁▁▁▂▁▄▃▁▁▁▁▁▅
val_accuracy,▁▇█

0,1
avg_train_loss,0.17092
avg_val_loss,0.56861
train_batch_loss,0.01621
val_accuracy,0.85069


[34m[1mwandb[0m: Agent Starting Run: 502rvmt2 with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 2e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  32
Learning_rate =  2e-05
epochs => 2

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:42.
  Batch   160  of    241.    Elapsed: 0:00:56.
  Batch   200  of    241.    Elapsed: 0:01:10.
  Batch   240  of    241.    Elapsed: 0:01:24.

  Average training loss: 0.49
  Training epcoh took: 0:01:24

Running Validation...
  Accuracy: 0.82
  Validation Loss: 0.39
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:42.
  Batch   160  of    241.    Elapsed: 0:00:56.
  Batch   200  of    241.    Elapsed: 0:01:09.
  Batch   240  of    241.    Elapsed: 0:01:23.

  Average training loss: 0.32
  Training epcoh took: 0:01:23

Running Validation...
  Accuracy: 0.84
  Validation Loss: 0.40
  Validation took: 0:00:03

Training complete!
Total training took 0:02:54 (h:m

0,1
avg_train_loss,█▁
avg_val_loss,▁█
train_batch_loss,▆▆█▆▇▄▅▆▄▅▇▅▅█▂▅▂▅▄▇▅▃▄▃▅▄▄▃▁█▁▁▂▃▃▃▂▃▃▃
val_accuracy,▁█

0,1
avg_train_loss,0.31612
avg_val_loss,0.39635
train_batch_loss,0.28104
val_accuracy,0.83873


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: zv81wrsy with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 2e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  16
Learning_rate =  2e-05
epochs => 2

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:24.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of    481.    Elapsed: 0:00:41.
  Batch   240  of    481.    Elapsed: 0:00:50.
  Batch   280  of    481.    Elapsed: 0:00:58.
  Batch   320  of    481.    Elapsed: 0:01:06.
  Batch   360  of    481.    Elapsed: 0:01:15.
  Batch   400  of    481.    Elapsed: 0:01:23.
  Batch   440  of    481.    Elapsed: 0:01:31.
  Batch   480  of    481.    Elapsed: 0:01:39.

  Average training loss: 0.48
  Training epcoh took: 0:01:39

Running Validation...
  Accuracy: 0.83
  Validation Loss: 0.38
  Validation took: 0:00:04

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of

0,1
avg_train_loss,█▁
avg_val_loss,▁█
train_batch_loss,▆▆▇▅▅▅▄▆▅▅▅▅▆▇▃▃▂▅▄▇▃▂▄▄▂▂▃█▂▃▅▄█▇▅▁▄▄▂▂
val_accuracy,▁█

0,1
avg_train_loss,0.28587
avg_val_loss,0.42674
train_batch_loss,0.10137
val_accuracy,0.84954


[34m[1mwandb[0m: Agent Starting Run: id3xoa5n with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 3e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  32
Learning_rate =  3e-05
epochs => 2

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:42.
  Batch   160  of    241.    Elapsed: 0:00:56.
  Batch   200  of    241.    Elapsed: 0:01:10.
  Batch   240  of    241.    Elapsed: 0:01:23.

  Average training loss: 0.49
  Training epcoh took: 0:01:24

Running Validation...
  Accuracy: 0.82
  Validation Loss: 0.38
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:42.
  Batch   160  of    241.    Elapsed: 0:00:55.
  Batch   200  of    241.    Elapsed: 0:01:09.
  Batch   240  of    241.    Elapsed: 0:01:23.

  Average training loss: 0.28
  Training epcoh took: 0:01:23

Running Validation...
  Accuracy: 0.84
  Validation Loss: 0.39
  Validation took: 0:00:03

Training complete!
Total training took 0:02:53 (h:m

0,1
avg_train_loss,█▁
avg_val_loss,▁█
train_batch_loss,█▅▇▆█▅▄▇▄▅▆▅▅▇▃▅▃▅▅▆▃▂▄▂▅▃▅▂▁▇▁▂▂▄▂▃▂▃▃▄
val_accuracy,▁█

0,1
avg_train_loss,0.27828
avg_val_loss,0.38883
train_batch_loss,0.18283
val_accuracy,0.84452


[34m[1mwandb[0m: Agent Starting Run: as03ps4m with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 3
[34m[1mwandb[0m: 	learning_rate: 2e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  32
Learning_rate =  2e-05
epochs => 3

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:42.
  Batch   160  of    241.    Elapsed: 0:00:56.
  Batch   200  of    241.    Elapsed: 0:01:10.
  Batch   240  of    241.    Elapsed: 0:01:23.

  Average training loss: 0.50
  Training epcoh took: 0:01:24

Running Validation...
  Accuracy: 0.82
  Validation Loss: 0.39
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:42.
  Batch   160  of    241.    Elapsed: 0:00:55.
  Batch   200  of    241.    Elapsed: 0:01:09.
  Batch   240  of    241.    Elapsed: 0:01:23.

  Average training loss: 0.32
  Training epcoh took: 0:01:23

Running Validation...
  Accuracy: 0.85
  Validation Loss: 0.39
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0

0,1
avg_train_loss,█▄▁
avg_val_loss,▁▁█
train_batch_loss,█▇▇▆▇▆▆█▅▆▅▆█▇▃▄▆▆▄▄▂▅▃▅▄▅▆▃▃▅▂▃▂▆▃▁▂▃▄▁
val_accuracy,▁▇█

0,1
avg_train_loss,0.21191
avg_val_loss,0.43598
train_batch_loss,0.07389
val_accuracy,0.84992


[34m[1mwandb[0m: Agent Starting Run: 5i99dv3m with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 4
[34m[1mwandb[0m: 	learning_rate: 5e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  16
Learning_rate =  5e-05
epochs => 4

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of    481.    Elapsed: 0:00:42.
  Batch   240  of    481.    Elapsed: 0:00:50.
  Batch   280  of    481.    Elapsed: 0:00:58.
  Batch   320  of    481.    Elapsed: 0:01:06.
  Batch   360  of    481.    Elapsed: 0:01:15.
  Batch   400  of    481.    Elapsed: 0:01:23.
  Batch   440  of    481.    Elapsed: 0:01:31.
  Batch   480  of    481.    Elapsed: 0:01:39.

  Average training loss: 0.53
  Training epcoh took: 0:01:39

Running Validation...
  Accuracy: 0.80
  Validation Loss: 0.44
  Validation took: 0:00:04

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:17.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of

0,1
avg_train_loss,█▅▂▁
avg_val_loss,▁▁▆█
train_batch_loss,▇▆█▆▄▇▅▆▄▇▅▅█▄▅▆▆▇▄▂▁▁▁▁▃▃▂▂█▆▁▁▃▄▁▁▁▄▄▄
val_accuracy,▁▅▇█

0,1
avg_train_loss,0.12617
avg_val_loss,0.64476
train_batch_loss,0.09967
val_accuracy,0.84606


[34m[1mwandb[0m: Agent Starting Run: 7u5ypcap with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 4
[34m[1mwandb[0m: 	learning_rate: 5e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  32
Learning_rate =  5e-05
epochs => 4

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:42.
  Batch   160  of    241.    Elapsed: 0:00:56.
  Batch   200  of    241.    Elapsed: 0:01:10.
  Batch   240  of    241.    Elapsed: 0:01:23.

  Average training loss: 0.49
  Training epcoh took: 0:01:24

Running Validation...
  Accuracy: 0.83
  Validation Loss: 0.41
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:41.
  Batch   160  of    241.    Elapsed: 0:00:55.
  Batch   200  of    241.    Elapsed: 0:01:09.
  Batch   240  of    241.    Elapsed: 0:01:23.

  Average training loss: 0.27
  Training epcoh took: 0:01:23

Running Validation...
  Accuracy: 0.84
  Validation Loss: 0.42
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0

VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
avg_train_loss,█▄▂▁
avg_val_loss,▁▁▅█
train_batch_loss,▇▇▆▇▅█▄▅▄█▄▄▄▆▂▂▃▅▃▄▂▅▃▃▂▂▁▂▃▂▁▁▁▂▁▁▁▄▁▃
val_accuracy,▁▆▇█

0,1
avg_train_loss,0.07979
avg_val_loss,0.61347
train_batch_loss,0.00402
val_accuracy,0.85185


[34m[1mwandb[0m: Agent Starting Run: ffgpy47u with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 3
[34m[1mwandb[0m: 	learning_rate: 2e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  32
Learning_rate =  2e-05
epochs => 3

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:42.
  Batch   160  of    241.    Elapsed: 0:00:56.
  Batch   200  of    241.    Elapsed: 0:01:10.
  Batch   240  of    241.    Elapsed: 0:01:23.

  Average training loss: 0.50
  Training epcoh took: 0:01:24

Running Validation...
  Accuracy: 0.83
  Validation Loss: 0.39
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:42.
  Batch   160  of    241.    Elapsed: 0:00:56.
  Batch   200  of    241.    Elapsed: 0:01:09.
  Batch   240  of    241.    Elapsed: 0:01:23.

  Average training loss: 0.32
  Training epcoh took: 0:01:23

Running Validation...
  Accuracy: 0.85
  Validation Loss: 0.39
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0

0,1
avg_train_loss,█▄▁
avg_val_loss,▁▁█
train_batch_loss,▇▇▅▆▇▅▅▇▄▅▆▆█▆▄▄▆▅▃▄▂▅▂▄▄▄▅▃▄▅▂▂▂▄▁▁▂▃▃▁
val_accuracy,▁██

0,1
avg_train_loss,0.21344
avg_val_loss,0.43438
train_batch_loss,0.08799
val_accuracy,0.85031


[34m[1mwandb[0m: Agent Starting Run: gh04i2zk with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 2e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  32
Learning_rate =  2e-05
epochs => 2

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:42.
  Batch   160  of    241.    Elapsed: 0:00:56.
  Batch   200  of    241.    Elapsed: 0:01:10.
  Batch   240  of    241.    Elapsed: 0:01:23.

  Average training loss: 0.49
  Training epcoh took: 0:01:24

Running Validation...
  Accuracy: 0.82
  Validation Loss: 0.39
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:41.
  Batch   160  of    241.    Elapsed: 0:00:55.
  Batch   200  of    241.    Elapsed: 0:01:09.
  Batch   240  of    241.    Elapsed: 0:01:23.

  Average training loss: 0.32
  Training epcoh took: 0:01:23

Running Validation...
  Accuracy: 0.84
  Validation Loss: 0.40
  Validation took: 0:00:03

Training complete!
Total training took 0:02:53 (h:m

0,1
avg_train_loss,█▁
avg_val_loss,▁█
train_batch_loss,▆▆█▆▇▄▅▆▄▅▇▅▅█▂▅▂▅▄▇▅▃▄▃▅▄▄▃▁█▁▁▂▃▃▃▂▃▃▃
val_accuracy,▁█

0,1
avg_train_loss,0.31612
avg_val_loss,0.39635
train_batch_loss,0.28104
val_accuracy,0.83873


[34m[1mwandb[0m: Agent Starting Run: kiv6gtpe with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 5e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  16
Learning_rate =  5e-05
epochs => 2

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:24.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of    481.    Elapsed: 0:00:41.
  Batch   240  of    481.    Elapsed: 0:00:50.
  Batch   280  of    481.    Elapsed: 0:00:58.
  Batch   320  of    481.    Elapsed: 0:01:06.
  Batch   360  of    481.    Elapsed: 0:01:15.
  Batch   400  of    481.    Elapsed: 0:01:23.
  Batch   440  of    481.    Elapsed: 0:01:31.
  Batch   480  of    481.    Elapsed: 0:01:39.

  Average training loss: 0.50
  Training epcoh took: 0:01:39

Running Validation...
  Accuracy: 0.84
  Validation Loss: 0.39
  Validation took: 0:00:04

Training...
  Batch    40  of    481.    Elapsed: 0:00:09.
  Batch    80  of    481.    Elapsed: 0:00:17.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of

VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
avg_train_loss,█▁
avg_val_loss,▁█
train_batch_loss,▅▆▇▅▆▆▄▆▅▅▆▄▆▆▃▃▂▇▅▇▃▆█▄▃▃▁▇▃▅▂▂▅▄▅▃▄▃▃▂
val_accuracy,█▁

0,1
avg_train_loss,0.25894
avg_val_loss,0.44929
train_batch_loss,0.09801
val_accuracy,0.83449


[34m[1mwandb[0m: Agent Starting Run: rxqka8oz with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 4
[34m[1mwandb[0m: 	learning_rate: 5e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  32
Learning_rate =  5e-05
epochs => 4

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:42.
  Batch   160  of    241.    Elapsed: 0:00:56.
  Batch   200  of    241.    Elapsed: 0:01:10.
  Batch   240  of    241.    Elapsed: 0:01:24.

  Average training loss: 0.49
  Training epcoh took: 0:01:24

Running Validation...
  Accuracy: 0.82
  Validation Loss: 0.40
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:42.
  Batch   160  of    241.    Elapsed: 0:00:55.
  Batch   200  of    241.    Elapsed: 0:01:09.
  Batch   240  of    241.    Elapsed: 0:01:23.

  Average training loss: 0.26
  Training epcoh took: 0:01:23

Running Validation...
  Accuracy: 0.84
  Validation Loss: 0.42
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0

VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
avg_train_loss,█▄▂▁
avg_val_loss,▁▁▅█
train_batch_loss,▇▇▆▇▆█▅▆▄▇▆▄▆▆▃▃▃▃▄▄▂▆▁▃▁▁▁▁▃▁▁▃▁▁▁▁▁▃▁▃
val_accuracy,▁▅▇█

0,1
avg_train_loss,0.07968
avg_val_loss,0.61346
train_batch_loss,0.00507
val_accuracy,0.85494


[34m[1mwandb[0m: Agent Starting Run: 15wxoiht with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 3
[34m[1mwandb[0m: 	learning_rate: 2e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  16
Learning_rate =  2e-05
epochs => 3

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of    481.    Elapsed: 0:00:42.
  Batch   240  of    481.    Elapsed: 0:00:50.
  Batch   280  of    481.    Elapsed: 0:00:58.
  Batch   320  of    481.    Elapsed: 0:01:07.
  Batch   360  of    481.    Elapsed: 0:01:15.
  Batch   400  of    481.    Elapsed: 0:01:23.
  Batch   440  of    481.    Elapsed: 0:01:31.
  Batch   480  of    481.    Elapsed: 0:01:39.

  Average training loss: 0.48
  Training epcoh took: 0:01:40

Running Validation...
  Accuracy: 0.82
  Validation Loss: 0.39
  Validation took: 0:00:04

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of

0,1
avg_train_loss,█▃▁
avg_val_loss,▁▄█
train_batch_loss,▇▅▇▇▇▆▆▇▄▄▃▆█▆▄▄▃▃▃▄▆▆▅▄▄▂▃▁▁▁▃▁▄▂▁▁▁▃▁▂
val_accuracy,▁▇█

0,1
avg_train_loss,0.17129
avg_val_loss,0.54039
train_batch_loss,0.01923
val_accuracy,0.85648


[34m[1mwandb[0m: Agent Starting Run: yc8k1e8i with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 5e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  16
Learning_rate =  5e-05
epochs => 2

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of    481.    Elapsed: 0:00:42.
  Batch   240  of    481.    Elapsed: 0:00:50.
  Batch   280  of    481.    Elapsed: 0:00:58.
  Batch   320  of    481.    Elapsed: 0:01:07.
  Batch   360  of    481.    Elapsed: 0:01:15.
  Batch   400  of    481.    Elapsed: 0:01:23.
  Batch   440  of    481.    Elapsed: 0:01:31.
  Batch   480  of    481.    Elapsed: 0:01:39.

  Average training loss: 0.50
  Training epcoh took: 0:01:39

Running Validation...
  Accuracy: 0.82
  Validation Loss: 0.40
  Validation took: 0:00:04

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of

0,1
avg_train_loss,█▁
avg_val_loss,▁█
train_batch_loss,▄▆▆▆▅▆▃▆▃▅▄▄█▆▃▃▃▆▅▇▂▂▅▃▃▁▁█▃▃▅▂▄▂▄▃▂▃▃▁
val_accuracy,▁█

0,1
avg_train_loss,0.24895
avg_val_loss,0.46483
train_batch_loss,0.08793
val_accuracy,0.84838


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: c6yb91kb with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 4
[34m[1mwandb[0m: 	learning_rate: 2e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  32
Learning_rate =  2e-05
epochs => 4

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:27.
  Batch   120  of    241.    Elapsed: 0:00:42.
  Batch   160  of    241.    Elapsed: 0:00:56.
  Batch   200  of    241.    Elapsed: 0:01:10.
  Batch   240  of    241.    Elapsed: 0:01:24.

  Average training loss: 0.50
  Training epcoh took: 0:01:24

Running Validation...
  Accuracy: 0.83
  Validation Loss: 0.39
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:41.
  Batch   160  of    241.    Elapsed: 0:00:55.
  Batch   200  of    241.    Elapsed: 0:01:09.
  Batch   240  of    241.    Elapsed: 0:01:23.

  Average training loss: 0.32
  Training epcoh took: 0:01:23

Running Validation...
  Accuracy: 0.85
  Validation Loss: 0.38
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0

VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
avg_train_loss,█▄▂▁
avg_val_loss,▂▁▄█
train_batch_loss,██▆▇▅█▅▇▄█▅▅▆▆▃▃▄▅▄▅▃▄▂▃▂▃▁▂▃▂▂▁▁▂▁▂▂▂▁▅
val_accuracy,▁█▆█

0,1
avg_train_loss,0.13615
avg_val_loss,0.4838
train_batch_loss,0.01934
val_accuracy,0.85571


[34m[1mwandb[0m: Agent Starting Run: j9nise8n with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 3e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  16
Learning_rate =  3e-05
epochs => 2

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of    481.    Elapsed: 0:00:42.
  Batch   240  of    481.    Elapsed: 0:00:50.
  Batch   280  of    481.    Elapsed: 0:00:58.
  Batch   320  of    481.    Elapsed: 0:01:07.
  Batch   360  of    481.    Elapsed: 0:01:15.
  Batch   400  of    481.    Elapsed: 0:01:23.
  Batch   440  of    481.    Elapsed: 0:01:31.
  Batch   480  of    481.    Elapsed: 0:01:39.

  Average training loss: 0.48
  Training epcoh took: 0:01:40

Running Validation...
  Accuracy: 0.83
  Validation Loss: 0.39
  Validation took: 0:00:04

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of

VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
avg_train_loss,█▁
avg_val_loss,▁█
train_batch_loss,▅▅▆▆▅▆▄▆▄▅▆▅▅▇▃▃▂▆▄█▂▂▅▄▂▁▂▅▂▃▅▃▅▆▄▁▃▄▂▃
val_accuracy,▁█

0,1
avg_train_loss,0.25152
avg_val_loss,0.43946
train_batch_loss,0.06903
val_accuracy,0.84838


[34m[1mwandb[0m: Agent Starting Run: pvidnej3 with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 3e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  32
Learning_rate =  3e-05
epochs => 2

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:42.
  Batch   160  of    241.    Elapsed: 0:00:56.
  Batch   200  of    241.    Elapsed: 0:01:10.
  Batch   240  of    241.    Elapsed: 0:01:23.

  Average training loss: 0.49
  Training epcoh took: 0:01:24

Running Validation...
  Accuracy: 0.82
  Validation Loss: 0.38
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:41.
  Batch   160  of    241.    Elapsed: 0:00:55.
  Batch   200  of    241.    Elapsed: 0:01:09.
  Batch   240  of    241.    Elapsed: 0:01:23.

  Average training loss: 0.28
  Training epcoh took: 0:01:23

Running Validation...
  Accuracy: 0.84
  Validation Loss: 0.39
  Validation took: 0:00:03

Training complete!
Total training took 0:02:53 (h:m

0,1
avg_train_loss,█▁
avg_val_loss,▁█
train_batch_loss,█▅▇▆█▅▄▇▄▅▆▅▅▇▃▅▃▅▅▆▃▂▄▂▅▃▅▂▁▇▁▂▂▄▂▃▂▃▃▄
val_accuracy,▁█

0,1
avg_train_loss,0.27828
avg_val_loss,0.38883
train_batch_loss,0.18283
val_accuracy,0.84452


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: nfwikjh6 with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 3
[34m[1mwandb[0m: 	learning_rate: 5e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  16
Learning_rate =  5e-05
epochs => 3

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of    481.    Elapsed: 0:00:41.
  Batch   240  of    481.    Elapsed: 0:00:50.
  Batch   280  of    481.    Elapsed: 0:00:58.
  Batch   320  of    481.    Elapsed: 0:01:07.
  Batch   360  of    481.    Elapsed: 0:01:15.
  Batch   400  of    481.    Elapsed: 0:01:23.
  Batch   440  of    481.    Elapsed: 0:01:31.
  Batch   480  of    481.    Elapsed: 0:01:39.

  Average training loss: 0.50
  Training epcoh took: 0:01:39

Running Validation...
  Accuracy: 0.83
  Validation Loss: 0.39
  Validation took: 0:00:04

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:17.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of

VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
avg_train_loss,█▃▁
avg_val_loss,▁▁█
train_batch_loss,▆▄▅▇▅▄▄▇▄▄▃▆█▆▄▆▃▂▂▅▂▄▃▃▂▃▃▁▁▁▁▁▄▁▁▂▁▁▁▁
val_accuracy,▁▅█

0,1
avg_train_loss,0.14128
avg_val_loss,0.64564
train_batch_loss,0.24841
val_accuracy,0.83796


[34m[1mwandb[0m: Agent Starting Run: 7020rqqv with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 2e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  32
Learning_rate =  2e-05
epochs => 2

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:42.
  Batch   160  of    241.    Elapsed: 0:00:56.
  Batch   200  of    241.    Elapsed: 0:01:10.
  Batch   240  of    241.    Elapsed: 0:01:23.

  Average training loss: 0.49
  Training epcoh took: 0:01:24

Running Validation...
  Accuracy: 0.82
  Validation Loss: 0.39
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:42.
  Batch   160  of    241.    Elapsed: 0:00:55.
  Batch   200  of    241.    Elapsed: 0:01:09.
  Batch   240  of    241.    Elapsed: 0:01:23.

  Average training loss: 0.32
  Training epcoh took: 0:01:23

Running Validation...
  Accuracy: 0.84
  Validation Loss: 0.40
  Validation took: 0:00:03

Training complete!
Total training took 0:02:53 (h:m

0,1
avg_train_loss,█▁
avg_val_loss,▁█
train_batch_loss,▆▆█▆▇▄▅▆▄▅▇▅▅█▂▅▂▅▄▇▅▃▄▃▅▄▄▃▁█▁▁▂▃▃▃▂▃▃▃
val_accuracy,▁█

0,1
avg_train_loss,0.31612
avg_val_loss,0.39635
train_batch_loss,0.28104
val_accuracy,0.83873


[34m[1mwandb[0m: Agent Starting Run: hsbzw65a with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 4
[34m[1mwandb[0m: 	learning_rate: 3e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  16
Learning_rate =  3e-05
epochs => 4

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of    481.    Elapsed: 0:00:42.
  Batch   240  of    481.    Elapsed: 0:00:50.
  Batch   280  of    481.    Elapsed: 0:00:58.
  Batch   320  of    481.    Elapsed: 0:01:06.
  Batch   360  of    481.    Elapsed: 0:01:15.
  Batch   400  of    481.    Elapsed: 0:01:23.
  Batch   440  of    481.    Elapsed: 0:01:31.
  Batch   480  of    481.    Elapsed: 0:01:39.

  Average training loss: 0.48
  Training epcoh took: 0:01:39

Running Validation...
  Accuracy: 0.83
  Validation Loss: 0.38
  Validation took: 0:00:04

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of

0,1
avg_train_loss,█▄▂▁
avg_val_loss,▁▃▆█
train_batch_loss,▆▅▆▇▃▇▄▅▃▆▃▄▄▂▂▃▆▅▃▁▁▂▁▁▃▂▁▁█▄▁▁▁▄▁▁▁▁▄▃
val_accuracy,▁▅▆█

0,1
avg_train_loss,0.09121
avg_val_loss,0.67923
train_batch_loss,0.00919
val_accuracy,0.86227


[34m[1mwandb[0m: Agent Starting Run: ivmo4j6h with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 3e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  32
Learning_rate =  3e-05
epochs => 2

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:42.
  Batch   160  of    241.    Elapsed: 0:00:56.
  Batch   200  of    241.    Elapsed: 0:01:10.
  Batch   240  of    241.    Elapsed: 0:01:23.

  Average training loss: 0.49
  Training epcoh took: 0:01:24

Running Validation...
  Accuracy: 0.83
  Validation Loss: 0.38
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:42.
  Batch   160  of    241.    Elapsed: 0:00:55.
  Batch   200  of    241.    Elapsed: 0:01:09.
  Batch   240  of    241.    Elapsed: 0:01:23.

  Average training loss: 0.28
  Training epcoh took: 0:01:23

Running Validation...
  Accuracy: 0.84
  Validation Loss: 0.39
  Validation took: 0:00:03

Training complete!
Total training took 0:02:53 (h:m

0,1
avg_train_loss,█▁
avg_val_loss,▁█
train_batch_loss,▆▆▆▅█▅▅▆▅▅▆▅▆▇▃▆▃▅▅▇▅▃▅▂▅▄▄▃▂▇▁▂▃▅▂▃▃▃▃▄
val_accuracy,▁█

0,1
avg_train_loss,0.28226
avg_val_loss,0.39011
train_batch_loss,0.21012
val_accuracy,0.84336


[34m[1mwandb[0m: Agent Starting Run: nnh276k3 with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 2e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  32
Learning_rate =  2e-05
epochs => 2

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:42.
  Batch   160  of    241.    Elapsed: 0:00:56.
  Batch   200  of    241.    Elapsed: 0:01:10.
  Batch   240  of    241.    Elapsed: 0:01:23.

  Average training loss: 0.50
  Training epcoh took: 0:01:24

Running Validation...
  Accuracy: 0.83
  Validation Loss: 0.39
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:41.
  Batch   160  of    241.    Elapsed: 0:00:55.
  Batch   200  of    241.    Elapsed: 0:01:09.
  Batch   240  of    241.    Elapsed: 0:01:23.

  Average training loss: 0.33
  Training epcoh took: 0:01:23

Running Validation...
  Accuracy: 0.83
  Validation Loss: 0.39
  Validation took: 0:00:03

Training complete!
Total training took 0:02:53 (h:m

0,1
avg_train_loss,█▁
avg_val_loss,█▁
train_batch_loss,█▅▇▅▆▄▄▆▃▄▆▅▄▇▂▄▃▄▄▅▃▂▃▂▅▄▃▃▁▆▁▁▂▂▂▃▂▃▃▄
val_accuracy,▁█

0,1
avg_train_loss,0.33324
avg_val_loss,0.39238
train_batch_loss,0.35099
val_accuracy,0.82948


[34m[1mwandb[0m: Agent Starting Run: y8s6y8y6 with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 4
[34m[1mwandb[0m: 	learning_rate: 5e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  32
Learning_rate =  5e-05
epochs => 4

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:42.
  Batch   160  of    241.    Elapsed: 0:00:56.
  Batch   200  of    241.    Elapsed: 0:01:10.
  Batch   240  of    241.    Elapsed: 0:01:23.

  Average training loss: 0.49
  Training epcoh took: 0:01:24

Running Validation...
  Accuracy: 0.82
  Validation Loss: 0.40
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:41.
  Batch   160  of    241.    Elapsed: 0:00:55.
  Batch   200  of    241.    Elapsed: 0:01:09.
  Batch   240  of    241.    Elapsed: 0:01:23.

  Average training loss: 0.26
  Training epcoh took: 0:01:23

Running Validation...
  Accuracy: 0.84
  Validation Loss: 0.42
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0

0,1
avg_train_loss,█▄▂▁
avg_val_loss,▁▁▅█
train_batch_loss,▇▇▆▇▆█▅▆▄▇▆▄▆▆▃▃▃▃▄▄▂▆▁▃▁▁▁▁▃▁▁▃▁▁▁▁▁▃▁▃
val_accuracy,▁▅▇█

0,1
avg_train_loss,0.07968
avg_val_loss,0.61346
train_batch_loss,0.00507
val_accuracy,0.85494


[34m[1mwandb[0m: Agent Starting Run: vemdy4bu with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 3
[34m[1mwandb[0m: 	learning_rate: 2e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  32
Learning_rate =  2e-05
epochs => 3

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:42.
  Batch   160  of    241.    Elapsed: 0:00:56.
  Batch   200  of    241.    Elapsed: 0:01:10.
  Batch   240  of    241.    Elapsed: 0:01:23.

  Average training loss: 0.50
  Training epcoh took: 0:01:24

Running Validation...
  Accuracy: 0.83
  Validation Loss: 0.39
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:41.
  Batch   160  of    241.    Elapsed: 0:00:55.
  Batch   200  of    241.    Elapsed: 0:01:09.
  Batch   240  of    241.    Elapsed: 0:01:23.

  Average training loss: 0.32
  Training epcoh took: 0:01:23

Running Validation...
  Accuracy: 0.85
  Validation Loss: 0.39
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0

0,1
avg_train_loss,█▄▁
avg_val_loss,▁▁█
train_batch_loss,▇▇▅▆▇▅▅▇▄▅▆▆█▆▄▄▆▅▃▄▂▅▂▄▄▄▅▃▄▅▂▂▂▄▁▁▂▃▃▁
val_accuracy,▁██

0,1
avg_train_loss,0.21344
avg_val_loss,0.43438
train_batch_loss,0.08799
val_accuracy,0.85031


[34m[1mwandb[0m: Agent Starting Run: qs3vdv3t with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 3
[34m[1mwandb[0m: 	learning_rate: 2e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  32
Learning_rate =  2e-05
epochs => 3

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:27.
  Batch   120  of    241.    Elapsed: 0:00:42.
  Batch   160  of    241.    Elapsed: 0:00:56.
  Batch   200  of    241.    Elapsed: 0:01:10.
  Batch   240  of    241.    Elapsed: 0:01:23.

  Average training loss: 0.49
  Training epcoh took: 0:01:24

Running Validation...
  Accuracy: 0.82
  Validation Loss: 0.39
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:42.
  Batch   160  of    241.    Elapsed: 0:00:56.
  Batch   200  of    241.    Elapsed: 0:01:10.
  Batch   240  of    241.    Elapsed: 0:01:24.

  Average training loss: 0.31
  Training epcoh took: 0:01:24

Running Validation...
  Accuracy: 0.84
  Validation Loss: 0.39
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0

0,1
avg_train_loss,█▄▁
avg_val_loss,▁▁█
train_batch_loss,▇▇▆▆▇▆▅▇▄▅▆▆█▆▄▄▆▅▄▄▂▃▃▄▄▄▄▃▃▃▂▃▂▄▁▁▂▅▂▁
val_accuracy,▁█▇

0,1
avg_train_loss,0.20282
avg_val_loss,0.44122
train_batch_loss,0.10633
val_accuracy,0.83873


[34m[1mwandb[0m: Agent Starting Run: y2yvxs8g with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 4
[34m[1mwandb[0m: 	learning_rate: 2e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  16
Learning_rate =  2e-05
epochs => 4

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of    481.    Elapsed: 0:00:42.
  Batch   240  of    481.    Elapsed: 0:00:50.
  Batch   280  of    481.    Elapsed: 0:00:58.
  Batch   320  of    481.    Elapsed: 0:01:06.
  Batch   360  of    481.    Elapsed: 0:01:15.
  Batch   400  of    481.    Elapsed: 0:01:23.
  Batch   440  of    481.    Elapsed: 0:01:31.
  Batch   480  of    481.    Elapsed: 0:01:39.

  Average training loss: 0.49
  Training epcoh took: 0:01:39

Running Validation...
  Accuracy: 0.83
  Validation Loss: 0.39
  Validation took: 0:00:04

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:17.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of

0,1
avg_train_loss,█▄▂▁
avg_val_loss,▁▂▆█
train_batch_loss,▆▅▆▆▃▆▄▅▃▇▂▄▄▂▃▆▇▄▄▃▁▂▁▂▂▅▁▁█▃▁▁▁▃▁▁▃▄▃▃
val_accuracy,▁▅█▇

0,1
avg_train_loss,0.12253
avg_val_loss,0.61211
train_batch_loss,0.01292
val_accuracy,0.8588


[34m[1mwandb[0m: Agent Starting Run: lbccxvyo with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 3e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  32
Learning_rate =  3e-05
epochs => 2

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:42.
  Batch   160  of    241.    Elapsed: 0:00:56.
  Batch   200  of    241.    Elapsed: 0:01:10.
  Batch   240  of    241.    Elapsed: 0:01:23.

  Average training loss: 0.49
  Training epcoh took: 0:01:24

Running Validation...
  Accuracy: 0.83
  Validation Loss: 0.38
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:42.
  Batch   160  of    241.    Elapsed: 0:00:55.
  Batch   200  of    241.    Elapsed: 0:01:09.
  Batch   240  of    241.    Elapsed: 0:01:23.

  Average training loss: 0.28
  Training epcoh took: 0:01:23

Running Validation...
  Accuracy: 0.84
  Validation Loss: 0.39
  Validation took: 0:00:03

Training complete!
Total training took 0:02:54 (h:m

0,1
avg_train_loss,█▁
avg_val_loss,▁█
train_batch_loss,▆▆▆▅█▅▅▆▅▅▆▅▆▇▃▆▃▅▅▇▅▃▅▂▅▄▄▃▂▇▁▂▃▅▂▃▃▃▃▄
val_accuracy,▁█

0,1
avg_train_loss,0.28226
avg_val_loss,0.39011
train_batch_loss,0.21012
val_accuracy,0.84336


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: mcs3l0f2 with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 3
[34m[1mwandb[0m: 	learning_rate: 3e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  16
Learning_rate =  3e-05
epochs => 3

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of    481.    Elapsed: 0:00:41.
  Batch   240  of    481.    Elapsed: 0:00:50.
  Batch   280  of    481.    Elapsed: 0:00:58.
  Batch   320  of    481.    Elapsed: 0:01:06.
  Batch   360  of    481.    Elapsed: 0:01:15.
  Batch   400  of    481.    Elapsed: 0:01:23.
  Batch   440  of    481.    Elapsed: 0:01:31.
  Batch   480  of    481.    Elapsed: 0:01:39.

  Average training loss: 0.48
  Training epcoh took: 0:01:39

Running Validation...
  Accuracy: 0.83
  Validation Loss: 0.37
  Validation took: 0:00:04

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of

VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
avg_train_loss,█▃▁
avg_val_loss,▁▃█
train_batch_loss,█▇▇▆█▅▅█▄▅▃▇█▆▄▄▄▂▃▃▅▆▄▃▃▂▄▁▁▁▃▁▂▁▁▄▂▁▁▄
val_accuracy,▁▄█

0,1
avg_train_loss,0.12854
avg_val_loss,0.61668
train_batch_loss,0.01585
val_accuracy,0.85301


[34m[1mwandb[0m: Agent Starting Run: 4qkqab9u with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 3
[34m[1mwandb[0m: 	learning_rate: 2e-05


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  32
Learning_rate =  2e-05
epochs => 3

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:42.
  Batch   160  of    241.    Elapsed: 0:00:56.
  Batch   200  of    241.    Elapsed: 0:01:10.
  Batch   240  of    241.    Elapsed: 0:01:24.

  Average training loss: 0.49
  Training epcoh took: 0:01:24

Running Validation...
  Accuracy: 0.82
  Validation Loss: 0.39
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:41.
  Batch   160  of    241.    Elapsed: 0:00:55.
  Batch   200  of    241.    Elapsed: 0:01:09.
  Batch   240  of    241.    Elapsed: 0:01:23.

  Average training loss: 0.31
  Training epcoh took: 0:01:23

Running Validation...
  Accuracy: 0.84
  Validation Loss: 0.39
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0

0,1
avg_train_loss,█▄▁
avg_val_loss,▁▁█
train_batch_loss,▇▇▆▆▇▆▅▇▄▅▆▆█▆▄▄▆▅▄▄▂▃▃▄▄▄▄▃▃▃▂▃▂▄▁▁▂▅▂▁
val_accuracy,▁█▇

0,1
avg_train_loss,0.20282
avg_val_loss,0.44122
train_batch_loss,0.10633
val_accuracy,0.83873


[34m[1mwandb[0m: Agent Starting Run: jx7mqe0e with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 3
[34m[1mwandb[0m: 	learning_rate: 2e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  16
Learning_rate =  2e-05
epochs => 3

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of    481.    Elapsed: 0:00:42.
  Batch   240  of    481.    Elapsed: 0:00:50.
  Batch   280  of    481.    Elapsed: 0:00:58.
  Batch   320  of    481.    Elapsed: 0:01:07.
  Batch   360  of    481.    Elapsed: 0:01:15.
  Batch   400  of    481.    Elapsed: 0:01:23.
  Batch   440  of    481.    Elapsed: 0:01:31.
  Batch   480  of    481.    Elapsed: 0:01:39.

  Average training loss: 0.49
  Training epcoh took: 0:01:40

Running Validation...
  Accuracy: 0.83
  Validation Loss: 0.39
  Validation took: 0:00:04

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:17.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of

0,1
avg_train_loss,█▃▁
avg_val_loss,▁▃█
train_batch_loss,▆▆▆▆▇▆▅▇▄▄▃▅█▆▅▄▄▃▃▃▆▅▅▄▅▃▂▂▁▁▂▁▃▁▁▁▁▁▁▄
val_accuracy,▁▆█

0,1
avg_train_loss,0.1768
avg_val_loss,0.54523
train_batch_loss,0.06812
val_accuracy,0.8588


[34m[1mwandb[0m: Agent Starting Run: cjrrftk9 with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 3
[34m[1mwandb[0m: 	learning_rate: 3e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  32
Learning_rate =  3e-05
epochs => 3

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:42.
  Batch   160  of    241.    Elapsed: 0:00:56.
  Batch   200  of    241.    Elapsed: 0:01:10.
  Batch   240  of    241.    Elapsed: 0:01:23.

  Average training loss: 0.48
  Training epcoh took: 0:01:24

Running Validation...
  Accuracy: 0.82
  Validation Loss: 0.39
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:42.
  Batch   160  of    241.    Elapsed: 0:00:55.
  Batch   200  of    241.    Elapsed: 0:01:09.
  Batch   240  of    241.    Elapsed: 0:01:23.

  Average training loss: 0.27
  Training epcoh took: 0:01:23

Running Validation...
  Accuracy: 0.85
  Validation Loss: 0.39
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0

0,1
avg_train_loss,█▄▁
avg_val_loss,▁▁█
train_batch_loss,▇▇▆▅▇▆▅█▄▅▆▆█▆▃▄▅▅▃▃▂▄▂▅▄▄▅▂▂▂▂▃▁▃▂▁▁▄▂▂
val_accuracy,▁▇█

0,1
avg_train_loss,0.14639
avg_val_loss,0.48281
train_batch_loss,0.09459
val_accuracy,0.85069


[34m[1mwandb[0m: Agent Starting Run: kt3p1vlq with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 3
[34m[1mwandb[0m: 	learning_rate: 5e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  16
Learning_rate =  5e-05
epochs => 3

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of    481.    Elapsed: 0:00:42.
  Batch   240  of    481.    Elapsed: 0:00:50.
  Batch   280  of    481.    Elapsed: 0:00:58.
  Batch   320  of    481.    Elapsed: 0:01:07.
  Batch   360  of    481.    Elapsed: 0:01:15.
  Batch   400  of    481.    Elapsed: 0:01:23.
  Batch   440  of    481.    Elapsed: 0:01:31.
  Batch   480  of    481.    Elapsed: 0:01:39.

  Average training loss: 0.50
  Training epcoh took: 0:01:40

Running Validation...
  Accuracy: 0.82
  Validation Loss: 0.41
  Validation took: 0:00:04

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of

0,1
avg_train_loss,█▄▁
avg_val_loss,▁▄█
train_batch_loss,▅▅▅▅▆▄▄▆▃▄▃▆█▄▃▄▃▃▂▄▄▄▃▃▃▂▄▁▁▁▁▁▄▁▁▁▁▁▁▁
val_accuracy,▁▅█

0,1
avg_train_loss,0.14244
avg_val_loss,0.59522
train_batch_loss,0.01906
val_accuracy,0.85185


[34m[1mwandb[0m: Agent Starting Run: ddodmwsn with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 4
[34m[1mwandb[0m: 	learning_rate: 2e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  16
Learning_rate =  2e-05
epochs => 4

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of    481.    Elapsed: 0:00:42.
  Batch   240  of    481.    Elapsed: 0:00:50.
  Batch   280  of    481.    Elapsed: 0:00:58.
  Batch   320  of    481.    Elapsed: 0:01:07.
  Batch   360  of    481.    Elapsed: 0:01:15.
  Batch   400  of    481.    Elapsed: 0:01:23.
  Batch   440  of    481.    Elapsed: 0:01:31.
  Batch   480  of    481.    Elapsed: 0:01:39.

  Average training loss: 0.49
  Training epcoh took: 0:01:40

Running Validation...
  Accuracy: 0.83
  Validation Loss: 0.39
  Validation took: 0:00:04

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of

0,1
avg_train_loss,█▄▂▁
avg_val_loss,▁▂▆█
train_batch_loss,▆▅▆▆▃▆▄▅▃▇▂▄▄▂▃▆▇▄▄▃▁▂▁▂▂▅▁▁█▃▁▁▁▃▁▁▃▄▃▃
val_accuracy,▁▅█▇

0,1
avg_train_loss,0.12253
avg_val_loss,0.61211
train_batch_loss,0.01292
val_accuracy,0.8588


[34m[1mwandb[0m: Agent Starting Run: 1sq851d0 with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 3e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  16
Learning_rate =  3e-05
epochs => 2

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of    481.    Elapsed: 0:00:42.
  Batch   240  of    481.    Elapsed: 0:00:50.
  Batch   280  of    481.    Elapsed: 0:00:58.
  Batch   320  of    481.    Elapsed: 0:01:07.
  Batch   360  of    481.    Elapsed: 0:01:15.
  Batch   400  of    481.    Elapsed: 0:01:23.
  Batch   440  of    481.    Elapsed: 0:01:31.
  Batch   480  of    481.    Elapsed: 0:01:39.

  Average training loss: 0.48
  Training epcoh took: 0:01:40

Running Validation...
  Accuracy: 0.83
  Validation Loss: 0.39
  Validation took: 0:00:04

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of

VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
avg_train_loss,█▁
avg_val_loss,▁█
train_batch_loss,▅▅▆▆▅▆▄▆▄▅▆▅▅▇▃▃▂▆▄█▂▂▅▄▂▁▂▅▂▃▅▃▅▆▄▁▃▄▂▃
val_accuracy,▁█

0,1
avg_train_loss,0.25152
avg_val_loss,0.43946
train_batch_loss,0.06903
val_accuracy,0.84838


[34m[1mwandb[0m: Agent Starting Run: ceg18548 with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 4
[34m[1mwandb[0m: 	learning_rate: 2e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  16
Learning_rate =  2e-05
epochs => 4

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of    481.    Elapsed: 0:00:42.
  Batch   240  of    481.    Elapsed: 0:00:50.
  Batch   280  of    481.    Elapsed: 0:00:58.
  Batch   320  of    481.    Elapsed: 0:01:07.
  Batch   360  of    481.    Elapsed: 0:01:15.
  Batch   400  of    481.    Elapsed: 0:01:23.
  Batch   440  of    481.    Elapsed: 0:01:31.
  Batch   480  of    481.    Elapsed: 0:01:39.

  Average training loss: 0.48
  Training epcoh took: 0:01:39

Running Validation...
  Accuracy: 0.83
  Validation Loss: 0.38
  Validation took: 0:00:04

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:17.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of

0,1
avg_train_loss,█▄▂▁
avg_val_loss,▁▄▆█
train_batch_loss,▇▅▆▆▃▇▄▆▃█▃▅▃▃▃▆█▅▃▂▁▁▁▁▂▅▁▁█▄▁▁▁▃▂▁▂▃▁▄
val_accuracy,▁▆▆█

0,1
avg_train_loss,0.11994
avg_val_loss,0.62618
train_batch_loss,0.01291
val_accuracy,0.86806


[34m[1mwandb[0m: Agent Starting Run: 21lfkcv2 with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 5e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  16
Learning_rate =  5e-05
epochs => 2

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of    481.    Elapsed: 0:00:42.
  Batch   240  of    481.    Elapsed: 0:00:50.
  Batch   280  of    481.    Elapsed: 0:00:58.
  Batch   320  of    481.    Elapsed: 0:01:07.
  Batch   360  of    481.    Elapsed: 0:01:15.
  Batch   400  of    481.    Elapsed: 0:01:23.
  Batch   440  of    481.    Elapsed: 0:01:31.
  Batch   480  of    481.    Elapsed: 0:01:39.

  Average training loss: 0.49
  Training epcoh took: 0:01:40

Running Validation...
  Accuracy: 0.83
  Validation Loss: 0.38
  Validation took: 0:00:04

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of

VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
avg_train_loss,█▁
avg_val_loss,▁█
train_batch_loss,▆▇█▅▆▇▄█▄▆▆▆█▇▃▃▂▇▆█▃▂▃▄▂▂▁▇▁▄▄▁▆▄▆▂▃▁▁▂
val_accuracy,▁█

0,1
avg_train_loss,0.23675
avg_val_loss,0.44762
train_batch_loss,0.06008
val_accuracy,0.84259


[34m[1mwandb[0m: Agent Starting Run: 0l35ohpw with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 3
[34m[1mwandb[0m: 	learning_rate: 2e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  32
Learning_rate =  2e-05
epochs => 3

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:42.
  Batch   160  of    241.    Elapsed: 0:00:56.
  Batch   200  of    241.    Elapsed: 0:01:10.
  Batch   240  of    241.    Elapsed: 0:01:24.

  Average training loss: 0.50
  Training epcoh took: 0:01:24

Running Validation...
  Accuracy: 0.82
  Validation Loss: 0.39
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:42.
  Batch   160  of    241.    Elapsed: 0:00:56.
  Batch   200  of    241.    Elapsed: 0:01:10.
  Batch   240  of    241.    Elapsed: 0:01:24.

  Average training loss: 0.32
  Training epcoh took: 0:01:24

Running Validation...
  Accuracy: 0.85
  Validation Loss: 0.39
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0

VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
avg_train_loss,█▄▁
avg_val_loss,▁▁█
train_batch_loss,█▇▇▆▇▆▆█▅▆▅▆█▇▃▄▆▆▄▄▂▅▃▅▄▅▆▃▃▅▂▃▂▆▃▁▂▃▄▁
val_accuracy,▁▇█

0,1
avg_train_loss,0.21191
avg_val_loss,0.43598
train_batch_loss,0.07389
val_accuracy,0.84992


[34m[1mwandb[0m: Agent Starting Run: 7w7uj6u9 with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 4
[34m[1mwandb[0m: 	learning_rate: 5e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  16
Learning_rate =  5e-05
epochs => 4

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:16.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of    481.    Elapsed: 0:00:42.
  Batch   240  of    481.    Elapsed: 0:00:50.
  Batch   280  of    481.    Elapsed: 0:00:58.
  Batch   320  of    481.    Elapsed: 0:01:07.
  Batch   360  of    481.    Elapsed: 0:01:15.
  Batch   400  of    481.    Elapsed: 0:01:23.
  Batch   440  of    481.    Elapsed: 0:01:31.
  Batch   480  of    481.    Elapsed: 0:01:39.

  Average training loss: 0.53
  Training epcoh took: 0:01:40

Running Validation...
  Accuracy: 0.80
  Validation Loss: 0.44
  Validation took: 0:00:04

Training...
  Batch    40  of    481.    Elapsed: 0:00:08.
  Batch    80  of    481.    Elapsed: 0:00:17.
  Batch   120  of    481.    Elapsed: 0:00:25.
  Batch   160  of    481.    Elapsed: 0:00:33.
  Batch   200  of

0,1
avg_train_loss,█▅▂▁
avg_val_loss,▁▁▆█
train_batch_loss,▇▆█▆▄▇▅▆▄▇▅▅█▄▅▆▆▇▄▂▁▁▁▁▃▃▂▂█▆▁▁▃▄▁▁▁▄▄▄
val_accuracy,▁▅▇█

0,1
avg_train_loss,0.12617
avg_val_loss,0.64476
train_batch_loss,0.09967
val_accuracy,0.84606


[34m[1mwandb[0m: Agent Starting Run: kc0faras with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 4
[34m[1mwandb[0m: 	learning_rate: 3e-05
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


cuda


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

batch_size =  32
Learning_rate =  3e-05
epochs => 4

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:42.
  Batch   160  of    241.    Elapsed: 0:00:56.
  Batch   200  of    241.    Elapsed: 0:01:10.
  Batch   240  of    241.    Elapsed: 0:01:24.

  Average training loss: 0.49
  Training epcoh took: 0:01:24

Running Validation...
  Accuracy: 0.82
  Validation Loss: 0.38
  Validation took: 0:00:03

Training...
  Batch    40  of    241.    Elapsed: 0:00:14.
  Batch    80  of    241.    Elapsed: 0:00:28.
  Batch   120  of    241.    Elapsed: 0:00:42.
  Batch   160  of    241.    Elapsed: 0:00:55.
  Batch   200  of    241.    Elapsed: 0:01:09.
