<a id="1"></a>
# <div style="border: 2px solid #555; color:black; border-radius: 10px; background-color: #0074D9; padding: 10px; font-size: 20px; text-align: center;">Introduction</div>

**Table Of Content:**
* [Introduction](#1)
* [Refactor and Define utils](#2)
* [Refactor Train](#3)
* [Sweeps](#4)


<a id="2"></a>
# <div style="border: 2px solid #555; color:black; border-radius: 10px; background-color: #0074D9; padding: 10px; font-size: 20px; text-align: center;">Refactor and Define utils</div>
* [return top](#1)

In [1]:
%%writefile utils.py
import json
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
import torch
from datasets import Dataset
from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification
from transformers import TrainingArguments, Trainer, EarlyStoppingCallback
import torch.nn.functional as F
from sklearn.model_selection import StratifiedKFold
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
import torch
from datasets import Dataset
import io

def read_data():
    """
    return train , test
    """
    with open("/kaggle/working/artifacts/detect_llm_raw_data:v1/train_df.table.json") as json_data:
        data = json.load(json_data)
        train = pd.DataFrame(data = data["data"],columns=data["columns"])
        json_data.close()

    with open("/kaggle/working/artifacts/detect_llm_raw_data:v1/test_df.table.json") as json_data:
        data = json.load(json_data)
        test = pd.DataFrame(data = data["data"],columns=data["columns"])
        json_data.close()
    return train , test

def preprocess(train=None,test=None):
    """
    return dataset_train, dataset_test
    """
    train.fillna(" ",inplace=True)
    test.fillna(" ",inplace=True)
    train["text"] = train["Question"] + " " + train["Response"]
    test["text"] = test["Question"] + " " + test["Response"]
    df_train = train[["target","text"]]
    df_test = test[["text"]]
    dataset_train = Dataset.from_pandas(df_train)
    dataset_test = Dataset.from_pandas(df_test)
    
    return dataset_train, dataset_test


def dataset_tokenize_n_split(train, dataset_train, dataset_test,model_name):
    """
    return split_train_dataset,split_eval_dataset , tokenized_test , tokenizer
    """
    tokenizer       = AutoTokenizer.from_pretrained(model_name )
    def tokenize_function(examples):
    
        return tokenizer(examples["text"], padding="max_length", truncation=True)

    tokenized_train = dataset_train.map(tokenize_function, batched=True)
    tokenized_test  = dataset_test.map(tokenize_function, batched=True)
    tokenized_train = tokenized_train.remove_columns(['text'])
    tokenized_train = tokenized_train.rename_column("target", "labels")
    tokenized_test = tokenized_test.remove_columns(['text'])

    kf= StratifiedKFold(n_splits=10,shuffle=True,random_state=42)
    for i , (tr_idx,val_idx) in enumerate(kf.split(train,train.target)):
        print(f"Fold : {i}")
        print(f"shape train : {tr_idx.shape}")
        print(f"shape val : {val_idx.shape}")
        break
        
    
    split_train_dataset = tokenized_train.select(tr_idx)
    split_eval_dataset = tokenized_train.select(val_idx)

    return split_train_dataset,split_eval_dataset , tokenized_test , tokenizer

def predict_fn(dataset_ = None):
    
    """
    return mean of all_probabilities (m,7)
    """
    input_ids = dataset_['input_ids']
    # token_type_ids = dataset_['token_type_ids']
    attention_mask = dataset_['attention_mask']

    # Move the input tensors to the GPU
    input_ids = torch.tensor(input_ids).to('cuda:0')
    # token_type_ids = torch.tensor(token_type_ids).to('cuda:0')
    attention_mask = torch.tensor(attention_mask).to('cuda:0')

    # Define batch size
    batch_size = 8

    # Calculate the number of batches
    num_samples = len(input_ids)
    num_batches = (num_samples + batch_size - 1) // batch_size

    # Initialize a list to store the softmax probabilities
    all_probabilities = []

    # Make predictions in batches
    with torch.no_grad():
        for batch in range(num_batches):
            start_idx = batch * batch_size
            end_idx = min((batch + 1) * batch_size, num_samples)

            batch_input_ids = input_ids[start_idx:end_idx]
    #         batch_token_type_ids = token_type_ids[start_idx:end_idx]
            batch_attention_mask = attention_mask[start_idx:end_idx]

            outputs = model(input_ids=batch_input_ids, 
    #                         token_type_ids=batch_token_type_ids, 
                            attention_mask=batch_attention_mask)
            logits = outputs.logits

            # Apply softmax to get probabilities
            probabilities = F.softmax(logits, dim=1)


            all_probabilities.extend(probabilities.tolist())
    return np.concatenate(all_probabilities,axis=0).reshape(dataset_.shape[0],7)


def conf_mat(df_val = None,preds_val = None):
    """
    no return
    """
    plt.figure(figsize=(8,8))
    ConfusionMatrixDisplay.from_predictions(df_val.target,np.argmax(preds_val,axis=1))
    plt.savefig(f"val_conf_matrix.png", format="png")
    plt.show();
    conf = wandb.Image(data_or_path="val_conf_matrix.png")
    wandb.log({"val_conf_matrix": conf})
def create_model(model_name = "distilroberta-base",num_labels = 7):
    """
    return
    """
    model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=num_labels)
    # Specify the GPU device
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    # Move your model to the GPU
    model.to(device);
    
    return model
    


Writing utils.py


<a id="3"></a>
# <div style="border: 2px solid #555; color:black; border-radius: 10px; background-color: #0074D9; padding: 10px; font-size: 20px; text-align: center;">Refactor Train</div>
* [return top](#1)

In [2]:
# %%writefile train.py
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
import torch
from datasets import Dataset
import json
from IPython.display import display
import wandb
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
import matplotlib.pyplot as plt
from transformers import AutoModelForSequenceClassification,TrainerCallback
from transformers import TrainingArguments, Trainer, EarlyStoppingCallback
import torch.nn.functional as F
from utils import *
import io

class WandbMetricsLogger(TrainerCallback):
    def on_evaluate(self, args, state, control, model, metrics):
        # Log metrics to Wandb
        wandb.log(metrics)
        
default_config = {
        'method': 'random',
        'metric': {
        'goal': 'minimize', 
        'name': 'eval_loss'
        },
    }


    # hyperparameters
parameters_dict = {
        'epochs': {
            'value': 2
            },
        'seed': {
            'value': 42
            },
        'batch_size': {
            'values': [4, 8, 16]
            },
        'learning_rate': {
            'distribution': 'log_uniform_values',
            'min': 1e-4,
            'max': 2e-3
        },
        'weight_decay': {
            'values': [0.0, 0.2]
        },
        'learning_sch': {
            'values': ['linear','polynomial','cosine']
        },
        'architecture': {
            'values': ["distilroberta-base","bert-base-uncased","distilbert-base-uncased"]
        },
    }


default_config['parameters'] = parameters_dict

def compute_metrics_fn(eval_preds):
    metrics = dict()

    # Extract the validation loss from eval_preds
    validation_loss = eval_preds.loss
    metrics['validation_loss'] = validation_loss

    return metrics

def parse_args():
    "Overriding default argments"
    argparser = argparse.ArgumentParser(description='Process hyper-parameters')
    argparser.add_argument('--batch_size', type=int, default=default_config.get("parameters").get("batch_size").get("values")[-1],
                           help='batch size')
    argparser.add_argument('--epochs', type=int, default=default_config.get("parameters").get("epochs").get("value"),
                           help='number of training epochs')
    argparser.add_argument('--lr', type=float, default=default_config.get("parameters").get("learning_rate").get("min"),
                           help='learning rate')
    argparser.add_argument('--seed', type=int, default=default_config.get("parameters").get("seed").get("value"),
                           help='random seed')
    argparser.add_argument('--weight_decay', type=float, default=default_config.get("parameters").get("weight_decay").get("values")[-1],
                           help='random seed')
    
    args = argparser.parse_args()
    vars(default_config).update(vars(args))
    return



def train(config=None):
    
    torch.manual_seed(default_config.get("parameters").get("seed").get("value"))
    
    run = wandb.init(
                project="h2o-ai-predict-the-llm-kaggle-competition", 
                entity=None, 
                   job_type="hyperparameter-tuning"
    )
    if "artifacts" not in os.listdir():
        raw_data_at = run.use_artifact('mustafakeser/h2o-ai-predict-the-llm-kaggle-competition/detect_llm_raw_data:v1', 
                                                       type='raw_data')
        artifact_di = raw_data_at.download()
    else: pass
    train , test = read_data()
    dataset_train, dataset_test = preprocess(train=train,test=test)
    config = wandb.config
    split_train_dataset,split_eval_dataset , tokenized_test , tokenizer = dataset_tokenize_n_split(train,dataset_train, dataset_test,config.architecture)

    
    
    
    model = create_model(model_name =config.architecture ,num_labels = 7)
    
    num_train_epochs=2.
    training_args = TrainingArguments(                                

                                output_dir='h2o-ai-sweeps',
                                report_to='wandb',  # Turn on Weights & Biases logging
                                num_train_epochs=config.epochs,
                                learning_rate=config.learning_rate,
                                lr_scheduler_type = config.learning_sch,
                                per_device_train_batch_size=config.batch_size,
                                per_device_eval_batch_size=16,
                                save_strategy='epoch',
                                evaluation_strategy='epoch',
                                logging_strategy='epoch',
                                metric_for_best_model="eval_loss", 
                                load_best_model_at_end=True,
                                remove_unused_columns=False,
                                greater_is_better=False,
                                weight_decay = config.weight_decay
                                

                                 )
    early_stopping = EarlyStoppingCallback(early_stopping_patience=2)
    trainer = Trainer(
                        model=model,
                        args=training_args,
                        train_dataset=split_train_dataset,
                        eval_dataset=split_eval_dataset,
                        callbacks = [early_stopping],
                        tokenizer=tokenizer,
        )
    trainer.train()

    
# if __name__=="__main__":
# #     wandb.agent(sweep_id, train, count=20)
#     parse_args()
#     train(default_config)



<a id="4"></a>
# <div style="border: 2px solid #555; color:black; border-radius: 10px; background-color: #0074D9; padding: 10px; font-size: 20px; text-align: center;">Sweeps</div>
* [return top](#1)

In [3]:
wandb.login(relogin=True)

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
[34m[1mwandb[0m: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

  ········································


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


True

In [4]:
sweep_id = wandb.sweep(default_config, project='h2o-ai-predict-the-llm-kaggle-competition')

Create sweep with ID: q9q9767w
Sweep URL: https://wandb.ai/mustafakeser/h2o-ai-predict-the-llm-kaggle-competition/sweeps/q9q9767w


In [5]:
wandb.agent(sweep_id, train, count=20)

[34m[1mwandb[0m: Agent Starting Run: vdaoah93 with config:
[34m[1mwandb[0m: 	architecture: bert-base-uncased
[34m[1mwandb[0m: 	batch_size: 4
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 0.0004682687272526259
[34m[1mwandb[0m: 	learning_sch: polynomial
[34m[1mwandb[0m: 	seed: 42
[34m[1mwandb[0m: 	weight_decay: 0.2
[34m[1mwandb[0m: Currently logged in as: [33mmustafakeser[0m. Use [1m`wandb login --relogin`[0m to force relogin


[34m[1mwandb[0m:   4 of 4 files downloaded.  


Downloading (…)okenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

  0%|          | 0/4 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

Fold : 0
shape train : (3578,)
shape val : (398,)


Downloading model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss
1,1.9966,1.960535
2,1.9569,1.946229




0,1
eval/loss,█▁
eval/runtime,█▁
eval/samples_per_second,▁█
eval/steps_per_second,▁█
train/epoch,▁▁███
train/global_step,▁▁███
train/learning_rate,█▁
train/loss,█▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,1.94623
eval/runtime,7.4461
eval/samples_per_second,53.451
eval/steps_per_second,3.357
train/epoch,2.0
train/global_step,1790.0
train/learning_rate,0.0
train/loss,1.9569
train/total_flos,1882907237683200.0
train/train_loss,1.97678


[34m[1mwandb[0m: Agent Starting Run: 7ssb6ua3 with config:
[34m[1mwandb[0m: 	architecture: bert-base-uncased
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 0.00012148211314055668
[34m[1mwandb[0m: 	learning_sch: polynomial
[34m[1mwandb[0m: 	seed: 42
[34m[1mwandb[0m: 	weight_decay: 0.2


  0%|          | 0/4 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

Fold : 0
shape train : (3578,)
shape val : (398,)


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss
1,1.7876,1.671198
2,1.5707,1.599512


VBox(children=(Label(value='0.487 MB of 0.487 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…



0,1
eval/loss,█▁
eval/runtime,█▁
eval/samples_per_second,▁█
eval/steps_per_second,▁█
train/epoch,▁▁███
train/global_step,▁▁███
train/learning_rate,█▁
train/loss,█▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,1.59951
eval/runtime,7.4491
eval/samples_per_second,53.429
eval/steps_per_second,3.356
train/epoch,2.0
train/global_step,448.0
train/learning_rate,0.0
train/loss,1.5707
train/total_flos,1882907237683200.0
train/train_loss,1.67915


[34m[1mwandb[0m: Agent Starting Run: ptvhysk3 with config:
[34m[1mwandb[0m: 	architecture: distilbert-base-uncased
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 0.0015244923145798347
[34m[1mwandb[0m: 	learning_sch: cosine
[34m[1mwandb[0m: 	seed: 42
[34m[1mwandb[0m: 	weight_decay: 0.2


Downloading (…)okenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

  0%|          | 0/4 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

Fold : 0
shape train : (3578,)
shape val : (398,)


Downloading model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss
1,1.9748,1.947357
2,1.946,1.94591


VBox(children=(Label(value='0.499 MB of 0.499 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…



0,1
eval/loss,█▁
eval/runtime,▁█
eval/samples_per_second,█▁
eval/steps_per_second,█▁
train/epoch,▁▁███
train/global_step,▁▁███
train/learning_rate,█▁
train/loss,█▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,1.94591
eval/runtime,3.8752
eval/samples_per_second,102.706
eval/steps_per_second,6.451
train/epoch,2.0
train/global_step,448.0
train/learning_rate,0.0
train/loss,1.946
train/total_flos,948021230309376.0
train/train_loss,1.9604


[34m[1mwandb[0m: Agent Starting Run: 8y04mgww with config:
[34m[1mwandb[0m: 	architecture: distilroberta-base
[34m[1mwandb[0m: 	batch_size: 4
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 0.0001684716496566905
[34m[1mwandb[0m: 	learning_sch: cosine
[34m[1mwandb[0m: 	seed: 42
[34m[1mwandb[0m: 	weight_decay: 0


Downloading (…)lve/main/config.json:   0%|          | 0.00/480 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

  0%|          | 0/4 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

Fold : 0
shape train : (3578,)
shape val : (398,)


Downloading model.safetensors:   0%|          | 0.00/331M [00:00<?, ?B/s]

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.out_proj.weight', 'classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You're using a RobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss
1,1.9754,1.954528
2,1.9531,1.946631




0,1
eval/loss,█▁
eval/runtime,▁█
eval/samples_per_second,█▁
eval/steps_per_second,█▁
train/epoch,▁▁███
train/global_step,▁▁███
train/learning_rate,█▁
train/loss,█▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,1.94663
eval/runtime,3.8906
eval/samples_per_second,102.297
eval/steps_per_second,6.426
train/epoch,2.0
train/global_step,1790.0
train/learning_rate,0.0
train/loss,1.9531
train/total_flos,948021230309376.0
train/train_loss,1.96425


[34m[1mwandb[0m: Agent Starting Run: 20bol7cf with config:
[34m[1mwandb[0m: 	architecture: distilbert-base-uncased
[34m[1mwandb[0m: 	batch_size: 4
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 0.00015942974320807068
[34m[1mwandb[0m: 	learning_sch: polynomial
[34m[1mwandb[0m: 	seed: 42
[34m[1mwandb[0m: 	weight_decay: 0.2


  0%|          | 0/4 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

Fold : 0
shape train : (3578,)
shape val : (398,)


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss
1,1.865,2.049907
2,1.7937,1.81356




0,1
eval/loss,█▁
eval/runtime,█▁
eval/samples_per_second,▁█
eval/steps_per_second,▁█
train/epoch,▁▁███
train/global_step,▁▁███
train/learning_rate,█▁
train/loss,█▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,1.81356
eval/runtime,3.8669
eval/samples_per_second,102.925
eval/steps_per_second,6.465
train/epoch,2.0
train/global_step,1790.0
train/learning_rate,0.0
train/loss,1.7937
train/total_flos,948021230309376.0
train/train_loss,1.82936


[34m[1mwandb[0m: Agent Starting Run: be9e2x1i with config:
[34m[1mwandb[0m: 	architecture: distilroberta-base
[34m[1mwandb[0m: 	batch_size: 8
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 0.00014046399469880863
[34m[1mwandb[0m: 	learning_sch: linear
[34m[1mwandb[0m: 	seed: 42
[34m[1mwandb[0m: 	weight_decay: 0.2


  0%|          | 0/4 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

Fold : 0
shape train : (3578,)
shape val : (398,)


Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.out_proj.weight', 'classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You're using a RobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss
1,1.8474,1.76505
2,1.6519,1.639143




0,1
eval/loss,█▁
eval/runtime,█▁
eval/samples_per_second,▁█
eval/steps_per_second,▁█
train/epoch,▁▁███
train/global_step,▁▁███
train/learning_rate,█▁
train/loss,█▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,1.63914
eval/runtime,3.8658
eval/samples_per_second,102.955
eval/steps_per_second,6.467
train/epoch,2.0
train/global_step,896.0
train/learning_rate,0.0
train/loss,1.6519
train/total_flos,948021230309376.0
train/train_loss,1.74961


[34m[1mwandb[0m: Agent Starting Run: rzsnukwr with config:
[34m[1mwandb[0m: 	architecture: distilbert-base-uncased
[34m[1mwandb[0m: 	batch_size: 4
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 0.0013474318112704808
[34m[1mwandb[0m: 	learning_sch: polynomial
[34m[1mwandb[0m: 	seed: 42
[34m[1mwandb[0m: 	weight_decay: 0.2


  0%|          | 0/4 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

Fold : 0
shape train : (3578,)
shape val : (398,)


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss
1,1.9712,1.94603
2,1.9463,1.94604




0,1
eval/loss,▁█
eval/runtime,█▁
eval/samples_per_second,▁█
eval/steps_per_second,▁█
train/epoch,▁▁███
train/global_step,▁▁███
train/learning_rate,█▁
train/loss,█▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,1.94604
eval/runtime,3.8718
eval/samples_per_second,102.793
eval/steps_per_second,6.457
train/epoch,2.0
train/global_step,1790.0
train/learning_rate,0.0
train/loss,1.9463
train/total_flos,948021230309376.0
train/train_loss,1.95871


[34m[1mwandb[0m: Agent Starting Run: r3zdsacy with config:
[34m[1mwandb[0m: 	architecture: distilroberta-base
[34m[1mwandb[0m: 	batch_size: 4
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 0.00033538507484628526
[34m[1mwandb[0m: 	learning_sch: cosine
[34m[1mwandb[0m: 	seed: 42
[34m[1mwandb[0m: 	weight_decay: 0.2


  0%|          | 0/4 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

Fold : 0
shape train : (3578,)
shape val : (398,)


Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.out_proj.weight', 'classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You're using a RobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss
1,2.0014,1.953425
2,1.954,1.946548




0,1
eval/loss,█▁
eval/runtime,▁█
eval/samples_per_second,█▁
eval/steps_per_second,█▁
train/epoch,▁▁███
train/global_step,▁▁███
train/learning_rate,█▁
train/loss,█▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,1.94655
eval/runtime,3.866
eval/samples_per_second,102.949
eval/steps_per_second,6.467
train/epoch,2.0
train/global_step,1790.0
train/learning_rate,0.0
train/loss,1.954
train/total_flos,948021230309376.0
train/train_loss,1.9777


[34m[1mwandb[0m: Agent Starting Run: b1wekpie with config:
[34m[1mwandb[0m: 	architecture: distilroberta-base
[34m[1mwandb[0m: 	batch_size: 4
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 0.0013634126139094823
[34m[1mwandb[0m: 	learning_sch: cosine
[34m[1mwandb[0m: 	seed: 42
[34m[1mwandb[0m: 	weight_decay: 0.2


  0%|          | 0/4 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

Fold : 0
shape train : (3578,)
shape val : (398,)


Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.out_proj.weight', 'classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You're using a RobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss
1,2.0526,1.949695
2,1.9559,1.9462




0,1
eval/loss,█▁
eval/runtime,█▁
eval/samples_per_second,▁█
eval/steps_per_second,▁█
train/epoch,▁▁███
train/global_step,▁▁███
train/learning_rate,█▁
train/loss,█▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,1.9462
eval/runtime,3.8673
eval/samples_per_second,102.915
eval/steps_per_second,6.465
train/epoch,2.0
train/global_step,1790.0
train/learning_rate,0.0
train/loss,1.9559
train/total_flos,948021230309376.0
train/train_loss,2.00425


[34m[1mwandb[0m: Agent Starting Run: sd6nabyf with config:
[34m[1mwandb[0m: 	architecture: bert-base-uncased
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 0.0007781436315843396
[34m[1mwandb[0m: 	learning_sch: cosine
[34m[1mwandb[0m: 	seed: 42
[34m[1mwandb[0m: 	weight_decay: 0.2


  0%|          | 0/4 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

Fold : 0
shape train : (3578,)
shape val : (398,)


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss
1,1.9925,1.966883
2,1.952,1.94777


VBox(children=(Label(value='0.556 MB of 0.556 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…



0,1
eval/loss,█▁
eval/runtime,▁█
eval/samples_per_second,█▁
eval/steps_per_second,█▁
train/epoch,▁▁███
train/global_step,▁▁███
train/learning_rate,█▁
train/loss,█▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,1.94777
eval/runtime,7.4461
eval/samples_per_second,53.451
eval/steps_per_second,3.357
train/epoch,2.0
train/global_step,448.0
train/learning_rate,0.0
train/loss,1.952
train/total_flos,1882907237683200.0
train/train_loss,1.97222


[34m[1mwandb[0m: Agent Starting Run: mmfulxav with config:
[34m[1mwandb[0m: 	architecture: distilbert-base-uncased
[34m[1mwandb[0m: 	batch_size: 8
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 0.00020249578309784896
[34m[1mwandb[0m: 	learning_sch: linear
[34m[1mwandb[0m: 	seed: 42
[34m[1mwandb[0m: 	weight_decay: 0.2


  0%|          | 0/4 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

Fold : 0
shape train : (3578,)
shape val : (398,)


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss
1,1.9625,1.950685
2,1.9171,1.854917


VBox(children=(Label(value='0.943 MB of 0.943 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…



0,1
eval/loss,█▁
eval/runtime,█▁
eval/samples_per_second,▁█
eval/steps_per_second,▁█
train/epoch,▁▁███
train/global_step,▁▁███
train/learning_rate,█▁
train/loss,█▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,1.85492
eval/runtime,3.8822
eval/samples_per_second,102.52
eval/steps_per_second,6.44
train/epoch,2.0
train/global_step,896.0
train/learning_rate,0.0
train/loss,1.9171
train/total_flos,948021230309376.0
train/train_loss,1.93981


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: zddff35t with config:
[34m[1mwandb[0m: 	architecture: bert-base-uncased
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 0.00025427081951376226
[34m[1mwandb[0m: 	learning_sch: linear
[34m[1mwandb[0m: 	seed: 42
[34m[1mwandb[0m: 	weight_decay: 0


  0%|          | 0/4 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

Fold : 0
shape train : (3578,)
shape val : (398,)


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss
1,1.9668,1.947442
2,1.9436,1.937133


VBox(children=(Label(value='0.573 MB of 0.573 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…



0,1
eval/loss,█▁
eval/runtime,▁█
eval/samples_per_second,█▁
eval/steps_per_second,█▁
train/epoch,▁▁███
train/global_step,▁▁███
train/learning_rate,█▁
train/loss,█▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,1.93713
eval/runtime,7.464
eval/samples_per_second,53.323
eval/steps_per_second,3.349
train/epoch,2.0
train/global_step,448.0
train/learning_rate,0.0
train/loss,1.9436
train/total_flos,1882907237683200.0
train/train_loss,1.95519


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: dn04bojm with config:
[34m[1mwandb[0m: 	architecture: distilbert-base-uncased
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 0.0012658039459375705
[34m[1mwandb[0m: 	learning_sch: cosine
[34m[1mwandb[0m: 	seed: 42
[34m[1mwandb[0m: 	weight_decay: 0


  0%|          | 0/4 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

Fold : 0
shape train : (3578,)
shape val : (398,)


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss
1,1.9718,1.946639
2,1.9485,1.946151




0,1
eval/loss,█▁
eval/runtime,▁█
eval/samples_per_second,█▁
eval/steps_per_second,█▁
train/epoch,▁▁███
train/global_step,▁▁███
train/learning_rate,█▁
train/loss,█▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,1.94615
eval/runtime,3.8812
eval/samples_per_second,102.545
eval/steps_per_second,6.441
train/epoch,2.0
train/global_step,448.0
train/learning_rate,0.0
train/loss,1.9485
train/total_flos,948021230309376.0
train/train_loss,1.96013


[34m[1mwandb[0m: Agent Starting Run: tz4efiup with config:
[34m[1mwandb[0m: 	architecture: bert-base-uncased
[34m[1mwandb[0m: 	batch_size: 4
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 0.0004165608711615979
[34m[1mwandb[0m: 	learning_sch: cosine
[34m[1mwandb[0m: 	seed: 42
[34m[1mwandb[0m: 	weight_decay: 0.2


  0%|          | 0/4 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

Fold : 0
shape train : (3578,)
shape val : (398,)


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss
1,2.0367,1.96129
2,1.971,1.947189


VBox(children=(Label(value='1.729 MB of 1.729 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…



0,1
eval/loss,█▁
eval/runtime,█▁
eval/samples_per_second,▁█
eval/steps_per_second,▁█
train/epoch,▁▁███
train/global_step,▁▁███
train/learning_rate,█▁
train/loss,█▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,1.94719
eval/runtime,7.4394
eval/samples_per_second,53.499
eval/steps_per_second,3.36
train/epoch,2.0
train/global_step,1790.0
train/learning_rate,0.0
train/loss,1.971
train/total_flos,1882907237683200.0
train/train_loss,2.00383


[34m[1mwandb[0m: Agent Starting Run: zgqh2338 with config:
[34m[1mwandb[0m: 	architecture: bert-base-uncased
[34m[1mwandb[0m: 	batch_size: 4
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 0.0001034355665955621
[34m[1mwandb[0m: 	learning_sch: cosine
[34m[1mwandb[0m: 	seed: 42
[34m[1mwandb[0m: 	weight_decay: 0.2


  0%|          | 0/4 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

Fold : 0
shape train : (3578,)
shape val : (398,)


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss
1,1.9736,1.95951
2,1.9508,1.946805




0,1
eval/loss,█▁
eval/runtime,▁█
eval/samples_per_second,█▁
eval/steps_per_second,█▁
train/epoch,▁▁███
train/global_step,▁▁███
train/learning_rate,█▁
train/loss,█▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,1.94681
eval/runtime,7.4509
eval/samples_per_second,53.416
eval/steps_per_second,3.355
train/epoch,2.0
train/global_step,1790.0
train/learning_rate,0.0
train/loss,1.9508
train/total_flos,1882907237683200.0
train/train_loss,1.96222


[34m[1mwandb[0m: Agent Starting Run: 47kg0thg with config:
[34m[1mwandb[0m: 	architecture: bert-base-uncased
[34m[1mwandb[0m: 	batch_size: 8
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 0.0003932086111349965
[34m[1mwandb[0m: 	learning_sch: polynomial
[34m[1mwandb[0m: 	seed: 42
[34m[1mwandb[0m: 	weight_decay: 0.2


  0%|          | 0/4 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

Fold : 0
shape train : (3578,)
shape val : (398,)


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss
1,2.012,1.961284
2,1.9755,1.946179


VBox(children=(Label(value='0.985 MB of 0.985 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…



0,1
eval/loss,█▁
eval/runtime,█▁
eval/samples_per_second,▁█
eval/steps_per_second,▁█
train/epoch,▁▁███
train/global_step,▁▁███
train/learning_rate,█▁
train/loss,█▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,1.94618
eval/runtime,7.4378
eval/samples_per_second,53.511
eval/steps_per_second,3.361
train/epoch,2.0
train/global_step,896.0
train/learning_rate,0.0
train/loss,1.9755
train/total_flos,1882907237683200.0
train/train_loss,1.99374


[34m[1mwandb[0m: Agent Starting Run: ipkwym4v with config:
[34m[1mwandb[0m: 	architecture: distilroberta-base
[34m[1mwandb[0m: 	batch_size: 8
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 0.0010918810064765956
[34m[1mwandb[0m: 	learning_sch: polynomial
[34m[1mwandb[0m: 	seed: 42
[34m[1mwandb[0m: 	weight_decay: 0.2


  0%|          | 0/4 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

Fold : 0
shape train : (3578,)
shape val : (398,)


Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.out_proj.weight', 'classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You're using a RobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss
1,2.0137,1.948326
2,1.9534,1.946038




0,1
eval/loss,█▁
eval/runtime,█▁
eval/samples_per_second,▁█
eval/steps_per_second,▁█
train/epoch,▁▁███
train/global_step,▁▁███
train/learning_rate,█▁
train/loss,█▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,1.94604
eval/runtime,3.856
eval/samples_per_second,103.215
eval/steps_per_second,6.483
train/epoch,2.0
train/global_step,896.0
train/learning_rate,0.0
train/loss,1.9534
train/total_flos,948021230309376.0
train/train_loss,1.98356


[34m[1mwandb[0m: Agent Starting Run: hiiz0zca with config:
[34m[1mwandb[0m: 	architecture: distilbert-base-uncased
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 0.0003781242878558797
[34m[1mwandb[0m: 	learning_sch: linear
[34m[1mwandb[0m: 	seed: 42
[34m[1mwandb[0m: 	weight_decay: 0.2


  0%|          | 0/4 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

Fold : 0
shape train : (3578,)
shape val : (398,)


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss
1,1.9682,1.952204
2,1.8792,1.827658




0,1
eval/loss,█▁
eval/runtime,▁█
eval/samples_per_second,█▁
eval/steps_per_second,█▁
train/epoch,▁▁███
train/global_step,▁▁███
train/learning_rate,█▁
train/loss,█▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,1.82766
eval/runtime,3.9148
eval/samples_per_second,101.667
eval/steps_per_second,6.386
train/epoch,2.0
train/global_step,448.0
train/learning_rate,0.0
train/loss,1.8792
train/total_flos,948021230309376.0
train/train_loss,1.92372


[34m[1mwandb[0m: Agent Starting Run: vonyio99 with config:
[34m[1mwandb[0m: 	architecture: bert-base-uncased
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 0.0005873766334117263
[34m[1mwandb[0m: 	learning_sch: cosine
[34m[1mwandb[0m: 	seed: 42
[34m[1mwandb[0m: 	weight_decay: 0


  0%|          | 0/4 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

Fold : 0
shape train : (3578,)
shape val : (398,)


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss
1,2.0244,1.980799
2,1.9713,1.947941




0,1
eval/loss,█▁
eval/runtime,█▁
eval/samples_per_second,▁█
eval/steps_per_second,▁█
train/epoch,▁▁███
train/global_step,▁▁███
train/learning_rate,█▁
train/loss,█▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,1.94794
eval/runtime,7.4255
eval/samples_per_second,53.599
eval/steps_per_second,3.367
train/epoch,2.0
train/global_step,448.0
train/learning_rate,0.0
train/loss,1.9713
train/total_flos,1882907237683200.0
train/train_loss,1.99784


[34m[1mwandb[0m: Agent Starting Run: z01uv9h4 with config:
[34m[1mwandb[0m: 	architecture: bert-base-uncased
[34m[1mwandb[0m: 	batch_size: 8
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 0.0006328288020660093
[34m[1mwandb[0m: 	learning_sch: linear
[34m[1mwandb[0m: 	seed: 42
[34m[1mwandb[0m: 	weight_decay: 0.2


  0%|          | 0/4 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

Fold : 0
shape train : (3578,)
shape val : (398,)


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss
1,2.0335,1.964422
2,1.9812,1.946555




0,1
eval/loss,█▁
eval/runtime,█▁
eval/samples_per_second,▁█
eval/steps_per_second,▁█
train/epoch,▁▁███
train/global_step,▁▁███
train/learning_rate,█▁
train/loss,█▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,1.94656
eval/runtime,7.4523
eval/samples_per_second,53.406
eval/steps_per_second,3.355
train/epoch,2.0
train/global_step,896.0
train/learning_rate,0.0
train/loss,1.9812
train/total_flos,1882907237683200.0
train/train_loss,2.00731


In [6]:
wandb.finish()

In [None]:
#1.5 h p100