<a id="1"></a>
# <div style="border: 2px solid #555; color:black; border-radius: 10px; background-color: #0074D9; padding: 10px; font-size: 20px; text-align: center;">Introduction</div>

**Table Of Content:**
* [Introduction](#1)
* [Refactor and Define utils](#2)
* [Refactor Train](#3)
* [Sweeps](#4)


<a id="2"></a>
# <div style="border: 2px solid #555; color:black; border-radius: 10px; background-color: #0074D9; padding: 10px; font-size: 20px; text-align: center;">Refactor and Define utils</div>
* [return top](#1)

In [3]:
%%writefile utils.py
import json
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
import torch
from datasets import Dataset
from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification
from transformers import TrainingArguments, Trainer, EarlyStoppingCallback
import torch.nn.functional as F
from sklearn.model_selection import StratifiedKFold
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
import torch
from datasets import Dataset
import io

def read_data():
    """
    return train , test
    """
    with open("/kaggle/working/artifacts/detect_llm_raw_data:v1/train_df.table.json") as json_data:
        data = json.load(json_data)
        train = pd.DataFrame(data = data["data"],columns=data["columns"])
        json_data.close()

    with open("/kaggle/working/artifacts/detect_llm_raw_data:v1/test_df.table.json") as json_data:
        data = json.load(json_data)
        test = pd.DataFrame(data = data["data"],columns=data["columns"])
        json_data.close()
    return train , test

def preprocess(train=None,test=None):
    """
    return dataset_train, dataset_test
    """
    train.fillna(" ",inplace=True)
    test.fillna(" ",inplace=True)
    train["text"] = train["Question"] + " " + train["Response"]
    test["text"] = test["Question"] + " " + test["Response"]
    df_train = train[["target","text"]]
    df_test = test[["text"]]
    dataset_train = Dataset.from_pandas(df_train)
    dataset_test = Dataset.from_pandas(df_test)
    
    return dataset_train, dataset_test


def dataset_tokenize_n_split(train, dataset_train, dataset_test):
    """
    return split_train_dataset,split_eval_dataset , tokenized_test , tokenizer
    """
    tokenizer       = AutoTokenizer.from_pretrained("distilbert-base-uncased")
    def tokenize_function(examples):
    
        return tokenizer(examples["text"], padding="max_length", truncation=True)

    tokenized_train = dataset_train.map(tokenize_function, batched=True)
    tokenized_test  = dataset_test.map(tokenize_function, batched=True)
    tokenized_train = tokenized_train.remove_columns(['text'])
    tokenized_train = tokenized_train.rename_column("target", "labels")
    tokenized_test = tokenized_test.remove_columns(['text'])

    kf= StratifiedKFold(n_splits=10,shuffle=True,random_state=42)
    for i , (tr_idx,val_idx) in enumerate(kf.split(train,train.target)):
        print(f"Fold : {i}")
        print(f"shape train : {tr_idx.shape}")
        print(f"shape val : {val_idx.shape}")
        break
        
    
    split_train_dataset = tokenized_train.select(tr_idx)
    split_eval_dataset = tokenized_train.select(val_idx)

    return split_train_dataset,split_eval_dataset , tokenized_test , tokenizer

def predict_fn(dataset_ = None):
    
    """
    return mean of all_probabilities (m,7)
    """
    input_ids = dataset_['input_ids']
    # token_type_ids = dataset_['token_type_ids']
    attention_mask = dataset_['attention_mask']

    # Move the input tensors to the GPU
    input_ids = torch.tensor(input_ids).to('cuda:0')
    # token_type_ids = torch.tensor(token_type_ids).to('cuda:0')
    attention_mask = torch.tensor(attention_mask).to('cuda:0')

    # Define batch size
    batch_size = 8

    # Calculate the number of batches
    num_samples = len(input_ids)
    num_batches = (num_samples + batch_size - 1) // batch_size

    # Initialize a list to store the softmax probabilities
    all_probabilities = []

    # Make predictions in batches
    with torch.no_grad():
        for batch in range(num_batches):
            start_idx = batch * batch_size
            end_idx = min((batch + 1) * batch_size, num_samples)

            batch_input_ids = input_ids[start_idx:end_idx]
    #         batch_token_type_ids = token_type_ids[start_idx:end_idx]
            batch_attention_mask = attention_mask[start_idx:end_idx]

            outputs = model(input_ids=batch_input_ids, 
    #                         token_type_ids=batch_token_type_ids, 
                            attention_mask=batch_attention_mask)
            logits = outputs.logits

            # Apply softmax to get probabilities
            probabilities = F.softmax(logits, dim=1)


            all_probabilities.extend(probabilities.tolist())
    return np.concatenate(all_probabilities,axis=0).reshape(dataset_.shape[0],7)


def conf_mat(df_val = None,preds_val = None):
    """
    no return
    """
    plt.figure(figsize=(8,8))
    ConfusionMatrixDisplay.from_predictions(df_val.target,np.argmax(preds_val,axis=1))
    plt.savefig(f"val_conf_matrix.png", format="png")
    plt.show();
    conf = wandb.Image(data_or_path="val_conf_matrix.png")
    wandb.log({"val_conf_matrix": conf})
def create_model(model_name = "distilroberta-base",num_labels = 7):
    """
    return
    """
    model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=num_labels)
    # Specify the GPU device
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    # Move your model to the GPU
    model.to(device);
    
    return model
    


Writing utils.py


<a id="3"></a>
# <div style="border: 2px solid #555; color:black; border-radius: 10px; background-color: #0074D9; padding: 10px; font-size: 20px; text-align: center;">Refactor Train</div>
* [return top](#1)

In [4]:
# %%writefile train.py
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
import torch
from datasets import Dataset
import json
from IPython.display import display
import wandb
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
import matplotlib.pyplot as plt
from transformers import AutoModelForSequenceClassification,TrainerCallback
from transformers import TrainingArguments, Trainer, EarlyStoppingCallback
import torch.nn.functional as F
from utils import *
import io

class WandbMetricsLogger(TrainerCallback):
    def on_evaluate(self, args, state, control, model, metrics):
        # Log metrics to Wandb
        wandb.log(metrics)
        
default_config = {
        'method': 'random',
        'metric': {
        'goal': 'minimize', 
        'name': 'eval_loss'
        },
    }


    # hyperparameters
parameters_dict = {
        'epochs': {
            'value': 2
            },
        'seed': {
            'value': 42
            },
        'batch_size': {
            'values': [8, 16, 32]
            },
        'learning_rate': {
            'distribution': 'log_uniform_values',
            'min': 1e-5,
            'max': 1e-3
        },
        'weight_decay': {
            'values': [0.0, 0.2]
        },
        'learning_sch': {
            'values': ['linear','polynomial','cosine']
        },
    }


default_config['parameters'] = parameters_dict

def compute_metrics_fn(eval_preds):
    metrics = dict()

    # Extract the validation loss from eval_preds
    validation_loss = eval_preds.loss
    metrics['validation_loss'] = validation_loss

    return metrics

def parse_args():
    "Overriding default argments"
    argparser = argparse.ArgumentParser(description='Process hyper-parameters')
    argparser.add_argument('--batch_size', type=int, default=default_config.get("parameters").get("batch_size").get("values")[-1],
                           help='batch size')
    argparser.add_argument('--epochs', type=int, default=default_config.get("parameters").get("epochs").get("value"),
                           help='number of training epochs')
    argparser.add_argument('--lr', type=float, default=default_config.get("parameters").get("learning_rate").get("min"),
                           help='learning rate')
    argparser.add_argument('--seed', type=int, default=default_config.get("parameters").get("seed").get("value"),
                           help='random seed')
    argparser.add_argument('--weight_decay', type=float, default=default_config.get("parameters").get("weight_decay").get("values")[-1],
                           help='random seed')
    
    args = argparser.parse_args()
    vars(default_config).update(vars(args))
    return



def train(config=None):
    
    torch.manual_seed(default_config.get("parameters").get("seed").get("value"))
    
    run = wandb.init(
                project="h2o-ai-predict-the-llm-kaggle-competition", 
                entity=None, 
                   job_type="hyperparameter-tuning"
    )
    if "artifacts" not in os.listdir():
        raw_data_at = run.use_artifact('mustafakeser/h2o-ai-predict-the-llm-kaggle-competition/detect_llm_raw_data:v1', 
                                                       type='raw_data')
        artifact_di = raw_data_at.download()
    else: pass
    train , test = read_data()
    dataset_train, dataset_test = preprocess(train=train,test=test)
    split_train_dataset,split_eval_dataset , tokenized_test , tokenizer = dataset_tokenize_n_split(train,dataset_train, dataset_test)

    
    config = wandb.config
    
    model = create_model(model_name = "distilroberta-base",num_labels = 7)
    
    num_train_epochs=2.
    training_args = TrainingArguments(                                

                                output_dir='h2o-ai-sweeps',
                                report_to='wandb',  # Turn on Weights & Biases logging
                                num_train_epochs=config.epochs,
                                learning_rate=config.learning_rate,
                                lr_scheduler_type = config.learning_sch,
                                per_device_train_batch_size=config.batch_size,
                                per_device_eval_batch_size=16,
                                save_strategy='epoch',
                                evaluation_strategy='epoch',
                                logging_strategy='epoch',
                                metric_for_best_model="eval_loss", 
                                load_best_model_at_end=True,
                                remove_unused_columns=False,
                                greater_is_better=False
                                

                                 )
    early_stopping = EarlyStoppingCallback(early_stopping_patience=2)
    trainer = Trainer(
                        model=model,
                        args=training_args,
                        train_dataset=split_train_dataset,
                        eval_dataset=split_eval_dataset,
                        callbacks = [early_stopping],
                        tokenizer=tokenizer,
        )
    trainer.train()

    
# if __name__=="__main__":
# #     wandb.agent(sweep_id, train, count=20)
#     parse_args()
#     train(default_config)



<a id="4"></a>
# <div style="border: 2px solid #555; color:black; border-radius: 10px; background-color: #0074D9; padding: 10px; font-size: 20px; text-align: center;">Sweeps</div>
* [return top](#1)

In [5]:
wandb.login(relogin=True)

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
[34m[1mwandb[0m: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

  ········································


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


True

In [7]:
sweep_id = wandb.sweep(default_config, project='h2o-ai-predict-the-llm-kaggle-competition')

Create sweep with ID: dp0ecg1w
Sweep URL: https://wandb.ai/mustafakeser/h2o-ai-predict-the-llm-kaggle-competition/sweeps/dp0ecg1w


In [8]:
wandb.agent(sweep_id, train, count=20)

[34m[1mwandb[0m: Agent Starting Run: 3j17rpcc with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 5.5765535179451664e-05
[34m[1mwandb[0m: 	learning_sch: cosine
[34m[1mwandb[0m: 	seed: 42
[34m[1mwandb[0m: 	weight_decay: 0.2
[34m[1mwandb[0m: Currently logged in as: [33mmustafakeser[0m. Use [1m`wandb login --relogin`[0m to force relogin


[34m[1mwandb[0m:   4 of 4 files downloaded.  


Downloading (…)okenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

  0%|          | 0/4 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

Fold : 0
shape train : (3578,)
shape val : (398,)


Downloading (…)lve/main/config.json:   0%|          | 0.00/480 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/331M [00:00<?, ?B/s]

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.dense.weight', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss
1,1.8108,1.733616
2,1.6942,1.71288


VBox(children=(Label(value='0.295 MB of 0.295 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…



0,1
eval/loss,█▁
eval/runtime,▁█
eval/samples_per_second,█▁
eval/steps_per_second,█▁
train/epoch,▁▁███
train/global_step,▁▁███
train/learning_rate,█▁
train/loss,█▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,1.71288
eval/runtime,3.9109
eval/samples_per_second,101.766
eval/steps_per_second,6.392
train/epoch,2.0
train/global_step,224.0
train/learning_rate,0.0
train/loss,1.6942
train/total_flos,948021230309376.0
train/train_loss,1.75249


[34m[1mwandb[0m: Agent Starting Run: mcahupjl with config:
[34m[1mwandb[0m: 	batch_size: 8
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 4.818258016086e-05
[34m[1mwandb[0m: 	learning_sch: linear
[34m[1mwandb[0m: 	seed: 42
[34m[1mwandb[0m: 	weight_decay: 0


  0%|          | 0/4 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

Fold : 0
shape train : (3578,)
shape val : (398,)


Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.dense.weight', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss
1,1.8295,1.779587
2,1.7459,1.728652


VBox(children=(Label(value='0.869 MB of 0.869 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…



0,1
eval/loss,█▁
eval/runtime,▁█
eval/samples_per_second,█▁
eval/steps_per_second,█▁
train/epoch,▁▁███
train/global_step,▁▁███
train/learning_rate,█▁
train/loss,█▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,1.72865
eval/runtime,3.9221
eval/samples_per_second,101.476
eval/steps_per_second,6.374
train/epoch,2.0
train/global_step,896.0
train/learning_rate,0.0
train/loss,1.7459
train/total_flos,948021230309376.0
train/train_loss,1.7877


[34m[1mwandb[0m: Agent Starting Run: yhprbtzy with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 0.0007180182112163355
[34m[1mwandb[0m: 	learning_sch: polynomial
[34m[1mwandb[0m: 	seed: 42
[34m[1mwandb[0m: 	weight_decay: 0


  0%|          | 0/4 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

Fold : 0
shape train : (3578,)
shape val : (398,)


Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.dense.weight', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss
1,1.9881,1.953655
2,1.9553,1.947281




0,1
eval/loss,█▁
eval/runtime,█▁
eval/samples_per_second,▁█
eval/steps_per_second,▁█
train/epoch,▁▁███
train/global_step,▁▁███
train/learning_rate,█▁
train/loss,█▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,1.94728
eval/runtime,3.9169
eval/samples_per_second,101.611
eval/steps_per_second,6.383
train/epoch,2.0
train/global_step,224.0
train/learning_rate,0.0
train/loss,1.9553
train/total_flos,948021230309376.0
train/train_loss,1.97168


[34m[1mwandb[0m: Agent Starting Run: 5xe7l8y2 with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 3.613467348240049e-05
[34m[1mwandb[0m: 	learning_sch: cosine
[34m[1mwandb[0m: 	seed: 42
[34m[1mwandb[0m: 	weight_decay: 0


  0%|          | 0/4 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

Fold : 0
shape train : (3578,)
shape val : (398,)


Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.dense.weight', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss
1,1.8086,1.738096
2,1.6993,1.718404


VBox(children=(Label(value='0.318 MB of 0.318 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…



0,1
eval/loss,█▁
eval/runtime,█▁
eval/samples_per_second,▁█
eval/steps_per_second,▁█
train/epoch,▁▁███
train/global_step,▁▁███
train/learning_rate,█▁
train/loss,█▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,1.7184
eval/runtime,3.9166
eval/samples_per_second,101.618
eval/steps_per_second,6.383
train/epoch,2.0
train/global_step,224.0
train/learning_rate,0.0
train/loss,1.6993
train/total_flos,948021230309376.0
train/train_loss,1.75394


[34m[1mwandb[0m: Agent Starting Run: 1gz2izaq with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 0.000883799881461311
[34m[1mwandb[0m: 	learning_sch: linear
[34m[1mwandb[0m: 	seed: 42
[34m[1mwandb[0m: 	weight_decay: 0


  0%|          | 0/4 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

Fold : 0
shape train : (3578,)
shape val : (398,)


Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.dense.weight', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss
1,1.9992,1.960397
2,1.9574,1.946956


VBox(children=(Label(value='0.516 MB of 0.516 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…



0,1
eval/loss,█▁
eval/runtime,█▁
eval/samples_per_second,▁█
eval/steps_per_second,▁█
train/epoch,▁▁███
train/global_step,▁▁███
train/learning_rate,█▁
train/loss,█▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,1.94696
eval/runtime,3.9296
eval/samples_per_second,101.284
eval/steps_per_second,6.362
train/epoch,2.0
train/global_step,448.0
train/learning_rate,0.0
train/loss,1.9574
train/total_flos,948021230309376.0
train/train_loss,1.97833


[34m[1mwandb[0m: Agent Starting Run: 9ecft6ia with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 6.354784195992922e-05
[34m[1mwandb[0m: 	learning_sch: linear
[34m[1mwandb[0m: 	seed: 42
[34m[1mwandb[0m: 	weight_decay: 0.2


  0%|          | 0/4 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

Fold : 0
shape train : (3578,)
shape val : (398,)


Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.dense.weight', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss
1,1.8093,1.731341
2,1.6952,1.706076


VBox(children=(Label(value='0.335 MB of 0.335 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…



0,1
eval/loss,█▁
eval/runtime,█▁
eval/samples_per_second,▁█
eval/steps_per_second,▁█
train/epoch,▁▁███
train/global_step,▁▁███
train/learning_rate,█▁
train/loss,█▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,1.70608
eval/runtime,3.9143
eval/samples_per_second,101.679
eval/steps_per_second,6.387
train/epoch,2.0
train/global_step,224.0
train/learning_rate,0.0
train/loss,1.6952
train/total_flos,948021230309376.0
train/train_loss,1.75223


[34m[1mwandb[0m: Agent Starting Run: uxn4s3op with config:
[34m[1mwandb[0m: 	batch_size: 8
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 3.392354976544033e-05
[34m[1mwandb[0m: 	learning_sch: cosine
[34m[1mwandb[0m: 	seed: 42
[34m[1mwandb[0m: 	weight_decay: 0


  0%|          | 0/4 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

Fold : 0
shape train : (3578,)
shape val : (398,)


Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.dense.weight', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss
1,1.8156,1.719137
2,1.6893,1.712819


VBox(children=(Label(value='0.911 MB of 0.911 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…



0,1
eval/loss,█▁
eval/runtime,█▁
eval/samples_per_second,▁█
eval/steps_per_second,▁█
train/epoch,▁▁███
train/global_step,▁▁███
train/learning_rate,█▁
train/loss,█▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,1.71282
eval/runtime,3.9174
eval/samples_per_second,101.598
eval/steps_per_second,6.382
train/epoch,2.0
train/global_step,896.0
train/learning_rate,0.0
train/loss,1.6893
train/total_flos,948021230309376.0
train/train_loss,1.75247


[34m[1mwandb[0m: Agent Starting Run: xch9a82b with config:
[34m[1mwandb[0m: 	batch_size: 8
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 3.658860795433211e-05
[34m[1mwandb[0m: 	learning_sch: polynomial
[34m[1mwandb[0m: 	seed: 42
[34m[1mwandb[0m: 	weight_decay: 0


  0%|          | 0/4 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

Fold : 0
shape train : (3578,)
shape val : (398,)


Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.dense.weight', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss
1,1.812,1.745455
2,1.6987,1.686643


VBox(children=(Label(value='0.919 MB of 0.919 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…



0,1
eval/loss,█▁
eval/runtime,▁█
eval/samples_per_second,█▁
eval/steps_per_second,█▁
train/epoch,▁▁███
train/global_step,▁▁███
train/learning_rate,█▁
train/loss,█▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,1.68664
eval/runtime,3.9305
eval/samples_per_second,101.258
eval/steps_per_second,6.36
train/epoch,2.0
train/global_step,896.0
train/learning_rate,0.0
train/loss,1.6987
train/total_flos,948021230309376.0
train/train_loss,1.75536


[34m[1mwandb[0m: Agent Starting Run: bzrxi4c1 with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 1.73594255725719e-05
[34m[1mwandb[0m: 	learning_sch: polynomial
[34m[1mwandb[0m: 	seed: 42
[34m[1mwandb[0m: 	weight_decay: 0.2


  0%|          | 0/4 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

Fold : 0
shape train : (3578,)
shape val : (398,)


Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.dense.weight', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss
1,1.8043,1.747561
2,1.7116,1.71118


VBox(children=(Label(value='0.549 MB of 0.549 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…



0,1
eval/loss,█▁
eval/runtime,▁█
eval/samples_per_second,█▁
eval/steps_per_second,█▁
train/epoch,▁▁███
train/global_step,▁▁███
train/learning_rate,█▁
train/loss,█▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,1.71118
eval/runtime,3.927
eval/samples_per_second,101.349
eval/steps_per_second,6.366
train/epoch,2.0
train/global_step,448.0
train/learning_rate,0.0
train/loss,1.7116
train/total_flos,948021230309376.0
train/train_loss,1.75797


[34m[1mwandb[0m: Agent Starting Run: 8n2vlswm with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 0.0003113680093197998
[34m[1mwandb[0m: 	learning_sch: linear
[34m[1mwandb[0m: 	seed: 42
[34m[1mwandb[0m: 	weight_decay: 0.2


  0%|          | 0/4 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

Fold : 0
shape train : (3578,)
shape val : (398,)


Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.dense.weight', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss
1,1.8749,1.778311
2,1.7727,1.804582




0,1
eval/loss,▁█
eval/runtime,█▁
eval/samples_per_second,▁█
eval/steps_per_second,▁█
train/epoch,▁▁███
train/global_step,▁▁███
train/learning_rate,█▁
train/loss,█▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,1.80458
eval/runtime,3.9084
eval/samples_per_second,101.833
eval/steps_per_second,6.397
train/epoch,2.0
train/global_step,224.0
train/learning_rate,0.0
train/loss,1.7727
train/total_flos,948021230309376.0
train/train_loss,1.82384


[34m[1mwandb[0m: Agent Starting Run: j1o5u48j with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 0.0001268305358453921
[34m[1mwandb[0m: 	learning_sch: linear
[34m[1mwandb[0m: 	seed: 42
[34m[1mwandb[0m: 	weight_decay: 0


  0%|          | 0/4 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

Fold : 0
shape train : (3578,)
shape val : (398,)


Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.dense.weight', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss
1,1.866,1.814876
2,1.7472,1.724897


VBox(children=(Label(value='0.566 MB of 0.566 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…



0,1
eval/loss,█▁
eval/runtime,▁█
eval/samples_per_second,█▁
eval/steps_per_second,█▁
train/epoch,▁▁███
train/global_step,▁▁███
train/learning_rate,█▁
train/loss,█▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,1.7249
eval/runtime,3.9117
eval/samples_per_second,101.747
eval/steps_per_second,6.391
train/epoch,2.0
train/global_step,448.0
train/learning_rate,0.0
train/loss,1.7472
train/total_flos,948021230309376.0
train/train_loss,1.80661


[34m[1mwandb[0m: Agent Starting Run: 8pto8bup with config:
[34m[1mwandb[0m: 	batch_size: 8
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 0.0002858764766373609
[34m[1mwandb[0m: 	learning_sch: polynomial
[34m[1mwandb[0m: 	seed: 42
[34m[1mwandb[0m: 	weight_decay: 0.2


  0%|          | 0/4 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

Fold : 0
shape train : (3578,)
shape val : (398,)


Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.dense.weight', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss
1,1.9891,1.951397
2,1.9561,1.946564


VBox(children=(Label(value='0.950 MB of 0.950 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…



0,1
eval/loss,█▁
eval/runtime,▁█
eval/samples_per_second,█▁
eval/steps_per_second,█▁
train/epoch,▁▁███
train/global_step,▁▁███
train/learning_rate,█▁
train/loss,█▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,1.94656
eval/runtime,3.9491
eval/samples_per_second,100.782
eval/steps_per_second,6.331
train/epoch,2.0
train/global_step,896.0
train/learning_rate,0.0
train/loss,1.9561
train/total_flos,948021230309376.0
train/train_loss,1.97262


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: npkw9qvq with config:
[34m[1mwandb[0m: 	batch_size: 8
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 0.0001373681713207237
[34m[1mwandb[0m: 	learning_sch: linear
[34m[1mwandb[0m: 	seed: 42
[34m[1mwandb[0m: 	weight_decay: 0.2


  0%|          | 0/4 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

Fold : 0
shape train : (3578,)
shape val : (398,)


Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.dense.weight', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss
1,1.8371,1.757713
2,1.7095,1.703283


VBox(children=(Label(value='0.959 MB of 0.959 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…



0,1
eval/loss,█▁
eval/runtime,█▁
eval/samples_per_second,▁█
eval/steps_per_second,▁█
train/epoch,▁▁███
train/global_step,▁▁███
train/learning_rate,█▁
train/loss,█▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,1.70328
eval/runtime,3.9132
eval/samples_per_second,101.708
eval/steps_per_second,6.389
train/epoch,2.0
train/global_step,896.0
train/learning_rate,0.0
train/loss,1.7095
train/total_flos,948021230309376.0
train/train_loss,1.77333


[34m[1mwandb[0m: Agent Starting Run: 3izh6jy3 with config:
[34m[1mwandb[0m: 	batch_size: 8
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 2.3141515315680035e-05
[34m[1mwandb[0m: 	learning_sch: polynomial
[34m[1mwandb[0m: 	seed: 42
[34m[1mwandb[0m: 	weight_decay: 0.2


  0%|          | 0/4 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

Fold : 0
shape train : (3578,)
shape val : (398,)


Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.dense.weight', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss
1,1.8056,1.752494
2,1.6871,1.690626




0,1
eval/loss,█▁
eval/runtime,█▁
eval/samples_per_second,▁█
eval/steps_per_second,▁█
train/epoch,▁▁███
train/global_step,▁▁███
train/learning_rate,█▁
train/loss,█▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,1.69063
eval/runtime,3.9045
eval/samples_per_second,101.934
eval/steps_per_second,6.403
train/epoch,2.0
train/global_step,896.0
train/learning_rate,0.0
train/loss,1.6871
train/total_flos,948021230309376.0
train/train_loss,1.74637


[34m[1mwandb[0m: Agent Starting Run: 38udvcbv with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 0.0008958315669213104
[34m[1mwandb[0m: 	learning_sch: cosine
[34m[1mwandb[0m: 	seed: 42
[34m[1mwandb[0m: 	weight_decay: 0


  0%|          | 0/4 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

Fold : 0
shape train : (3578,)
shape val : (398,)


Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.dense.weight', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss
1,1.9929,1.955834
2,1.9549,1.947726


VBox(children=(Label(value='0.408 MB of 0.408 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…



0,1
eval/loss,█▁
eval/runtime,█▁
eval/samples_per_second,▁█
eval/steps_per_second,▁█
train/epoch,▁▁███
train/global_step,▁▁███
train/learning_rate,█▁
train/loss,█▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,1.94773
eval/runtime,3.9042
eval/samples_per_second,101.941
eval/steps_per_second,6.403
train/epoch,2.0
train/global_step,224.0
train/learning_rate,0.0
train/loss,1.9549
train/total_flos,948021230309376.0
train/train_loss,1.97391


[34m[1mwandb[0m: Agent Starting Run: kdokv1t8 with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 0.00011703718540656912
[34m[1mwandb[0m: 	learning_sch: linear
[34m[1mwandb[0m: 	seed: 42
[34m[1mwandb[0m: 	weight_decay: 0.2


  0%|          | 0/4 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

Fold : 0
shape train : (3578,)
shape val : (398,)


Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.dense.weight', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss
1,1.8169,1.722703
2,1.694,1.705139




0,1
eval/loss,█▁
eval/runtime,█▁
eval/samples_per_second,▁█
eval/steps_per_second,▁█
train/epoch,▁▁███
train/global_step,▁▁███
train/learning_rate,█▁
train/loss,█▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,1.70514
eval/runtime,3.9101
eval/samples_per_second,101.789
eval/steps_per_second,6.394
train/epoch,2.0
train/global_step,224.0
train/learning_rate,0.0
train/loss,1.694
train/total_flos,948021230309376.0
train/train_loss,1.75547


[34m[1mwandb[0m: Agent Starting Run: knxqbk1g with config:
[34m[1mwandb[0m: 	batch_size: 8
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 1.0118363981607169e-05
[34m[1mwandb[0m: 	learning_sch: polynomial
[34m[1mwandb[0m: 	seed: 42
[34m[1mwandb[0m: 	weight_decay: 0.2


  0%|          | 0/4 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

Fold : 0
shape train : (3578,)
shape val : (398,)


Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.dense.weight', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss
1,1.8082,1.752365
2,1.7203,1.721905


VBox(children=(Label(value='0.992 MB of 0.992 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…



0,1
eval/loss,█▁
eval/runtime,█▁
eval/samples_per_second,▁█
eval/steps_per_second,▁█
train/epoch,▁▁███
train/global_step,▁▁███
train/learning_rate,█▁
train/loss,█▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,1.7219
eval/runtime,3.9029
eval/samples_per_second,101.975
eval/steps_per_second,6.405
train/epoch,2.0
train/global_step,896.0
train/learning_rate,0.0
train/loss,1.7203
train/total_flos,948021230309376.0
train/train_loss,1.76422


[34m[1mwandb[0m: Agent Starting Run: 8s7z07t1 with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 0.00015259665805831906
[34m[1mwandb[0m: 	learning_sch: polynomial
[34m[1mwandb[0m: 	seed: 42
[34m[1mwandb[0m: 	weight_decay: 0.2


  0%|          | 0/4 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

Fold : 0
shape train : (3578,)
shape val : (398,)


Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.dense.weight', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss
1,1.8068,1.768535
2,1.6836,1.698527




0,1
eval/loss,█▁
eval/runtime,▁█
eval/samples_per_second,█▁
eval/steps_per_second,█▁
train/epoch,▁▁███
train/global_step,▁▁███
train/learning_rate,█▁
train/loss,█▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,1.69853
eval/runtime,3.9591
eval/samples_per_second,100.529
eval/steps_per_second,6.315
train/epoch,2.0
train/global_step,224.0
train/learning_rate,0.0
train/loss,1.6836
train/total_flos,948021230309376.0
train/train_loss,1.74519


[34m[1mwandb[0m: Agent Starting Run: 4dqlaevr with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 2.216965480387029e-05
[34m[1mwandb[0m: 	learning_sch: linear
[34m[1mwandb[0m: 	seed: 42
[34m[1mwandb[0m: 	weight_decay: 0


  0%|          | 0/4 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

Fold : 0
shape train : (3578,)
shape val : (398,)


Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.dense.weight', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss
1,1.7999,1.752092
2,1.7112,1.712424


VBox(children=(Label(value='0.631 MB of 0.631 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…



0,1
eval/loss,█▁
eval/runtime,▁█
eval/samples_per_second,█▁
eval/steps_per_second,█▁
train/epoch,▁▁███
train/global_step,▁▁███
train/learning_rate,█▁
train/loss,█▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,1.71242
eval/runtime,3.925
eval/samples_per_second,101.4
eval/steps_per_second,6.369
train/epoch,2.0
train/global_step,448.0
train/learning_rate,0.0
train/loss,1.7112
train/total_flos,948021230309376.0
train/train_loss,1.75556


[34m[1mwandb[0m: Agent Starting Run: xu3d50ex with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 2
[34m[1mwandb[0m: 	learning_rate: 0.0002243360387687802
[34m[1mwandb[0m: 	learning_sch: linear
[34m[1mwandb[0m: 	seed: 42
[34m[1mwandb[0m: 	weight_decay: 0.2


  0%|          | 0/4 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

Fold : 0
shape train : (3578,)
shape val : (398,)


Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.dense.weight', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss
1,1.8748,1.813246
2,1.8211,1.790207


VBox(children=(Label(value='0.639 MB of 0.639 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…



0,1
eval/loss,█▁
eval/runtime,▁█
eval/samples_per_second,█▁
eval/steps_per_second,█▁
train/epoch,▁▁███
train/global_step,▁▁███
train/learning_rate,█▁
train/loss,█▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,1.79021
eval/runtime,3.911
eval/samples_per_second,101.765
eval/steps_per_second,6.392
train/epoch,2.0
train/global_step,448.0
train/learning_rate,0.0
train/loss,1.8211
train/total_flos,948021230309376.0
train/train_loss,1.84793


In [9]:
wandb.finish()

In [None]:
#2. h p100