# Pro russian comments classification from Le Monde, using Setfit 

In [1]:
!pip install setfit

Collecting setfit
  Obtaining dependency information for setfit from https://files.pythonhosted.org/packages/a4/b0/0afe7c5e0901fece8677746a70f9658c8c7c55dc46c9c947e473c7ed9d77/setfit-1.0.1-py3-none-any.whl.metadata
  Downloading setfit-1.0.1-py3-none-any.whl.metadata (11 kB)
Collecting datasets>=2.3.0 (from setfit)
  Obtaining dependency information for datasets>=2.3.0 from https://files.pythonhosted.org/packages/e2/cf/db41e572d7ed958e8679018f8190438ef700aeb501b62da9e1eed9e4d69a/datasets-2.15.0-py3-none-any.whl.metadata
  Downloading datasets-2.15.0-py3-none-any.whl.metadata (20 kB)
Collecting sentence-transformers>=2.2.1 (from setfit)
  Downloading sentence-transformers-2.2.2.tar.gz (85 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.0/86.0 kB[0m [31m8.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l- \ done
[?25hCollecting evaluate>=0.3.0 (from setfit)
  Obtaining dependency information for evaluate>=0.3.0 from 

In [2]:
# wandb login, logging enabled by default in SetFit
import wandb
from kaggle_secrets import UserSecretsClient

user_secrets = UserSecretsClient()
my_secret = user_secrets.get_secret("wandb_key") 
wandb.login(key=my_secret)

[34m[1mwandb[0m: W&B API key is configured. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


True

In [3]:
import pandas as pd
from sklearn.model_selection import train_test_split

from datasets import Dataset, DatasetDict, load_dataset
from setfit import SetFitModel, Trainer, TrainingArguments, sample_dataset

from sentence_transformers.losses import CosineSimilarityLoss

import torch
import gc

from optuna import Trial



In [4]:
import warnings
warnings.filterwarnings('ignore')

## Load data

#### Load from disk

In [5]:
# filepath = "data/lmd_ukraine_annotated.parquet"
filepath = "/kaggle/input/lmd-annotated/lmd_ukraine_annotated.parquet"

In [6]:
data = pd.read_parquet(filepath)
display(data.head(3))
print(data.dtypes)

Unnamed: 0,article_id,url,title,desc,content,date,keywords,article_type,allow_comments,premium,author,comment,comment_id,classe
0,3259703,https://www.lemonde.fr/actualite-medias/articl...,"Le conflit russo-ukrainien, qui mobilise les m...",Au Festival de journalisme de Couthures : la g...,Parce qu’elle est revenue frapper à nos porte...,2022-07-16,"[international, europe, ukraine, crise-ukraini...",Factuel,True,False,Ricardo Uztarroz,La question qui vaille et qui n'est pas posée...,e7206b56918f694f,pro_russia
1,3259703,https://www.lemonde.fr/actualite-medias/articl...,"Le conflit russo-ukrainien, qui mobilise les m...",Au Festival de journalisme de Couthures : la g...,Parce qu’elle est revenue frapper à nos porte...,2022-07-16,"[international, europe, ukraine, crise-ukraini...",Factuel,True,False,Ricardo Uztarroz,Salandre : les documents dont vous faîtes ét...,d904e44906dfb957,other
2,3259703,https://www.lemonde.fr/actualite-medias/articl...,"Le conflit russo-ukrainien, qui mobilise les m...",Au Festival de journalisme de Couthures : la g...,Parce qu’elle est revenue frapper à nos porte...,2022-07-16,"[international, europe, ukraine, crise-ukraini...",Factuel,True,False,Correcteur,« C’est l’affaire des russes »? C’est donc vot...,1c03f54daeffd1ca,pro_ukraine


article_id           int64
url                 object
title               object
desc                object
content             object
date                object
keywords            object
article_type      category
allow_comments        bool
premium               bool
author              object
comment             object
comment_id          object
classe              object
dtype: object


In [7]:
#For later stage and to comply with huggingface Dataset format, convert article_type to string type
data['article_type'] = data['article_type'].astype(str)

#### Classes overview / % annotated labels

Custom original dataset (see my other projects) was 236k comments.  
After custom hashing / cleaning / deduplication + manual labeling : 175k records, 574 manually labeled examples, using label studio.  
As a whole, dataset is unbalanced "by nature", labeled examples are ok.  
"Truly" pro-russian comments were quite hard to find : 1. comment section is subscribers only and moderated so almost 0 trolls. 2. People support Ukraine 3. Had to extend a bit what pro-russian means, but tried not to be too harsh on "balanced" comments either. Highly subjective.  

In [8]:
print(len(data))
print(data.classe.value_counts())
print(sum(data.classe.notnull()))
print(sum(data.classe.isnull()))

175353
classe
other          256
pro_ukraine    196
pro_russia     122
Name: count, dtype: int64
574
174779


## Prepare Dataset (labels, optional sample, split)

#### Split, convert to Huggingface DatasetDict

Keep labels representativity in our train / eval data (overall, not annotated dataset is way more unbalanced). Using sklearn Stratify (optional)  
We want each class to have +- 50 examples max (ressources often show it works with only 8 rows per class ; we could go up to 100). Let's value our painful manual labeling work.  
We have 574 labels, train dataset is sampled to have 60 labels per class. Eval is kept around 200 to 300 samples. Test data will be the remaining, non labeled data.  
Test, unlabeled data could be of use later for optimization through distillation (teacher <-> student). Setfit uses a particular technique to leverage unlabeled data.   



In [9]:
# select labeled data only to split between train and eval, test set is the unlabeled data.
with_labels = data.query("classe.notnull()")
test_df = data.query("classe.isnull()")
print(len(with_labels), len(test_df))

574 174779


In [10]:
# labeled data is split between train and eval sets
# Optional stratify= but we still want to make sure classes are "balanced" in both dataset

train_df, eval_df = train_test_split(with_labels, test_size=0.4, stratify=with_labels['classe'], random_state=40)

In [11]:
# we make sure the smaller class has enough labels (e.g 8, or 20 or 50 or "max" 100).of
# This dataset will later be sampled again using Setfit.sample_dataset. Classes will have the same amount of rows (8 or 10 or 60...)
print(len(train_df))
print(train_df.classe.value_counts())
print(len(eval_df))
print(eval_df.classe.value_counts())

344
classe
other          153
pro_ukraine    118
pro_russia      73
Name: count, dtype: int64
230
classe
other          103
pro_ukraine     78
pro_russia      49
Name: count, dtype: int64


In [12]:
# For labeled data, add a 'label' column where 'classe' labels strings -> int
# We do it now, because we SetFit wants integers and not floats for training
label_mapping = {'pro_ukraine': 0, 'pro_russia': 1, 'other': 2}
for df in [train_df, eval_df]:
    df['label'] = df['classe'].map(label_mapping)

Convert to hugging dataset format to streamline the operations, and later push to the hub

In [13]:
train_dataset = Dataset.from_pandas(train_df)
eval_dataset = Dataset.from_pandas(eval_df)
test_dataset = Dataset.from_pandas(test_df)

# convert to huggingface --commonly used, DatasetDict format
dataset = DatasetDict({
    'train': train_dataset,
    'validation': eval_dataset,
    'test': test_dataset
})

In [14]:
# save # classes, to be used later when loading model
num_classes = len(train_dataset.unique("label"))
num_classes

3

## Modeling

Candidates models, could also use something larger.  
- https://huggingface.co/sentence-transformers/paraphrase-multilingual-mpnet-base-v2 (900MB)  
- https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 (470MB)
- https://huggingface.co/dangvantuan/sentence-camembert-base
- https://huggingface.co/dangvantuan/sentence-camembert-large (1GB)

Training with SetFit consists of two phases behind the scenes: 1.finetuning embeddings and 2. training a classification head.  
Depending on SetFit version, might import (old) `SetFitTrainer` instead of `Trainer`.   
Refers to hf/setfit [documentation](https://huggingface.co/docs/setfit/how_to/overview) rather than the github for updated ressources

In [15]:
# "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"
# "sentence-transformers/paraphrase-multilingual-mpnet-base-v2"

# dangvantuan/sentence-camembert-base
# dangvantuan/sentence-camembert-large

# Lajavaness/sentence-camembert-base
# Lajavaness/sentence-camembert-large

## C. Hyperparameter Optimization, using LogisticRegression & Optuna

In [16]:
# Sample dataset
train_dataset = sample_dataset(dataset["train"], label_column="label", num_samples=72, seed=40)

In [17]:
import gc
import torch
from optuna import Trial
from setfit import Trainer, SetFitModel, sample_dataset
import time

# Model initialization function
def model_init(params):
    params = params or {}
    max_iter = params.get("max_iter", 100)
    solver = params.get("solver", "liblinear")
    params = {
        "head_params": {
            "max_iter": max_iter,
            "solver": solver,
        }
    }

    return SetFitModel.from_pretrained("sentence-transformers/paraphrase-multilingual-mpnet-base-v2", **params)

In [18]:
# Hyperparameter space definition
def hp_space(trial):
    """ Define hyperparams search space (Optuna) """
    
    return {
        # Embeddings fine-tuning phase params :
        
        "body_learning_rate": trial.suggest_float("body_learning_rate", 1e-07 , 3e-06, log=True),
        "max_steps": trial.suggest_int("max_steps", 150, 380), # 200, 900
        "batch_size": trial.suggest_categorical("batch_size", [32]),
        "seed": trial.suggest_int("seed", 1, 40),
        
        # LogisticRegression head params :
        
        "max_iter": trial.suggest_int("max_iter", 120, 140), # 100, 200
        "solver": trial.suggest_categorical("solver", ["liblinear"]), # "newton-cg",'lbfgs'
    }

In [19]:
args = TrainingArguments(
    sampling_strategy='oversampling',
    evaluation_strategy='steps',
    eval_steps=20, # print eval every eval_steps
    save_strategy='steps',
) 
    

In [20]:
# Initialize Trainer
trainer = Trainer(
    model_init=model_init,
    args=args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    metric="accuracy",
    column_mapping={"comment": "text", "label": "label"},
)

Applying column mapping to the training dataset
Applying column mapping to the evaluation dataset


config.json:   0%|          | 0.00/723 [00:00<?, ?B/s]

.gitattributes:   0%|          | 0.00/690 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/4.10k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/723 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/122 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.11G [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.08M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/402 [00:00<?, ?B/s]

modules.json:   0%|          | 0.00/229 [00:00<?, ?B/s]

model_head.pkl not found on HuggingFace Hub, initialising classification head with random weights. You should TRAIN this model on a downstream task to use it for predictions and inference.


Map:   0%|          | 0/216 [00:00<?, ? examples/s]

In [21]:
# Run hyperparameter search
best_run = trainer.hyperparameter_search(direction="maximize", hp_space=hp_space, n_trials=6)

[I 2023-12-22 11:17:38,137] A new study created in memory with name: no-name-ddcb266e-ed56-492e-96ad-6cf7da197c2b
Trial: {'body_learning_rate': 1.1910393832984982e-07, 'max_steps': 289, 'batch_size': 32, 'seed': 9, 'max_iter': 139, 'solver': 'liblinear'}
model_head.pkl not found on HuggingFace Hub, initialising classification head with random weights. You should TRAIN this model on a downstream task to use it for predictions and inference.
***** Running training *****
  Num examples = 972
  Num epochs = 1
  Total optimization steps = 289
  Total train batch size = 32
[34m[1mwandb[0m: Currently logged in as: [33mvionmatthieu[0m. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Tracking run with wandb version 0.16.1
[34m[1mwandb[0m: Run data is saved locally in [35m[1m/kaggle/working/wandb/run-20231222_111741-50g9lfcg[0m
[34m[1mwandb[0m: Run [1m`wandb offline`[0m to turn off syncing.
[34m[1mwandb[0m: Syncing run [33mdenim-violet-6[0m
[34m[1m

Step,Training Loss,Validation Loss,Embedding Loss,Rate
20,No log,No log,0.2806,0.0
40,No log,No log,0.2798,0.0
60,No log,No log,0.2791,0.0
80,No log,No log,0.2782,0.0
100,No log,No log,0.2775,0.0
120,No log,No log,0.2771,0.0
140,No log,No log,0.2766,0.0
160,No log,No log,0.2761,0.0
180,No log,No log,0.2758,0.0
200,No log,No log,0.2757,0.0


  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

Batches:   0%|          | 0/7 [00:00<?, ?it/s]

***** Running evaluation *****


Batches:   0%|          | 0/8 [00:00<?, ?it/s]

Downloading builder script:   0%|          | 0.00/4.20k [00:00<?, ?B/s]

[I 2023-12-22 12:21:04,756] Trial 0 finished with value: 0.6782608695652174 and parameters: {'body_learning_rate': 1.1910393832984982e-07, 'max_steps': 289, 'batch_size': 32, 'seed': 9, 'max_iter': 139, 'solver': 'liblinear'}. Best is trial 0 with value: 0.6782608695652174.
Trial: {'body_learning_rate': 9.513853403949528e-07, 'max_steps': 232, 'batch_size': 32, 'seed': 35, 'max_iter': 133, 'solver': 'liblinear'}
model_head.pkl not found on HuggingFace Hub, initialising classification head with random weights. You should TRAIN this model on a downstream task to use it for predictions and inference.
***** Running training *****
  Num examples = 914
  Num epochs = 1
  Total optimization steps = 232
  Total train batch size = 32


Step,Training Loss,Validation Loss,Embedding Loss,Rate
20,No log,No log,0.2782,1e-06
40,No log,No log,0.2727,1e-06
60,No log,No log,0.269,1e-06
80,No log,No log,0.2672,1e-06
100,No log,No log,0.2655,1e-06
120,No log,No log,0.2647,1e-06
140,No log,No log,0.2642,0.0
160,No log,No log,0.2639,0.0
180,No log,No log,0.2635,0.0
200,No log,No log,0.2633,0.0


  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

Batches:   0%|          | 0/7 [00:00<?, ?it/s]

***** Running evaluation *****


Batches:   0%|          | 0/8 [00:00<?, ?it/s]

[I 2023-12-22 13:10:42,122] Trial 1 finished with value: 0.6869565217391305 and parameters: {'body_learning_rate': 9.513853403949528e-07, 'max_steps': 232, 'batch_size': 32, 'seed': 35, 'max_iter': 133, 'solver': 'liblinear'}. Best is trial 1 with value: 0.6869565217391305.
Trial: {'body_learning_rate': 1.3798641017689414e-06, 'max_steps': 304, 'batch_size': 32, 'seed': 33, 'max_iter': 135, 'solver': 'liblinear'}
model_head.pkl not found on HuggingFace Hub, initialising classification head with random weights. You should TRAIN this model on a downstream task to use it for predictions and inference.
***** Running training *****
  Num examples = 972
  Num epochs = 1
  Total optimization steps = 304
  Total train batch size = 32


Step,Training Loss,Validation Loss,Embedding Loss,Rate
20,No log,No log,0.278,1e-06
40,No log,No log,0.2709,1e-06
60,No log,No log,0.2664,1e-06
80,No log,No log,0.2647,1e-06
100,No log,No log,0.2628,1e-06
120,No log,No log,0.262,1e-06
140,No log,No log,0.2613,1e-06
160,No log,No log,0.2611,1e-06
180,No log,No log,0.2604,1e-06
200,No log,No log,0.2594,1e-06


  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

Batches:   0%|          | 0/7 [00:00<?, ?it/s]

***** Running evaluation *****


Batches:   0%|          | 0/8 [00:00<?, ?it/s]

[I 2023-12-22 14:18:14,972] Trial 2 finished with value: 0.6782608695652174 and parameters: {'body_learning_rate': 1.3798641017689414e-06, 'max_steps': 304, 'batch_size': 32, 'seed': 33, 'max_iter': 135, 'solver': 'liblinear'}. Best is trial 1 with value: 0.6869565217391305.
Trial: {'body_learning_rate': 2.744456982586778e-06, 'max_steps': 344, 'batch_size': 32, 'seed': 1, 'max_iter': 127, 'solver': 'liblinear'}
model_head.pkl not found on HuggingFace Hub, initialising classification head with random weights. You should TRAIN this model on a downstream task to use it for predictions and inference.
***** Running training *****
  Num examples = 972
  Num epochs = 1
  Total optimization steps = 344
  Total train batch size = 32


Step,Training Loss,Validation Loss,Embedding Loss,Rate
20,No log,No log,0.276,2e-06
40,No log,No log,0.2664,3e-06
60,No log,No log,0.2614,3e-06
80,No log,No log,0.2604,2e-06
100,No log,No log,0.2574,2e-06
120,No log,No log,0.2561,2e-06
140,No log,No log,0.2551,2e-06
160,No log,No log,0.2553,2e-06
180,No log,No log,0.2527,1e-06
200,No log,No log,0.2496,1e-06


  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

Batches:   0%|          | 0/7 [00:00<?, ?it/s]

***** Running evaluation *****


Batches:   0%|          | 0/8 [00:00<?, ?it/s]

[I 2023-12-22 15:34:48,698] Trial 3 finished with value: 0.6652173913043479 and parameters: {'body_learning_rate': 2.744456982586778e-06, 'max_steps': 344, 'batch_size': 32, 'seed': 1, 'max_iter': 127, 'solver': 'liblinear'}. Best is trial 1 with value: 0.6869565217391305.
Trial: {'body_learning_rate': 1.8439453204632744e-07, 'max_steps': 164, 'batch_size': 32, 'seed': 21, 'max_iter': 140, 'solver': 'liblinear'}
model_head.pkl not found on HuggingFace Hub, initialising classification head with random weights. You should TRAIN this model on a downstream task to use it for predictions and inference.
***** Running training *****
  Num examples = 640
  Num epochs = 1
  Total optimization steps = 164
  Total train batch size = 32


Step,Training Loss,Validation Loss,Embedding Loss,Rate
20,No log,No log,0.2801,0.0
40,No log,No log,0.2789,0.0
60,No log,No log,0.2779,0.0
80,No log,No log,0.277,0.0
100,No log,No log,0.2763,0.0
120,No log,No log,0.276,0.0
140,No log,No log,0.2758,0.0
160,No log,No log,0.2756,0.0


  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

Batches:   0%|          | 0/7 [00:00<?, ?it/s]

***** Running evaluation *****


Batches:   0%|          | 0/8 [00:00<?, ?it/s]

[I 2023-12-22 16:10:54,974] Trial 4 finished with value: 0.6782608695652174 and parameters: {'body_learning_rate': 1.8439453204632744e-07, 'max_steps': 164, 'batch_size': 32, 'seed': 21, 'max_iter': 140, 'solver': 'liblinear'}. Best is trial 1 with value: 0.6869565217391305.
Trial: {'body_learning_rate': 1.1411952497523292e-07, 'max_steps': 158, 'batch_size': 32, 'seed': 10, 'max_iter': 133, 'solver': 'liblinear'}
model_head.pkl not found on HuggingFace Hub, initialising classification head with random weights. You should TRAIN this model on a downstream task to use it for predictions and inference.
***** Running training *****
  Num examples = 617
  Num epochs = 1
  Total optimization steps = 158
  Total train batch size = 32


Step,Training Loss,Validation Loss,Embedding Loss,Rate
20,No log,No log,0.2804,0.0
40,No log,No log,0.2797,0.0
60,No log,No log,0.2791,0.0
80,No log,No log,0.2784,0.0
100,No log,No log,0.2779,0.0
120,No log,No log,0.2778,0.0
140,No log,No log,0.2777,0.0


  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

  0%|          | 0/1057 [00:00<?, ?it/s]

Batches:   0%|          | 0/7 [00:00<?, ?it/s]

***** Running evaluation *****


Batches:   0%|          | 0/8 [00:00<?, ?it/s]

[I 2023-12-22 16:42:41,929] Trial 5 finished with value: 0.6826086956521739 and parameters: {'body_learning_rate': 1.1411952497523292e-07, 'max_steps': 158, 'batch_size': 32, 'seed': 10, 'max_iter': 133, 'solver': 'liblinear'}. Best is trial 1 with value: 0.6869565217391305.


In [22]:
print(best_run)

BestRun(run_id='1', objective=0.6869565217391305, hyperparameters={'body_learning_rate': 9.513853403949528e-07, 'max_steps': 232, 'batch_size': 32, 'seed': 35, 'max_iter': 133, 'solver': 'liblinear'}, backend=<optuna.study.study.Study object at 0x78af62442b60>)


Trial 3 finished with value: 0.691304347826087 and parameters: {'body_learning_rate': 4.378056750692589e-07, 'max_steps': 379, 'batch_size': 32, 'seed': 39, 'max_iter': 136, 'solver': 'liblinear'}. Best is trial 3 with value: 0.691304347826087.