# 效能調校(Fine Tuning)作法
- https://colab.research.google.com/github/huggingface/notebooks/blob/master/examples/text_classification.ipynb

# 參數設定

In [1]:
GLUE_TASKS = ["cola", "mnli", "mnli-mm", "mrpc", "qnli", "qqp", "rte", "sst2", "stsb", "wnli"]

In [2]:
# 指定任務為 cola
task = "cola"
# 預先訓練模型
model_checkpoint = "distilbert-base-uncased"
# 批量
batch_size = 16

In [3]:
import datasets

actual_task = "mnli" if task == "mnli-mm" else task
# 載入資料集
dataset = datasets.load_dataset("glue", actual_task)
# 載入效能衡量指標
metric = datasets.load_metric('glue', actual_task)

  from .autonotebook import tqdm as notebook_tqdm
  metric = datasets.load_metric('glue', actual_task)


In [4]:
# CoLA(The Corpus of Linguistic Acceptability): 判斷句子是否合乎語法
actual_task

'cola'

In [5]:
dataset

DatasetDict({
    train: Dataset({
        features: ['sentence', 'label', 'idx'],
        num_rows: 8551
    })
    validation: Dataset({
        features: ['sentence', 'label', 'idx'],
        num_rows: 1043
    })
    test: Dataset({
        features: ['sentence', 'label', 'idx'],
        num_rows: 1063
    })
})

In [6]:
type(dataset)

datasets.dataset_dict.DatasetDict

In [7]:
?datasets.dataset_dict.DatasetDict

[1;31mInit signature:[0m [0mdatasets[0m[1;33m.[0m[0mdataset_dict[0m[1;33m.[0m[0mDatasetDict[0m[1;33m([0m[0mself[0m[1;33m,[0m [1;33m/[0m[1;33m,[0m [1;33m*[0m[0margs[0m[1;33m,[0m [1;33m**[0m[0mkwargs[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m      A dictionary (dict of str: datasets.Dataset) with dataset transforms methods (map, filter, etc.)
[1;31mFile:[0m           c:\users\a4022\anaconda3\envs\gpu\lib\site-packages\datasets\dataset_dict.py
[1;31mType:[0m           type
[1;31mSubclasses:[0m     

### dataset 資料型態為 `DatasetDict`

In [8]:
# 顯示第一筆資料
dataset["train"][0]

{'sentence': "Our friends won't buy this analysis, let alone the next one we propose.",
 'label': 1,
 'idx': 0}

# 定義隨機抽取數據函數

In [9]:
import random
import pandas as pd
from IPython.display import display, HTML

# 隨機抽取資料函數
def show_random_elements(dataset, num_examples=10):
    picks = []
    for _ in range(num_examples):
        pick = random.randint(0, len(dataset)-1)
        while pick in picks:
            pick = random.randint(0, len(dataset)-1)
        picks.append(pick)
    
    df = pd.DataFrame(dataset[picks])
    for column, typ in dataset.features.items():
        if isinstance(typ, datasets.ClassLabel):
            df[column] = df[column].transform(lambda i: typ.names[i])
    display(HTML(df.to_html()))

In [10]:
df = pd.DataFrame(dataset["train"][:30])
df

Unnamed: 0,sentence,label,idx
0,"Our friends won't buy this analysis, let alone...",1,0
1,One more pseudo generalization and I'm giving up.,1,1
2,One more pseudo generalization or I'm giving up.,1,2
3,"The more we study verbs, the crazier they get.",1,3
4,Day by day the facts are getting murkier.,1,4
5,I'll fix you a drink.,1,5
6,Fred watered the plants flat.,1,6
7,Bill coughed his way out of the restaurant.,1,7
8,We're dancing the night away.,1,8
9,Herman hammered the metal flat.,1,9


In [11]:
show_random_elements(dataset["train"])

Unnamed: 0,sentence,label,idx
0,Be very clever.,acceptable,4741
1,There are three Davids in my class.,acceptable,4135
2,The tree lost some branches.,acceptable,596
3,The cat had haven eaten.,unacceptable,6023
4,Pat was neither recommended for promotion nor under any illusions about what that meant.,acceptable,6995
5,I've never known as strong a person as Louise.,acceptable,5471
6,Alison poked the needle into the cloth.,acceptable,2835
7,Carmen obtained Mary a spare part.,unacceptable,2748
8,"John offered, and Harry gave, Sally a Cadillac.",unacceptable,6597
9,Stephen is believed to be easy to annoy Ben.,unacceptable,4628


## 顯示效能衡量指標

In [12]:
# Accuracy(準確率), F1 score,Pearson Correlation(關聯度), Spearman Correlation, Matthew Correlation
metric

Metric(name: "glue", features: {'predictions': Value(dtype='int64', id=None), 'references': Value(dtype='int64', id=None)}, usage: """
Compute GLUE evaluation metric associated to each GLUE dataset.
Args:
    predictions: list of predictions to score.
        Each translation should be tokenized into a list of tokens.
    references: list of lists of references for each translation.
        Each reference should be tokenized into a list of tokens.
Returns: depending on the GLUE subset, one or several of:
    "accuracy": Accuracy
    "f1": F1 score
    "pearson": Pearson Correlation
    "spearmanr": Spearman Correlation
    "matthews_correlation": Matthew Correlation
Examples:

    >>> glue_metric = datasets.load_metric('glue', 'sst2')  # 'sst2' or any of ["mnli", "mnli_mismatched", "mnli_matched", "qnli", "rte", "wnli", "hans"]
    >>> references = [0, 1]
    >>> predictions = [0, 1]
    >>> results = glue_metric.compute(predictions=predictions, references=references)
    >>> print(res

## 產生兩筆隨機亂數，測試效能衡量指標

In [13]:
import numpy as np

fake_preds = np.random.randint(0, 2, size=(64,))
fake_labels = np.random.randint(0, 2, size=(64,))
metric.compute(predictions=fake_preds, references=fake_labels)

{'matthews_correlation': -0.03126526997403612}

Note that `load_metric` has loaded the proper metric associated to your task, which is:

- for CoLA: [Matthews Correlation Coefficient](https://en.wikipedia.org/wiki/Matthews_correlation_coefficient)
- for MNLI (matched or mismatched): Accuracy
- for MRPC: Accuracy and [F1 score](https://en.wikipedia.org/wiki/F1_score)
- for QNLI: Accuracy
- for QQP: Accuracy and [F1 score](https://en.wikipedia.org/wiki/F1_score)
- for RTE: Accuracy
- for SST-2: Accuracy
- for STS-B: [Pearson Correlation Coefficient](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient) and [Spearman's_Rank_Correlation_Coefficient](https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient)
- for WNLI: Accuracy

so the metric object only computes the one(s) needed for your task.

# step3: 分詞

In [14]:
from transformers import AutoTokenizer

# 分詞
# model_checkpoint: "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint, use_fast=True)

## 測試兩筆資料，進行分詞

In [15]:
tokenizer("Hello, this one sentence!", "And this sentence goes with it.")

{'input_ids': [101, 7592, 1010, 2023, 2028, 6251, 999, 102, 1998, 2023, 6251, 3632, 2007, 2009, 1012, 102], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}

## 定義任務的資料集欄位

In [16]:
task_to_keys = {
    "cola": ("sentence", None),
    "mnli": ("premise", "hypothesis"),
    "mnli-mm": ("premise", "hypothesis"),
    "mrpc": ("sentence1", "sentence2"),
    "qnli": ("question", "sentence"),
    "qqp": ("question1", "question2"),
    "rte": ("sentence1", "sentence2"),
    "sst2": ("sentence", None),
    "stsb": ("sentence1", "sentence2"),
    "wnli": ("sentence1", "sentence2"),
}

## 測試第一筆資料

In [17]:
sentence1_key, sentence2_key = task_to_keys[task]
if sentence2_key is None:
    print(f"Sentence: {dataset['train'][0][sentence1_key]}")
else:
    print(f"Sentence 1: {dataset['train'][0][sentence1_key]}")
    print(f"Sentence 2: {dataset['train'][0][sentence2_key]}")

Sentence: Our friends won't buy this analysis, let alone the next one we propose.


## 測試 5 筆資料分詞

In [18]:
def preprocess_function(examples):
    if sentence2_key is None:
        return tokenizer(examples[sentence1_key], truncation=True)
    return tokenizer(examples[sentence1_key], examples[sentence2_key], truncation=True)

preprocess_function(dataset['train'][:5])

{'input_ids': [[101, 2256, 2814, 2180, 1005, 1056, 4965, 2023, 4106, 1010, 2292, 2894, 1996, 2279, 2028, 2057, 16599, 1012, 102], [101, 2028, 2062, 18404, 2236, 3989, 1998, 1045, 1005, 1049, 3228, 2039, 1012, 102], [101, 2028, 2062, 18404, 2236, 3989, 2030, 1045, 1005, 1049, 3228, 2039, 1012, 102], [101, 1996, 2062, 2057, 2817, 16025, 1010, 1996, 13675, 16103, 2121, 2027, 2131, 1012, 102], [101, 2154, 2011, 2154, 1996, 8866, 2024, 2893, 14163, 8024, 3771, 1012, 102]], 'attention_mask': [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]}

In [19]:
dataset

DatasetDict({
    train: Dataset({
        features: ['sentence', 'label', 'idx'],
        num_rows: 8551
    })
    validation: Dataset({
        features: ['sentence', 'label', 'idx'],
        num_rows: 1043
    })
    test: Dataset({
        features: ['sentence', 'label', 'idx'],
        num_rows: 1063
    })
})

To apply this function on all the sentences (or pairs of sentences) in our dataset, we just use the map method of our dataset object we created earlier. This will apply the function on all the elements of all the splits in dataset, so our training, validation and testing data will be preprocessed in one single command.

In [20]:
# 將所有資料進行分詞
encoded_dataset = dataset.map(preprocess_function, batched=True)

# step4: 效能微調(Fine tuning)，先加載預先訓練的模型

In [21]:
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer

# 載入預先訓練的模型
num_labels = 3 if task.startswith("mnli") else 1 if task=="stsb" else 2
model = AutoModelForSequenceClassification.from_pretrained(model_checkpoint, num_labels=num_labels)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


## 定義訓練參數，可參閱 [`TrainingArguments`](https://huggingface.co/transformers/main_classes/trainer.html#transformers.TrainingArguments)

In [22]:
metric_name = "pearson" if task == "stsb" else "matthews_correlation" \
                        if task == "cola" else "accuracy"

args = TrainingArguments(
    "test-glue",
    evaluation_strategy = "epoch",
    save_strategy = "epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    num_train_epochs=5,
    weight_decay=0.01,
    load_best_model_at_end=True,
    metric_for_best_model=metric_name,
)

## 定義效能衡量指標計算的函數

In [23]:
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    if task != "stsb":
        predictions = np.argmax(predictions, axis=1)
    else:
        predictions = predictions[:, 0]
    return metric.compute(predictions=predictions, references=labels)

# step5: 定義訓練者(Trainer)物件

In [24]:
model

DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): MultiHeadSelfAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)
 

In [25]:
validation_key = "validation_mismatched" if task == "mnli-mm" else \
                 "validation_matched" if task == "mnli" else "validation"

trainer = Trainer(
    model,
    args,
    train_dataset=encoded_dataset["train"],
    eval_dataset=encoded_dataset[validation_key],
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)

## 模型訓練

In [26]:
trainer.train()

You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss,Matthews Correlation
1,0.5217,0.462327,0.460087
2,0.3527,0.519836,0.478542
3,0.2325,0.585863,0.512349
4,0.1748,0.758358,0.53997
5,0.1251,0.819873,0.532412


TrainOutput(global_step=2675, training_loss=0.2703213001857294, metrics={'train_runtime': 136.0028, 'train_samples_per_second': 314.369, 'train_steps_per_second': 19.669, 'total_flos': 229437415353012.0, 'train_loss': 0.2703213001857294, 'epoch': 5.0})

# step8: 模型評估

In [27]:
trainer.evaluate()

{'eval_loss': 0.7583578824996948,
 'eval_matthews_correlation': 0.5399695537530284,
 'eval_runtime': 0.642,
 'eval_samples_per_second': 1624.658,
 'eval_steps_per_second': 102.807,
 'epoch': 5.0}

# step9: 模型存檔

In [28]:
trainer.save_model('./cola')

In [29]:
class SimpleDataset:
    def __init__(self, tokenized_texts):
        self.tokenized_texts = tokenized_texts
    
    def __len__(self):
        return len(self.tokenized_texts["input_ids"])
    
    def __getitem__(self, idx):
        return {k: v[idx] for k, v in self.tokenized_texts.items()}

texts = ["Hello, this one sentence!", "And this sentence goes with it."]    
tokenized_texts = tokenizer(texts, padding=True, truncation=True)
new_dataset = SimpleDataset(tokenized_texts)
trainer.predict(new_dataset)

PredictionOutput(predictions=array([[-2.0816252,  2.294584 ],
       [-2.1686707,  2.3378236]], dtype=float32), label_ids=None, metrics={'test_runtime': 0.017, 'test_samples_per_second': 117.455, 'test_steps_per_second': 58.727})

In [30]:
tokenized_texts = tokenizer(["They drank the pub.", "The professor talked us into a stupor."]
                            , padding=True, truncation=True)
new_dataset = SimpleDataset(tokenized_texts)
trainer.predict(new_dataset)

PredictionOutput(predictions=array([[-2.0677142,  2.164078 ],
       [-2.6961114,  2.9182317]], dtype=float32), label_ids=None, metrics={'test_runtime': 0.016, 'test_samples_per_second': 125.261, 'test_steps_per_second': 62.631})

In [31]:
tokenized_texts = tokenizer(["Hello there!", "This is another text"]
                            , padding=True, truncation=True)
new_dataset = SimpleDataset(tokenized_texts)
trainer.predict(new_dataset)

PredictionOutput(predictions=array([[-2.6207142,  2.7085028],
       [-2.2014954,  2.3939314]], dtype=float32), label_ids=None, metrics={'test_runtime': 0.0156, 'test_samples_per_second': 128.439, 'test_steps_per_second': 64.22})

# step7: 效能調整

## Hyperparameter search
- The `Trainer` supports hyperparameter search using [optuna](https://optuna.org/) or [Ray Tune](https://docs.ray.io/en/latest/tune/). For this last section you will need either of those libraries installed, just uncomment the line you want on the next cell and run it.

In [32]:
model_checkpoint

'distilbert-base-uncased'

In [33]:
#! pip install optuna
#! pip install ray[tune]

In [34]:
def model_init():
    return AutoModelForSequenceClassification.from_pretrained(model_checkpoint, num_labels=num_labels)

In [35]:
trainer = Trainer(
    model_init=model_init,
    args=args,
    train_dataset=encoded_dataset["train"],
    eval_dataset=encoded_dataset[validation_key],
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [36]:
best_run = trainer.hyperparameter_search(n_trials=10, direction="maximize")

[I 2023-11-05 01:02:27,713] A new study created in memory with name: no-name-bc6df64a-97b3-409d-a0b3-61bf08ec5e40
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Matthews Correlation
1,No log,0.533316,0.355453
2,No log,0.500787,0.43145
3,No log,0.504701,0.435431


[I 2023-11-05 01:03:19,568] Trial 0 finished with value: 0.4354313415465737 and parameters: {'learning_rate': 1.0667576859345529e-05, 'num_train_epochs': 3, 'seed': 6, 'per_device_train_batch_size': 64}. Best is trial 0 with value: 0.4354313415465737.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Matthews Correlation
1,0.5399,0.533439,0.366596
2,0.382,0.668247,0.430677
3,0.2462,0.959647,0.430553
4,0.1325,1.073776,0.464223


[I 2023-11-05 01:06:30,159] Trial 1 finished with value: 0.46422323628946977 and parameters: {'learning_rate': 6.599744206500645e-05, 'num_train_epochs': 4, 'seed': 9, 'per_device_train_batch_size': 8}. Best is trial 1 with value: 0.46422323628946977.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Matthews Correlation
1,No log,0.512989,0.394343
2,0.495100,0.516761,0.432928
3,0.495100,0.527688,0.429462
4,0.344800,0.531088,0.447995


[I 2023-11-05 01:08:26,920] Trial 2 finished with value: 0.44799494293545944 and parameters: {'learning_rate': 1.0926819029622011e-05, 'num_train_epochs': 4, 'seed': 11, 'per_device_train_batch_size': 32}. Best is trial 1 with value: 0.46422323628946977.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Matthews Correlation
1,0.5135,0.52926,0.40346
2,0.4092,0.507549,0.44253
3,0.3328,0.612596,0.473412
4,0.2937,0.730784,0.498608
5,0.2809,0.764165,0.484539


[I 2023-11-05 01:12:55,617] Trial 3 finished with value: 0.48453921534304883 and parameters: {'learning_rate': 6.288787932757869e-06, 'num_train_epochs': 5, 'seed': 40, 'per_device_train_batch_size': 8}. Best is trial 3 with value: 0.48453921534304883.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Matthews Correlation
1,No log,0.572184,0.0
2,No log,0.525554,0.380985
3,No log,0.514579,0.412227
4,0.512100,0.513608,0.432808
5,0.512100,0.508384,0.433597


[I 2023-11-05 01:15:11,489] Trial 4 finished with value: 0.4335972500415117 and parameters: {'learning_rate': 5.326254733649814e-06, 'num_train_epochs': 5, 'seed': 21, 'per_device_train_batch_size': 64}. Best is trial 3 with value: 0.48453921534304883.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Matthews Correlation
1,0.5541,0.610807,0.418298


[I 2023-11-05 01:16:44,082] Trial 5 pruned. 
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Matthews Correlation
1,No log,0.471578,0.429998
2,0.448900,0.464675,0.515768
3,0.448900,0.53747,0.506716
4,0.221000,0.614644,0.511795
5,0.221000,0.66016,0.512324


[I 2023-11-05 01:18:45,860] Trial 6 finished with value: 0.5123240976315012 and parameters: {'learning_rate': 2.237532596962579e-05, 'num_train_epochs': 5, 'seed': 36, 'per_device_train_batch_size': 32}. Best is trial 6 with value: 0.5123240976315012.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Matthews Correlation
1,0.5039,0.476089,0.469169
2,0.3105,0.642732,0.529477


[I 2023-11-05 01:21:19,261] Trial 7 finished with value: 0.5294768861655004 and parameters: {'learning_rate': 4.05349397012981e-05, 'num_train_epochs': 2, 'seed': 3, 'per_device_train_batch_size': 8}. Best is trial 7 with value: 0.5294768861655004.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Matthews Correlation
1,0.5295,0.544174,0.322108


[I 2023-11-05 01:22:10,525] Trial 8 pruned. 
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Matthews Correlation
1,0.5279,0.564417,0.388571


[I 2023-11-05 01:23:54,620] Trial 9 pruned. 
