# Running Pika for CARE

To run Pika for CARE and retrain the model you first need to create an environment for Pika.

This requires creating a new environment and then running this notebook from within that environmemnt.

Pika can be found here: https://github.com/EMCarrami/Pika

### Installation/setup of Pika environment
```
conda create --name pika python=3.10
```

```
conda activate pika
```

```
pip install git+https://github.com/EMCarrami/Pika.git
```

In [2]:
import sys
from pika.main import Pika
from pika.utils.helpers import load_config
import warnings
import logging

warnings.filterwarnings("ignore")
logging.getLogger("transformers").setLevel(logging.ERROR)

  from .autonotebook import tqdm as notebook_tqdm


### Running Pika and retraining

In order to run and re-train Pika you need to create the datasets in the same format. 

This requires creating a `metrics`, `sequences`, `split` and `annotations` file. Additionally the config needs to be updated to refelect this:

```
{
  "seed": 7,
  "datamodule": {
    "sequence_data_path": "sequences.csv",
    "annotations_path": "annotations.csv",
    "metrics_data_path": "metrics.csv",
    "split_path": "split.csv",
    "max_protein_length": 1500,
    "max_text_length": 250,
    "data_types_to_use": ["qa"],
    "sequence_placeholder": "<protein sequence placeholder> ",
    "train_batch_size": 10,
    "eval_batch_size": 2,
    "num_workers": 0
  },
  "model": {
    "language_model": "gpt2",
    "protein_model": "esm2_t6_8M_UR50D",
    "multimodal_strategy": "self-pika",
    "protein_layer_to_use": -1,
    "perceiver_latent_size": 10,
    "num_perceiver_layers": 4,
    "multimodal_layers": [0],
    "enable_gradient_checkpointing": false,
    "lr": 1e-4,
    "weight_decay": 1e-4
  },
  "checkpoint_callback": {
    "checkpoint_path": "test_checkpoint",
    "save_partial_checkpoints": true,
    "checkpoint_monitors": ["loss/val_loss"],
    "checkpoint_modes": ["min"]
  },
  "trainer": {
    "max_epochs": 2,
    "limit_train_batches": 100,
    "limit_val_batches": 1,
    "limit_test_batches": 100
  }
}

```

In [3]:
import sys
from pika.main import Pika
from pika.utils.helpers import load_config
import warnings
import logging
import pandas as pd
import numpy as np


ec_column = 'EC All'

df_train = pd.read_csv('../../splits/task1/protein_train.csv')
rows = []
for entry, seq, ec in df_train[['Entry', 'Sequence', ec_column]].values:
    rows.append([entry, 'qa', f"What is the EC number of this protein? {ec}"])
    
sample_annotations = pd.DataFrame(rows, columns=['uniprot_id', 'type', 'annotation'])
sample_annotations.to_csv('annotations.csv', index=False)

# Also split into a train test and validation set for the model training
from sklearn.model_selection import train_test_split

train, test = train_test_split(df_train, test_size=0.3)
rows = []
for entry, seq, ec in df_train[['Entry', 'Sequence', ec_column]].values:
        rows.append([entry, len(seq), 'train'])
    
for entry, seq, ec in test[['Entry', 'Sequence', ec_column]].values[:int(0.5*(len(test)))]:
    rows.append([entry, len(seq), 'test'])

for entry, seq, ec in test[['Entry', 'Sequence', ec_column]].values[int(0.5*(len(test))):]:
    rows.append([entry, len(seq), 'val'])
    
sample_split = pd.DataFrame(rows, columns=['uniprot_id' , 'protein_length', 'split'])

sample_split.to_csv('split.csv', index=False)

# Next we need to make the metrics
# uniprot_id,metric,value
# A0A068BGA5,is_enzyme,True


## Pika requires knowing other info about the enzyme

Even though this likely doesn't affect the re-training, we update this information as well.

Download the dataset from Pika (they used ChatGPT3.5 to extract the metrics for each protein).

So we wget the metrics and use this to fill in the metrics for the training dataset: 

https://huggingface.co/datasets/EMCarrami/Pika-DS/tree/main/dataset

```
wget https://huggingface.co/datasets/EMCarrami/Pika-DS/resolve/main/dataset/pika_metrics.csv
```



In [6]:
metrics_df = pd.read_csv('pika_metrics.csv')
metrics_df

Unnamed: 0,uniprot_id,metric,value
0,A0A009IHW8,in_membrane,False
1,A0A009IHW8,in_nucleus,False
2,A0A009IHW8,in_mitochondria,False
3,A0A009IHW8,is_enzyme,True
4,A0A009IHW8,mw,30922
...,...,...,...
1432678,W6Q4Q9,in_nucleus,False
1432679,W6Q4Q9,in_mitochondria,False
1432680,W6Q4Q9,is_enzyme,True
1432681,W6Q4Q9,cofactor,mg(2+)


In [7]:
metrics_df = metrics_df[metrics_df['uniprot_id'].isin(list(set(df_train['Entry'].values)))]
metrics_df

Unnamed: 0,uniprot_id,metric,value
0,A0A009IHW8,in_membrane,False
1,A0A009IHW8,in_nucleus,False
2,A0A009IHW8,in_mitochondria,False
3,A0A009IHW8,is_enzyme,True
4,A0A009IHW8,mw,30922
...,...,...,...
1432660,S3DQP8,in_nucleus,False
1432661,S3DQP8,in_mitochondria,False
1432662,S3DQP8,is_enzyme,True
1432663,S3DQP8,cofactor,pyridoxal 5'-phosphate (plp)


In [8]:

from sklearn.model_selection import train_test_split
from tqdm import tqdm

# A0A084R1H6,in_membrane,False
# A0A084R1H6,in_nucleus,False
# A0A084R1H6,in_mitochondria,False
# A0A084R1H6,is_enzyme,True
# A0A084R1H6,mw,263256
rows = []
for entry, seq, ec in tqdm(df_train[['Entry', 'Sequence', ec_column]].values):
    metrics = metrics_df[metrics_df['uniprot_id'] == entry]
    # now we can assign the map for each one
    for metric_name, value in metrics[['metric', 'value']].values:
        rows.append([entry, metric_name, value])

sample_metrics = pd.DataFrame(rows, columns=['uniprot_id' , 'metric', 'value'])
sample_metrics.to_csv('metrics.csv', index=False)


  0%|          | 0/184529 [00:00<?, ?it/s]

100%|██████████| 184529/184529 [59:15<00:00, 51.90it/s] 


In [2]:
import pandas as pd
pd.read_csv('metrics.csv')

Unnamed: 0,uniprot_id,metric,value
0,A0A009IHW8,in_membrane,False
1,A0A009IHW8,in_nucleus,False
2,A0A009IHW8,in_mitochondria,False
3,A0A009IHW8,is_enzyme,True
4,A0A009IHW8,mw,30922
...,...,...,...
451840,Q9J5H2,in_membrane,False
451841,Q9J5H2,in_nucleus,False
451842,Q9J5H2,in_mitochondria,False
451843,Q9J5H2,is_enzyme,True


## Formatting the sequence dataset

We do the same thing with formatting the sequence dataset.

```
wget https://huggingface.co/datasets/EMCarrami/Pika-DS/resolve/main/dataset/pika_sequences.csv
```

In [6]:
seq_df = pd.read_csv('pika_sequences.csv')
# seq_df = seq_df[seq_df['uniprot_id'].isin(list(set(df_train['Entry'].values)))]
# seq_df

In [15]:
seq_df

Unnamed: 0,uniprot_id,uniref_cluster,taxonomy,sequence,length,mw,num_fields,num_summary,num_qa
0,A0A009IHW8,UniRef50_A0A009IHW8,"Bacteria, Pseudomonadota, Gammaproteobacteria",MSLEQKKGADIISKILQIQNSIGKTTSPSTLKTKLSEISRKEQENA...,269,30922,5,6,11
1,A0A067XGX8,UniRef50_A0A067XH53,"Eukaryota, Viridiplantae, Streptophyta",MALTATATTRGGSALPNSCLQTPKFQSLQKPTFISSFPTNKKTKPR...,512,57062,7,5,8
2,A0A067XH53,UniRef50_A0A067XH53,"Eukaryota, Viridiplantae, Streptophyta",MALSTNSTTSSLLPKTPLVQQPLLKNASLPTTTKAIRFIQPISAIH...,533,58894,7,7,8
3,A0A068BGA5,UniRef50_A0A068BGA5,"Eukaryota, Viridiplantae, Streptophyta",MASFPPSLVFTVRRKEPILVLPSKPTPRELKQLSDIDDQEGLRFQV...,456,50972,4,4,6
4,A0A072VHJ1,UniRef50_A0A072VHJ1,"Eukaryota, Viridiplantae, Streptophyta",MSGVPFPSNLLPSPSSPEWLSKADNAWQLMAATLVGMQSVPGLIIL...,481,52906,4,4,7
...,...,...,...,...,...,...,...,...,...
257162,Q9ZWB3,UniRef50_Q9ZWB3,"Eukaryota, Viridiplantae, Streptophyta",MDLVIGGKFKLGRKIGSGSFGELYLGINVQTGEEVAVKLESVKTKH...,471,53002,6,7,8
257163,S3DQP8,UniRef50_S3DQP8,"Eukaryota, Fungi, Dikarya",MTENFPLPPLLGVDWDHLGFEPLEVNGHVECTFSTTTSCWTEPVFV...,358,38895,5,6,7
257164,V6F510,UniRef50_V6F510,"Bacteria, Pseudomonadota, Alphaproteobacteria",MKFENCRDCREEVVWWAFTADICMTLFKGILGLMSGSVALVADSLH...,297,31942,5,5,6
257165,W6KHH6,UniRef50_W6KHH6,"Bacteria, Pseudomonadota, Alphaproteobacteria",MTTAACRKCRDEVIWWAFFINIGQTTYKGVLGVLSGSAALVADAMH...,293,31446,5,6,8


In [10]:
id_to_cluster = dict(zip(seq_df['uniprot_id'], seq_df['uniref_cluster']))
id_to_tax = dict(zip(seq_df['uniprot_id'], seq_df['taxonomy']))
id_to_mw = dict(zip(seq_df['uniprot_id'], seq_df['mw']))
id_to_num_fields = dict(zip(seq_df['uniprot_id'], seq_df['num_fields']))
id_to_summary = dict(zip(seq_df['uniprot_id'], seq_df['num_summary']))



In [13]:
uniprot = pd.read_csv('../../pretrained/raw_data/uniprotkb_AND_reviewed_true_2024_10_21_annot_pika.tsv', sep='\t')
id_to_tax = dict(zip(uniprot['Entry'], uniprot['Taxonomic lineage']))
id_to_mw = dict(zip(uniprot['Entry'], uniprot['Mass']))
id_to_len = dict(zip(uniprot['Entry'], uniprot['Length']))

In [16]:
df_train['Entry'].value_counts()

Entry
Q04828    9
Q95JH6    9
Q2NM15    9
Q5REQ0    9
G8H5N0    8
         ..
B5E3K8    1
B5E3L5    1
B5E3S7    1
B5E3Z6    1
B5E327    1
Name: count, Length: 173550, dtype: int64

In [18]:
# This time we'll make a map for each to make this more efficient

# Save the training sequences
rows = []
for entry, seq, ec, clust in df_train[['Entry', 'Sequence', ec_column, 'clusterRes50']].values:
    rows.append([entry, f'UniRef50_{clust}', id_to_tax.get(entry), seq, len(seq), id_to_mw.get(entry), 1, 1, 1])
    
sample_seqs = pd.DataFrame(rows, columns=['uniprot_id', 'uniref_cluster', 'taxonomy', 'sequence', 'length', 'mw', 'num_fields', 'num_summary', 'num_qa'])

sample_seqs.to_csv('sequences.csv', index=False)

In [19]:
sample_seqs

Unnamed: 0,uniprot_id,uniref_cluster,taxonomy,sequence,length,mw,num_fields,num_summary,num_qa
0,A0A009IHW8,UniRef50_A0A009IHW8,"cellular organisms (no rank), Bacteria (superk...",MSLEQKKGADIISKILQIQNSIGKTTSPSTLKTKLSEISRKEQENA...,269,30922.0,1,1,1
1,A0A023I7E1,UniRef50_A0A023I7E1,"cellular organisms (no rank), Eukaryota (super...",MRFQVIVAAATITMITSYIPGVASQSTSDGDDLFVPVSNFDPKSIF...,796,89495.0,1,1,1
2,A0A024SC78,UniRef50_A8QPD8,"cellular organisms (no rank), Eukaryota (super...",MRSLAILTTLLAGHAFAYPKPAPQSVNRRDWPSINEFLSELAKVMP...,248,25924.0,1,1,1
3,A0A024SH76,UniRef50_A1CCN4,"cellular organisms (no rank), Eukaryota (super...",MIVGILTTLATLATLAASVPLEERQACSSVWGQCGGQNWSGPTCCA...,471,49653.0,1,1,1
4,A0A044RE18,UniRef50_A0A044RE18,"cellular organisms (no rank), Eukaryota (super...",MYWQLVRILVLFDCLQKILAIEHDSICIADVDDACPEPSHTVMRLR...,693,76800.0,1,1,1
...,...,...,...,...,...,...,...,...,...
184524,Q05115,UniRef50_Q05115,"cellular organisms (no rank), Bacteria (superk...",MQQASTPTIGMIVPPAAGLVPADGARLYPDLPFIASGLGLGSVTPE...,240,24735.0,1,1,1
184525,Q6HX62,UniRef50_Q6HX62,"cellular organisms (no rank), Bacteria (superk...",MGQNQFRWSNEQLREHVEIIDGTRSPHKLLKNATYLNSYIREWMQA...,584,66760.0,1,1,1
184526,Q6L032,UniRef50_Q6L032,"cellular organisms (no rank), Archaea (superki...",MLLKNIKISNDYNIFMIIASRKPSLKDIYKIIKVSKFDEPADLIIE...,573,65635.0,1,1,1
184527,Q94MV8,UniRef50_P39262,"Viruses (superkingdom), Duplodnaviria (clade),...",MAHFNECAHLIEGVDKANRAYAENIMHNIDPLQVMLDMQRHLQIRL...,172,20191.0,1,1,1


## Re-train Pika

In [20]:
# Make the model 
# prep config
assets_path = "../assets/"
config = load_config("pika_config.json")
config["datamodule"]["split_path"] = "split.csv"
model = Pika(config)
model.train()

# For each of the test sets we want to 
splits = ['30', '30-50', 'price', 'promiscuous']
rows = []
for split in splits: 
    df_test = pd.read_csv(f'../.../splits/task1/{split}_protein_test.csv')
    
    for entry, seq in df_test[['Entry', 'Sequence']].values:
        ec = model.enquire(
            proteins=seq,
            question="What is the EC number of this protein?"
        )
        rows.append([split, seq, entry, '|'.join(ec)])
saving_df = pd.DataFrame(rows, columns=['Split', 'seq', 'Entry', 'EC'])
saving_df.to_csv(f'all_test_datasets_output.csv', index=False)


### Save the results now individually 

df = saving_df.copy()

# The datasets we want to go through
splits = ['30', '30-50', 'price', 'promiscuous']

for split in splits:
    
    # Entry,EC number,
    sub_df = df[df['Split'] == split]
    # Make the enrty to the EC 
    test_df = pd.read_csv(f'../../splits/task1/{split}_protein_test.csv')

    # Make sure the EC is clean
    sub_df['EC number'] = [e.strip() for e in sub_df['EC'].values]
    
    # Make the EC format the same as the other datasets
    entry_to_ec = dict(zip(sub_df['Entry'], sub_df['EC number']))
    test_df['0'] = [entry_to_ec.get(e) for e in test_df['Entry'].values]
    
    test_df.to_csv(f'../results_summary/Pika/{split}_protein_test_results_df.csv', index=False)

Seed set to 7
Using cache found in /disk1/ariane/.cache/torch/hub/facebookresearch_esm_main
[32m2024-10-23 16:42:40.646[0m | [1mINFO    [0m | [36mpika.datamodule.pika_datamodule[0m:[36m__init__[0m:[36m76[0m - [1mloading data from sequences.csv, annotations.csv & metrics.csv[0m
[32m2024-10-23 16:42:42.450[0m | [1mINFO    [0m | [36mpika.datamodule.pika_datamodule[0m:[36m__init__[0m:[36m130[0m - [1mpreparing examples[0m
[32m2024-10-23 16:42:42.473[0m | [1mINFO    [0m | [36mpika.datamodule.pika_torch_datasets[0m:[36m__init__[0m:[36m25[0m - [1mpreparing train dataset[0m
[32m2024-10-23 16:42:42.549[0m | [1mINFO    [0m | [36mpika.datamodule.pika_torch_datasets[0m:[36m__init__[0m:[36m25[0m - [1mpreparing val dataset[0m
[32m2024-10-23 16:42:42.622[0m | [1mINFO    [0m | [36mpika.datamodule.pika_torch_datasets[0m:[36m__init__[0m:[36m69[0m - [1mpreparing val metrics dataset[0m
GPU available: True (cuda), used: True
TPU available: False, u

Epoch 0: 100%|██████████| 100/100 [00:06<00:00, 15.26it/s, v_num=0, loss/train_loss=1.200, loss/val_loss=0.860]

Epoch 0, global step 100: 'loss/val_loss' reached 0.86033 (best 0.86033), saving model to '/disk1/ariane/vscode/CARE/task1_baselines/Pika/test_checkpoint/241023164243_self-pika_esm2_t6_8M_UR50D_gpt2_7_loss/epoch=0-step=100.ckpt' as top 1


Epoch 1: 100%|██████████| 100/100 [00:05<00:00, 16.78it/s, v_num=0, loss/train_loss=0.647, loss/val_loss=0.629]

Epoch 1, global step 200: 'loss/val_loss' reached 0.62949 (best 0.62949), saving model to '/disk1/ariane/vscode/CARE/task1_baselines/Pika/test_checkpoint/241023164243_self-pika_esm2_t6_8M_UR50D_gpt2_7_loss/epoch=1-step=200.ckpt' as top 1


Epoch 2: 100%|██████████| 100/100 [00:06<00:00, 15.56it/s, v_num=0, loss/train_loss=0.550, loss/val_loss=0.651]

Epoch 2, global step 300: 'loss/val_loss' was not in top 1


Epoch 3: 100%|██████████| 100/100 [00:06<00:00, 15.48it/s, v_num=0, loss/train_loss=0.702, loss/val_loss=0.589]

Epoch 3, global step 400: 'loss/val_loss' reached 0.58864 (best 0.58864), saving model to '/disk1/ariane/vscode/CARE/task1_baselines/Pika/test_checkpoint/241023164243_self-pika_esm2_t6_8M_UR50D_gpt2_7_loss/epoch=3-step=400.ckpt' as top 1


Epoch 4: 100%|██████████| 100/100 [00:06<00:00, 15.95it/s, v_num=0, loss/train_loss=0.433, loss/val_loss=0.633]

Epoch 4, global step 500: 'loss/val_loss' was not in top 1


Epoch 5: 100%|██████████| 100/100 [00:06<00:00, 15.16it/s, v_num=0, loss/train_loss=0.454, loss/val_loss=0.648]

Epoch 5, global step 600: 'loss/val_loss' was not in top 1


Epoch 6: 100%|██████████| 100/100 [00:06<00:00, 15.43it/s, v_num=0, loss/train_loss=0.559, loss/val_loss=0.599]

Epoch 6, global step 700: 'loss/val_loss' was not in top 1


Epoch 7: 100%|██████████| 100/100 [00:06<00:00, 15.18it/s, v_num=0, loss/train_loss=0.417, loss/val_loss=0.614]

Epoch 7, global step 800: 'loss/val_loss' was not in top 1


Epoch 8: 100%|██████████| 100/100 [00:06<00:00, 15.33it/s, v_num=0, loss/train_loss=0.529, loss/val_loss=0.591]

Epoch 8, global step 900: 'loss/val_loss' was not in top 1


Epoch 9: 100%|██████████| 100/100 [00:06<00:00, 15.34it/s, v_num=0, loss/train_loss=0.464, loss/val_loss=0.613]

Epoch 9, global step 1000: 'loss/val_loss' was not in top 1


Epoch 10: 100%|██████████| 100/100 [00:06<00:00, 15.38it/s, v_num=0, loss/train_loss=0.544, loss/val_loss=0.609]

Epoch 10, global step 1100: 'loss/val_loss' was not in top 1


Epoch 11: 100%|██████████| 100/100 [00:06<00:00, 15.42it/s, v_num=0, loss/train_loss=0.522, loss/val_loss=0.618]

Epoch 11, global step 1200: 'loss/val_loss' was not in top 1


Epoch 12: 100%|██████████| 100/100 [00:06<00:00, 15.58it/s, v_num=0, loss/train_loss=0.471, loss/val_loss=0.604]

Epoch 12, global step 1300: 'loss/val_loss' was not in top 1


Epoch 13: 100%|██████████| 100/100 [00:06<00:00, 15.49it/s, v_num=0, loss/train_loss=0.518, loss/val_loss=0.604]

Epoch 13, global step 1400: 'loss/val_loss' was not in top 1


Epoch 14: 100%|██████████| 100/100 [00:06<00:00, 15.40it/s, v_num=0, loss/train_loss=0.417, loss/val_loss=0.602]

Epoch 14, global step 1500: 'loss/val_loss' was not in top 1


Epoch 15: 100%|██████████| 100/100 [00:06<00:00, 15.47it/s, v_num=0, loss/train_loss=0.346, loss/val_loss=0.629]

Epoch 15, global step 1600: 'loss/val_loss' was not in top 1


Epoch 16: 100%|██████████| 100/100 [00:06<00:00, 15.22it/s, v_num=0, loss/train_loss=0.477, loss/val_loss=0.671]

Epoch 16, global step 1700: 'loss/val_loss' was not in top 1


Epoch 17: 100%|██████████| 100/100 [00:06<00:00, 15.29it/s, v_num=0, loss/train_loss=0.484, loss/val_loss=0.664]

Epoch 17, global step 1800: 'loss/val_loss' was not in top 1


Epoch 18: 100%|██████████| 100/100 [00:06<00:00, 15.18it/s, v_num=0, loss/train_loss=0.309, loss/val_loss=0.567]

Epoch 18, global step 1900: 'loss/val_loss' reached 0.56679 (best 0.56679), saving model to '/disk1/ariane/vscode/CARE/task1_baselines/Pika/test_checkpoint/241023164243_self-pika_esm2_t6_8M_UR50D_gpt2_7_loss/epoch=18-step=1900.ckpt' as top 1


Epoch 19: 100%|██████████| 100/100 [00:06<00:00, 15.54it/s, v_num=0, loss/train_loss=0.400, loss/val_loss=0.607]

Epoch 19, global step 2000: 'loss/val_loss' was not in top 1


Epoch 20: 100%|██████████| 100/100 [00:06<00:00, 15.35it/s, v_num=0, loss/train_loss=0.354, loss/val_loss=0.641]

Epoch 20, global step 2100: 'loss/val_loss' was not in top 1


Epoch 21: 100%|██████████| 100/100 [00:06<00:00, 15.74it/s, v_num=0, loss/train_loss=0.326, loss/val_loss=0.588]

Epoch 21, global step 2200: 'loss/val_loss' was not in top 1


Epoch 22: 100%|██████████| 100/100 [00:06<00:00, 15.29it/s, v_num=0, loss/train_loss=0.354, loss/val_loss=0.662]

Epoch 22, global step 2300: 'loss/val_loss' was not in top 1


Epoch 23: 100%|██████████| 100/100 [00:06<00:00, 15.66it/s, v_num=0, loss/train_loss=0.403, loss/val_loss=0.597]

Epoch 23, global step 2400: 'loss/val_loss' was not in top 1


Epoch 24: 100%|██████████| 100/100 [00:06<00:00, 15.10it/s, v_num=0, loss/train_loss=0.420, loss/val_loss=0.637]

Epoch 24, global step 2500: 'loss/val_loss' was not in top 1


Epoch 25: 100%|██████████| 100/100 [00:06<00:00, 15.57it/s, v_num=0, loss/train_loss=0.429, loss/val_loss=0.634]

Epoch 25, global step 2600: 'loss/val_loss' was not in top 1


Epoch 26: 100%|██████████| 100/100 [00:06<00:00, 15.46it/s, v_num=0, loss/train_loss=0.422, loss/val_loss=0.673]

Epoch 26, global step 2700: 'loss/val_loss' was not in top 1


Epoch 27: 100%|██████████| 100/100 [00:06<00:00, 15.21it/s, v_num=0, loss/train_loss=0.435, loss/val_loss=0.673]

Epoch 27, global step 2800: 'loss/val_loss' was not in top 1


Epoch 28: 100%|██████████| 100/100 [00:06<00:00, 15.35it/s, v_num=0, loss/train_loss=0.385, loss/val_loss=0.657]

Epoch 28, global step 2900: 'loss/val_loss' was not in top 1


Epoch 29: 100%|██████████| 100/100 [00:06<00:00, 15.25it/s, v_num=0, loss/train_loss=0.382, loss/val_loss=0.607]

Epoch 29, global step 3000: 'loss/val_loss' was not in top 1


Epoch 30: 100%|██████████| 100/100 [00:06<00:00, 15.14it/s, v_num=0, loss/train_loss=0.359, loss/val_loss=0.714]

Epoch 30, global step 3100: 'loss/val_loss' was not in top 1


Epoch 31: 100%|██████████| 100/100 [00:06<00:00, 15.60it/s, v_num=0, loss/train_loss=0.436, loss/val_loss=0.641]

Epoch 31, global step 3200: 'loss/val_loss' was not in top 1


Epoch 32: 100%|██████████| 100/100 [00:06<00:00, 15.94it/s, v_num=0, loss/train_loss=0.406, loss/val_loss=0.658]

Epoch 32, global step 3300: 'loss/val_loss' was not in top 1


Epoch 33: 100%|██████████| 100/100 [00:06<00:00, 15.66it/s, v_num=0, loss/train_loss=0.315, loss/val_loss=0.649]

Epoch 33, global step 3400: 'loss/val_loss' was not in top 1


Epoch 34: 100%|██████████| 100/100 [00:06<00:00, 15.46it/s, v_num=0, loss/train_loss=0.382, loss/val_loss=0.721]

Epoch 34, global step 3500: 'loss/val_loss' was not in top 1


Epoch 35: 100%|██████████| 100/100 [00:06<00:00, 15.96it/s, v_num=0, loss/train_loss=0.428, loss/val_loss=0.742]

Epoch 35, global step 3600: 'loss/val_loss' was not in top 1


Epoch 36: 100%|██████████| 100/100 [00:06<00:00, 15.34it/s, v_num=0, loss/train_loss=0.303, loss/val_loss=0.635]

Epoch 36, global step 3700: 'loss/val_loss' was not in top 1


Epoch 37: 100%|██████████| 100/100 [00:06<00:00, 15.98it/s, v_num=0, loss/train_loss=0.343, loss/val_loss=0.706]

Epoch 37, global step 3800: 'loss/val_loss' was not in top 1


Epoch 38: 100%|██████████| 100/100 [00:06<00:00, 15.74it/s, v_num=0, loss/train_loss=0.276, loss/val_loss=0.590]

Epoch 38, global step 3900: 'loss/val_loss' was not in top 1


Epoch 39: 100%|██████████| 100/100 [00:06<00:00, 15.34it/s, v_num=0, loss/train_loss=0.412, loss/val_loss=0.578]

Epoch 39, global step 4000: 'loss/val_loss' was not in top 1


Epoch 40: 100%|██████████| 100/100 [00:06<00:00, 15.86it/s, v_num=0, loss/train_loss=0.380, loss/val_loss=0.705]

Epoch 40, global step 4100: 'loss/val_loss' was not in top 1


Epoch 41: 100%|██████████| 100/100 [00:06<00:00, 15.68it/s, v_num=0, loss/train_loss=0.374, loss/val_loss=0.653]

Epoch 41, global step 4200: 'loss/val_loss' was not in top 1


Epoch 42: 100%|██████████| 100/100 [00:06<00:00, 15.43it/s, v_num=0, loss/train_loss=0.381, loss/val_loss=0.687]

Epoch 42, global step 4300: 'loss/val_loss' was not in top 1


Epoch 43: 100%|██████████| 100/100 [00:06<00:00, 15.58it/s, v_num=0, loss/train_loss=0.467, loss/val_loss=0.713]

Epoch 43, global step 4400: 'loss/val_loss' was not in top 1


Epoch 44: 100%|██████████| 100/100 [00:06<00:00, 15.71it/s, v_num=0, loss/train_loss=0.347, loss/val_loss=0.734]

Epoch 44, global step 4500: 'loss/val_loss' was not in top 1


Epoch 45: 100%|██████████| 100/100 [00:06<00:00, 15.33it/s, v_num=0, loss/train_loss=0.392, loss/val_loss=0.700]

Epoch 45, global step 4600: 'loss/val_loss' was not in top 1


Epoch 46: 100%|██████████| 100/100 [00:06<00:00, 15.38it/s, v_num=0, loss/train_loss=0.413, loss/val_loss=0.856]

Epoch 46, global step 4700: 'loss/val_loss' was not in top 1


Epoch 47: 100%|██████████| 100/100 [00:06<00:00, 15.24it/s, v_num=0, loss/train_loss=0.291, loss/val_loss=0.744]

Epoch 47, global step 4800: 'loss/val_loss' was not in top 1


Epoch 48: 100%|██████████| 100/100 [00:06<00:00, 15.65it/s, v_num=0, loss/train_loss=0.464, loss/val_loss=0.736]

Epoch 48, global step 4900: 'loss/val_loss' was not in top 1


Epoch 49: 100%|██████████| 100/100 [00:06<00:00, 14.99it/s, v_num=0, loss/train_loss=0.243, loss/val_loss=0.786]

Epoch 49, global step 5000: 'loss/val_loss' was not in top 1


Epoch 50: 100%|██████████| 100/100 [00:06<00:00, 15.31it/s, v_num=0, loss/train_loss=0.294, loss/val_loss=0.758]

Epoch 50, global step 5100: 'loss/val_loss' was not in top 1


Epoch 51: 100%|██████████| 100/100 [00:06<00:00, 15.61it/s, v_num=0, loss/train_loss=0.353, loss/val_loss=0.720]

Epoch 51, global step 5200: 'loss/val_loss' was not in top 1


Epoch 52: 100%|██████████| 100/100 [00:06<00:00, 15.48it/s, v_num=0, loss/train_loss=0.331, loss/val_loss=0.733]

Epoch 52, global step 5300: 'loss/val_loss' was not in top 1


Epoch 53: 100%|██████████| 100/100 [00:06<00:00, 15.52it/s, v_num=0, loss/train_loss=0.305, loss/val_loss=0.737]

Epoch 53, global step 5400: 'loss/val_loss' was not in top 1


Epoch 54: 100%|██████████| 100/100 [00:06<00:00, 15.70it/s, v_num=0, loss/train_loss=0.343, loss/val_loss=0.753]

Epoch 54, global step 5500: 'loss/val_loss' was not in top 1


Epoch 55: 100%|██████████| 100/100 [00:06<00:00, 15.62it/s, v_num=0, loss/train_loss=0.334, loss/val_loss=0.741]

Epoch 55, global step 5600: 'loss/val_loss' was not in top 1


Epoch 56: 100%|██████████| 100/100 [00:06<00:00, 15.28it/s, v_num=0, loss/train_loss=0.293, loss/val_loss=0.889]

Epoch 56, global step 5700: 'loss/val_loss' was not in top 1


Epoch 57: 100%|██████████| 100/100 [00:06<00:00, 15.49it/s, v_num=0, loss/train_loss=0.314, loss/val_loss=0.830]

Epoch 57, global step 5800: 'loss/val_loss' was not in top 1


Epoch 58: 100%|██████████| 100/100 [00:06<00:00, 15.68it/s, v_num=0, loss/train_loss=0.276, loss/val_loss=0.810]

Epoch 58, global step 5900: 'loss/val_loss' was not in top 1


Epoch 59: 100%|██████████| 100/100 [00:06<00:00, 15.43it/s, v_num=0, loss/train_loss=0.390, loss/val_loss=0.720]

Epoch 59, global step 6000: 'loss/val_loss' was not in top 1


Epoch 60: 100%|██████████| 100/100 [00:06<00:00, 15.66it/s, v_num=0, loss/train_loss=0.391, loss/val_loss=0.851]

Epoch 60, global step 6100: 'loss/val_loss' was not in top 1


Epoch 61: 100%|██████████| 100/100 [00:06<00:00, 15.31it/s, v_num=0, loss/train_loss=0.403, loss/val_loss=0.808]

Epoch 61, global step 6200: 'loss/val_loss' was not in top 1


Epoch 62: 100%|██████████| 100/100 [00:06<00:00, 15.13it/s, v_num=0, loss/train_loss=0.296, loss/val_loss=0.822]

Epoch 62, global step 6300: 'loss/val_loss' was not in top 1


Epoch 63: 100%|██████████| 100/100 [00:06<00:00, 15.82it/s, v_num=0, loss/train_loss=0.312, loss/val_loss=0.782]

Epoch 63, global step 6400: 'loss/val_loss' was not in top 1


Epoch 64: 100%|██████████| 100/100 [00:06<00:00, 15.25it/s, v_num=0, loss/train_loss=0.324, loss/val_loss=0.798]

Epoch 64, global step 6500: 'loss/val_loss' was not in top 1


Epoch 65: 100%|██████████| 100/100 [00:06<00:00, 15.34it/s, v_num=0, loss/train_loss=0.264, loss/val_loss=0.822]

Epoch 65, global step 6600: 'loss/val_loss' was not in top 1


Epoch 66: 100%|██████████| 100/100 [00:06<00:00, 15.23it/s, v_num=0, loss/train_loss=0.238, loss/val_loss=0.772]

Epoch 66, global step 6700: 'loss/val_loss' was not in top 1


Epoch 67: 100%|██████████| 100/100 [00:06<00:00, 15.68it/s, v_num=0, loss/train_loss=0.279, loss/val_loss=0.741]

Epoch 67, global step 6800: 'loss/val_loss' was not in top 1


Epoch 68: 100%|██████████| 100/100 [00:06<00:00, 15.09it/s, v_num=0, loss/train_loss=0.356, loss/val_loss=0.721]

Epoch 68, global step 6900: 'loss/val_loss' was not in top 1


Epoch 69: 100%|██████████| 100/100 [00:06<00:00, 15.91it/s, v_num=0, loss/train_loss=0.236, loss/val_loss=0.728]

Epoch 69, global step 7000: 'loss/val_loss' was not in top 1


Epoch 70: 100%|██████████| 100/100 [00:06<00:00, 15.70it/s, v_num=0, loss/train_loss=0.230, loss/val_loss=0.752]

Epoch 70, global step 7100: 'loss/val_loss' was not in top 1


Epoch 71: 100%|██████████| 100/100 [00:06<00:00, 15.44it/s, v_num=0, loss/train_loss=0.310, loss/val_loss=0.726]

Epoch 71, global step 7200: 'loss/val_loss' was not in top 1


Epoch 72: 100%|██████████| 100/100 [00:06<00:00, 15.04it/s, v_num=0, loss/train_loss=0.244, loss/val_loss=0.664]

Epoch 72, global step 7300: 'loss/val_loss' was not in top 1


Epoch 73: 100%|██████████| 100/100 [00:06<00:00, 15.31it/s, v_num=0, loss/train_loss=0.273, loss/val_loss=0.722]

Epoch 73, global step 7400: 'loss/val_loss' was not in top 1


Epoch 74: 100%|██████████| 100/100 [00:06<00:00, 15.28it/s, v_num=0, loss/train_loss=0.335, loss/val_loss=0.664]

Epoch 74, global step 7500: 'loss/val_loss' was not in top 1


Epoch 75: 100%|██████████| 100/100 [00:06<00:00, 15.36it/s, v_num=0, loss/train_loss=0.196, loss/val_loss=0.706]

Epoch 75, global step 7600: 'loss/val_loss' was not in top 1


Epoch 76: 100%|██████████| 100/100 [00:06<00:00, 15.04it/s, v_num=0, loss/train_loss=0.282, loss/val_loss=0.679]

Epoch 76, global step 7700: 'loss/val_loss' was not in top 1


Epoch 77: 100%|██████████| 100/100 [00:06<00:00, 15.68it/s, v_num=0, loss/train_loss=0.315, loss/val_loss=0.732]

Epoch 77, global step 7800: 'loss/val_loss' was not in top 1


Epoch 78: 100%|██████████| 100/100 [00:06<00:00, 16.09it/s, v_num=0, loss/train_loss=0.308, loss/val_loss=0.722]

Epoch 78, global step 7900: 'loss/val_loss' was not in top 1


Epoch 79: 100%|██████████| 100/100 [00:06<00:00, 15.26it/s, v_num=0, loss/train_loss=0.326, loss/val_loss=0.743]

Epoch 79, global step 8000: 'loss/val_loss' was not in top 1


Epoch 80: 100%|██████████| 100/100 [00:06<00:00, 15.27it/s, v_num=0, loss/train_loss=0.320, loss/val_loss=0.751]

Epoch 80, global step 8100: 'loss/val_loss' was not in top 1


Epoch 81: 100%|██████████| 100/100 [00:06<00:00, 15.03it/s, v_num=0, loss/train_loss=0.300, loss/val_loss=0.739]

Epoch 81, global step 8200: 'loss/val_loss' was not in top 1


Epoch 82: 100%|██████████| 100/100 [00:06<00:00, 15.81it/s, v_num=0, loss/train_loss=0.319, loss/val_loss=0.697]

Epoch 82, global step 8300: 'loss/val_loss' was not in top 1


Epoch 83: 100%|██████████| 100/100 [00:06<00:00, 15.92it/s, v_num=0, loss/train_loss=0.261, loss/val_loss=0.770]

Epoch 83, global step 8400: 'loss/val_loss' was not in top 1


Epoch 84: 100%|██████████| 100/100 [00:06<00:00, 15.33it/s, v_num=0, loss/train_loss=0.305, loss/val_loss=0.740]

Epoch 84, global step 8500: 'loss/val_loss' was not in top 1


Epoch 85: 100%|██████████| 100/100 [00:06<00:00, 15.53it/s, v_num=0, loss/train_loss=0.358, loss/val_loss=0.709]

Epoch 85, global step 8600: 'loss/val_loss' was not in top 1


Epoch 86: 100%|██████████| 100/100 [00:06<00:00, 15.82it/s, v_num=0, loss/train_loss=0.137, loss/val_loss=0.699]

Epoch 86, global step 8700: 'loss/val_loss' was not in top 1


Epoch 87: 100%|██████████| 100/100 [00:06<00:00, 15.27it/s, v_num=0, loss/train_loss=0.392, loss/val_loss=0.715]

Epoch 87, global step 8800: 'loss/val_loss' was not in top 1


Epoch 88: 100%|██████████| 100/100 [00:06<00:00, 15.29it/s, v_num=0, loss/train_loss=0.303, loss/val_loss=0.713]

Epoch 88, global step 8900: 'loss/val_loss' was not in top 1


Epoch 89: 100%|██████████| 100/100 [00:05<00:00, 17.20it/s, v_num=0, loss/train_loss=0.259, loss/val_loss=0.729]

Epoch 89, global step 9000: 'loss/val_loss' was not in top 1


Epoch 90: 100%|██████████| 100/100 [00:06<00:00, 15.53it/s, v_num=0, loss/train_loss=0.359, loss/val_loss=0.741]

Epoch 90, global step 9100: 'loss/val_loss' was not in top 1


Epoch 91: 100%|██████████| 100/100 [00:06<00:00, 15.68it/s, v_num=0, loss/train_loss=0.283, loss/val_loss=0.712]

Epoch 91, global step 9200: 'loss/val_loss' was not in top 1


Epoch 92: 100%|██████████| 100/100 [00:06<00:00, 15.34it/s, v_num=0, loss/train_loss=0.250, loss/val_loss=0.651]

Epoch 92, global step 9300: 'loss/val_loss' was not in top 1


Epoch 93: 100%|██████████| 100/100 [00:06<00:00, 15.52it/s, v_num=0, loss/train_loss=0.262, loss/val_loss=0.609]

Epoch 93, global step 9400: 'loss/val_loss' was not in top 1


Epoch 94: 100%|██████████| 100/100 [00:06<00:00, 15.67it/s, v_num=0, loss/train_loss=0.339, loss/val_loss=0.668]

Epoch 94, global step 9500: 'loss/val_loss' was not in top 1


Epoch 95: 100%|██████████| 100/100 [00:06<00:00, 15.22it/s, v_num=0, loss/train_loss=0.280, loss/val_loss=0.665]

Epoch 95, global step 9600: 'loss/val_loss' was not in top 1


Epoch 96: 100%|██████████| 100/100 [00:06<00:00, 15.22it/s, v_num=0, loss/train_loss=0.312, loss/val_loss=0.682]

Epoch 96, global step 9700: 'loss/val_loss' was not in top 1


Epoch 97: 100%|██████████| 100/100 [00:06<00:00, 15.95it/s, v_num=0, loss/train_loss=0.448, loss/val_loss=0.584]

Epoch 97, global step 9800: 'loss/val_loss' was not in top 1


Epoch 98: 100%|██████████| 100/100 [00:06<00:00, 15.50it/s, v_num=0, loss/train_loss=0.178, loss/val_loss=0.600]

Epoch 98, global step 9900: 'loss/val_loss' was not in top 1


Epoch 99: 100%|██████████| 100/100 [00:06<00:00, 15.30it/s, v_num=0, loss/train_loss=0.332, loss/val_loss=0.559]

Epoch 99, global step 10000: 'loss/val_loss' reached 0.55868 (best 0.55868), saving model to '/disk1/ariane/vscode/CARE/task1_baselines/Pika/test_checkpoint/241023164243_self-pika_esm2_t6_8M_UR50D_gpt2_7_loss/epoch=99-step=10000.ckpt' as top 1


Epoch 100: 100%|██████████| 100/100 [00:06<00:00, 15.61it/s, v_num=0, loss/train_loss=0.285, loss/val_loss=0.567]

Epoch 100, global step 10100: 'loss/val_loss' was not in top 1


Epoch 101: 100%|██████████| 100/100 [00:06<00:00, 15.74it/s, v_num=0, loss/train_loss=0.212, loss/val_loss=0.548]

Epoch 101, global step 10200: 'loss/val_loss' reached 0.54766 (best 0.54766), saving model to '/disk1/ariane/vscode/CARE/task1_baselines/Pika/test_checkpoint/241023164243_self-pika_esm2_t6_8M_UR50D_gpt2_7_loss/epoch=101-step=10200.ckpt' as top 1


Epoch 102: 100%|██████████| 100/100 [00:06<00:00, 16.02it/s, v_num=0, loss/train_loss=0.291, loss/val_loss=0.626]

Epoch 102, global step 10300: 'loss/val_loss' was not in top 1


Epoch 103: 100%|██████████| 100/100 [00:06<00:00, 15.73it/s, v_num=0, loss/train_loss=0.245, loss/val_loss=0.680]

Epoch 103, global step 10400: 'loss/val_loss' was not in top 1


Epoch 104: 100%|██████████| 100/100 [00:06<00:00, 15.71it/s, v_num=0, loss/train_loss=0.286, loss/val_loss=0.720]

Epoch 104, global step 10500: 'loss/val_loss' was not in top 1


Epoch 105: 100%|██████████| 100/100 [00:06<00:00, 16.18it/s, v_num=0, loss/train_loss=0.294, loss/val_loss=0.753]

Epoch 105, global step 10600: 'loss/val_loss' was not in top 1


Epoch 106: 100%|██████████| 100/100 [00:06<00:00, 15.76it/s, v_num=0, loss/train_loss=0.317, loss/val_loss=0.806]

Epoch 106, global step 10700: 'loss/val_loss' was not in top 1


Epoch 107: 100%|██████████| 100/100 [00:06<00:00, 15.96it/s, v_num=0, loss/train_loss=0.164, loss/val_loss=0.674]

Epoch 107, global step 10800: 'loss/val_loss' was not in top 1


Epoch 108: 100%|██████████| 100/100 [00:06<00:00, 16.27it/s, v_num=0, loss/train_loss=0.257, loss/val_loss=0.605]

Epoch 108, global step 10900: 'loss/val_loss' was not in top 1


Epoch 109: 100%|██████████| 100/100 [00:06<00:00, 16.05it/s, v_num=0, loss/train_loss=0.217, loss/val_loss=0.692]

Epoch 109, global step 11000: 'loss/val_loss' was not in top 1


Epoch 110: 100%|██████████| 100/100 [00:06<00:00, 16.05it/s, v_num=0, loss/train_loss=0.247, loss/val_loss=0.584]

Epoch 110, global step 11100: 'loss/val_loss' was not in top 1


Epoch 111: 100%|██████████| 100/100 [00:06<00:00, 15.94it/s, v_num=0, loss/train_loss=0.291, loss/val_loss=0.647]

Epoch 111, global step 11200: 'loss/val_loss' was not in top 1


Epoch 112: 100%|██████████| 100/100 [00:06<00:00, 16.15it/s, v_num=0, loss/train_loss=0.199, loss/val_loss=0.644]

Epoch 112, global step 11300: 'loss/val_loss' was not in top 1


Epoch 113: 100%|██████████| 100/100 [00:06<00:00, 16.12it/s, v_num=0, loss/train_loss=0.370, loss/val_loss=0.580]

Epoch 113, global step 11400: 'loss/val_loss' was not in top 1


Epoch 114: 100%|██████████| 100/100 [00:06<00:00, 16.18it/s, v_num=0, loss/train_loss=0.310, loss/val_loss=0.589]

Epoch 114, global step 11500: 'loss/val_loss' was not in top 1


Epoch 115: 100%|██████████| 100/100 [00:06<00:00, 16.35it/s, v_num=0, loss/train_loss=0.222, loss/val_loss=0.583]

Epoch 115, global step 11600: 'loss/val_loss' was not in top 1


Epoch 116: 100%|██████████| 100/100 [00:06<00:00, 16.33it/s, v_num=0, loss/train_loss=0.214, loss/val_loss=0.596]

Epoch 116, global step 11700: 'loss/val_loss' was not in top 1


Epoch 117: 100%|██████████| 100/100 [00:06<00:00, 16.01it/s, v_num=0, loss/train_loss=0.219, loss/val_loss=0.537]

Epoch 117, global step 11800: 'loss/val_loss' reached 0.53735 (best 0.53735), saving model to '/disk1/ariane/vscode/CARE/task1_baselines/Pika/test_checkpoint/241023164243_self-pika_esm2_t6_8M_UR50D_gpt2_7_loss/epoch=117-step=11800.ckpt' as top 1


Epoch 118: 100%|██████████| 100/100 [00:06<00:00, 15.86it/s, v_num=0, loss/train_loss=0.223, loss/val_loss=0.655]

Epoch 118, global step 11900: 'loss/val_loss' was not in top 1


Epoch 119: 100%|██████████| 100/100 [00:06<00:00, 16.17it/s, v_num=0, loss/train_loss=0.348, loss/val_loss=0.581]

Epoch 119, global step 12000: 'loss/val_loss' was not in top 1


Epoch 120: 100%|██████████| 100/100 [00:06<00:00, 15.55it/s, v_num=0, loss/train_loss=0.165, loss/val_loss=0.631]

Epoch 120, global step 12100: 'loss/val_loss' was not in top 1


Epoch 121: 100%|██████████| 100/100 [00:06<00:00, 16.12it/s, v_num=0, loss/train_loss=0.217, loss/val_loss=0.695]

Epoch 121, global step 12200: 'loss/val_loss' was not in top 1


Epoch 122: 100%|██████████| 100/100 [00:06<00:00, 16.03it/s, v_num=0, loss/train_loss=0.272, loss/val_loss=0.710]

Epoch 122, global step 12300: 'loss/val_loss' was not in top 1


Epoch 123: 100%|██████████| 100/100 [00:06<00:00, 15.91it/s, v_num=0, loss/train_loss=0.237, loss/val_loss=0.790]

Epoch 123, global step 12400: 'loss/val_loss' was not in top 1


Epoch 124: 100%|██████████| 100/100 [00:06<00:00, 15.97it/s, v_num=0, loss/train_loss=0.327, loss/val_loss=0.639]

Epoch 124, global step 12500: 'loss/val_loss' was not in top 1


Epoch 125: 100%|██████████| 100/100 [00:06<00:00, 16.35it/s, v_num=0, loss/train_loss=0.307, loss/val_loss=0.701]

Epoch 125, global step 12600: 'loss/val_loss' was not in top 1


Epoch 126: 100%|██████████| 100/100 [00:05<00:00, 16.99it/s, v_num=0, loss/train_loss=0.167, loss/val_loss=0.686]

Epoch 126, global step 12700: 'loss/val_loss' was not in top 1


Epoch 127: 100%|██████████| 100/100 [00:06<00:00, 15.91it/s, v_num=0, loss/train_loss=0.197, loss/val_loss=0.681]

Epoch 127, global step 12800: 'loss/val_loss' was not in top 1


Epoch 128: 100%|██████████| 100/100 [00:06<00:00, 15.86it/s, v_num=0, loss/train_loss=0.244, loss/val_loss=0.766]

Epoch 128, global step 12900: 'loss/val_loss' was not in top 1


Epoch 129: 100%|██████████| 100/100 [00:06<00:00, 15.99it/s, v_num=0, loss/train_loss=0.348, loss/val_loss=0.710]

Epoch 129, global step 13000: 'loss/val_loss' was not in top 1


Epoch 130: 100%|██████████| 100/100 [00:06<00:00, 16.54it/s, v_num=0, loss/train_loss=0.213, loss/val_loss=0.744]

Epoch 130, global step 13100: 'loss/val_loss' was not in top 1


Epoch 131: 100%|██████████| 100/100 [00:06<00:00, 15.82it/s, v_num=0, loss/train_loss=0.316, loss/val_loss=0.727]

Epoch 131, global step 13200: 'loss/val_loss' was not in top 1


Epoch 132: 100%|██████████| 100/100 [00:06<00:00, 15.78it/s, v_num=0, loss/train_loss=0.175, loss/val_loss=0.681]

Epoch 132, global step 13300: 'loss/val_loss' was not in top 1


Epoch 133: 100%|██████████| 100/100 [00:06<00:00, 15.84it/s, v_num=0, loss/train_loss=0.233, loss/val_loss=0.730]

Epoch 133, global step 13400: 'loss/val_loss' was not in top 1


Epoch 134: 100%|██████████| 100/100 [00:06<00:00, 15.72it/s, v_num=0, loss/train_loss=0.250, loss/val_loss=0.737]

Epoch 134, global step 13500: 'loss/val_loss' was not in top 1


Epoch 135: 100%|██████████| 100/100 [00:06<00:00, 15.74it/s, v_num=0, loss/train_loss=0.259, loss/val_loss=0.706]

Epoch 135, global step 13600: 'loss/val_loss' was not in top 1


Epoch 136: 100%|██████████| 100/100 [00:06<00:00, 15.41it/s, v_num=0, loss/train_loss=0.257, loss/val_loss=0.766]

Epoch 136, global step 13700: 'loss/val_loss' was not in top 1


Epoch 137: 100%|██████████| 100/100 [00:06<00:00, 15.45it/s, v_num=0, loss/train_loss=0.219, loss/val_loss=0.576]

Epoch 137, global step 13800: 'loss/val_loss' was not in top 1


Epoch 138: 100%|██████████| 100/100 [00:05<00:00, 17.20it/s, v_num=0, loss/train_loss=0.372, loss/val_loss=0.618]

Epoch 138, global step 13900: 'loss/val_loss' was not in top 1


Epoch 139: 100%|██████████| 100/100 [00:05<00:00, 17.35it/s, v_num=0, loss/train_loss=0.241, loss/val_loss=0.640]

Epoch 139, global step 14000: 'loss/val_loss' was not in top 1


Epoch 140: 100%|██████████| 100/100 [00:05<00:00, 17.99it/s, v_num=0, loss/train_loss=0.235, loss/val_loss=0.554]

Epoch 140, global step 14100: 'loss/val_loss' was not in top 1


Epoch 141: 100%|██████████| 100/100 [00:05<00:00, 17.87it/s, v_num=0, loss/train_loss=0.263, loss/val_loss=0.644]

Epoch 141, global step 14200: 'loss/val_loss' was not in top 1


Epoch 142: 100%|██████████| 100/100 [00:05<00:00, 17.24it/s, v_num=0, loss/train_loss=0.212, loss/val_loss=0.598]

Epoch 142, global step 14300: 'loss/val_loss' was not in top 1


Epoch 143: 100%|██████████| 100/100 [00:06<00:00, 16.35it/s, v_num=0, loss/train_loss=0.273, loss/val_loss=0.618]

Epoch 143, global step 14400: 'loss/val_loss' was not in top 1


Epoch 144: 100%|██████████| 100/100 [00:06<00:00, 15.79it/s, v_num=0, loss/train_loss=0.149, loss/val_loss=0.623]

Epoch 144, global step 14500: 'loss/val_loss' was not in top 1


Epoch 145: 100%|██████████| 100/100 [00:06<00:00, 15.71it/s, v_num=0, loss/train_loss=0.129, loss/val_loss=0.654]

Epoch 145, global step 14600: 'loss/val_loss' was not in top 1


Epoch 146: 100%|██████████| 100/100 [00:06<00:00, 15.60it/s, v_num=0, loss/train_loss=0.252, loss/val_loss=0.546]

Epoch 146, global step 14700: 'loss/val_loss' was not in top 1


Epoch 147: 100%|██████████| 100/100 [00:06<00:00, 15.25it/s, v_num=0, loss/train_loss=0.270, loss/val_loss=0.503]

Epoch 147, global step 14800: 'loss/val_loss' reached 0.50275 (best 0.50275), saving model to '/disk1/ariane/vscode/CARE/task1_baselines/Pika/test_checkpoint/241023164243_self-pika_esm2_t6_8M_UR50D_gpt2_7_loss/epoch=147-step=14800.ckpt' as top 1


Epoch 148: 100%|██████████| 100/100 [00:06<00:00, 15.37it/s, v_num=0, loss/train_loss=0.323, loss/val_loss=0.457]

Epoch 148, global step 14900: 'loss/val_loss' reached 0.45687 (best 0.45687), saving model to '/disk1/ariane/vscode/CARE/task1_baselines/Pika/test_checkpoint/241023164243_self-pika_esm2_t6_8M_UR50D_gpt2_7_loss/epoch=148-step=14900.ckpt' as top 1


Epoch 149: 100%|██████████| 100/100 [00:06<00:00, 15.89it/s, v_num=0, loss/train_loss=0.160, loss/val_loss=0.562]

Epoch 149, global step 15000: 'loss/val_loss' was not in top 1


Epoch 150: 100%|██████████| 100/100 [00:06<00:00, 15.34it/s, v_num=0, loss/train_loss=0.204, loss/val_loss=0.578]

Epoch 150, global step 15100: 'loss/val_loss' was not in top 1


Epoch 151: 100%|██████████| 100/100 [00:06<00:00, 15.79it/s, v_num=0, loss/train_loss=0.242, loss/val_loss=0.588]

Epoch 151, global step 15200: 'loss/val_loss' was not in top 1


Epoch 152: 100%|██████████| 100/100 [00:06<00:00, 16.09it/s, v_num=0, loss/train_loss=0.365, loss/val_loss=0.598]

Epoch 152, global step 15300: 'loss/val_loss' was not in top 1


Epoch 153: 100%|██████████| 100/100 [00:06<00:00, 15.55it/s, v_num=0, loss/train_loss=0.259, loss/val_loss=0.590]

Epoch 153, global step 15400: 'loss/val_loss' was not in top 1


Epoch 154: 100%|██████████| 100/100 [00:06<00:00, 15.68it/s, v_num=0, loss/train_loss=0.292, loss/val_loss=0.551]

Epoch 154, global step 15500: 'loss/val_loss' was not in top 1


Epoch 155: 100%|██████████| 100/100 [00:06<00:00, 15.03it/s, v_num=0, loss/train_loss=0.235, loss/val_loss=0.556]

Epoch 155, global step 15600: 'loss/val_loss' was not in top 1


Epoch 156: 100%|██████████| 100/100 [00:06<00:00, 15.84it/s, v_num=0, loss/train_loss=0.228, loss/val_loss=0.545]

Epoch 156, global step 15700: 'loss/val_loss' was not in top 1


Epoch 157: 100%|██████████| 100/100 [00:06<00:00, 15.29it/s, v_num=0, loss/train_loss=0.301, loss/val_loss=0.533]

Epoch 157, global step 15800: 'loss/val_loss' was not in top 1


Epoch 158: 100%|██████████| 100/100 [00:06<00:00, 15.29it/s, v_num=0, loss/train_loss=0.261, loss/val_loss=0.539]

Epoch 158, global step 15900: 'loss/val_loss' was not in top 1


Epoch 159: 100%|██████████| 100/100 [00:06<00:00, 15.74it/s, v_num=0, loss/train_loss=0.253, loss/val_loss=0.593]

Epoch 159, global step 16000: 'loss/val_loss' was not in top 1


Epoch 160: 100%|██████████| 100/100 [00:06<00:00, 15.86it/s, v_num=0, loss/train_loss=0.238, loss/val_loss=0.569]

Epoch 160, global step 16100: 'loss/val_loss' was not in top 1


Epoch 161: 100%|██████████| 100/100 [00:06<00:00, 15.72it/s, v_num=0, loss/train_loss=0.324, loss/val_loss=0.559]

Epoch 161, global step 16200: 'loss/val_loss' was not in top 1


Epoch 162: 100%|██████████| 100/100 [00:06<00:00, 16.21it/s, v_num=0, loss/train_loss=0.275, loss/val_loss=0.525]

Epoch 162, global step 16300: 'loss/val_loss' was not in top 1


Epoch 163: 100%|██████████| 100/100 [00:06<00:00, 15.78it/s, v_num=0, loss/train_loss=0.236, loss/val_loss=0.494]

Epoch 163, global step 16400: 'loss/val_loss' was not in top 1


Epoch 164: 100%|██████████| 100/100 [00:06<00:00, 15.94it/s, v_num=0, loss/train_loss=0.288, loss/val_loss=0.619]

Epoch 164, global step 16500: 'loss/val_loss' was not in top 1


Epoch 165: 100%|██████████| 100/100 [00:06<00:00, 15.17it/s, v_num=0, loss/train_loss=0.213, loss/val_loss=0.529]

Epoch 165, global step 16600: 'loss/val_loss' was not in top 1


Epoch 166: 100%|██████████| 100/100 [00:06<00:00, 15.36it/s, v_num=0, loss/train_loss=0.185, loss/val_loss=0.483]

Epoch 166, global step 16700: 'loss/val_loss' was not in top 1


Epoch 167: 100%|██████████| 100/100 [00:06<00:00, 15.47it/s, v_num=0, loss/train_loss=0.136, loss/val_loss=0.446]

Epoch 167, global step 16800: 'loss/val_loss' reached 0.44636 (best 0.44636), saving model to '/disk1/ariane/vscode/CARE/task1_baselines/Pika/test_checkpoint/241023164243_self-pika_esm2_t6_8M_UR50D_gpt2_7_loss/epoch=167-step=16800.ckpt' as top 1


Epoch 168: 100%|██████████| 100/100 [00:06<00:00, 15.73it/s, v_num=0, loss/train_loss=0.161, loss/val_loss=0.582]

Epoch 168, global step 16900: 'loss/val_loss' was not in top 1


Epoch 169: 100%|██████████| 100/100 [00:06<00:00, 15.69it/s, v_num=0, loss/train_loss=0.181, loss/val_loss=0.612]

Epoch 169, global step 17000: 'loss/val_loss' was not in top 1


Epoch 170: 100%|██████████| 100/100 [00:06<00:00, 15.23it/s, v_num=0, loss/train_loss=0.210, loss/val_loss=0.541]

Epoch 170, global step 17100: 'loss/val_loss' was not in top 1


Epoch 171: 100%|██████████| 100/100 [00:06<00:00, 15.54it/s, v_num=0, loss/train_loss=0.252, loss/val_loss=0.549]

Epoch 171, global step 17200: 'loss/val_loss' was not in top 1


Epoch 172: 100%|██████████| 100/100 [00:06<00:00, 15.93it/s, v_num=0, loss/train_loss=0.352, loss/val_loss=0.614]

Epoch 172, global step 17300: 'loss/val_loss' was not in top 1


Epoch 173: 100%|██████████| 100/100 [00:06<00:00, 15.76it/s, v_num=0, loss/train_loss=0.366, loss/val_loss=0.602]

Epoch 173, global step 17400: 'loss/val_loss' was not in top 1


Epoch 174: 100%|██████████| 100/100 [00:06<00:00, 15.62it/s, v_num=0, loss/train_loss=0.168, loss/val_loss=0.543]

Epoch 174, global step 17500: 'loss/val_loss' was not in top 1


Epoch 175: 100%|██████████| 100/100 [00:06<00:00, 15.35it/s, v_num=0, loss/train_loss=0.383, loss/val_loss=0.516]

Epoch 175, global step 17600: 'loss/val_loss' was not in top 1


Epoch 176: 100%|██████████| 100/100 [00:06<00:00, 15.92it/s, v_num=0, loss/train_loss=0.253, loss/val_loss=0.572]

Epoch 176, global step 17700: 'loss/val_loss' was not in top 1


Epoch 177: 100%|██████████| 100/100 [00:06<00:00, 15.58it/s, v_num=0, loss/train_loss=0.248, loss/val_loss=0.536]

Epoch 177, global step 17800: 'loss/val_loss' was not in top 1


Epoch 178: 100%|██████████| 100/100 [00:06<00:00, 15.58it/s, v_num=0, loss/train_loss=0.300, loss/val_loss=0.582]

Epoch 178, global step 17900: 'loss/val_loss' was not in top 1


Epoch 179: 100%|██████████| 100/100 [00:06<00:00, 15.41it/s, v_num=0, loss/train_loss=0.207, loss/val_loss=0.532]

Epoch 179, global step 18000: 'loss/val_loss' was not in top 1


Epoch 180: 100%|██████████| 100/100 [00:06<00:00, 15.61it/s, v_num=0, loss/train_loss=0.255, loss/val_loss=0.562]

Epoch 180, global step 18100: 'loss/val_loss' was not in top 1


Epoch 181: 100%|██████████| 100/100 [00:06<00:00, 15.20it/s, v_num=0, loss/train_loss=0.219, loss/val_loss=0.520]

Epoch 181, global step 18200: 'loss/val_loss' was not in top 1


Epoch 182: 100%|██████████| 100/100 [00:06<00:00, 15.38it/s, v_num=0, loss/train_loss=0.208, loss/val_loss=0.428]

Epoch 182, global step 18300: 'loss/val_loss' reached 0.42838 (best 0.42838), saving model to '/disk1/ariane/vscode/CARE/task1_baselines/Pika/test_checkpoint/241023164243_self-pika_esm2_t6_8M_UR50D_gpt2_7_loss/epoch=182-step=18300.ckpt' as top 1


Epoch 183: 100%|██████████| 100/100 [00:06<00:00, 16.26it/s, v_num=0, loss/train_loss=0.237, loss/val_loss=0.472]

Epoch 183, global step 18400: 'loss/val_loss' was not in top 1


Epoch 184: 100%|██████████| 100/100 [00:06<00:00, 16.04it/s, v_num=0, loss/train_loss=0.197, loss/val_loss=0.505]

Epoch 184, global step 18500: 'loss/val_loss' was not in top 1


Epoch 185: 100%|██████████| 100/100 [00:06<00:00, 16.26it/s, v_num=0, loss/train_loss=0.145, loss/val_loss=0.507]

Epoch 185, global step 18600: 'loss/val_loss' was not in top 1


Epoch 186: 100%|██████████| 100/100 [00:06<00:00, 15.96it/s, v_num=0, loss/train_loss=0.179, loss/val_loss=0.497]

Epoch 186, global step 18700: 'loss/val_loss' was not in top 1


Epoch 187: 100%|██████████| 100/100 [00:06<00:00, 15.98it/s, v_num=0, loss/train_loss=0.199, loss/val_loss=0.473]

Epoch 187, global step 18800: 'loss/val_loss' was not in top 1


Epoch 188: 100%|██████████| 100/100 [00:06<00:00, 16.18it/s, v_num=0, loss/train_loss=0.245, loss/val_loss=0.503]

Epoch 188, global step 18900: 'loss/val_loss' was not in top 1


Epoch 189: 100%|██████████| 100/100 [00:06<00:00, 16.22it/s, v_num=0, loss/train_loss=0.218, loss/val_loss=0.488]

Epoch 189, global step 19000: 'loss/val_loss' was not in top 1


Epoch 190: 100%|██████████| 100/100 [00:06<00:00, 16.41it/s, v_num=0, loss/train_loss=0.275, loss/val_loss=0.389]

Epoch 190, global step 19100: 'loss/val_loss' reached 0.38900 (best 0.38900), saving model to '/disk1/ariane/vscode/CARE/task1_baselines/Pika/test_checkpoint/241023164243_self-pika_esm2_t6_8M_UR50D_gpt2_7_loss/epoch=190-step=19100.ckpt' as top 1


Epoch 191: 100%|██████████| 100/100 [00:06<00:00, 16.19it/s, v_num=0, loss/train_loss=0.234, loss/val_loss=0.438]

Epoch 191, global step 19200: 'loss/val_loss' was not in top 1


Epoch 192: 100%|██████████| 100/100 [00:06<00:00, 16.34it/s, v_num=0, loss/train_loss=0.313, loss/val_loss=0.403]

Epoch 192, global step 19300: 'loss/val_loss' was not in top 1


Epoch 193: 100%|██████████| 100/100 [00:06<00:00, 16.45it/s, v_num=0, loss/train_loss=0.130, loss/val_loss=0.496]

Epoch 193, global step 19400: 'loss/val_loss' was not in top 1


Epoch 194: 100%|██████████| 100/100 [00:06<00:00, 15.72it/s, v_num=0, loss/train_loss=0.177, loss/val_loss=0.549]

Epoch 194, global step 19500: 'loss/val_loss' was not in top 1


Epoch 195: 100%|██████████| 100/100 [00:06<00:00, 16.47it/s, v_num=0, loss/train_loss=0.194, loss/val_loss=0.506]

Epoch 195, global step 19600: 'loss/val_loss' was not in top 1


Epoch 196: 100%|██████████| 100/100 [00:05<00:00, 18.42it/s, v_num=0, loss/train_loss=0.201, loss/val_loss=0.469]

Epoch 196, global step 19700: 'loss/val_loss' was not in top 1


Epoch 197: 100%|██████████| 100/100 [00:05<00:00, 18.00it/s, v_num=0, loss/train_loss=0.196, loss/val_loss=0.442]

Epoch 197, global step 19800: 'loss/val_loss' was not in top 1


Epoch 198: 100%|██████████| 100/100 [00:05<00:00, 17.74it/s, v_num=0, loss/train_loss=0.214, loss/val_loss=0.421]

Epoch 198, global step 19900: 'loss/val_loss' was not in top 1


Epoch 199: 100%|██████████| 100/100 [00:05<00:00, 18.61it/s, v_num=0, loss/train_loss=0.206, loss/val_loss=0.428]

Epoch 199, global step 20000: 'loss/val_loss' was not in top 1


Epoch 200: 100%|██████████| 100/100 [00:05<00:00, 18.36it/s, v_num=0, loss/train_loss=0.163, loss/val_loss=0.470]

Epoch 200, global step 20100: 'loss/val_loss' was not in top 1


Epoch 201: 100%|██████████| 100/100 [00:05<00:00, 18.77it/s, v_num=0, loss/train_loss=0.256, loss/val_loss=0.506]

Epoch 201, global step 20200: 'loss/val_loss' was not in top 1


Epoch 202: 100%|██████████| 100/100 [00:05<00:00, 18.38it/s, v_num=0, loss/train_loss=0.194, loss/val_loss=0.632]

Epoch 202, global step 20300: 'loss/val_loss' was not in top 1


Epoch 203: 100%|██████████| 100/100 [00:05<00:00, 19.07it/s, v_num=0, loss/train_loss=0.241, loss/val_loss=0.522]

Epoch 203, global step 20400: 'loss/val_loss' was not in top 1


Epoch 204: 100%|██████████| 100/100 [00:05<00:00, 18.90it/s, v_num=0, loss/train_loss=0.180, loss/val_loss=0.520]

Epoch 204, global step 20500: 'loss/val_loss' was not in top 1


Epoch 205: 100%|██████████| 100/100 [00:05<00:00, 18.07it/s, v_num=0, loss/train_loss=0.189, loss/val_loss=0.436]

Epoch 205, global step 20600: 'loss/val_loss' was not in top 1


Epoch 206: 100%|██████████| 100/100 [00:05<00:00, 18.51it/s, v_num=0, loss/train_loss=0.185, loss/val_loss=0.431]

Epoch 206, global step 20700: 'loss/val_loss' was not in top 1


Epoch 207: 100%|██████████| 100/100 [00:05<00:00, 18.71it/s, v_num=0, loss/train_loss=0.280, loss/val_loss=0.535]

Epoch 207, global step 20800: 'loss/val_loss' was not in top 1


Epoch 208: 100%|██████████| 100/100 [00:05<00:00, 17.94it/s, v_num=0, loss/train_loss=0.353, loss/val_loss=0.446]

Epoch 208, global step 20900: 'loss/val_loss' was not in top 1


Epoch 209: 100%|██████████| 100/100 [00:06<00:00, 15.79it/s, v_num=0, loss/train_loss=0.333, loss/val_loss=0.390]

Epoch 209, global step 21000: 'loss/val_loss' was not in top 1


Epoch 210: 100%|██████████| 100/100 [00:06<00:00, 15.89it/s, v_num=0, loss/train_loss=0.221, loss/val_loss=0.438]

Epoch 210, global step 21100: 'loss/val_loss' was not in top 1


Epoch 211: 100%|██████████| 100/100 [00:06<00:00, 15.70it/s, v_num=0, loss/train_loss=0.304, loss/val_loss=0.476]

Epoch 211, global step 21200: 'loss/val_loss' was not in top 1


Epoch 212: 100%|██████████| 100/100 [00:06<00:00, 15.72it/s, v_num=0, loss/train_loss=0.174, loss/val_loss=0.475]

Epoch 212, global step 21300: 'loss/val_loss' was not in top 1


Epoch 213: 100%|██████████| 100/100 [00:06<00:00, 15.37it/s, v_num=0, loss/train_loss=0.265, loss/val_loss=0.492]

Epoch 213, global step 21400: 'loss/val_loss' was not in top 1


Epoch 214: 100%|██████████| 100/100 [00:06<00:00, 15.44it/s, v_num=0, loss/train_loss=0.192, loss/val_loss=0.496]

Epoch 214, global step 21500: 'loss/val_loss' was not in top 1


Epoch 215: 100%|██████████| 100/100 [00:06<00:00, 15.90it/s, v_num=0, loss/train_loss=0.148, loss/val_loss=0.500]

Epoch 215, global step 21600: 'loss/val_loss' was not in top 1


Epoch 216: 100%|██████████| 100/100 [00:06<00:00, 15.40it/s, v_num=0, loss/train_loss=0.178, loss/val_loss=0.498]

Epoch 216, global step 21700: 'loss/val_loss' was not in top 1


Epoch 217: 100%|██████████| 100/100 [00:06<00:00, 15.87it/s, v_num=0, loss/train_loss=0.263, loss/val_loss=0.563]

Epoch 217, global step 21800: 'loss/val_loss' was not in top 1


Epoch 218: 100%|██████████| 100/100 [00:06<00:00, 16.12it/s, v_num=0, loss/train_loss=0.234, loss/val_loss=0.518]

Epoch 218, global step 21900: 'loss/val_loss' was not in top 1


Epoch 219: 100%|██████████| 100/100 [00:06<00:00, 15.34it/s, v_num=0, loss/train_loss=0.211, loss/val_loss=0.409]

Epoch 219, global step 22000: 'loss/val_loss' was not in top 1


Epoch 220: 100%|██████████| 100/100 [00:06<00:00, 15.53it/s, v_num=0, loss/train_loss=0.204, loss/val_loss=0.494]

Epoch 220, global step 22100: 'loss/val_loss' was not in top 1


Epoch 221: 100%|██████████| 100/100 [00:06<00:00, 15.18it/s, v_num=0, loss/train_loss=0.262, loss/val_loss=0.359]

Epoch 221, global step 22200: 'loss/val_loss' reached 0.35932 (best 0.35932), saving model to '/disk1/ariane/vscode/CARE/task1_baselines/Pika/test_checkpoint/241023164243_self-pika_esm2_t6_8M_UR50D_gpt2_7_loss/epoch=221-step=22200.ckpt' as top 1


Epoch 222: 100%|██████████| 100/100 [00:06<00:00, 15.35it/s, v_num=0, loss/train_loss=0.296, loss/val_loss=0.411]

Epoch 222, global step 22300: 'loss/val_loss' was not in top 1


Epoch 223: 100%|██████████| 100/100 [00:06<00:00, 15.57it/s, v_num=0, loss/train_loss=0.270, loss/val_loss=0.378]

Epoch 223, global step 22400: 'loss/val_loss' was not in top 1


Epoch 224: 100%|██████████| 100/100 [00:06<00:00, 15.52it/s, v_num=0, loss/train_loss=0.130, loss/val_loss=0.458]

Epoch 224, global step 22500: 'loss/val_loss' was not in top 1


Epoch 225: 100%|██████████| 100/100 [00:06<00:00, 15.93it/s, v_num=0, loss/train_loss=0.247, loss/val_loss=0.364]

Epoch 225, global step 22600: 'loss/val_loss' was not in top 1


Epoch 226: 100%|██████████| 100/100 [00:06<00:00, 15.23it/s, v_num=0, loss/train_loss=0.127, loss/val_loss=0.428]

Epoch 226, global step 22700: 'loss/val_loss' was not in top 1


Epoch 227: 100%|██████████| 100/100 [00:06<00:00, 15.46it/s, v_num=0, loss/train_loss=0.294, loss/val_loss=0.444]

Epoch 227, global step 22800: 'loss/val_loss' was not in top 1


Epoch 228: 100%|██████████| 100/100 [00:06<00:00, 15.45it/s, v_num=0, loss/train_loss=0.139, loss/val_loss=0.410]

Epoch 228, global step 22900: 'loss/val_loss' was not in top 1


Epoch 229: 100%|██████████| 100/100 [00:06<00:00, 15.37it/s, v_num=0, loss/train_loss=0.171, loss/val_loss=0.418]

Epoch 229, global step 23000: 'loss/val_loss' was not in top 1


Epoch 230: 100%|██████████| 100/100 [00:06<00:00, 15.45it/s, v_num=0, loss/train_loss=0.205, loss/val_loss=0.474]

Epoch 230, global step 23100: 'loss/val_loss' was not in top 1


Epoch 231: 100%|██████████| 100/100 [00:06<00:00, 15.73it/s, v_num=0, loss/train_loss=0.215, loss/val_loss=0.403]

Epoch 231, global step 23200: 'loss/val_loss' was not in top 1


Epoch 232: 100%|██████████| 100/100 [00:06<00:00, 15.64it/s, v_num=0, loss/train_loss=0.299, loss/val_loss=0.413]

Epoch 232, global step 23300: 'loss/val_loss' was not in top 1


Epoch 233: 100%|██████████| 100/100 [00:06<00:00, 15.31it/s, v_num=0, loss/train_loss=0.217, loss/val_loss=0.412]

Epoch 233, global step 23400: 'loss/val_loss' was not in top 1


Epoch 234: 100%|██████████| 100/100 [00:06<00:00, 15.59it/s, v_num=0, loss/train_loss=0.153, loss/val_loss=0.366]

Epoch 234, global step 23500: 'loss/val_loss' was not in top 1


Epoch 235: 100%|██████████| 100/100 [00:06<00:00, 15.74it/s, v_num=0, loss/train_loss=0.204, loss/val_loss=0.408]

Epoch 235, global step 23600: 'loss/val_loss' was not in top 1


Epoch 236: 100%|██████████| 100/100 [00:06<00:00, 15.40it/s, v_num=0, loss/train_loss=0.211, loss/val_loss=0.422]

Epoch 236, global step 23700: 'loss/val_loss' was not in top 1


Epoch 237: 100%|██████████| 100/100 [00:06<00:00, 15.42it/s, v_num=0, loss/train_loss=0.231, loss/val_loss=0.440]

Epoch 237, global step 23800: 'loss/val_loss' was not in top 1


Epoch 238: 100%|██████████| 100/100 [00:06<00:00, 15.24it/s, v_num=0, loss/train_loss=0.241, loss/val_loss=0.411]

Epoch 238, global step 23900: 'loss/val_loss' was not in top 1


Epoch 239: 100%|██████████| 100/100 [00:06<00:00, 15.72it/s, v_num=0, loss/train_loss=0.109, loss/val_loss=0.437]

Epoch 239, global step 24000: 'loss/val_loss' was not in top 1


Epoch 240: 100%|██████████| 100/100 [00:06<00:00, 15.33it/s, v_num=0, loss/train_loss=0.365, loss/val_loss=0.405]

Epoch 240, global step 24100: 'loss/val_loss' was not in top 1


Epoch 241: 100%|██████████| 100/100 [00:06<00:00, 15.63it/s, v_num=0, loss/train_loss=0.195, loss/val_loss=0.419]

Epoch 241, global step 24200: 'loss/val_loss' was not in top 1


Epoch 242: 100%|██████████| 100/100 [00:06<00:00, 15.52it/s, v_num=0, loss/train_loss=0.309, loss/val_loss=0.432]

Epoch 242, global step 24300: 'loss/val_loss' was not in top 1


Epoch 243: 100%|██████████| 100/100 [00:06<00:00, 15.19it/s, v_num=0, loss/train_loss=0.226, loss/val_loss=0.418]

Epoch 243, global step 24400: 'loss/val_loss' was not in top 1


Epoch 244: 100%|██████████| 100/100 [00:06<00:00, 15.49it/s, v_num=0, loss/train_loss=0.289, loss/val_loss=0.437]

Epoch 244, global step 24500: 'loss/val_loss' was not in top 1


Epoch 245: 100%|██████████| 100/100 [00:06<00:00, 15.57it/s, v_num=0, loss/train_loss=0.172, loss/val_loss=0.421]

Epoch 245, global step 24600: 'loss/val_loss' was not in top 1


Epoch 246: 100%|██████████| 100/100 [00:06<00:00, 15.77it/s, v_num=0, loss/train_loss=0.203, loss/val_loss=0.491]

Epoch 246, global step 24700: 'loss/val_loss' was not in top 1


Epoch 247: 100%|██████████| 100/100 [00:06<00:00, 15.57it/s, v_num=0, loss/train_loss=0.188, loss/val_loss=0.431]

Epoch 247, global step 24800: 'loss/val_loss' was not in top 1


Epoch 248: 100%|██████████| 100/100 [00:06<00:00, 15.37it/s, v_num=0, loss/train_loss=0.255, loss/val_loss=0.500]

Epoch 248, global step 24900: 'loss/val_loss' was not in top 1


Epoch 249: 100%|██████████| 100/100 [00:06<00:00, 15.68it/s, v_num=0, loss/train_loss=0.174, loss/val_loss=0.442]

Epoch 249, global step 25000: 'loss/val_loss' was not in top 1


Epoch 250: 100%|██████████| 100/100 [00:06<00:00, 15.68it/s, v_num=0, loss/train_loss=0.172, loss/val_loss=0.399]

Epoch 250, global step 25100: 'loss/val_loss' was not in top 1


Epoch 251: 100%|██████████| 100/100 [00:06<00:00, 15.58it/s, v_num=0, loss/train_loss=0.217, loss/val_loss=0.430]

Epoch 251, global step 25200: 'loss/val_loss' was not in top 1


Epoch 252: 100%|██████████| 100/100 [00:06<00:00, 15.37it/s, v_num=0, loss/train_loss=0.124, loss/val_loss=0.472]

Epoch 252, global step 25300: 'loss/val_loss' was not in top 1


Epoch 253: 100%|██████████| 100/100 [00:06<00:00, 15.28it/s, v_num=0, loss/train_loss=0.196, loss/val_loss=0.439]

Epoch 253, global step 25400: 'loss/val_loss' was not in top 1


Epoch 254: 100%|██████████| 100/100 [00:06<00:00, 15.83it/s, v_num=0, loss/train_loss=0.143, loss/val_loss=0.445]

Epoch 254, global step 25500: 'loss/val_loss' was not in top 1


Epoch 255: 100%|██████████| 100/100 [00:06<00:00, 15.47it/s, v_num=0, loss/train_loss=0.246, loss/val_loss=0.437]

Epoch 255, global step 25600: 'loss/val_loss' was not in top 1


Epoch 256: 100%|██████████| 100/100 [00:06<00:00, 15.43it/s, v_num=0, loss/train_loss=0.117, loss/val_loss=0.472]

Epoch 256, global step 25700: 'loss/val_loss' was not in top 1


Epoch 257: 100%|██████████| 100/100 [00:06<00:00, 15.80it/s, v_num=0, loss/train_loss=0.251, loss/val_loss=0.462]

Epoch 257, global step 25800: 'loss/val_loss' was not in top 1


Epoch 258: 100%|██████████| 100/100 [00:06<00:00, 15.39it/s, v_num=0, loss/train_loss=0.224, loss/val_loss=0.479]

Epoch 258, global step 25900: 'loss/val_loss' was not in top 1


Epoch 259: 100%|██████████| 100/100 [00:06<00:00, 15.62it/s, v_num=0, loss/train_loss=0.164, loss/val_loss=0.424]

Epoch 259, global step 26000: 'loss/val_loss' was not in top 1


Epoch 260: 100%|██████████| 100/100 [00:06<00:00, 15.46it/s, v_num=0, loss/train_loss=0.327, loss/val_loss=0.399]

Epoch 260, global step 26100: 'loss/val_loss' was not in top 1


Epoch 261: 100%|██████████| 100/100 [00:06<00:00, 15.13it/s, v_num=0, loss/train_loss=0.154, loss/val_loss=0.427]

Epoch 261, global step 26200: 'loss/val_loss' was not in top 1


Epoch 262: 100%|██████████| 100/100 [00:06<00:00, 15.78it/s, v_num=0, loss/train_loss=0.184, loss/val_loss=0.453]

Epoch 262, global step 26300: 'loss/val_loss' was not in top 1


Epoch 263: 100%|██████████| 100/100 [00:06<00:00, 15.25it/s, v_num=0, loss/train_loss=0.255, loss/val_loss=0.424]

Epoch 263, global step 26400: 'loss/val_loss' was not in top 1


Epoch 264: 100%|██████████| 100/100 [00:06<00:00, 15.77it/s, v_num=0, loss/train_loss=0.132, loss/val_loss=0.371]

Epoch 264, global step 26500: 'loss/val_loss' was not in top 1


Epoch 265: 100%|██████████| 100/100 [00:06<00:00, 15.46it/s, v_num=0, loss/train_loss=0.180, loss/val_loss=0.413]

Epoch 265, global step 26600: 'loss/val_loss' was not in top 1


Epoch 266: 100%|██████████| 100/100 [00:06<00:00, 15.32it/s, v_num=0, loss/train_loss=0.230, loss/val_loss=0.355]

Epoch 266, global step 26700: 'loss/val_loss' reached 0.35500 (best 0.35500), saving model to '/disk1/ariane/vscode/CARE/task1_baselines/Pika/test_checkpoint/241023164243_self-pika_esm2_t6_8M_UR50D_gpt2_7_loss/epoch=266-step=26700.ckpt' as top 1


Epoch 267: 100%|██████████| 100/100 [00:06<00:00, 15.22it/s, v_num=0, loss/train_loss=0.237, loss/val_loss=0.346]

Epoch 267, global step 26800: 'loss/val_loss' reached 0.34633 (best 0.34633), saving model to '/disk1/ariane/vscode/CARE/task1_baselines/Pika/test_checkpoint/241023164243_self-pika_esm2_t6_8M_UR50D_gpt2_7_loss/epoch=267-step=26800.ckpt' as top 1


Epoch 268: 100%|██████████| 100/100 [00:06<00:00, 15.41it/s, v_num=0, loss/train_loss=0.243, loss/val_loss=0.374]

Epoch 268, global step 26900: 'loss/val_loss' was not in top 1


Epoch 269: 100%|██████████| 100/100 [00:06<00:00, 15.44it/s, v_num=0, loss/train_loss=0.193, loss/val_loss=0.418]

Epoch 269, global step 27000: 'loss/val_loss' was not in top 1


Epoch 270: 100%|██████████| 100/100 [00:06<00:00, 15.32it/s, v_num=0, loss/train_loss=0.247, loss/val_loss=0.395]

Epoch 270, global step 27100: 'loss/val_loss' was not in top 1


Epoch 271: 100%|██████████| 100/100 [00:06<00:00, 15.52it/s, v_num=0, loss/train_loss=0.230, loss/val_loss=0.372]

Epoch 271, global step 27200: 'loss/val_loss' was not in top 1


Epoch 272: 100%|██████████| 100/100 [00:06<00:00, 15.85it/s, v_num=0, loss/train_loss=0.133, loss/val_loss=0.348]

Epoch 272, global step 27300: 'loss/val_loss' was not in top 1


Epoch 273: 100%|██████████| 100/100 [00:06<00:00, 15.69it/s, v_num=0, loss/train_loss=0.187, loss/val_loss=0.417]

Epoch 273, global step 27400: 'loss/val_loss' was not in top 1


Epoch 274: 100%|██████████| 100/100 [00:06<00:00, 15.31it/s, v_num=0, loss/train_loss=0.182, loss/val_loss=0.529]

Epoch 274, global step 27500: 'loss/val_loss' was not in top 1


Epoch 275: 100%|██████████| 100/100 [00:06<00:00, 15.65it/s, v_num=0, loss/train_loss=0.235, loss/val_loss=0.453]

Epoch 275, global step 27600: 'loss/val_loss' was not in top 1


Epoch 276: 100%|██████████| 100/100 [00:06<00:00, 15.49it/s, v_num=0, loss/train_loss=0.0811, loss/val_loss=0.457]

Epoch 276, global step 27700: 'loss/val_loss' was not in top 1


Epoch 277: 100%|██████████| 100/100 [00:06<00:00, 15.73it/s, v_num=0, loss/train_loss=0.252, loss/val_loss=0.435] 

Epoch 277, global step 27800: 'loss/val_loss' was not in top 1


Epoch 278: 100%|██████████| 100/100 [00:06<00:00, 15.79it/s, v_num=0, loss/train_loss=0.208, loss/val_loss=0.354]

Epoch 278, global step 27900: 'loss/val_loss' was not in top 1


Epoch 279: 100%|██████████| 100/100 [00:06<00:00, 15.51it/s, v_num=0, loss/train_loss=0.133, loss/val_loss=0.413]

Epoch 279, global step 28000: 'loss/val_loss' was not in top 1


Epoch 280: 100%|██████████| 100/100 [00:06<00:00, 15.18it/s, v_num=0, loss/train_loss=0.198, loss/val_loss=0.384]

Epoch 280, global step 28100: 'loss/val_loss' was not in top 1


Epoch 281: 100%|██████████| 100/100 [00:06<00:00, 15.52it/s, v_num=0, loss/train_loss=0.192, loss/val_loss=0.431]

Epoch 281, global step 28200: 'loss/val_loss' was not in top 1


Epoch 282: 100%|██████████| 100/100 [00:06<00:00, 15.76it/s, v_num=0, loss/train_loss=0.191, loss/val_loss=0.429]

Epoch 282, global step 28300: 'loss/val_loss' was not in top 1


Epoch 283: 100%|██████████| 100/100 [00:06<00:00, 15.40it/s, v_num=0, loss/train_loss=0.177, loss/val_loss=0.469]

Epoch 283, global step 28400: 'loss/val_loss' was not in top 1


Epoch 284: 100%|██████████| 100/100 [00:06<00:00, 15.10it/s, v_num=0, loss/train_loss=0.234, loss/val_loss=0.424]

Epoch 284, global step 28500: 'loss/val_loss' was not in top 1


Epoch 285: 100%|██████████| 100/100 [00:06<00:00, 15.28it/s, v_num=0, loss/train_loss=0.190, loss/val_loss=0.530]

Epoch 285, global step 28600: 'loss/val_loss' was not in top 1


Epoch 286: 100%|██████████| 100/100 [00:06<00:00, 15.76it/s, v_num=0, loss/train_loss=0.106, loss/val_loss=0.372]

Epoch 286, global step 28700: 'loss/val_loss' was not in top 1


Epoch 287: 100%|██████████| 100/100 [00:06<00:00, 15.48it/s, v_num=0, loss/train_loss=0.171, loss/val_loss=0.477]

Epoch 287, global step 28800: 'loss/val_loss' was not in top 1


Epoch 288: 100%|██████████| 100/100 [00:06<00:00, 15.55it/s, v_num=0, loss/train_loss=0.109, loss/val_loss=0.351]

Epoch 288, global step 28900: 'loss/val_loss' was not in top 1


Epoch 289: 100%|██████████| 100/100 [00:06<00:00, 15.25it/s, v_num=0, loss/train_loss=0.127, loss/val_loss=0.396]

Epoch 289, global step 29000: 'loss/val_loss' was not in top 1


Epoch 290: 100%|██████████| 100/100 [00:06<00:00, 16.06it/s, v_num=0, loss/train_loss=0.270, loss/val_loss=0.375]

Epoch 290, global step 29100: 'loss/val_loss' was not in top 1


Epoch 291: 100%|██████████| 100/100 [00:06<00:00, 15.50it/s, v_num=0, loss/train_loss=0.0945, loss/val_loss=0.371]

Epoch 291, global step 29200: 'loss/val_loss' was not in top 1


Epoch 292: 100%|██████████| 100/100 [00:06<00:00, 15.57it/s, v_num=0, loss/train_loss=0.155, loss/val_loss=0.384] 

Epoch 292, global step 29300: 'loss/val_loss' was not in top 1


Epoch 293: 100%|██████████| 100/100 [00:06<00:00, 16.11it/s, v_num=0, loss/train_loss=0.187, loss/val_loss=0.421]

Epoch 293, global step 29400: 'loss/val_loss' was not in top 1


Epoch 294: 100%|██████████| 100/100 [00:06<00:00, 15.41it/s, v_num=0, loss/train_loss=0.147, loss/val_loss=0.413]

Epoch 294, global step 29500: 'loss/val_loss' was not in top 1


Epoch 295: 100%|██████████| 100/100 [00:06<00:00, 15.86it/s, v_num=0, loss/train_loss=0.186, loss/val_loss=0.373]

Epoch 295, global step 29600: 'loss/val_loss' was not in top 1


Epoch 296: 100%|██████████| 100/100 [00:06<00:00, 15.60it/s, v_num=0, loss/train_loss=0.197, loss/val_loss=0.381]

Epoch 296, global step 29700: 'loss/val_loss' was not in top 1


Epoch 297: 100%|██████████| 100/100 [00:06<00:00, 15.54it/s, v_num=0, loss/train_loss=0.149, loss/val_loss=0.387]

Epoch 297, global step 29800: 'loss/val_loss' was not in top 1


Epoch 298: 100%|██████████| 100/100 [00:06<00:00, 16.04it/s, v_num=0, loss/train_loss=0.185, loss/val_loss=0.443]

Epoch 298, global step 29900: 'loss/val_loss' was not in top 1


Epoch 299: 100%|██████████| 100/100 [00:06<00:00, 16.05it/s, v_num=0, loss/train_loss=0.303, loss/val_loss=0.514]

Epoch 299, global step 30000: 'loss/val_loss' was not in top 1


Epoch 300: 100%|██████████| 100/100 [00:06<00:00, 15.58it/s, v_num=0, loss/train_loss=0.107, loss/val_loss=0.385]

Epoch 300, global step 30100: 'loss/val_loss' was not in top 1


Epoch 301: 100%|██████████| 100/100 [00:06<00:00, 16.02it/s, v_num=0, loss/train_loss=0.166, loss/val_loss=0.458]

Epoch 301, global step 30200: 'loss/val_loss' was not in top 1


Epoch 302: 100%|██████████| 100/100 [00:06<00:00, 16.01it/s, v_num=0, loss/train_loss=0.159, loss/val_loss=0.391]

Epoch 302, global step 30300: 'loss/val_loss' was not in top 1


Epoch 303: 100%|██████████| 100/100 [00:06<00:00, 16.01it/s, v_num=0, loss/train_loss=0.167, loss/val_loss=0.486]

Epoch 303, global step 30400: 'loss/val_loss' was not in top 1


Epoch 304: 100%|██████████| 100/100 [00:06<00:00, 16.15it/s, v_num=0, loss/train_loss=0.181, loss/val_loss=0.420]

Epoch 304, global step 30500: 'loss/val_loss' was not in top 1


Epoch 305: 100%|██████████| 100/100 [00:06<00:00, 15.23it/s, v_num=0, loss/train_loss=0.204, loss/val_loss=0.359]

Epoch 305, global step 30600: 'loss/val_loss' was not in top 1


Epoch 306: 100%|██████████| 100/100 [00:06<00:00, 15.87it/s, v_num=0, loss/train_loss=0.153, loss/val_loss=0.446]

Epoch 306, global step 30700: 'loss/val_loss' was not in top 1


Epoch 307: 100%|██████████| 100/100 [00:06<00:00, 15.39it/s, v_num=0, loss/train_loss=0.289, loss/val_loss=0.403]

Epoch 307, global step 30800: 'loss/val_loss' was not in top 1


Epoch 308: 100%|██████████| 100/100 [00:06<00:00, 15.44it/s, v_num=0, loss/train_loss=0.180, loss/val_loss=0.400]

Epoch 308, global step 30900: 'loss/val_loss' was not in top 1


Epoch 309: 100%|██████████| 100/100 [00:06<00:00, 15.57it/s, v_num=0, loss/train_loss=0.293, loss/val_loss=0.457]

Epoch 309, global step 31000: 'loss/val_loss' was not in top 1


Epoch 310: 100%|██████████| 100/100 [00:06<00:00, 15.53it/s, v_num=0, loss/train_loss=0.188, loss/val_loss=0.501]

Epoch 310, global step 31100: 'loss/val_loss' was not in top 1


Epoch 311: 100%|██████████| 100/100 [00:06<00:00, 15.41it/s, v_num=0, loss/train_loss=0.271, loss/val_loss=0.459]

Epoch 311, global step 31200: 'loss/val_loss' was not in top 1


Epoch 312: 100%|██████████| 100/100 [00:06<00:00, 15.39it/s, v_num=0, loss/train_loss=0.283, loss/val_loss=0.463]

Epoch 312, global step 31300: 'loss/val_loss' was not in top 1


Epoch 313: 100%|██████████| 100/100 [00:06<00:00, 15.64it/s, v_num=0, loss/train_loss=0.246, loss/val_loss=0.410]

Epoch 313, global step 31400: 'loss/val_loss' was not in top 1


Epoch 314: 100%|██████████| 100/100 [00:06<00:00, 15.30it/s, v_num=0, loss/train_loss=0.143, loss/val_loss=0.412]

Epoch 314, global step 31500: 'loss/val_loss' was not in top 1


Epoch 315: 100%|██████████| 100/100 [00:06<00:00, 15.44it/s, v_num=0, loss/train_loss=0.196, loss/val_loss=0.411]

Epoch 315, global step 31600: 'loss/val_loss' was not in top 1


Epoch 316: 100%|██████████| 100/100 [00:06<00:00, 15.52it/s, v_num=0, loss/train_loss=0.204, loss/val_loss=0.410]

Epoch 316, global step 31700: 'loss/val_loss' was not in top 1


Epoch 317: 100%|██████████| 100/100 [00:06<00:00, 14.88it/s, v_num=0, loss/train_loss=0.126, loss/val_loss=0.360]

Epoch 317, global step 31800: 'loss/val_loss' was not in top 1


Epoch 318: 100%|██████████| 100/100 [00:06<00:00, 15.33it/s, v_num=0, loss/train_loss=0.172, loss/val_loss=0.353]

Epoch 318, global step 31900: 'loss/val_loss' was not in top 1


Epoch 319: 100%|██████████| 100/100 [00:06<00:00, 15.32it/s, v_num=0, loss/train_loss=0.168, loss/val_loss=0.314]

Epoch 319, global step 32000: 'loss/val_loss' reached 0.31437 (best 0.31437), saving model to '/disk1/ariane/vscode/CARE/task1_baselines/Pika/test_checkpoint/241023164243_self-pika_esm2_t6_8M_UR50D_gpt2_7_loss/epoch=319-step=32000.ckpt' as top 1


Epoch 320: 100%|██████████| 100/100 [00:06<00:00, 15.44it/s, v_num=0, loss/train_loss=0.236, loss/val_loss=0.335] 

Epoch 320, global step 32100: 'loss/val_loss' was not in top 1


Epoch 321: 100%|██████████| 100/100 [00:06<00:00, 15.53it/s, v_num=0, loss/train_loss=0.189, loss/val_loss=0.400]

Epoch 321, global step 32200: 'loss/val_loss' was not in top 1


Epoch 322: 100%|██████████| 100/100 [00:06<00:00, 15.58it/s, v_num=0, loss/train_loss=0.124, loss/val_loss=0.364]

Epoch 322, global step 32300: 'loss/val_loss' was not in top 1


Epoch 323: 100%|██████████| 100/100 [00:06<00:00, 15.29it/s, v_num=0, loss/train_loss=0.174, loss/val_loss=0.443]

Epoch 323, global step 32400: 'loss/val_loss' was not in top 1


Epoch 324: 100%|██████████| 100/100 [00:06<00:00, 15.47it/s, v_num=0, loss/train_loss=0.155, loss/val_loss=0.436]

Epoch 324, global step 32500: 'loss/val_loss' was not in top 1


Epoch 325: 100%|██████████| 100/100 [00:06<00:00, 15.56it/s, v_num=0, loss/train_loss=0.308, loss/val_loss=0.436] 

Epoch 325, global step 32600: 'loss/val_loss' was not in top 1


Epoch 326: 100%|██████████| 100/100 [00:06<00:00, 15.68it/s, v_num=0, loss/train_loss=0.153, loss/val_loss=0.404]

Epoch 326, global step 32700: 'loss/val_loss' was not in top 1


Epoch 327: 100%|██████████| 100/100 [00:06<00:00, 15.55it/s, v_num=0, loss/train_loss=0.197, loss/val_loss=0.416]

Epoch 327, global step 32800: 'loss/val_loss' was not in top 1


Epoch 328: 100%|██████████| 100/100 [00:06<00:00, 15.38it/s, v_num=0, loss/train_loss=0.167, loss/val_loss=0.415]

Epoch 328, global step 32900: 'loss/val_loss' was not in top 1


Epoch 329: 100%|██████████| 100/100 [00:06<00:00, 15.49it/s, v_num=0, loss/train_loss=0.148, loss/val_loss=0.407]

Epoch 329, global step 33000: 'loss/val_loss' was not in top 1


Epoch 330: 100%|██████████| 100/100 [00:06<00:00, 15.50it/s, v_num=0, loss/train_loss=0.162, loss/val_loss=0.384]

Epoch 330, global step 33100: 'loss/val_loss' was not in top 1


Epoch 331: 100%|██████████| 100/100 [00:06<00:00, 15.68it/s, v_num=0, loss/train_loss=0.139, loss/val_loss=0.400]

Epoch 331, global step 33200: 'loss/val_loss' was not in top 1


Epoch 332: 100%|██████████| 100/100 [00:06<00:00, 15.47it/s, v_num=0, loss/train_loss=0.181, loss/val_loss=0.437]

Epoch 332, global step 33300: 'loss/val_loss' was not in top 1


Epoch 333: 100%|██████████| 100/100 [00:06<00:00, 15.63it/s, v_num=0, loss/train_loss=0.162, loss/val_loss=0.390]

Epoch 333, global step 33400: 'loss/val_loss' was not in top 1


Epoch 334: 100%|██████████| 100/100 [00:06<00:00, 15.45it/s, v_num=0, loss/train_loss=0.111, loss/val_loss=0.377]

Epoch 334, global step 33500: 'loss/val_loss' was not in top 1


Epoch 335: 100%|██████████| 100/100 [00:06<00:00, 15.55it/s, v_num=0, loss/train_loss=0.116, loss/val_loss=0.349]

Epoch 335, global step 33600: 'loss/val_loss' was not in top 1


Epoch 336: 100%|██████████| 100/100 [00:06<00:00, 15.21it/s, v_num=0, loss/train_loss=0.187, loss/val_loss=0.383]

Epoch 336, global step 33700: 'loss/val_loss' was not in top 1


Epoch 337: 100%|██████████| 100/100 [00:06<00:00, 15.92it/s, v_num=0, loss/train_loss=0.178, loss/val_loss=0.404]

Epoch 337, global step 33800: 'loss/val_loss' was not in top 1


Epoch 338: 100%|██████████| 100/100 [00:06<00:00, 15.55it/s, v_num=0, loss/train_loss=0.152, loss/val_loss=0.375]

Epoch 338, global step 33900: 'loss/val_loss' was not in top 1


Epoch 339: 100%|██████████| 100/100 [00:06<00:00, 15.16it/s, v_num=0, loss/train_loss=0.144, loss/val_loss=0.357]

Epoch 339, global step 34000: 'loss/val_loss' was not in top 1


Epoch 340: 100%|██████████| 100/100 [00:06<00:00, 15.14it/s, v_num=0, loss/train_loss=0.177, loss/val_loss=0.390]

Epoch 340, global step 34100: 'loss/val_loss' was not in top 1


Epoch 341: 100%|██████████| 100/100 [00:06<00:00, 15.31it/s, v_num=0, loss/train_loss=0.239, loss/val_loss=0.355]

Epoch 341, global step 34200: 'loss/val_loss' was not in top 1


Epoch 342: 100%|██████████| 100/100 [00:06<00:00, 15.54it/s, v_num=0, loss/train_loss=0.248, loss/val_loss=0.349]

Epoch 342, global step 34300: 'loss/val_loss' was not in top 1


Epoch 343: 100%|██████████| 100/100 [00:06<00:00, 15.54it/s, v_num=0, loss/train_loss=0.190, loss/val_loss=0.402]

Epoch 343, global step 34400: 'loss/val_loss' was not in top 1


Epoch 344: 100%|██████████| 100/100 [00:06<00:00, 15.54it/s, v_num=0, loss/train_loss=0.149, loss/val_loss=0.325]

Epoch 344, global step 34500: 'loss/val_loss' was not in top 1


Epoch 345: 100%|██████████| 100/100 [00:06<00:00, 15.50it/s, v_num=0, loss/train_loss=0.249, loss/val_loss=0.330]

Epoch 345, global step 34600: 'loss/val_loss' was not in top 1


Epoch 346: 100%|██████████| 100/100 [00:06<00:00, 15.65it/s, v_num=0, loss/train_loss=0.0927, loss/val_loss=0.314]

Epoch 346, global step 34700: 'loss/val_loss' reached 0.31365 (best 0.31365), saving model to '/disk1/ariane/vscode/CARE/task1_baselines/Pika/test_checkpoint/241023164243_self-pika_esm2_t6_8M_UR50D_gpt2_7_loss/epoch=346-step=34700.ckpt' as top 1


Epoch 347: 100%|██████████| 100/100 [00:06<00:00, 15.55it/s, v_num=0, loss/train_loss=0.170, loss/val_loss=0.346] 

Epoch 347, global step 34800: 'loss/val_loss' was not in top 1


Epoch 348: 100%|██████████| 100/100 [00:06<00:00, 15.57it/s, v_num=0, loss/train_loss=0.272, loss/val_loss=0.448]

Epoch 348, global step 34900: 'loss/val_loss' was not in top 1


Epoch 349: 100%|██████████| 100/100 [00:06<00:00, 15.53it/s, v_num=0, loss/train_loss=0.115, loss/val_loss=0.443]

Epoch 349, global step 35000: 'loss/val_loss' was not in top 1


Epoch 350: 100%|██████████| 100/100 [00:06<00:00, 15.75it/s, v_num=0, loss/train_loss=0.170, loss/val_loss=0.376]

Epoch 350, global step 35100: 'loss/val_loss' was not in top 1


Epoch 351: 100%|██████████| 100/100 [00:06<00:00, 15.52it/s, v_num=0, loss/train_loss=0.241, loss/val_loss=0.377]

Epoch 351, global step 35200: 'loss/val_loss' was not in top 1


Epoch 352: 100%|██████████| 100/100 [00:06<00:00, 15.84it/s, v_num=0, loss/train_loss=0.156, loss/val_loss=0.375]

Epoch 352, global step 35300: 'loss/val_loss' was not in top 1


Epoch 353: 100%|██████████| 100/100 [00:06<00:00, 15.60it/s, v_num=0, loss/train_loss=0.177, loss/val_loss=0.397]

Epoch 353, global step 35400: 'loss/val_loss' was not in top 1


Epoch 354: 100%|██████████| 100/100 [00:06<00:00, 15.77it/s, v_num=0, loss/train_loss=0.248, loss/val_loss=0.398]

Epoch 354, global step 35500: 'loss/val_loss' was not in top 1


Epoch 355: 100%|██████████| 100/100 [00:06<00:00, 15.61it/s, v_num=0, loss/train_loss=0.147, loss/val_loss=0.404]

Epoch 355, global step 35600: 'loss/val_loss' was not in top 1


Epoch 356: 100%|██████████| 100/100 [00:06<00:00, 15.34it/s, v_num=0, loss/train_loss=0.172, loss/val_loss=0.359]

Epoch 356, global step 35700: 'loss/val_loss' was not in top 1


Epoch 357: 100%|██████████| 100/100 [00:06<00:00, 15.53it/s, v_num=0, loss/train_loss=0.194, loss/val_loss=0.337]

Epoch 357, global step 35800: 'loss/val_loss' was not in top 1


Epoch 358: 100%|██████████| 100/100 [00:06<00:00, 15.51it/s, v_num=0, loss/train_loss=0.126, loss/val_loss=0.334]

Epoch 358, global step 35900: 'loss/val_loss' was not in top 1


Epoch 359: 100%|██████████| 100/100 [00:06<00:00, 15.49it/s, v_num=0, loss/train_loss=0.203, loss/val_loss=0.344]

Epoch 359, global step 36000: 'loss/val_loss' was not in top 1


Epoch 360: 100%|██████████| 100/100 [00:06<00:00, 15.83it/s, v_num=0, loss/train_loss=0.190, loss/val_loss=0.371]

Epoch 360, global step 36100: 'loss/val_loss' was not in top 1


Epoch 361: 100%|██████████| 100/100 [00:06<00:00, 15.40it/s, v_num=0, loss/train_loss=0.124, loss/val_loss=0.433]

Epoch 361, global step 36200: 'loss/val_loss' was not in top 1


Epoch 362: 100%|██████████| 100/100 [00:06<00:00, 15.72it/s, v_num=0, loss/train_loss=0.248, loss/val_loss=0.352]

Epoch 362, global step 36300: 'loss/val_loss' was not in top 1


Epoch 363: 100%|██████████| 100/100 [00:06<00:00, 15.47it/s, v_num=0, loss/train_loss=0.199, loss/val_loss=0.385]

Epoch 363, global step 36400: 'loss/val_loss' was not in top 1


Epoch 364: 100%|██████████| 100/100 [00:06<00:00, 15.69it/s, v_num=0, loss/train_loss=0.218, loss/val_loss=0.314]

Epoch 364, global step 36500: 'loss/val_loss' was not in top 1


Epoch 365: 100%|██████████| 100/100 [00:06<00:00, 16.06it/s, v_num=0, loss/train_loss=0.0752, loss/val_loss=0.295]

Epoch 365, global step 36600: 'loss/val_loss' reached 0.29503 (best 0.29503), saving model to '/disk1/ariane/vscode/CARE/task1_baselines/Pika/test_checkpoint/241023164243_self-pika_esm2_t6_8M_UR50D_gpt2_7_loss/epoch=365-step=36600.ckpt' as top 1


Epoch 366: 100%|██████████| 100/100 [00:06<00:00, 15.55it/s, v_num=0, loss/train_loss=0.218, loss/val_loss=0.294] 

Epoch 366, global step 36700: 'loss/val_loss' reached 0.29370 (best 0.29370), saving model to '/disk1/ariane/vscode/CARE/task1_baselines/Pika/test_checkpoint/241023164243_self-pika_esm2_t6_8M_UR50D_gpt2_7_loss/epoch=366-step=36700.ckpt' as top 1


Epoch 367: 100%|██████████| 100/100 [00:06<00:00, 15.75it/s, v_num=0, loss/train_loss=0.249, loss/val_loss=0.275]

Epoch 367, global step 36800: 'loss/val_loss' reached 0.27523 (best 0.27523), saving model to '/disk1/ariane/vscode/CARE/task1_baselines/Pika/test_checkpoint/241023164243_self-pika_esm2_t6_8M_UR50D_gpt2_7_loss/epoch=367-step=36800.ckpt' as top 1


Epoch 368: 100%|██████████| 100/100 [00:06<00:00, 15.76it/s, v_num=0, loss/train_loss=0.146, loss/val_loss=0.308]

Epoch 368, global step 36900: 'loss/val_loss' was not in top 1


Epoch 369: 100%|██████████| 100/100 [00:06<00:00, 15.53it/s, v_num=0, loss/train_loss=0.186, loss/val_loss=0.348]

Epoch 369, global step 37000: 'loss/val_loss' was not in top 1


Epoch 370: 100%|██████████| 100/100 [00:06<00:00, 15.62it/s, v_num=0, loss/train_loss=0.178, loss/val_loss=0.303]

Epoch 370, global step 37100: 'loss/val_loss' was not in top 1


Epoch 371: 100%|██████████| 100/100 [00:06<00:00, 15.34it/s, v_num=0, loss/train_loss=0.200, loss/val_loss=0.312]

Epoch 371, global step 37200: 'loss/val_loss' was not in top 1


Epoch 372: 100%|██████████| 100/100 [00:06<00:00, 15.73it/s, v_num=0, loss/train_loss=0.200, loss/val_loss=0.324]

Epoch 372, global step 37300: 'loss/val_loss' was not in top 1


Epoch 373: 100%|██████████| 100/100 [00:06<00:00, 15.49it/s, v_num=0, loss/train_loss=0.218, loss/val_loss=0.317]

Epoch 373, global step 37400: 'loss/val_loss' was not in top 1


Epoch 374: 100%|██████████| 100/100 [00:06<00:00, 15.32it/s, v_num=0, loss/train_loss=0.195, loss/val_loss=0.332]

Epoch 374, global step 37500: 'loss/val_loss' was not in top 1


Epoch 375: 100%|██████████| 100/100 [00:06<00:00, 15.15it/s, v_num=0, loss/train_loss=0.211, loss/val_loss=0.357]

Epoch 375, global step 37600: 'loss/val_loss' was not in top 1


Epoch 376: 100%|██████████| 100/100 [00:06<00:00, 15.47it/s, v_num=0, loss/train_loss=0.158, loss/val_loss=0.356]

Epoch 376, global step 37700: 'loss/val_loss' was not in top 1


Epoch 377: 100%|██████████| 100/100 [00:06<00:00, 15.27it/s, v_num=0, loss/train_loss=0.168, loss/val_loss=0.356] 

Epoch 377, global step 37800: 'loss/val_loss' was not in top 1


Epoch 378: 100%|██████████| 100/100 [00:06<00:00, 15.33it/s, v_num=0, loss/train_loss=0.135, loss/val_loss=0.329]

Epoch 378, global step 37900: 'loss/val_loss' was not in top 1


Epoch 379: 100%|██████████| 100/100 [00:06<00:00, 15.36it/s, v_num=0, loss/train_loss=0.199, loss/val_loss=0.324]

Epoch 379, global step 38000: 'loss/val_loss' was not in top 1


Epoch 380: 100%|██████████| 100/100 [00:06<00:00, 15.31it/s, v_num=0, loss/train_loss=0.216, loss/val_loss=0.377]

Epoch 380, global step 38100: 'loss/val_loss' was not in top 1


Epoch 381: 100%|██████████| 100/100 [00:06<00:00, 15.56it/s, v_num=0, loss/train_loss=0.289, loss/val_loss=0.322]

Epoch 381, global step 38200: 'loss/val_loss' was not in top 1


Epoch 382: 100%|██████████| 100/100 [00:06<00:00, 15.56it/s, v_num=0, loss/train_loss=0.264, loss/val_loss=0.337]

Epoch 382, global step 38300: 'loss/val_loss' was not in top 1


Epoch 383: 100%|██████████| 100/100 [00:06<00:00, 15.52it/s, v_num=0, loss/train_loss=0.110, loss/val_loss=0.332] 

Epoch 383, global step 38400: 'loss/val_loss' was not in top 1


Epoch 384: 100%|██████████| 100/100 [00:06<00:00, 15.08it/s, v_num=0, loss/train_loss=0.151, loss/val_loss=0.392] 

Epoch 384, global step 38500: 'loss/val_loss' was not in top 1


Epoch 385: 100%|██████████| 100/100 [00:06<00:00, 15.40it/s, v_num=0, loss/train_loss=0.0885, loss/val_loss=0.400]

Epoch 385, global step 38600: 'loss/val_loss' was not in top 1


Epoch 386: 100%|██████████| 100/100 [00:06<00:00, 15.35it/s, v_num=0, loss/train_loss=0.223, loss/val_loss=0.303] 

Epoch 386, global step 38700: 'loss/val_loss' was not in top 1


Epoch 387: 100%|██████████| 100/100 [00:06<00:00, 15.92it/s, v_num=0, loss/train_loss=0.152, loss/val_loss=0.327]

Epoch 387, global step 38800: 'loss/val_loss' was not in top 1


Epoch 388: 100%|██████████| 100/100 [00:06<00:00, 15.30it/s, v_num=0, loss/train_loss=0.163, loss/val_loss=0.337]

Epoch 388, global step 38900: 'loss/val_loss' was not in top 1


Epoch 389: 100%|██████████| 100/100 [00:06<00:00, 15.01it/s, v_num=0, loss/train_loss=0.147, loss/val_loss=0.347]

Epoch 389, global step 39000: 'loss/val_loss' was not in top 1


Epoch 390: 100%|██████████| 100/100 [00:06<00:00, 15.32it/s, v_num=0, loss/train_loss=0.230, loss/val_loss=0.345]

Epoch 390, global step 39100: 'loss/val_loss' was not in top 1


Epoch 391: 100%|██████████| 100/100 [00:06<00:00, 15.53it/s, v_num=0, loss/train_loss=0.155, loss/val_loss=0.303]

Epoch 391, global step 39200: 'loss/val_loss' was not in top 1


Epoch 392: 100%|██████████| 100/100 [00:06<00:00, 15.34it/s, v_num=0, loss/train_loss=0.175, loss/val_loss=0.363]

Epoch 392, global step 39300: 'loss/val_loss' was not in top 1


Epoch 393: 100%|██████████| 100/100 [00:06<00:00, 15.96it/s, v_num=0, loss/train_loss=0.144, loss/val_loss=0.406]

Epoch 393, global step 39400: 'loss/val_loss' was not in top 1


Epoch 394: 100%|██████████| 100/100 [00:06<00:00, 15.81it/s, v_num=0, loss/train_loss=0.137, loss/val_loss=0.387]

Epoch 394, global step 39500: 'loss/val_loss' was not in top 1


Epoch 395: 100%|██████████| 100/100 [00:06<00:00, 15.37it/s, v_num=0, loss/train_loss=0.203, loss/val_loss=0.424]

Epoch 395, global step 39600: 'loss/val_loss' was not in top 1


Epoch 396: 100%|██████████| 100/100 [00:06<00:00, 15.47it/s, v_num=0, loss/train_loss=0.259, loss/val_loss=0.377]

Epoch 396, global step 39700: 'loss/val_loss' was not in top 1


Epoch 397: 100%|██████████| 100/100 [00:06<00:00, 16.03it/s, v_num=0, loss/train_loss=0.117, loss/val_loss=0.377]

Epoch 397, global step 39800: 'loss/val_loss' was not in top 1


Epoch 398: 100%|██████████| 100/100 [00:06<00:00, 15.84it/s, v_num=0, loss/train_loss=0.306, loss/val_loss=0.394]

Epoch 398, global step 39900: 'loss/val_loss' was not in top 1


Epoch 399: 100%|██████████| 100/100 [00:06<00:00, 15.38it/s, v_num=0, loss/train_loss=0.132, loss/val_loss=0.371]

Epoch 399, global step 40000: 'loss/val_loss' was not in top 1


Epoch 400: 100%|██████████| 100/100 [00:06<00:00, 15.29it/s, v_num=0, loss/train_loss=0.162, loss/val_loss=0.401]

Epoch 400, global step 40100: 'loss/val_loss' was not in top 1


Epoch 401: 100%|██████████| 100/100 [00:06<00:00, 15.63it/s, v_num=0, loss/train_loss=0.133, loss/val_loss=0.384]

Epoch 401, global step 40200: 'loss/val_loss' was not in top 1


Epoch 402: 100%|██████████| 100/100 [00:06<00:00, 15.62it/s, v_num=0, loss/train_loss=0.191, loss/val_loss=0.345]

Epoch 402, global step 40300: 'loss/val_loss' was not in top 1


Epoch 403: 100%|██████████| 100/100 [00:06<00:00, 15.64it/s, v_num=0, loss/train_loss=0.101, loss/val_loss=0.374]

Epoch 403, global step 40400: 'loss/val_loss' was not in top 1


Epoch 404: 100%|██████████| 100/100 [00:06<00:00, 15.96it/s, v_num=0, loss/train_loss=0.166, loss/val_loss=0.335]

Epoch 404, global step 40500: 'loss/val_loss' was not in top 1


Epoch 405: 100%|██████████| 100/100 [00:06<00:00, 15.26it/s, v_num=0, loss/train_loss=0.304, loss/val_loss=0.314]

Epoch 405, global step 40600: 'loss/val_loss' was not in top 1


Epoch 406: 100%|██████████| 100/100 [00:06<00:00, 16.27it/s, v_num=0, loss/train_loss=0.246, loss/val_loss=0.401]

Epoch 406, global step 40700: 'loss/val_loss' was not in top 1


Epoch 407: 100%|██████████| 100/100 [00:06<00:00, 15.51it/s, v_num=0, loss/train_loss=0.0807, loss/val_loss=0.404]

Epoch 407, global step 40800: 'loss/val_loss' was not in top 1


Epoch 408: 100%|██████████| 100/100 [00:06<00:00, 15.45it/s, v_num=0, loss/train_loss=0.261, loss/val_loss=0.401] 

Epoch 408, global step 40900: 'loss/val_loss' was not in top 1


Epoch 409: 100%|██████████| 100/100 [00:06<00:00, 15.78it/s, v_num=0, loss/train_loss=0.119, loss/val_loss=0.351]

Epoch 409, global step 41000: 'loss/val_loss' was not in top 1


Epoch 410: 100%|██████████| 100/100 [00:06<00:00, 15.66it/s, v_num=0, loss/train_loss=0.270, loss/val_loss=0.450]

Epoch 410, global step 41100: 'loss/val_loss' was not in top 1


Epoch 411: 100%|██████████| 100/100 [00:05<00:00, 16.69it/s, v_num=0, loss/train_loss=0.130, loss/val_loss=0.357]

Epoch 411, global step 41200: 'loss/val_loss' was not in top 1


Epoch 412: 100%|██████████| 100/100 [00:05<00:00, 17.55it/s, v_num=0, loss/train_loss=0.190, loss/val_loss=0.389]

Epoch 412, global step 41300: 'loss/val_loss' was not in top 1


Epoch 413: 100%|██████████| 100/100 [00:05<00:00, 17.68it/s, v_num=0, loss/train_loss=0.149, loss/val_loss=0.393]

Epoch 413, global step 41400: 'loss/val_loss' was not in top 1


Epoch 414: 100%|██████████| 100/100 [00:06<00:00, 16.23it/s, v_num=0, loss/train_loss=0.210, loss/val_loss=0.354]

Epoch 414, global step 41500: 'loss/val_loss' was not in top 1


Epoch 415: 100%|██████████| 100/100 [00:06<00:00, 15.69it/s, v_num=0, loss/train_loss=0.185, loss/val_loss=0.385]

Epoch 415, global step 41600: 'loss/val_loss' was not in top 1


Epoch 416: 100%|██████████| 100/100 [00:06<00:00, 15.58it/s, v_num=0, loss/train_loss=0.169, loss/val_loss=0.383]

Epoch 416, global step 41700: 'loss/val_loss' was not in top 1


Epoch 417: 100%|██████████| 100/100 [00:06<00:00, 15.46it/s, v_num=0, loss/train_loss=0.191, loss/val_loss=0.451]

Epoch 417, global step 41800: 'loss/val_loss' was not in top 1


Epoch 418: 100%|██████████| 100/100 [00:06<00:00, 15.63it/s, v_num=0, loss/train_loss=0.152, loss/val_loss=0.422]

Epoch 418, global step 41900: 'loss/val_loss' was not in top 1


Epoch 419: 100%|██████████| 100/100 [00:06<00:00, 16.10it/s, v_num=0, loss/train_loss=0.185, loss/val_loss=0.348]

Epoch 419, global step 42000: 'loss/val_loss' was not in top 1


Epoch 420: 100%|██████████| 100/100 [00:06<00:00, 15.28it/s, v_num=0, loss/train_loss=0.231, loss/val_loss=0.373]

Epoch 420, global step 42100: 'loss/val_loss' was not in top 1


Epoch 421: 100%|██████████| 100/100 [00:06<00:00, 15.71it/s, v_num=0, loss/train_loss=0.219, loss/val_loss=0.339] 

Epoch 421, global step 42200: 'loss/val_loss' was not in top 1


Epoch 422: 100%|██████████| 100/100 [00:06<00:00, 15.46it/s, v_num=0, loss/train_loss=0.240, loss/val_loss=0.295]

Epoch 422, global step 42300: 'loss/val_loss' was not in top 1


Epoch 423: 100%|██████████| 100/100 [00:06<00:00, 15.29it/s, v_num=0, loss/train_loss=0.183, loss/val_loss=0.327]

Epoch 423, global step 42400: 'loss/val_loss' was not in top 1


Epoch 424: 100%|██████████| 100/100 [00:06<00:00, 15.77it/s, v_num=0, loss/train_loss=0.162, loss/val_loss=0.334]

Epoch 424, global step 42500: 'loss/val_loss' was not in top 1


Epoch 425: 100%|██████████| 100/100 [00:06<00:00, 15.54it/s, v_num=0, loss/train_loss=0.174, loss/val_loss=0.439]

Epoch 425, global step 42600: 'loss/val_loss' was not in top 1


Epoch 426: 100%|██████████| 100/100 [00:06<00:00, 15.49it/s, v_num=0, loss/train_loss=0.120, loss/val_loss=0.467]

Epoch 426, global step 42700: 'loss/val_loss' was not in top 1


Epoch 427: 100%|██████████| 100/100 [00:06<00:00, 15.41it/s, v_num=0, loss/train_loss=0.275, loss/val_loss=0.365]

Epoch 427, global step 42800: 'loss/val_loss' was not in top 1


Epoch 428: 100%|██████████| 100/100 [00:06<00:00, 15.40it/s, v_num=0, loss/train_loss=0.062, loss/val_loss=0.402]

Epoch 428, global step 42900: 'loss/val_loss' was not in top 1


Epoch 429: 100%|██████████| 100/100 [00:06<00:00, 15.60it/s, v_num=0, loss/train_loss=0.133, loss/val_loss=0.455]

Epoch 429, global step 43000: 'loss/val_loss' was not in top 1


Epoch 430: 100%|██████████| 100/100 [00:06<00:00, 15.69it/s, v_num=0, loss/train_loss=0.342, loss/val_loss=0.532]

Epoch 430, global step 43100: 'loss/val_loss' was not in top 1


Epoch 431: 100%|██████████| 100/100 [00:06<00:00, 15.64it/s, v_num=0, loss/train_loss=0.145, loss/val_loss=0.434]

Epoch 431, global step 43200: 'loss/val_loss' was not in top 1


Epoch 432: 100%|██████████| 100/100 [00:06<00:00, 15.32it/s, v_num=0, loss/train_loss=0.252, loss/val_loss=0.297]

Epoch 432, global step 43300: 'loss/val_loss' was not in top 1


Epoch 433: 100%|██████████| 100/100 [00:06<00:00, 15.53it/s, v_num=0, loss/train_loss=0.168, loss/val_loss=0.314]

Epoch 433, global step 43400: 'loss/val_loss' was not in top 1


Epoch 434: 100%|██████████| 100/100 [00:06<00:00, 15.57it/s, v_num=0, loss/train_loss=0.153, loss/val_loss=0.318]

Epoch 434, global step 43500: 'loss/val_loss' was not in top 1


Epoch 435: 100%|██████████| 100/100 [00:06<00:00, 15.16it/s, v_num=0, loss/train_loss=0.150, loss/val_loss=0.281]

Epoch 435, global step 43600: 'loss/val_loss' was not in top 1


Epoch 436: 100%|██████████| 100/100 [00:06<00:00, 15.65it/s, v_num=0, loss/train_loss=0.191, loss/val_loss=0.336]

Epoch 436, global step 43700: 'loss/val_loss' was not in top 1


Epoch 437: 100%|██████████| 100/100 [00:06<00:00, 15.78it/s, v_num=0, loss/train_loss=0.158, loss/val_loss=0.291]

Epoch 437, global step 43800: 'loss/val_loss' was not in top 1


Epoch 438: 100%|██████████| 100/100 [00:06<00:00, 15.65it/s, v_num=0, loss/train_loss=0.166, loss/val_loss=0.291]

Epoch 438, global step 43900: 'loss/val_loss' was not in top 1


Epoch 439: 100%|██████████| 100/100 [00:06<00:00, 15.37it/s, v_num=0, loss/train_loss=0.202, loss/val_loss=0.305] 

Epoch 439, global step 44000: 'loss/val_loss' was not in top 1


Epoch 440: 100%|██████████| 100/100 [00:06<00:00, 15.20it/s, v_num=0, loss/train_loss=0.135, loss/val_loss=0.276]

Epoch 440, global step 44100: 'loss/val_loss' was not in top 1


Epoch 441: 100%|██████████| 100/100 [00:06<00:00, 15.27it/s, v_num=0, loss/train_loss=0.108, loss/val_loss=0.271]

Epoch 441, global step 44200: 'loss/val_loss' reached 0.27131 (best 0.27131), saving model to '/disk1/ariane/vscode/CARE/task1_baselines/Pika/test_checkpoint/241023164243_self-pika_esm2_t6_8M_UR50D_gpt2_7_loss/epoch=441-step=44200.ckpt' as top 1


Epoch 442: 100%|██████████| 100/100 [00:06<00:00, 15.63it/s, v_num=0, loss/train_loss=0.202, loss/val_loss=0.316]

Epoch 442, global step 44300: 'loss/val_loss' was not in top 1


Epoch 443: 100%|██████████| 100/100 [00:06<00:00, 15.67it/s, v_num=0, loss/train_loss=0.234, loss/val_loss=0.343]

Epoch 443, global step 44400: 'loss/val_loss' was not in top 1


Epoch 444: 100%|██████████| 100/100 [00:06<00:00, 15.65it/s, v_num=0, loss/train_loss=0.104, loss/val_loss=0.323]

Epoch 444, global step 44500: 'loss/val_loss' was not in top 1


Epoch 445: 100%|██████████| 100/100 [00:06<00:00, 15.84it/s, v_num=0, loss/train_loss=0.0668, loss/val_loss=0.307]

Epoch 445, global step 44600: 'loss/val_loss' was not in top 1


Epoch 446: 100%|██████████| 100/100 [00:06<00:00, 15.56it/s, v_num=0, loss/train_loss=0.184, loss/val_loss=0.328] 

Epoch 446, global step 44700: 'loss/val_loss' was not in top 1


Epoch 447: 100%|██████████| 100/100 [00:06<00:00, 15.43it/s, v_num=0, loss/train_loss=0.148, loss/val_loss=0.372]

Epoch 447, global step 44800: 'loss/val_loss' was not in top 1


Epoch 448: 100%|██████████| 100/100 [00:06<00:00, 15.64it/s, v_num=0, loss/train_loss=0.179, loss/val_loss=0.361]

Epoch 448, global step 44900: 'loss/val_loss' was not in top 1


Epoch 449: 100%|██████████| 100/100 [00:06<00:00, 15.86it/s, v_num=0, loss/train_loss=0.126, loss/val_loss=0.322]

Epoch 449, global step 45000: 'loss/val_loss' was not in top 1


Epoch 450: 100%|██████████| 100/100 [00:06<00:00, 15.69it/s, v_num=0, loss/train_loss=0.191, loss/val_loss=0.304]

Epoch 450, global step 45100: 'loss/val_loss' was not in top 1


Epoch 451: 100%|██████████| 100/100 [00:06<00:00, 15.18it/s, v_num=0, loss/train_loss=0.203, loss/val_loss=0.353]

Epoch 451, global step 45200: 'loss/val_loss' was not in top 1


Epoch 452: 100%|██████████| 100/100 [00:06<00:00, 15.80it/s, v_num=0, loss/train_loss=0.202, loss/val_loss=0.319]

Epoch 452, global step 45300: 'loss/val_loss' was not in top 1


Epoch 453: 100%|██████████| 100/100 [00:06<00:00, 15.86it/s, v_num=0, loss/train_loss=0.190, loss/val_loss=0.332]

Epoch 453, global step 45400: 'loss/val_loss' was not in top 1


Epoch 454: 100%|██████████| 100/100 [00:06<00:00, 15.52it/s, v_num=0, loss/train_loss=0.175, loss/val_loss=0.325]

Epoch 454, global step 45500: 'loss/val_loss' was not in top 1


Epoch 455: 100%|██████████| 100/100 [00:06<00:00, 15.99it/s, v_num=0, loss/train_loss=0.130, loss/val_loss=0.385]

Epoch 455, global step 45600: 'loss/val_loss' was not in top 1


Epoch 456: 100%|██████████| 100/100 [00:06<00:00, 15.73it/s, v_num=0, loss/train_loss=0.113, loss/val_loss=0.382] 

Epoch 456, global step 45700: 'loss/val_loss' was not in top 1


Epoch 457: 100%|██████████| 100/100 [00:06<00:00, 15.36it/s, v_num=0, loss/train_loss=0.129, loss/val_loss=0.362]

Epoch 457, global step 45800: 'loss/val_loss' was not in top 1


Epoch 458: 100%|██████████| 100/100 [00:06<00:00, 15.33it/s, v_num=0, loss/train_loss=0.234, loss/val_loss=0.413]

Epoch 458, global step 45900: 'loss/val_loss' was not in top 1


Epoch 459: 100%|██████████| 100/100 [00:06<00:00, 15.63it/s, v_num=0, loss/train_loss=0.158, loss/val_loss=0.377]

Epoch 459, global step 46000: 'loss/val_loss' was not in top 1


Epoch 460: 100%|██████████| 100/100 [00:06<00:00, 15.62it/s, v_num=0, loss/train_loss=0.191, loss/val_loss=0.346]

Epoch 460, global step 46100: 'loss/val_loss' was not in top 1


Epoch 461: 100%|██████████| 100/100 [00:06<00:00, 15.52it/s, v_num=0, loss/train_loss=0.181, loss/val_loss=0.378]

Epoch 461, global step 46200: 'loss/val_loss' was not in top 1


Epoch 462: 100%|██████████| 100/100 [00:06<00:00, 15.72it/s, v_num=0, loss/train_loss=0.293, loss/val_loss=0.404]

Epoch 462, global step 46300: 'loss/val_loss' was not in top 1


Epoch 463: 100%|██████████| 100/100 [00:06<00:00, 15.65it/s, v_num=0, loss/train_loss=0.178, loss/val_loss=0.399]

Epoch 463, global step 46400: 'loss/val_loss' was not in top 1


Epoch 464: 100%|██████████| 100/100 [00:06<00:00, 15.61it/s, v_num=0, loss/train_loss=0.137, loss/val_loss=0.368]

Epoch 464, global step 46500: 'loss/val_loss' was not in top 1


Epoch 465: 100%|██████████| 100/100 [00:06<00:00, 14.93it/s, v_num=0, loss/train_loss=0.116, loss/val_loss=0.450] 

Epoch 465, global step 46600: 'loss/val_loss' was not in top 1


Epoch 466: 100%|██████████| 100/100 [00:06<00:00, 15.79it/s, v_num=0, loss/train_loss=0.159, loss/val_loss=0.415]

Epoch 466, global step 46700: 'loss/val_loss' was not in top 1


Epoch 467: 100%|██████████| 100/100 [00:06<00:00, 15.48it/s, v_num=0, loss/train_loss=0.102, loss/val_loss=0.371]

Epoch 467, global step 46800: 'loss/val_loss' was not in top 1


Epoch 468: 100%|██████████| 100/100 [00:06<00:00, 15.89it/s, v_num=0, loss/train_loss=0.129, loss/val_loss=0.342]

Epoch 468, global step 46900: 'loss/val_loss' was not in top 1


Epoch 469: 100%|██████████| 100/100 [00:06<00:00, 15.90it/s, v_num=0, loss/train_loss=0.166, loss/val_loss=0.382]

Epoch 469, global step 47000: 'loss/val_loss' was not in top 1


Epoch 470: 100%|██████████| 100/100 [00:06<00:00, 15.73it/s, v_num=0, loss/train_loss=0.180, loss/val_loss=0.374]

Epoch 470, global step 47100: 'loss/val_loss' was not in top 1


Epoch 471: 100%|██████████| 100/100 [00:06<00:00, 15.67it/s, v_num=0, loss/train_loss=0.101, loss/val_loss=0.353]

Epoch 471, global step 47200: 'loss/val_loss' was not in top 1


Epoch 472: 100%|██████████| 100/100 [00:06<00:00, 15.53it/s, v_num=0, loss/train_loss=0.107, loss/val_loss=0.386]

Epoch 472, global step 47300: 'loss/val_loss' was not in top 1


Epoch 473: 100%|██████████| 100/100 [00:06<00:00, 15.46it/s, v_num=0, loss/train_loss=0.207, loss/val_loss=0.381]

Epoch 473, global step 47400: 'loss/val_loss' was not in top 1


Epoch 474: 100%|██████████| 100/100 [00:06<00:00, 15.72it/s, v_num=0, loss/train_loss=0.147, loss/val_loss=0.414]

Epoch 474, global step 47500: 'loss/val_loss' was not in top 1


Epoch 475: 100%|██████████| 100/100 [00:06<00:00, 15.95it/s, v_num=0, loss/train_loss=0.183, loss/val_loss=0.382]

Epoch 475, global step 47600: 'loss/val_loss' was not in top 1


Epoch 476: 100%|██████████| 100/100 [00:06<00:00, 15.37it/s, v_num=0, loss/train_loss=0.172, loss/val_loss=0.400]

Epoch 476, global step 47700: 'loss/val_loss' was not in top 1


Epoch 477: 100%|██████████| 100/100 [00:06<00:00, 15.43it/s, v_num=0, loss/train_loss=0.130, loss/val_loss=0.346] 

Epoch 477, global step 47800: 'loss/val_loss' was not in top 1


Epoch 478: 100%|██████████| 100/100 [00:06<00:00, 15.95it/s, v_num=0, loss/train_loss=0.132, loss/val_loss=0.396]

Epoch 478, global step 47900: 'loss/val_loss' was not in top 1


Epoch 479: 100%|██████████| 100/100 [00:06<00:00, 15.73it/s, v_num=0, loss/train_loss=0.117, loss/val_loss=0.456]

Epoch 479, global step 48000: 'loss/val_loss' was not in top 1


Epoch 480: 100%|██████████| 100/100 [00:06<00:00, 15.68it/s, v_num=0, loss/train_loss=0.236, loss/val_loss=0.370]

Epoch 480, global step 48100: 'loss/val_loss' was not in top 1


Epoch 481: 100%|██████████| 100/100 [00:06<00:00, 15.81it/s, v_num=0, loss/train_loss=0.141, loss/val_loss=0.324]

Epoch 481, global step 48200: 'loss/val_loss' was not in top 1


Epoch 482: 100%|██████████| 100/100 [00:06<00:00, 15.52it/s, v_num=0, loss/train_loss=0.255, loss/val_loss=0.347]

Epoch 482, global step 48300: 'loss/val_loss' was not in top 1


Epoch 483: 100%|██████████| 100/100 [00:06<00:00, 15.78it/s, v_num=0, loss/train_loss=0.265, loss/val_loss=0.426]

Epoch 483, global step 48400: 'loss/val_loss' was not in top 1


Epoch 484: 100%|██████████| 100/100 [00:06<00:00, 15.76it/s, v_num=0, loss/train_loss=0.189, loss/val_loss=0.339]

Epoch 484, global step 48500: 'loss/val_loss' was not in top 1


Epoch 485: 100%|██████████| 100/100 [00:06<00:00, 15.33it/s, v_num=0, loss/train_loss=0.191, loss/val_loss=0.347]

Epoch 485, global step 48600: 'loss/val_loss' was not in top 1


Epoch 486: 100%|██████████| 100/100 [00:06<00:00, 16.02it/s, v_num=0, loss/train_loss=0.155, loss/val_loss=0.367]

Epoch 486, global step 48700: 'loss/val_loss' was not in top 1


Epoch 487: 100%|██████████| 100/100 [00:06<00:00, 15.55it/s, v_num=0, loss/train_loss=0.140, loss/val_loss=0.282]

Epoch 487, global step 48800: 'loss/val_loss' was not in top 1


Epoch 488: 100%|██████████| 100/100 [00:06<00:00, 16.30it/s, v_num=0, loss/train_loss=0.180, loss/val_loss=0.282]

Epoch 488, global step 48900: 'loss/val_loss' was not in top 1


Epoch 489: 100%|██████████| 100/100 [00:06<00:00, 15.77it/s, v_num=0, loss/train_loss=0.142, loss/val_loss=0.367]

Epoch 489, global step 49000: 'loss/val_loss' was not in top 1


Epoch 490: 100%|██████████| 100/100 [00:06<00:00, 15.62it/s, v_num=0, loss/train_loss=0.162, loss/val_loss=0.294]

Epoch 490, global step 49100: 'loss/val_loss' was not in top 1


Epoch 491: 100%|██████████| 100/100 [00:06<00:00, 16.13it/s, v_num=0, loss/train_loss=0.160, loss/val_loss=0.311]

Epoch 491, global step 49200: 'loss/val_loss' was not in top 1


Epoch 492: 100%|██████████| 100/100 [00:06<00:00, 15.43it/s, v_num=0, loss/train_loss=0.135, loss/val_loss=0.372]

Epoch 492, global step 49300: 'loss/val_loss' was not in top 1


Epoch 493: 100%|██████████| 100/100 [00:06<00:00, 15.60it/s, v_num=0, loss/train_loss=0.108, loss/val_loss=0.328]

Epoch 493, global step 49400: 'loss/val_loss' was not in top 1


Epoch 494: 100%|██████████| 100/100 [00:06<00:00, 15.70it/s, v_num=0, loss/train_loss=0.240, loss/val_loss=0.397]

Epoch 494, global step 49500: 'loss/val_loss' was not in top 1


Epoch 495: 100%|██████████| 100/100 [00:06<00:00, 15.70it/s, v_num=0, loss/train_loss=0.240, loss/val_loss=0.370]

Epoch 495, global step 49600: 'loss/val_loss' was not in top 1


Epoch 496: 100%|██████████| 100/100 [00:06<00:00, 15.51it/s, v_num=0, loss/train_loss=0.0986, loss/val_loss=0.368]

Epoch 496, global step 49700: 'loss/val_loss' was not in top 1


Epoch 497: 100%|██████████| 100/100 [00:06<00:00, 15.43it/s, v_num=0, loss/train_loss=0.188, loss/val_loss=0.362] 

Epoch 497, global step 49800: 'loss/val_loss' was not in top 1


Epoch 498: 100%|██████████| 100/100 [00:06<00:00, 15.67it/s, v_num=0, loss/train_loss=0.278, loss/val_loss=0.368] 

Epoch 498, global step 49900: 'loss/val_loss' was not in top 1


Epoch 499: 100%|██████████| 100/100 [00:06<00:00, 15.87it/s, v_num=0, loss/train_loss=0.173, loss/val_loss=0.394]

Epoch 499, global step 50000: 'loss/val_loss' was not in top 1


Epoch 500: 100%|██████████| 100/100 [00:06<00:00, 15.67it/s, v_num=0, loss/train_loss=0.167, loss/val_loss=0.323]

Epoch 500, global step 50100: 'loss/val_loss' was not in top 1


Epoch 501: 100%|██████████| 100/100 [00:06<00:00, 15.50it/s, v_num=0, loss/train_loss=0.127, loss/val_loss=0.380]

Epoch 501, global step 50200: 'loss/val_loss' was not in top 1


Epoch 502: 100%|██████████| 100/100 [00:06<00:00, 16.00it/s, v_num=0, loss/train_loss=0.144, loss/val_loss=0.320]

Epoch 502, global step 50300: 'loss/val_loss' was not in top 1


Epoch 503: 100%|██████████| 100/100 [00:06<00:00, 15.91it/s, v_num=0, loss/train_loss=0.131, loss/val_loss=0.295]

Epoch 503, global step 50400: 'loss/val_loss' was not in top 1


Epoch 504: 100%|██████████| 100/100 [00:06<00:00, 15.93it/s, v_num=0, loss/train_loss=0.156, loss/val_loss=0.278]

Epoch 504, global step 50500: 'loss/val_loss' was not in top 1


Epoch 505: 100%|██████████| 100/100 [00:06<00:00, 15.43it/s, v_num=0, loss/train_loss=0.135, loss/val_loss=0.380]

Epoch 505, global step 50600: 'loss/val_loss' was not in top 1


Epoch 506: 100%|██████████| 100/100 [00:06<00:00, 15.60it/s, v_num=0, loss/train_loss=0.171, loss/val_loss=0.363]

Epoch 506, global step 50700: 'loss/val_loss' was not in top 1


Epoch 507: 100%|██████████| 100/100 [00:06<00:00, 15.40it/s, v_num=0, loss/train_loss=0.125, loss/val_loss=0.287]

Epoch 507, global step 50800: 'loss/val_loss' was not in top 1


Epoch 508: 100%|██████████| 100/100 [00:06<00:00, 15.70it/s, v_num=0, loss/train_loss=0.148, loss/val_loss=0.284]

Epoch 508, global step 50900: 'loss/val_loss' was not in top 1


Epoch 509: 100%|██████████| 100/100 [00:06<00:00, 15.64it/s, v_num=0, loss/train_loss=0.190, loss/val_loss=0.320]

Epoch 509, global step 51000: 'loss/val_loss' was not in top 1


Epoch 510: 100%|██████████| 100/100 [00:06<00:00, 15.55it/s, v_num=0, loss/train_loss=0.193, loss/val_loss=0.373]

Epoch 510, global step 51100: 'loss/val_loss' was not in top 1


Epoch 511: 100%|██████████| 100/100 [00:06<00:00, 15.39it/s, v_num=0, loss/train_loss=0.191, loss/val_loss=0.316]

Epoch 511, global step 51200: 'loss/val_loss' was not in top 1


Epoch 512: 100%|██████████| 100/100 [00:06<00:00, 15.51it/s, v_num=0, loss/train_loss=0.111, loss/val_loss=0.312] 

Epoch 512, global step 51300: 'loss/val_loss' was not in top 1


Epoch 513: 100%|██████████| 100/100 [00:06<00:00, 15.59it/s, v_num=0, loss/train_loss=0.193, loss/val_loss=0.380]

Epoch 513, global step 51400: 'loss/val_loss' was not in top 1


Epoch 514: 100%|██████████| 100/100 [00:06<00:00, 15.60it/s, v_num=0, loss/train_loss=0.116, loss/val_loss=0.362]

Epoch 514, global step 51500: 'loss/val_loss' was not in top 1


Epoch 515: 100%|██████████| 100/100 [00:06<00:00, 15.59it/s, v_num=0, loss/train_loss=0.174, loss/val_loss=0.287]

Epoch 515, global step 51600: 'loss/val_loss' was not in top 1


Epoch 516: 100%|██████████| 100/100 [00:06<00:00, 15.47it/s, v_num=0, loss/train_loss=0.121, loss/val_loss=0.339]

Epoch 516, global step 51700: 'loss/val_loss' was not in top 1


Epoch 517: 100%|██████████| 100/100 [00:06<00:00, 15.81it/s, v_num=0, loss/train_loss=0.135, loss/val_loss=0.307]

Epoch 517, global step 51800: 'loss/val_loss' was not in top 1


Epoch 518: 100%|██████████| 100/100 [00:06<00:00, 15.80it/s, v_num=0, loss/train_loss=0.236, loss/val_loss=0.305]

Epoch 518, global step 51900: 'loss/val_loss' was not in top 1


Epoch 519: 100%|██████████| 100/100 [00:06<00:00, 15.56it/s, v_num=0, loss/train_loss=0.159, loss/val_loss=0.281]

Epoch 519, global step 52000: 'loss/val_loss' was not in top 1


Epoch 520: 100%|██████████| 100/100 [00:06<00:00, 15.56it/s, v_num=0, loss/train_loss=0.260, loss/val_loss=0.242]

Epoch 520, global step 52100: 'loss/val_loss' reached 0.24175 (best 0.24175), saving model to '/disk1/ariane/vscode/CARE/task1_baselines/Pika/test_checkpoint/241023164243_self-pika_esm2_t6_8M_UR50D_gpt2_7_loss/epoch=520-step=52100.ckpt' as top 1


Epoch 521: 100%|██████████| 100/100 [00:06<00:00, 15.31it/s, v_num=0, loss/train_loss=0.211, loss/val_loss=0.298]

Epoch 521, global step 52200: 'loss/val_loss' was not in top 1


Epoch 522: 100%|██████████| 100/100 [00:06<00:00, 15.25it/s, v_num=0, loss/train_loss=0.135, loss/val_loss=0.267] 

Epoch 522, global step 52300: 'loss/val_loss' was not in top 1


Epoch 523: 100%|██████████| 100/100 [00:06<00:00, 15.13it/s, v_num=0, loss/train_loss=0.114, loss/val_loss=0.321]

Epoch 523, global step 52400: 'loss/val_loss' was not in top 1


Epoch 524: 100%|██████████| 100/100 [00:06<00:00, 15.27it/s, v_num=0, loss/train_loss=0.212, loss/val_loss=0.315]

Epoch 524, global step 52500: 'loss/val_loss' was not in top 1


Epoch 525: 100%|██████████| 100/100 [00:06<00:00, 15.30it/s, v_num=0, loss/train_loss=0.203, loss/val_loss=0.277]

Epoch 525, global step 52600: 'loss/val_loss' was not in top 1


Epoch 526: 100%|██████████| 100/100 [00:06<00:00, 15.62it/s, v_num=0, loss/train_loss=0.143, loss/val_loss=0.319]

Epoch 526, global step 52700: 'loss/val_loss' was not in top 1


Epoch 527: 100%|██████████| 100/100 [00:06<00:00, 15.56it/s, v_num=0, loss/train_loss=0.196, loss/val_loss=0.401]

Epoch 527, global step 52800: 'loss/val_loss' was not in top 1


Epoch 528: 100%|██████████| 100/100 [00:06<00:00, 15.49it/s, v_num=0, loss/train_loss=0.143, loss/val_loss=0.335]

Epoch 528, global step 52900: 'loss/val_loss' was not in top 1


Epoch 529: 100%|██████████| 100/100 [00:06<00:00, 15.56it/s, v_num=0, loss/train_loss=0.198, loss/val_loss=0.373]

Epoch 529, global step 53000: 'loss/val_loss' was not in top 1


Epoch 530: 100%|██████████| 100/100 [00:06<00:00, 16.02it/s, v_num=0, loss/train_loss=0.180, loss/val_loss=0.303]

Epoch 530, global step 53100: 'loss/val_loss' was not in top 1


Epoch 531: 100%|██████████| 100/100 [00:06<00:00, 15.73it/s, v_num=0, loss/train_loss=0.163, loss/val_loss=0.334]

Epoch 531, global step 53200: 'loss/val_loss' was not in top 1


Epoch 532: 100%|██████████| 100/100 [00:06<00:00, 15.19it/s, v_num=0, loss/train_loss=0.156, loss/val_loss=0.255]

Epoch 532, global step 53300: 'loss/val_loss' was not in top 1


Epoch 533: 100%|██████████| 100/100 [00:06<00:00, 15.50it/s, v_num=0, loss/train_loss=0.132, loss/val_loss=0.251]

Epoch 533, global step 53400: 'loss/val_loss' was not in top 1


Epoch 534: 100%|██████████| 100/100 [00:06<00:00, 15.66it/s, v_num=0, loss/train_loss=0.173, loss/val_loss=0.276]

Epoch 534, global step 53500: 'loss/val_loss' was not in top 1


Epoch 535: 100%|██████████| 100/100 [00:06<00:00, 15.71it/s, v_num=0, loss/train_loss=0.159, loss/val_loss=0.268]

Epoch 535, global step 53600: 'loss/val_loss' was not in top 1


Epoch 536: 100%|██████████| 100/100 [00:06<00:00, 15.79it/s, v_num=0, loss/train_loss=0.220, loss/val_loss=0.284]

Epoch 536, global step 53700: 'loss/val_loss' was not in top 1


Epoch 537: 100%|██████████| 100/100 [00:06<00:00, 15.47it/s, v_num=0, loss/train_loss=0.124, loss/val_loss=0.314] 

Epoch 537, global step 53800: 'loss/val_loss' was not in top 1


Epoch 538: 100%|██████████| 100/100 [00:06<00:00, 15.24it/s, v_num=0, loss/train_loss=0.161, loss/val_loss=0.327]

Epoch 538, global step 53900: 'loss/val_loss' was not in top 1


Epoch 539: 100%|██████████| 100/100 [00:06<00:00, 15.59it/s, v_num=0, loss/train_loss=0.180, loss/val_loss=0.326]

Epoch 539, global step 54000: 'loss/val_loss' was not in top 1


Epoch 540: 100%|██████████| 100/100 [00:06<00:00, 15.34it/s, v_num=0, loss/train_loss=0.175, loss/val_loss=0.306]

Epoch 540, global step 54100: 'loss/val_loss' was not in top 1


Epoch 541: 100%|██████████| 100/100 [00:06<00:00, 16.53it/s, v_num=0, loss/train_loss=0.115, loss/val_loss=0.284]

Epoch 541, global step 54200: 'loss/val_loss' was not in top 1


Epoch 542: 100%|██████████| 100/100 [00:06<00:00, 15.54it/s, v_num=0, loss/train_loss=0.199, loss/val_loss=0.298]

Epoch 542, global step 54300: 'loss/val_loss' was not in top 1


Epoch 543: 100%|██████████| 100/100 [00:06<00:00, 15.29it/s, v_num=0, loss/train_loss=0.217, loss/val_loss=0.406]

Epoch 543, global step 54400: 'loss/val_loss' was not in top 1


Epoch 544: 100%|██████████| 100/100 [00:06<00:00, 15.02it/s, v_num=0, loss/train_loss=0.169, loss/val_loss=0.298]

Epoch 544, global step 54500: 'loss/val_loss' was not in top 1


Epoch 545: 100%|██████████| 100/100 [00:06<00:00, 15.49it/s, v_num=0, loss/train_loss=0.142, loss/val_loss=0.287]

Epoch 545, global step 54600: 'loss/val_loss' was not in top 1


Epoch 546: 100%|██████████| 100/100 [00:06<00:00, 15.49it/s, v_num=0, loss/train_loss=0.138, loss/val_loss=0.337]

Epoch 546, global step 54700: 'loss/val_loss' was not in top 1


Epoch 547: 100%|██████████| 100/100 [00:06<00:00, 15.85it/s, v_num=0, loss/train_loss=0.155, loss/val_loss=0.272]

Epoch 547, global step 54800: 'loss/val_loss' was not in top 1


Epoch 548: 100%|██████████| 100/100 [00:06<00:00, 15.62it/s, v_num=0, loss/train_loss=0.114, loss/val_loss=0.231]

Epoch 548, global step 54900: 'loss/val_loss' reached 0.23128 (best 0.23128), saving model to '/disk1/ariane/vscode/CARE/task1_baselines/Pika/test_checkpoint/241023164243_self-pika_esm2_t6_8M_UR50D_gpt2_7_loss/epoch=548-step=54900.ckpt' as top 1


Epoch 549: 100%|██████████| 100/100 [00:06<00:00, 15.58it/s, v_num=0, loss/train_loss=0.136, loss/val_loss=0.275]

Epoch 549, global step 55000: 'loss/val_loss' was not in top 1


Epoch 550: 100%|██████████| 100/100 [00:06<00:00, 15.50it/s, v_num=0, loss/train_loss=0.201, loss/val_loss=0.313]

Epoch 550, global step 55100: 'loss/val_loss' was not in top 1


Epoch 551: 100%|██████████| 100/100 [00:06<00:00, 15.63it/s, v_num=0, loss/train_loss=0.127, loss/val_loss=0.327]

Epoch 551, global step 55200: 'loss/val_loss' was not in top 1


Epoch 552: 100%|██████████| 100/100 [00:06<00:00, 15.45it/s, v_num=0, loss/train_loss=0.122, loss/val_loss=0.346]

Epoch 552, global step 55300: 'loss/val_loss' was not in top 1


Epoch 553: 100%|██████████| 100/100 [00:06<00:00, 15.58it/s, v_num=0, loss/train_loss=0.154, loss/val_loss=0.285]

Epoch 553, global step 55400: 'loss/val_loss' was not in top 1


Epoch 554: 100%|██████████| 100/100 [00:06<00:00, 15.23it/s, v_num=0, loss/train_loss=0.123, loss/val_loss=0.373]

Epoch 554, global step 55500: 'loss/val_loss' was not in top 1


Epoch 555: 100%|██████████| 100/100 [00:06<00:00, 16.05it/s, v_num=0, loss/train_loss=0.205, loss/val_loss=0.384]

Epoch 555, global step 55600: 'loss/val_loss' was not in top 1


Epoch 556: 100%|██████████| 100/100 [00:06<00:00, 15.53it/s, v_num=0, loss/train_loss=0.0917, loss/val_loss=0.296]

Epoch 556, global step 55700: 'loss/val_loss' was not in top 1


Epoch 557: 100%|██████████| 100/100 [00:06<00:00, 15.53it/s, v_num=0, loss/train_loss=0.112, loss/val_loss=0.303] 

Epoch 557, global step 55800: 'loss/val_loss' was not in top 1


Epoch 558: 100%|██████████| 100/100 [00:06<00:00, 15.34it/s, v_num=0, loss/train_loss=0.171, loss/val_loss=0.334]

Epoch 558, global step 55900: 'loss/val_loss' was not in top 1


Epoch 559: 100%|██████████| 100/100 [00:06<00:00, 15.43it/s, v_num=0, loss/train_loss=0.0808, loss/val_loss=0.337]

Epoch 559, global step 56000: 'loss/val_loss' was not in top 1


Epoch 560: 100%|██████████| 100/100 [00:06<00:00, 15.86it/s, v_num=0, loss/train_loss=0.179, loss/val_loss=0.385] 

Epoch 560, global step 56100: 'loss/val_loss' was not in top 1


Epoch 561: 100%|██████████| 100/100 [00:06<00:00, 16.27it/s, v_num=0, loss/train_loss=0.179, loss/val_loss=0.397]

Epoch 561, global step 56200: 'loss/val_loss' was not in top 1


Epoch 562: 100%|██████████| 100/100 [00:06<00:00, 16.49it/s, v_num=0, loss/train_loss=0.106, loss/val_loss=0.439]

Epoch 562, global step 56300: 'loss/val_loss' was not in top 1


Epoch 563: 100%|██████████| 100/100 [00:06<00:00, 15.94it/s, v_num=0, loss/train_loss=0.144, loss/val_loss=0.327] 

Epoch 563, global step 56400: 'loss/val_loss' was not in top 1


Epoch 564: 100%|██████████| 100/100 [00:06<00:00, 16.54it/s, v_num=0, loss/train_loss=0.193, loss/val_loss=0.427]

Epoch 564, global step 56500: 'loss/val_loss' was not in top 1


Epoch 565: 100%|██████████| 100/100 [00:06<00:00, 16.07it/s, v_num=0, loss/train_loss=0.194, loss/val_loss=0.371]

Epoch 565, global step 56600: 'loss/val_loss' was not in top 1


Epoch 566: 100%|██████████| 100/100 [00:06<00:00, 16.18it/s, v_num=0, loss/train_loss=0.180, loss/val_loss=0.374]

Epoch 566, global step 56700: 'loss/val_loss' was not in top 1


Epoch 567: 100%|██████████| 100/100 [00:06<00:00, 16.39it/s, v_num=0, loss/train_loss=0.186, loss/val_loss=0.308]

Epoch 567, global step 56800: 'loss/val_loss' was not in top 1


Epoch 568: 100%|██████████| 100/100 [00:06<00:00, 16.17it/s, v_num=0, loss/train_loss=0.170, loss/val_loss=0.381]

Epoch 568, global step 56900: 'loss/val_loss' was not in top 1


Epoch 569: 100%|██████████| 100/100 [00:06<00:00, 15.69it/s, v_num=0, loss/train_loss=0.141, loss/val_loss=0.338]

Epoch 569, global step 57000: 'loss/val_loss' was not in top 1


Epoch 570: 100%|██████████| 100/100 [00:06<00:00, 16.37it/s, v_num=0, loss/train_loss=0.134, loss/val_loss=0.456]

Epoch 570, global step 57100: 'loss/val_loss' was not in top 1


Epoch 571: 100%|██████████| 100/100 [00:06<00:00, 16.32it/s, v_num=0, loss/train_loss=0.0802, loss/val_loss=0.426]

Epoch 571, global step 57200: 'loss/val_loss' was not in top 1


Epoch 572: 100%|██████████| 100/100 [00:06<00:00, 16.48it/s, v_num=0, loss/train_loss=0.104, loss/val_loss=0.405] 

Epoch 572, global step 57300: 'loss/val_loss' was not in top 1


Epoch 573: 100%|██████████| 100/100 [00:06<00:00, 16.08it/s, v_num=0, loss/train_loss=0.105, loss/val_loss=0.375]

Epoch 573, global step 57400: 'loss/val_loss' was not in top 1


Epoch 574: 100%|██████████| 100/100 [00:06<00:00, 16.09it/s, v_num=0, loss/train_loss=0.199, loss/val_loss=0.423]

Epoch 574, global step 57500: 'loss/val_loss' was not in top 1


Epoch 575: 100%|██████████| 100/100 [00:06<00:00, 15.39it/s, v_num=0, loss/train_loss=0.166, loss/val_loss=0.432]

Epoch 575, global step 57600: 'loss/val_loss' was not in top 1


Epoch 576: 100%|██████████| 100/100 [00:06<00:00, 15.54it/s, v_num=0, loss/train_loss=0.187, loss/val_loss=0.417]

Epoch 576, global step 57700: 'loss/val_loss' was not in top 1


Epoch 577: 100%|██████████| 100/100 [00:06<00:00, 15.31it/s, v_num=0, loss/train_loss=0.186, loss/val_loss=0.312]

Epoch 577, global step 57800: 'loss/val_loss' was not in top 1


Epoch 578: 100%|██████████| 100/100 [00:06<00:00, 15.61it/s, v_num=0, loss/train_loss=0.176, loss/val_loss=0.365]

Epoch 578, global step 57900: 'loss/val_loss' was not in top 1


Epoch 579: 100%|██████████| 100/100 [00:06<00:00, 15.62it/s, v_num=0, loss/train_loss=0.184, loss/val_loss=0.354]

Epoch 579, global step 58000: 'loss/val_loss' was not in top 1


Epoch 580: 100%|██████████| 100/100 [00:06<00:00, 15.91it/s, v_num=0, loss/train_loss=0.107, loss/val_loss=0.358]

Epoch 580, global step 58100: 'loss/val_loss' was not in top 1


Epoch 581: 100%|██████████| 100/100 [00:06<00:00, 15.92it/s, v_num=0, loss/train_loss=0.191, loss/val_loss=0.348]

Epoch 581, global step 58200: 'loss/val_loss' was not in top 1


Epoch 582: 100%|██████████| 100/100 [00:06<00:00, 15.70it/s, v_num=0, loss/train_loss=0.167, loss/val_loss=0.334]

Epoch 582, global step 58300: 'loss/val_loss' was not in top 1


Epoch 583: 100%|██████████| 100/100 [00:06<00:00, 15.69it/s, v_num=0, loss/train_loss=0.097, loss/val_loss=0.408]

Epoch 583, global step 58400: 'loss/val_loss' was not in top 1


Epoch 584: 100%|██████████| 100/100 [00:06<00:00, 15.48it/s, v_num=0, loss/train_loss=0.325, loss/val_loss=0.301]

Epoch 584, global step 58500: 'loss/val_loss' was not in top 1


Epoch 585: 100%|██████████| 100/100 [00:06<00:00, 15.23it/s, v_num=0, loss/train_loss=0.124, loss/val_loss=0.421]

Epoch 585, global step 58600: 'loss/val_loss' was not in top 1


Epoch 586: 100%|██████████| 100/100 [00:06<00:00, 15.99it/s, v_num=0, loss/train_loss=0.165, loss/val_loss=0.452]

Epoch 586, global step 58700: 'loss/val_loss' was not in top 1


Epoch 587: 100%|██████████| 100/100 [00:06<00:00, 15.67it/s, v_num=0, loss/train_loss=0.162, loss/val_loss=0.410]

Epoch 587, global step 58800: 'loss/val_loss' was not in top 1


Epoch 588: 100%|██████████| 100/100 [00:06<00:00, 15.82it/s, v_num=0, loss/train_loss=0.128, loss/val_loss=0.423]

Epoch 588, global step 58900: 'loss/val_loss' was not in top 1


Epoch 589: 100%|██████████| 100/100 [00:06<00:00, 15.37it/s, v_num=0, loss/train_loss=0.143, loss/val_loss=0.354]

Epoch 589, global step 59000: 'loss/val_loss' was not in top 1


Epoch 590: 100%|██████████| 100/100 [00:06<00:00, 15.49it/s, v_num=0, loss/train_loss=0.185, loss/val_loss=0.381]

Epoch 590, global step 59100: 'loss/val_loss' was not in top 1


Epoch 591: 100%|██████████| 100/100 [00:06<00:00, 15.82it/s, v_num=0, loss/train_loss=0.189, loss/val_loss=0.359]

Epoch 591, global step 59200: 'loss/val_loss' was not in top 1


Epoch 592: 100%|██████████| 100/100 [00:06<00:00, 15.58it/s, v_num=0, loss/train_loss=0.153, loss/val_loss=0.287]

Epoch 592, global step 59300: 'loss/val_loss' was not in top 1


Epoch 593: 100%|██████████| 100/100 [00:06<00:00, 15.20it/s, v_num=0, loss/train_loss=0.129, loss/val_loss=0.345]

Epoch 593, global step 59400: 'loss/val_loss' was not in top 1


Epoch 594: 100%|██████████| 100/100 [00:06<00:00, 15.36it/s, v_num=0, loss/train_loss=0.142, loss/val_loss=0.344]

Epoch 594, global step 59500: 'loss/val_loss' was not in top 1


Epoch 595: 100%|██████████| 100/100 [00:06<00:00, 15.58it/s, v_num=0, loss/train_loss=0.0711, loss/val_loss=0.223]

Epoch 595, global step 59600: 'loss/val_loss' reached 0.22299 (best 0.22299), saving model to '/disk1/ariane/vscode/CARE/task1_baselines/Pika/test_checkpoint/241023164243_self-pika_esm2_t6_8M_UR50D_gpt2_7_loss/epoch=595-step=59600.ckpt' as top 1


Epoch 596: 100%|██████████| 100/100 [00:06<00:00, 15.54it/s, v_num=0, loss/train_loss=0.157, loss/val_loss=0.247] 

Epoch 596, global step 59700: 'loss/val_loss' was not in top 1


Epoch 597: 100%|██████████| 100/100 [00:06<00:00, 15.43it/s, v_num=0, loss/train_loss=0.128, loss/val_loss=0.309]

Epoch 597, global step 59800: 'loss/val_loss' was not in top 1


Epoch 598: 100%|██████████| 100/100 [00:06<00:00, 15.75it/s, v_num=0, loss/train_loss=0.208, loss/val_loss=0.304]

Epoch 598, global step 59900: 'loss/val_loss' was not in top 1


Epoch 599: 100%|██████████| 100/100 [00:06<00:00, 15.53it/s, v_num=0, loss/train_loss=0.192, loss/val_loss=0.308]

Epoch 599, global step 60000: 'loss/val_loss' was not in top 1


Epoch 600: 100%|██████████| 100/100 [00:06<00:00, 15.84it/s, v_num=0, loss/train_loss=0.118, loss/val_loss=0.308]

Epoch 600, global step 60100: 'loss/val_loss' was not in top 1


Epoch 601: 100%|██████████| 100/100 [00:06<00:00, 15.66it/s, v_num=0, loss/train_loss=0.202, loss/val_loss=0.311] 

Epoch 601, global step 60200: 'loss/val_loss' was not in top 1


Epoch 602: 100%|██████████| 100/100 [00:06<00:00, 15.52it/s, v_num=0, loss/train_loss=0.191, loss/val_loss=0.327]

Epoch 602, global step 60300: 'loss/val_loss' was not in top 1


Epoch 603: 100%|██████████| 100/100 [00:06<00:00, 15.35it/s, v_num=0, loss/train_loss=0.217, loss/val_loss=0.322] 

Epoch 603, global step 60400: 'loss/val_loss' was not in top 1


Epoch 604: 100%|██████████| 100/100 [00:06<00:00, 16.09it/s, v_num=0, loss/train_loss=0.200, loss/val_loss=0.345]

Epoch 604, global step 60500: 'loss/val_loss' was not in top 1


Epoch 605: 100%|██████████| 100/100 [00:06<00:00, 15.70it/s, v_num=0, loss/train_loss=0.297, loss/val_loss=0.317]

Epoch 605, global step 60600: 'loss/val_loss' was not in top 1


Epoch 606: 100%|██████████| 100/100 [00:06<00:00, 15.59it/s, v_num=0, loss/train_loss=0.115, loss/val_loss=0.331]

Epoch 606, global step 60700: 'loss/val_loss' was not in top 1


Epoch 607: 100%|██████████| 100/100 [00:06<00:00, 15.89it/s, v_num=0, loss/train_loss=0.0961, loss/val_loss=0.359]

Epoch 607, global step 60800: 'loss/val_loss' was not in top 1


Epoch 608: 100%|██████████| 100/100 [00:06<00:00, 15.65it/s, v_num=0, loss/train_loss=0.111, loss/val_loss=0.312] 

Epoch 608, global step 60900: 'loss/val_loss' was not in top 1


Epoch 609: 100%|██████████| 100/100 [00:06<00:00, 15.72it/s, v_num=0, loss/train_loss=0.159, loss/val_loss=0.289]

Epoch 609, global step 61000: 'loss/val_loss' was not in top 1


Epoch 610: 100%|██████████| 100/100 [00:06<00:00, 15.33it/s, v_num=0, loss/train_loss=0.114, loss/val_loss=0.259]

Epoch 610, global step 61100: 'loss/val_loss' was not in top 1


Epoch 611: 100%|██████████| 100/100 [00:06<00:00, 15.37it/s, v_num=0, loss/train_loss=0.0848, loss/val_loss=0.315]

Epoch 611, global step 61200: 'loss/val_loss' was not in top 1


Epoch 612: 100%|██████████| 100/100 [00:06<00:00, 15.40it/s, v_num=0, loss/train_loss=0.196, loss/val_loss=0.334] 

Epoch 612, global step 61300: 'loss/val_loss' was not in top 1


Epoch 613: 100%|██████████| 100/100 [00:06<00:00, 15.74it/s, v_num=0, loss/train_loss=0.225, loss/val_loss=0.241]

Epoch 613, global step 61400: 'loss/val_loss' was not in top 1


Epoch 614: 100%|██████████| 100/100 [00:06<00:00, 15.24it/s, v_num=0, loss/train_loss=0.134, loss/val_loss=0.280]

Epoch 614, global step 61500: 'loss/val_loss' was not in top 1


Epoch 615: 100%|██████████| 100/100 [00:06<00:00, 15.72it/s, v_num=0, loss/train_loss=0.116, loss/val_loss=0.273]

Epoch 615, global step 61600: 'loss/val_loss' was not in top 1


Epoch 616: 100%|██████████| 100/100 [00:06<00:00, 15.52it/s, v_num=0, loss/train_loss=0.140, loss/val_loss=0.308]

Epoch 616, global step 61700: 'loss/val_loss' was not in top 1


Epoch 617: 100%|██████████| 100/100 [00:06<00:00, 15.75it/s, v_num=0, loss/train_loss=0.142, loss/val_loss=0.341]

Epoch 617, global step 61800: 'loss/val_loss' was not in top 1


Epoch 618: 100%|██████████| 100/100 [00:06<00:00, 15.56it/s, v_num=0, loss/train_loss=0.196, loss/val_loss=0.303]

Epoch 618, global step 61900: 'loss/val_loss' was not in top 1


Epoch 619: 100%|██████████| 100/100 [00:06<00:00, 15.50it/s, v_num=0, loss/train_loss=0.165, loss/val_loss=0.337]

Epoch 619, global step 62000: 'loss/val_loss' was not in top 1


Epoch 620: 100%|██████████| 100/100 [00:06<00:00, 15.60it/s, v_num=0, loss/train_loss=0.121, loss/val_loss=0.313] 

Epoch 620, global step 62100: 'loss/val_loss' was not in top 1


Epoch 621: 100%|██████████| 100/100 [00:06<00:00, 15.43it/s, v_num=0, loss/train_loss=0.148, loss/val_loss=0.377]

Epoch 621, global step 62200: 'loss/val_loss' was not in top 1


Epoch 622: 100%|██████████| 100/100 [00:06<00:00, 15.37it/s, v_num=0, loss/train_loss=0.138, loss/val_loss=0.450]

Epoch 622, global step 62300: 'loss/val_loss' was not in top 1


Epoch 623: 100%|██████████| 100/100 [00:06<00:00, 15.50it/s, v_num=0, loss/train_loss=0.158, loss/val_loss=0.336]

Epoch 623, global step 62400: 'loss/val_loss' was not in top 1


Epoch 624: 100%|██████████| 100/100 [00:06<00:00, 15.07it/s, v_num=0, loss/train_loss=0.285, loss/val_loss=0.321]

Epoch 624, global step 62500: 'loss/val_loss' was not in top 1


Epoch 625: 100%|██████████| 100/100 [00:06<00:00, 15.30it/s, v_num=0, loss/train_loss=0.093, loss/val_loss=0.364]

Epoch 625, global step 62600: 'loss/val_loss' was not in top 1


Epoch 626: 100%|██████████| 100/100 [00:06<00:00, 15.65it/s, v_num=0, loss/train_loss=0.161, loss/val_loss=0.467]

Epoch 626, global step 62700: 'loss/val_loss' was not in top 1


Epoch 627: 100%|██████████| 100/100 [00:06<00:00, 16.00it/s, v_num=0, loss/train_loss=0.173, loss/val_loss=0.387]

Epoch 627, global step 62800: 'loss/val_loss' was not in top 1


Epoch 628: 100%|██████████| 100/100 [00:06<00:00, 15.82it/s, v_num=0, loss/train_loss=0.202, loss/val_loss=0.375]

Epoch 628, global step 62900: 'loss/val_loss' was not in top 1


Epoch 629: 100%|██████████| 100/100 [00:06<00:00, 15.47it/s, v_num=0, loss/train_loss=0.182, loss/val_loss=0.410]

Epoch 629, global step 63000: 'loss/val_loss' was not in top 1


Epoch 630: 100%|██████████| 100/100 [00:06<00:00, 15.43it/s, v_num=0, loss/train_loss=0.182, loss/val_loss=0.388] 

Epoch 630, global step 63100: 'loss/val_loss' was not in top 1


Epoch 631: 100%|██████████| 100/100 [00:06<00:00, 15.62it/s, v_num=0, loss/train_loss=0.108, loss/val_loss=0.463]

Epoch 631, global step 63200: 'loss/val_loss' was not in top 1


Epoch 632: 100%|██████████| 100/100 [00:06<00:00, 15.59it/s, v_num=0, loss/train_loss=0.120, loss/val_loss=0.358]

Epoch 632, global step 63300: 'loss/val_loss' was not in top 1


Epoch 633: 100%|██████████| 100/100 [00:06<00:00, 15.49it/s, v_num=0, loss/train_loss=0.107, loss/val_loss=0.461]

Epoch 633, global step 63400: 'loss/val_loss' was not in top 1


Epoch 634: 100%|██████████| 100/100 [00:06<00:00, 15.50it/s, v_num=0, loss/train_loss=0.148, loss/val_loss=0.393] 

Epoch 634, global step 63500: 'loss/val_loss' was not in top 1


Epoch 635: 100%|██████████| 100/100 [00:06<00:00, 15.40it/s, v_num=0, loss/train_loss=0.180, loss/val_loss=0.426]

Epoch 635, global step 63600: 'loss/val_loss' was not in top 1


Epoch 636: 100%|██████████| 100/100 [00:06<00:00, 15.29it/s, v_num=0, loss/train_loss=0.162, loss/val_loss=0.397] 

Epoch 636, global step 63700: 'loss/val_loss' was not in top 1


Epoch 637: 100%|██████████| 100/100 [00:06<00:00, 15.53it/s, v_num=0, loss/train_loss=0.211, loss/val_loss=0.370] 

Epoch 637, global step 63800: 'loss/val_loss' was not in top 1


Epoch 638: 100%|██████████| 100/100 [00:06<00:00, 15.45it/s, v_num=0, loss/train_loss=0.135, loss/val_loss=0.444]

Epoch 638, global step 63900: 'loss/val_loss' was not in top 1


Epoch 639: 100%|██████████| 100/100 [00:06<00:00, 15.47it/s, v_num=0, loss/train_loss=0.181, loss/val_loss=0.357]

Epoch 639, global step 64000: 'loss/val_loss' was not in top 1


Epoch 640: 100%|██████████| 100/100 [00:06<00:00, 15.58it/s, v_num=0, loss/train_loss=0.111, loss/val_loss=0.331]

Epoch 640, global step 64100: 'loss/val_loss' was not in top 1


Epoch 641: 100%|██████████| 100/100 [00:06<00:00, 15.50it/s, v_num=0, loss/train_loss=0.0935, loss/val_loss=0.274]

Epoch 641, global step 64200: 'loss/val_loss' was not in top 1


Epoch 642: 100%|██████████| 100/100 [00:06<00:00, 15.90it/s, v_num=0, loss/train_loss=0.125, loss/val_loss=0.309] 

Epoch 642, global step 64300: 'loss/val_loss' was not in top 1


Epoch 643: 100%|██████████| 100/100 [00:06<00:00, 15.66it/s, v_num=0, loss/train_loss=0.160, loss/val_loss=0.347]

Epoch 643, global step 64400: 'loss/val_loss' was not in top 1


Epoch 644: 100%|██████████| 100/100 [00:06<00:00, 15.47it/s, v_num=0, loss/train_loss=0.209, loss/val_loss=0.382]

Epoch 644, global step 64500: 'loss/val_loss' was not in top 1


Epoch 645: 100%|██████████| 100/100 [00:06<00:00, 15.54it/s, v_num=0, loss/train_loss=0.158, loss/val_loss=0.316]

Epoch 645, global step 64600: 'loss/val_loss' was not in top 1


Epoch 646: 100%|██████████| 100/100 [00:06<00:00, 15.80it/s, v_num=0, loss/train_loss=0.222, loss/val_loss=0.324]

Epoch 646, global step 64700: 'loss/val_loss' was not in top 1


Epoch 647: 100%|██████████| 100/100 [00:06<00:00, 15.74it/s, v_num=0, loss/train_loss=0.208, loss/val_loss=0.349]

Epoch 647, global step 64800: 'loss/val_loss' was not in top 1


Epoch 648: 100%|██████████| 100/100 [00:06<00:00, 15.74it/s, v_num=0, loss/train_loss=0.160, loss/val_loss=0.406]

Epoch 648, global step 64900: 'loss/val_loss' was not in top 1


Epoch 649: 100%|██████████| 100/100 [00:06<00:00, 15.57it/s, v_num=0, loss/train_loss=0.139, loss/val_loss=0.390]

Epoch 649, global step 65000: 'loss/val_loss' was not in top 1


Epoch 650: 100%|██████████| 100/100 [00:06<00:00, 15.33it/s, v_num=0, loss/train_loss=0.124, loss/val_loss=0.419] 

Epoch 650, global step 65100: 'loss/val_loss' was not in top 1


Epoch 651: 100%|██████████| 100/100 [00:06<00:00, 15.57it/s, v_num=0, loss/train_loss=0.141, loss/val_loss=0.408]

Epoch 651, global step 65200: 'loss/val_loss' was not in top 1


Epoch 652: 100%|██████████| 100/100 [00:06<00:00, 15.81it/s, v_num=0, loss/train_loss=0.0813, loss/val_loss=0.393]

Epoch 652, global step 65300: 'loss/val_loss' was not in top 1


Epoch 653: 100%|██████████| 100/100 [00:06<00:00, 15.44it/s, v_num=0, loss/train_loss=0.116, loss/val_loss=0.439] 

Epoch 653, global step 65400: 'loss/val_loss' was not in top 1


Epoch 654: 100%|██████████| 100/100 [00:06<00:00, 15.51it/s, v_num=0, loss/train_loss=0.165, loss/val_loss=0.437]

Epoch 654, global step 65500: 'loss/val_loss' was not in top 1


Epoch 655: 100%|██████████| 100/100 [00:06<00:00, 15.52it/s, v_num=0, loss/train_loss=0.0913, loss/val_loss=0.411]

Epoch 655, global step 65600: 'loss/val_loss' was not in top 1


Epoch 656: 100%|██████████| 100/100 [00:06<00:00, 15.48it/s, v_num=0, loss/train_loss=0.0925, loss/val_loss=0.406]

Epoch 656, global step 65700: 'loss/val_loss' was not in top 1


Epoch 657: 100%|██████████| 100/100 [00:06<00:00, 15.49it/s, v_num=0, loss/train_loss=0.220, loss/val_loss=0.430] 

Epoch 657, global step 65800: 'loss/val_loss' was not in top 1


Epoch 658: 100%|██████████| 100/100 [00:06<00:00, 15.60it/s, v_num=0, loss/train_loss=0.112, loss/val_loss=0.553]

Epoch 658, global step 65900: 'loss/val_loss' was not in top 1


Epoch 659: 100%|██████████| 100/100 [00:06<00:00, 15.30it/s, v_num=0, loss/train_loss=0.246, loss/val_loss=0.488]

Epoch 659, global step 66000: 'loss/val_loss' was not in top 1


Epoch 660: 100%|██████████| 100/100 [00:06<00:00, 15.62it/s, v_num=0, loss/train_loss=0.130, loss/val_loss=0.356]

Epoch 660, global step 66100: 'loss/val_loss' was not in top 1


Epoch 661: 100%|██████████| 100/100 [00:06<00:00, 15.57it/s, v_num=0, loss/train_loss=0.133, loss/val_loss=0.307]

Epoch 661, global step 66200: 'loss/val_loss' was not in top 1


Epoch 662: 100%|██████████| 100/100 [00:06<00:00, 15.53it/s, v_num=0, loss/train_loss=0.130, loss/val_loss=0.350]

Epoch 662, global step 66300: 'loss/val_loss' was not in top 1


Epoch 663: 100%|██████████| 100/100 [00:06<00:00, 15.88it/s, v_num=0, loss/train_loss=0.105, loss/val_loss=0.309]

Epoch 663, global step 66400: 'loss/val_loss' was not in top 1


Epoch 664: 100%|██████████| 100/100 [00:06<00:00, 15.50it/s, v_num=0, loss/train_loss=0.179, loss/val_loss=0.298]

Epoch 664, global step 66500: 'loss/val_loss' was not in top 1


Epoch 665: 100%|██████████| 100/100 [00:06<00:00, 15.61it/s, v_num=0, loss/train_loss=0.103, loss/val_loss=0.346] 

Epoch 665, global step 66600: 'loss/val_loss' was not in top 1


Epoch 666: 100%|██████████| 100/100 [00:06<00:00, 15.49it/s, v_num=0, loss/train_loss=0.145, loss/val_loss=0.334]

Epoch 666, global step 66700: 'loss/val_loss' was not in top 1


Epoch 667: 100%|██████████| 100/100 [00:06<00:00, 15.59it/s, v_num=0, loss/train_loss=0.121, loss/val_loss=0.277]

Epoch 667, global step 66800: 'loss/val_loss' was not in top 1


Epoch 668: 100%|██████████| 100/100 [00:06<00:00, 15.49it/s, v_num=0, loss/train_loss=0.0827, loss/val_loss=0.304]

Epoch 668, global step 66900: 'loss/val_loss' was not in top 1


Epoch 669: 100%|██████████| 100/100 [00:06<00:00, 15.77it/s, v_num=0, loss/train_loss=0.0968, loss/val_loss=0.267]

Epoch 669, global step 67000: 'loss/val_loss' was not in top 1


Epoch 670: 100%|██████████| 100/100 [00:06<00:00, 15.36it/s, v_num=0, loss/train_loss=0.155, loss/val_loss=0.235] 

Epoch 670, global step 67100: 'loss/val_loss' was not in top 1


Epoch 671: 100%|██████████| 100/100 [00:06<00:00, 15.63it/s, v_num=0, loss/train_loss=0.104, loss/val_loss=0.257]

Epoch 671, global step 67200: 'loss/val_loss' was not in top 1


Epoch 672: 100%|██████████| 100/100 [00:06<00:00, 15.54it/s, v_num=0, loss/train_loss=0.176, loss/val_loss=0.265]

Epoch 672, global step 67300: 'loss/val_loss' was not in top 1


Epoch 673: 100%|██████████| 100/100 [00:06<00:00, 15.49it/s, v_num=0, loss/train_loss=0.132, loss/val_loss=0.258]

Epoch 673, global step 67400: 'loss/val_loss' was not in top 1


Epoch 674: 100%|██████████| 100/100 [00:06<00:00, 15.32it/s, v_num=0, loss/train_loss=0.159, loss/val_loss=0.245] 

Epoch 674, global step 67500: 'loss/val_loss' was not in top 1


Epoch 675: 100%|██████████| 100/100 [00:06<00:00, 15.28it/s, v_num=0, loss/train_loss=0.106, loss/val_loss=0.236] 

Epoch 675, global step 67600: 'loss/val_loss' was not in top 1


Epoch 676: 100%|██████████| 100/100 [00:06<00:00, 15.30it/s, v_num=0, loss/train_loss=0.181, loss/val_loss=0.203]

Epoch 676, global step 67700: 'loss/val_loss' reached 0.20330 (best 0.20330), saving model to '/disk1/ariane/vscode/CARE/task1_baselines/Pika/test_checkpoint/241023164243_self-pika_esm2_t6_8M_UR50D_gpt2_7_loss/epoch=676-step=67700.ckpt' as top 1


Epoch 677: 100%|██████████| 100/100 [00:06<00:00, 15.80it/s, v_num=0, loss/train_loss=0.107, loss/val_loss=0.233]

Epoch 677, global step 67800: 'loss/val_loss' was not in top 1


Epoch 678: 100%|██████████| 100/100 [00:05<00:00, 17.48it/s, v_num=0, loss/train_loss=0.104, loss/val_loss=0.217] 

Epoch 678, global step 67900: 'loss/val_loss' was not in top 1


Epoch 679: 100%|██████████| 100/100 [00:06<00:00, 16.15it/s, v_num=0, loss/train_loss=0.119, loss/val_loss=0.237]

Epoch 679, global step 68000: 'loss/val_loss' was not in top 1


Epoch 680: 100%|██████████| 100/100 [00:06<00:00, 16.44it/s, v_num=0, loss/train_loss=0.0647, loss/val_loss=0.233]

Epoch 680, global step 68100: 'loss/val_loss' was not in top 1


Epoch 681: 100%|██████████| 100/100 [00:06<00:00, 15.85it/s, v_num=0, loss/train_loss=0.0968, loss/val_loss=0.251]

Epoch 681, global step 68200: 'loss/val_loss' was not in top 1


Epoch 682: 100%|██████████| 100/100 [00:06<00:00, 16.16it/s, v_num=0, loss/train_loss=0.118, loss/val_loss=0.242] 

Epoch 682, global step 68300: 'loss/val_loss' was not in top 1


Epoch 683: 100%|██████████| 100/100 [00:06<00:00, 16.03it/s, v_num=0, loss/train_loss=0.145, loss/val_loss=0.264]

Epoch 683, global step 68400: 'loss/val_loss' was not in top 1


Epoch 684: 100%|██████████| 100/100 [00:06<00:00, 16.18it/s, v_num=0, loss/train_loss=0.153, loss/val_loss=0.232]

Epoch 684, global step 68500: 'loss/val_loss' was not in top 1


Epoch 685: 100%|██████████| 100/100 [00:06<00:00, 16.07it/s, v_num=0, loss/train_loss=0.119, loss/val_loss=0.236]

Epoch 685, global step 68600: 'loss/val_loss' was not in top 1


Epoch 686: 100%|██████████| 100/100 [00:06<00:00, 15.92it/s, v_num=0, loss/train_loss=0.148, loss/val_loss=0.232]

Epoch 686, global step 68700: 'loss/val_loss' was not in top 1


Epoch 687: 100%|██████████| 100/100 [00:05<00:00, 16.97it/s, v_num=0, loss/train_loss=0.149, loss/val_loss=0.270]

Epoch 687, global step 68800: 'loss/val_loss' was not in top 1


Epoch 688: 100%|██████████| 100/100 [00:06<00:00, 15.83it/s, v_num=0, loss/train_loss=0.0918, loss/val_loss=0.249]

Epoch 688, global step 68900: 'loss/val_loss' was not in top 1


Epoch 689: 100%|██████████| 100/100 [00:06<00:00, 15.72it/s, v_num=0, loss/train_loss=0.130, loss/val_loss=0.291] 

Epoch 689, global step 69000: 'loss/val_loss' was not in top 1


Epoch 690: 100%|██████████| 100/100 [00:06<00:00, 15.87it/s, v_num=0, loss/train_loss=0.143, loss/val_loss=0.187]

Epoch 690, global step 69100: 'loss/val_loss' reached 0.18677 (best 0.18677), saving model to '/disk1/ariane/vscode/CARE/task1_baselines/Pika/test_checkpoint/241023164243_self-pika_esm2_t6_8M_UR50D_gpt2_7_loss/epoch=690-step=69100.ckpt' as top 1


Epoch 691: 100%|██████████| 100/100 [00:06<00:00, 15.95it/s, v_num=0, loss/train_loss=0.135, loss/val_loss=0.218]

Epoch 691, global step 69200: 'loss/val_loss' was not in top 1


Epoch 692: 100%|██████████| 100/100 [00:06<00:00, 16.28it/s, v_num=0, loss/train_loss=0.0935, loss/val_loss=0.212]

Epoch 692, global step 69300: 'loss/val_loss' was not in top 1


Epoch 693: 100%|██████████| 100/100 [00:06<00:00, 15.72it/s, v_num=0, loss/train_loss=0.169, loss/val_loss=0.234] 

Epoch 693, global step 69400: 'loss/val_loss' was not in top 1


Epoch 694: 100%|██████████| 100/100 [00:06<00:00, 15.97it/s, v_num=0, loss/train_loss=0.251, loss/val_loss=0.215]

Epoch 694, global step 69500: 'loss/val_loss' was not in top 1


Epoch 695: 100%|██████████| 100/100 [00:06<00:00, 16.49it/s, v_num=0, loss/train_loss=0.0883, loss/val_loss=0.237]

Epoch 695, global step 69600: 'loss/val_loss' was not in top 1


Epoch 696: 100%|██████████| 100/100 [00:06<00:00, 16.35it/s, v_num=0, loss/train_loss=0.0872, loss/val_loss=0.199]

Epoch 696, global step 69700: 'loss/val_loss' was not in top 1


Epoch 697: 100%|██████████| 100/100 [00:06<00:00, 15.74it/s, v_num=0, loss/train_loss=0.230, loss/val_loss=0.175] 

Epoch 697, global step 69800: 'loss/val_loss' reached 0.17451 (best 0.17451), saving model to '/disk1/ariane/vscode/CARE/task1_baselines/Pika/test_checkpoint/241023164243_self-pika_esm2_t6_8M_UR50D_gpt2_7_loss/epoch=697-step=69800.ckpt' as top 1


Epoch 698: 100%|██████████| 100/100 [00:06<00:00, 16.16it/s, v_num=0, loss/train_loss=0.107, loss/val_loss=0.188]

Epoch 698, global step 69900: 'loss/val_loss' was not in top 1


Epoch 699: 100%|██████████| 100/100 [00:06<00:00, 15.88it/s, v_num=0, loss/train_loss=0.187, loss/val_loss=0.214]

Epoch 699, global step 70000: 'loss/val_loss' was not in top 1


Epoch 700: 100%|██████████| 100/100 [00:06<00:00, 15.93it/s, v_num=0, loss/train_loss=0.0953, loss/val_loss=0.213]

Epoch 700, global step 70100: 'loss/val_loss' was not in top 1


Epoch 701: 100%|██████████| 100/100 [00:06<00:00, 15.66it/s, v_num=0, loss/train_loss=0.182, loss/val_loss=0.230] 

Epoch 701, global step 70200: 'loss/val_loss' was not in top 1


Epoch 702: 100%|██████████| 100/100 [00:06<00:00, 15.28it/s, v_num=0, loss/train_loss=0.113, loss/val_loss=0.241]

Epoch 702, global step 70300: 'loss/val_loss' was not in top 1


Epoch 703: 100%|██████████| 100/100 [00:06<00:00, 15.40it/s, v_num=0, loss/train_loss=0.169, loss/val_loss=0.210]

Epoch 703, global step 70400: 'loss/val_loss' was not in top 1


Epoch 704: 100%|██████████| 100/100 [00:06<00:00, 15.76it/s, v_num=0, loss/train_loss=0.118, loss/val_loss=0.238]

Epoch 704, global step 70500: 'loss/val_loss' was not in top 1


Epoch 705: 100%|██████████| 100/100 [00:06<00:00, 15.53it/s, v_num=0, loss/train_loss=0.219, loss/val_loss=0.239] 

Epoch 705, global step 70600: 'loss/val_loss' was not in top 1


Epoch 706: 100%|██████████| 100/100 [00:06<00:00, 15.62it/s, v_num=0, loss/train_loss=0.181, loss/val_loss=0.270]

Epoch 706, global step 70700: 'loss/val_loss' was not in top 1


Epoch 707: 100%|██████████| 100/100 [00:06<00:00, 16.07it/s, v_num=0, loss/train_loss=0.160, loss/val_loss=0.263]

Epoch 707, global step 70800: 'loss/val_loss' was not in top 1


Epoch 708: 100%|██████████| 100/100 [00:06<00:00, 15.57it/s, v_num=0, loss/train_loss=0.210, loss/val_loss=0.236]

Epoch 708, global step 70900: 'loss/val_loss' was not in top 1


Epoch 709: 100%|██████████| 100/100 [00:06<00:00, 15.86it/s, v_num=0, loss/train_loss=0.102, loss/val_loss=0.272]

Epoch 709, global step 71000: 'loss/val_loss' was not in top 1


Epoch 710: 100%|██████████| 100/100 [00:06<00:00, 15.68it/s, v_num=0, loss/train_loss=0.192, loss/val_loss=0.270]

Epoch 710, global step 71100: 'loss/val_loss' was not in top 1


Epoch 711: 100%|██████████| 100/100 [00:06<00:00, 15.48it/s, v_num=0, loss/train_loss=0.119, loss/val_loss=0.248]

Epoch 711, global step 71200: 'loss/val_loss' was not in top 1


Epoch 712: 100%|██████████| 100/100 [00:06<00:00, 15.69it/s, v_num=0, loss/train_loss=0.199, loss/val_loss=0.231]

Epoch 712, global step 71300: 'loss/val_loss' was not in top 1


Epoch 713: 100%|██████████| 100/100 [00:06<00:00, 15.54it/s, v_num=0, loss/train_loss=0.229, loss/val_loss=0.281]

Epoch 713, global step 71400: 'loss/val_loss' was not in top 1


Epoch 714: 100%|██████████| 100/100 [00:06<00:00, 16.07it/s, v_num=0, loss/train_loss=0.173, loss/val_loss=0.246]

Epoch 714, global step 71500: 'loss/val_loss' was not in top 1


Epoch 715: 100%|██████████| 100/100 [00:06<00:00, 15.58it/s, v_num=0, loss/train_loss=0.162, loss/val_loss=0.210]

Epoch 715, global step 71600: 'loss/val_loss' was not in top 1


Epoch 716: 100%|██████████| 100/100 [00:06<00:00, 15.39it/s, v_num=0, loss/train_loss=0.145, loss/val_loss=0.210]

Epoch 716, global step 71700: 'loss/val_loss' was not in top 1


Epoch 717: 100%|██████████| 100/100 [00:06<00:00, 15.58it/s, v_num=0, loss/train_loss=0.125, loss/val_loss=0.241]

Epoch 717, global step 71800: 'loss/val_loss' was not in top 1


Epoch 718: 100%|██████████| 100/100 [00:06<00:00, 15.83it/s, v_num=0, loss/train_loss=0.105, loss/val_loss=0.269]

Epoch 718, global step 71900: 'loss/val_loss' was not in top 1


Epoch 719: 100%|██████████| 100/100 [00:06<00:00, 15.38it/s, v_num=0, loss/train_loss=0.147, loss/val_loss=0.232] 

Epoch 719, global step 72000: 'loss/val_loss' was not in top 1


Epoch 720: 100%|██████████| 100/100 [00:06<00:00, 15.63it/s, v_num=0, loss/train_loss=0.129, loss/val_loss=0.260]

Epoch 720, global step 72100: 'loss/val_loss' was not in top 1


Epoch 721: 100%|██████████| 100/100 [00:06<00:00, 15.36it/s, v_num=0, loss/train_loss=0.161, loss/val_loss=0.224]

Epoch 721, global step 72200: 'loss/val_loss' was not in top 1


Epoch 722: 100%|██████████| 100/100 [00:06<00:00, 15.78it/s, v_num=0, loss/train_loss=0.0791, loss/val_loss=0.223]

Epoch 722, global step 72300: 'loss/val_loss' was not in top 1


Epoch 723: 100%|██████████| 100/100 [00:06<00:00, 15.68it/s, v_num=0, loss/train_loss=0.132, loss/val_loss=0.251] 

Epoch 723, global step 72400: 'loss/val_loss' was not in top 1


Epoch 724: 100%|██████████| 100/100 [00:06<00:00, 15.78it/s, v_num=0, loss/train_loss=0.165, loss/val_loss=0.210]

Epoch 724, global step 72500: 'loss/val_loss' was not in top 1


Epoch 725: 100%|██████████| 100/100 [00:06<00:00, 16.15it/s, v_num=0, loss/train_loss=0.130, loss/val_loss=0.261]

Epoch 725, global step 72600: 'loss/val_loss' was not in top 1


Epoch 726: 100%|██████████| 100/100 [00:06<00:00, 16.04it/s, v_num=0, loss/train_loss=0.140, loss/val_loss=0.248]

Epoch 726, global step 72700: 'loss/val_loss' was not in top 1


Epoch 727: 100%|██████████| 100/100 [00:06<00:00, 16.23it/s, v_num=0, loss/train_loss=0.118, loss/val_loss=0.260] 

Epoch 727, global step 72800: 'loss/val_loss' was not in top 1


Epoch 728: 100%|██████████| 100/100 [00:06<00:00, 15.52it/s, v_num=0, loss/train_loss=0.153, loss/val_loss=0.281]

Epoch 728, global step 72900: 'loss/val_loss' was not in top 1


Epoch 729: 100%|██████████| 100/100 [00:06<00:00, 15.93it/s, v_num=0, loss/train_loss=0.147, loss/val_loss=0.268] 

Epoch 729, global step 73000: 'loss/val_loss' was not in top 1


Epoch 730: 100%|██████████| 100/100 [00:06<00:00, 15.61it/s, v_num=0, loss/train_loss=0.113, loss/val_loss=0.205]

Epoch 730, global step 73100: 'loss/val_loss' was not in top 1


Epoch 731: 100%|██████████| 100/100 [00:06<00:00, 16.17it/s, v_num=0, loss/train_loss=0.217, loss/val_loss=0.220]

Epoch 731, global step 73200: 'loss/val_loss' was not in top 1


Epoch 732: 100%|██████████| 100/100 [00:06<00:00, 15.77it/s, v_num=0, loss/train_loss=0.139, loss/val_loss=0.218]

Epoch 732, global step 73300: 'loss/val_loss' was not in top 1


Epoch 733: 100%|██████████| 100/100 [00:06<00:00, 15.86it/s, v_num=0, loss/train_loss=0.235, loss/val_loss=0.242]

Epoch 733, global step 73400: 'loss/val_loss' was not in top 1


Epoch 734: 100%|██████████| 100/100 [00:06<00:00, 16.40it/s, v_num=0, loss/train_loss=0.129, loss/val_loss=0.229]

Epoch 734, global step 73500: 'loss/val_loss' was not in top 1


Epoch 735: 100%|██████████| 100/100 [00:06<00:00, 16.15it/s, v_num=0, loss/train_loss=0.263, loss/val_loss=0.274] 

Epoch 735, global step 73600: 'loss/val_loss' was not in top 1


Epoch 736: 100%|██████████| 100/100 [00:06<00:00, 16.09it/s, v_num=0, loss/train_loss=0.180, loss/val_loss=0.265]

Epoch 736, global step 73700: 'loss/val_loss' was not in top 1


Epoch 737: 100%|██████████| 100/100 [00:06<00:00, 16.15it/s, v_num=0, loss/train_loss=0.0786, loss/val_loss=0.228]

Epoch 737, global step 73800: 'loss/val_loss' was not in top 1


Epoch 738: 100%|██████████| 100/100 [00:06<00:00, 15.40it/s, v_num=0, loss/train_loss=0.0906, loss/val_loss=0.195]

Epoch 738, global step 73900: 'loss/val_loss' was not in top 1


Epoch 739: 100%|██████████| 100/100 [00:06<00:00, 15.61it/s, v_num=0, loss/train_loss=0.180, loss/val_loss=0.234] 

Epoch 739, global step 74000: 'loss/val_loss' was not in top 1


Epoch 740: 100%|██████████| 100/100 [00:06<00:00, 15.71it/s, v_num=0, loss/train_loss=0.171, loss/val_loss=0.228] 

Epoch 740, global step 74100: 'loss/val_loss' was not in top 1


Epoch 741: 100%|██████████| 100/100 [00:06<00:00, 15.32it/s, v_num=0, loss/train_loss=0.197, loss/val_loss=0.238]

Epoch 741, global step 74200: 'loss/val_loss' was not in top 1


Epoch 742: 100%|██████████| 100/100 [00:06<00:00, 15.42it/s, v_num=0, loss/train_loss=0.275, loss/val_loss=0.206]

Epoch 742, global step 74300: 'loss/val_loss' was not in top 1


Epoch 743: 100%|██████████| 100/100 [00:06<00:00, 15.21it/s, v_num=0, loss/train_loss=0.0938, loss/val_loss=0.227]

Epoch 743, global step 74400: 'loss/val_loss' was not in top 1


Epoch 744: 100%|██████████| 100/100 [00:06<00:00, 15.73it/s, v_num=0, loss/train_loss=0.127, loss/val_loss=0.238] 

Epoch 744, global step 74500: 'loss/val_loss' was not in top 1


Epoch 745: 100%|██████████| 100/100 [00:06<00:00, 15.55it/s, v_num=0, loss/train_loss=0.0998, loss/val_loss=0.235]

Epoch 745, global step 74600: 'loss/val_loss' was not in top 1


Epoch 746: 100%|██████████| 100/100 [00:06<00:00, 15.34it/s, v_num=0, loss/train_loss=0.139, loss/val_loss=0.261] 

Epoch 746, global step 74700: 'loss/val_loss' was not in top 1


Epoch 747: 100%|██████████| 100/100 [00:06<00:00, 15.45it/s, v_num=0, loss/train_loss=0.112, loss/val_loss=0.258]

Epoch 747, global step 74800: 'loss/val_loss' was not in top 1


Epoch 748: 100%|██████████| 100/100 [00:06<00:00, 15.42it/s, v_num=0, loss/train_loss=0.216, loss/val_loss=0.300]

Epoch 748, global step 74900: 'loss/val_loss' was not in top 1


Epoch 749: 100%|██████████| 100/100 [00:06<00:00, 16.48it/s, v_num=0, loss/train_loss=0.200, loss/val_loss=0.273] 

Epoch 749, global step 75000: 'loss/val_loss' was not in top 1


Epoch 750: 100%|██████████| 100/100 [00:06<00:00, 16.09it/s, v_num=0, loss/train_loss=0.126, loss/val_loss=0.197]

Epoch 750, global step 75100: 'loss/val_loss' was not in top 1


Epoch 751: 100%|██████████| 100/100 [00:05<00:00, 16.77it/s, v_num=0, loss/train_loss=0.128, loss/val_loss=0.247]

Epoch 751, global step 75200: 'loss/val_loss' was not in top 1


Epoch 752: 100%|██████████| 100/100 [00:06<00:00, 16.31it/s, v_num=0, loss/train_loss=0.209, loss/val_loss=0.285]

Epoch 752, global step 75300: 'loss/val_loss' was not in top 1


Epoch 753: 100%|██████████| 100/100 [00:06<00:00, 15.60it/s, v_num=0, loss/train_loss=0.177, loss/val_loss=0.228] 

Epoch 753, global step 75400: 'loss/val_loss' was not in top 1


Epoch 754: 100%|██████████| 100/100 [00:06<00:00, 15.97it/s, v_num=0, loss/train_loss=0.179, loss/val_loss=0.246] 

Epoch 754, global step 75500: 'loss/val_loss' was not in top 1


Epoch 755: 100%|██████████| 100/100 [00:06<00:00, 15.95it/s, v_num=0, loss/train_loss=0.212, loss/val_loss=0.219]

Epoch 755, global step 75600: 'loss/val_loss' was not in top 1


Epoch 756: 100%|██████████| 100/100 [00:06<00:00, 16.17it/s, v_num=0, loss/train_loss=0.174, loss/val_loss=0.288]

Epoch 756, global step 75700: 'loss/val_loss' was not in top 1


Epoch 757: 100%|██████████| 100/100 [00:06<00:00, 16.23it/s, v_num=0, loss/train_loss=0.119, loss/val_loss=0.269]

Epoch 757, global step 75800: 'loss/val_loss' was not in top 1


Epoch 758: 100%|██████████| 100/100 [00:06<00:00, 15.47it/s, v_num=0, loss/train_loss=0.112, loss/val_loss=0.274]

Epoch 758, global step 75900: 'loss/val_loss' was not in top 1


Epoch 759: 100%|██████████| 100/100 [00:06<00:00, 16.56it/s, v_num=0, loss/train_loss=0.0934, loss/val_loss=0.297]

Epoch 759, global step 76000: 'loss/val_loss' was not in top 1


Epoch 760: 100%|██████████| 100/100 [00:06<00:00, 15.97it/s, v_num=0, loss/train_loss=0.140, loss/val_loss=0.329] 

Epoch 760, global step 76100: 'loss/val_loss' was not in top 1


Epoch 761: 100%|██████████| 100/100 [00:06<00:00, 16.08it/s, v_num=0, loss/train_loss=0.147, loss/val_loss=0.310]

Epoch 761, global step 76200: 'loss/val_loss' was not in top 1


Epoch 762: 100%|██████████| 100/100 [00:06<00:00, 16.49it/s, v_num=0, loss/train_loss=0.201, loss/val_loss=0.323]

Epoch 762, global step 76300: 'loss/val_loss' was not in top 1


Epoch 763: 100%|██████████| 100/100 [00:06<00:00, 16.34it/s, v_num=0, loss/train_loss=0.142, loss/val_loss=0.297]

Epoch 763, global step 76400: 'loss/val_loss' was not in top 1


Epoch 764: 100%|██████████| 100/100 [00:06<00:00, 16.30it/s, v_num=0, loss/train_loss=0.130, loss/val_loss=0.268]

Epoch 764, global step 76500: 'loss/val_loss' was not in top 1


Epoch 765: 100%|██████████| 100/100 [00:06<00:00, 16.33it/s, v_num=0, loss/train_loss=0.146, loss/val_loss=0.332]

Epoch 765, global step 76600: 'loss/val_loss' was not in top 1


Epoch 766: 100%|██████████| 100/100 [00:06<00:00, 15.91it/s, v_num=0, loss/train_loss=0.250, loss/val_loss=0.311]

Epoch 766, global step 76700: 'loss/val_loss' was not in top 1


Epoch 767: 100%|██████████| 100/100 [00:06<00:00, 15.71it/s, v_num=0, loss/train_loss=0.0937, loss/val_loss=0.257]

Epoch 767, global step 76800: 'loss/val_loss' was not in top 1


Epoch 768: 100%|██████████| 100/100 [00:06<00:00, 15.93it/s, v_num=0, loss/train_loss=0.183, loss/val_loss=0.262] 

Epoch 768, global step 76900: 'loss/val_loss' was not in top 1


Epoch 769: 100%|██████████| 100/100 [00:06<00:00, 15.78it/s, v_num=0, loss/train_loss=0.130, loss/val_loss=0.358]

Epoch 769, global step 77000: 'loss/val_loss' was not in top 1


Epoch 770: 100%|██████████| 100/100 [00:06<00:00, 15.92it/s, v_num=0, loss/train_loss=0.158, loss/val_loss=0.229]

Epoch 770, global step 77100: 'loss/val_loss' was not in top 1


Epoch 771: 100%|██████████| 100/100 [00:06<00:00, 15.76it/s, v_num=0, loss/train_loss=0.133, loss/val_loss=0.279]

Epoch 771, global step 77200: 'loss/val_loss' was not in top 1


Epoch 772: 100%|██████████| 100/100 [00:06<00:00, 16.39it/s, v_num=0, loss/train_loss=0.142, loss/val_loss=0.310]

Epoch 772, global step 77300: 'loss/val_loss' was not in top 1


Epoch 773: 100%|██████████| 100/100 [00:06<00:00, 16.21it/s, v_num=0, loss/train_loss=0.136, loss/val_loss=0.239]

Epoch 773, global step 77400: 'loss/val_loss' was not in top 1


Epoch 774: 100%|██████████| 100/100 [00:06<00:00, 16.06it/s, v_num=0, loss/train_loss=0.171, loss/val_loss=0.253]

Epoch 774, global step 77500: 'loss/val_loss' was not in top 1


Epoch 775: 100%|██████████| 100/100 [00:06<00:00, 15.85it/s, v_num=0, loss/train_loss=0.110, loss/val_loss=0.253]

Epoch 775, global step 77600: 'loss/val_loss' was not in top 1


Epoch 776: 100%|██████████| 100/100 [00:06<00:00, 15.93it/s, v_num=0, loss/train_loss=0.165, loss/val_loss=0.250]

Epoch 776, global step 77700: 'loss/val_loss' was not in top 1


Epoch 777: 100%|██████████| 100/100 [00:06<00:00, 16.15it/s, v_num=0, loss/train_loss=0.188, loss/val_loss=0.267]

Epoch 777, global step 77800: 'loss/val_loss' was not in top 1


Epoch 778: 100%|██████████| 100/100 [00:06<00:00, 15.90it/s, v_num=0, loss/train_loss=0.0711, loss/val_loss=0.274]

Epoch 778, global step 77900: 'loss/val_loss' was not in top 1


Epoch 779: 100%|██████████| 100/100 [00:06<00:00, 15.38it/s, v_num=0, loss/train_loss=0.119, loss/val_loss=0.253] 

Epoch 779, global step 78000: 'loss/val_loss' was not in top 1


Epoch 780: 100%|██████████| 100/100 [00:06<00:00, 15.44it/s, v_num=0, loss/train_loss=0.177, loss/val_loss=0.327]

Epoch 780, global step 78100: 'loss/val_loss' was not in top 1


Epoch 781: 100%|██████████| 100/100 [00:06<00:00, 15.60it/s, v_num=0, loss/train_loss=0.113, loss/val_loss=0.310]

Epoch 781, global step 78200: 'loss/val_loss' was not in top 1


Epoch 782: 100%|██████████| 100/100 [00:06<00:00, 15.25it/s, v_num=0, loss/train_loss=0.0795, loss/val_loss=0.321]

Epoch 782, global step 78300: 'loss/val_loss' was not in top 1


Epoch 783: 100%|██████████| 100/100 [00:06<00:00, 15.32it/s, v_num=0, loss/train_loss=0.197, loss/val_loss=0.295] 

Epoch 783, global step 78400: 'loss/val_loss' was not in top 1


Epoch 784: 100%|██████████| 100/100 [00:06<00:00, 15.80it/s, v_num=0, loss/train_loss=0.131, loss/val_loss=0.297]

Epoch 784, global step 78500: 'loss/val_loss' was not in top 1


Epoch 785: 100%|██████████| 100/100 [00:06<00:00, 15.59it/s, v_num=0, loss/train_loss=0.183, loss/val_loss=0.251]

Epoch 785, global step 78600: 'loss/val_loss' was not in top 1


Epoch 786: 100%|██████████| 100/100 [00:06<00:00, 14.99it/s, v_num=0, loss/train_loss=0.141, loss/val_loss=0.293]

Epoch 786, global step 78700: 'loss/val_loss' was not in top 1


Epoch 787: 100%|██████████| 100/100 [00:06<00:00, 15.83it/s, v_num=0, loss/train_loss=0.115, loss/val_loss=0.313]

Epoch 787, global step 78800: 'loss/val_loss' was not in top 1


Epoch 788: 100%|██████████| 100/100 [00:06<00:00, 15.15it/s, v_num=0, loss/train_loss=0.0955, loss/val_loss=0.278]

Epoch 788, global step 78900: 'loss/val_loss' was not in top 1


Epoch 789: 100%|██████████| 100/100 [00:06<00:00, 15.48it/s, v_num=0, loss/train_loss=0.199, loss/val_loss=0.356] 

Epoch 789, global step 79000: 'loss/val_loss' was not in top 1


Epoch 790: 100%|██████████| 100/100 [00:06<00:00, 15.42it/s, v_num=0, loss/train_loss=0.123, loss/val_loss=0.392]

Epoch 790, global step 79100: 'loss/val_loss' was not in top 1


Epoch 791: 100%|██████████| 100/100 [00:06<00:00, 15.57it/s, v_num=0, loss/train_loss=0.186, loss/val_loss=0.425]

Epoch 791, global step 79200: 'loss/val_loss' was not in top 1


Epoch 792: 100%|██████████| 100/100 [00:06<00:00, 15.72it/s, v_num=0, loss/train_loss=0.194, loss/val_loss=0.331]

Epoch 792, global step 79300: 'loss/val_loss' was not in top 1


Epoch 793: 100%|██████████| 100/100 [00:06<00:00, 15.99it/s, v_num=0, loss/train_loss=0.0868, loss/val_loss=0.401]

Epoch 793, global step 79400: 'loss/val_loss' was not in top 1


Epoch 794: 100%|██████████| 100/100 [00:06<00:00, 15.91it/s, v_num=0, loss/train_loss=0.158, loss/val_loss=0.377] 

Epoch 794, global step 79500: 'loss/val_loss' was not in top 1


Epoch 795: 100%|██████████| 100/100 [00:06<00:00, 16.24it/s, v_num=0, loss/train_loss=0.226, loss/val_loss=0.261] 

Epoch 795, global step 79600: 'loss/val_loss' was not in top 1


Epoch 796: 100%|██████████| 100/100 [00:06<00:00, 16.39it/s, v_num=0, loss/train_loss=0.195, loss/val_loss=0.216]

Epoch 796, global step 79700: 'loss/val_loss' was not in top 1


Epoch 797: 100%|██████████| 100/100 [00:06<00:00, 15.95it/s, v_num=0, loss/train_loss=0.135, loss/val_loss=0.232]

Epoch 797, global step 79800: 'loss/val_loss' was not in top 1


Epoch 798: 100%|██████████| 100/100 [00:06<00:00, 15.74it/s, v_num=0, loss/train_loss=0.158, loss/val_loss=0.256]

Epoch 798, global step 79900: 'loss/val_loss' was not in top 1


Epoch 799: 100%|██████████| 100/100 [00:06<00:00, 15.15it/s, v_num=0, loss/train_loss=0.191, loss/val_loss=0.259] 

Epoch 799, global step 80000: 'loss/val_loss' was not in top 1


Epoch 800: 100%|██████████| 100/100 [00:06<00:00, 15.75it/s, v_num=0, loss/train_loss=0.107, loss/val_loss=0.266]

Epoch 800, global step 80100: 'loss/val_loss' was not in top 1


Epoch 801: 100%|██████████| 100/100 [00:06<00:00, 15.70it/s, v_num=0, loss/train_loss=0.148, loss/val_loss=0.323]

Epoch 801, global step 80200: 'loss/val_loss' was not in top 1


Epoch 802: 100%|██████████| 100/100 [00:06<00:00, 15.62it/s, v_num=0, loss/train_loss=0.174, loss/val_loss=0.264]

Epoch 802, global step 80300: 'loss/val_loss' was not in top 1


Epoch 803: 100%|██████████| 100/100 [00:06<00:00, 15.35it/s, v_num=0, loss/train_loss=0.141, loss/val_loss=0.264] 

Epoch 803, global step 80400: 'loss/val_loss' was not in top 1


Epoch 804: 100%|██████████| 100/100 [00:06<00:00, 15.93it/s, v_num=0, loss/train_loss=0.177, loss/val_loss=0.294]

Epoch 804, global step 80500: 'loss/val_loss' was not in top 1


Epoch 805: 100%|██████████| 100/100 [00:06<00:00, 15.60it/s, v_num=0, loss/train_loss=0.103, loss/val_loss=0.257]

Epoch 805, global step 80600: 'loss/val_loss' was not in top 1


Epoch 806: 100%|██████████| 100/100 [00:06<00:00, 15.20it/s, v_num=0, loss/train_loss=0.0841, loss/val_loss=0.277]

Epoch 806, global step 80700: 'loss/val_loss' was not in top 1


Epoch 807: 100%|██████████| 100/100 [00:06<00:00, 15.66it/s, v_num=0, loss/train_loss=0.126, loss/val_loss=0.260] 

Epoch 807, global step 80800: 'loss/val_loss' was not in top 1


Epoch 808: 100%|██████████| 100/100 [00:06<00:00, 15.48it/s, v_num=0, loss/train_loss=0.183, loss/val_loss=0.236]

Epoch 808, global step 80900: 'loss/val_loss' was not in top 1


Epoch 809: 100%|██████████| 100/100 [00:06<00:00, 15.45it/s, v_num=0, loss/train_loss=0.131, loss/val_loss=0.205]

Epoch 809, global step 81000: 'loss/val_loss' was not in top 1


Epoch 810: 100%|██████████| 100/100 [00:06<00:00, 15.36it/s, v_num=0, loss/train_loss=0.148, loss/val_loss=0.230]

Epoch 810, global step 81100: 'loss/val_loss' was not in top 1


Epoch 811: 100%|██████████| 100/100 [00:06<00:00, 15.79it/s, v_num=0, loss/train_loss=0.154, loss/val_loss=0.249] 

Epoch 811, global step 81200: 'loss/val_loss' was not in top 1


Epoch 812: 100%|██████████| 100/100 [00:06<00:00, 15.39it/s, v_num=0, loss/train_loss=0.133, loss/val_loss=0.223]

Epoch 812, global step 81300: 'loss/val_loss' was not in top 1


Epoch 813: 100%|██████████| 100/100 [00:06<00:00, 15.49it/s, v_num=0, loss/train_loss=0.187, loss/val_loss=0.237]

Epoch 813, global step 81400: 'loss/val_loss' was not in top 1


Epoch 814: 100%|██████████| 100/100 [00:06<00:00, 15.60it/s, v_num=0, loss/train_loss=0.108, loss/val_loss=0.264]

Epoch 814, global step 81500: 'loss/val_loss' was not in top 1


Epoch 815: 100%|██████████| 100/100 [00:06<00:00, 15.64it/s, v_num=0, loss/train_loss=0.142, loss/val_loss=0.269]

Epoch 815, global step 81600: 'loss/val_loss' was not in top 1


Epoch 816: 100%|██████████| 100/100 [00:06<00:00, 15.83it/s, v_num=0, loss/train_loss=0.0956, loss/val_loss=0.267]

Epoch 816, global step 81700: 'loss/val_loss' was not in top 1


Epoch 817: 100%|██████████| 100/100 [00:06<00:00, 15.33it/s, v_num=0, loss/train_loss=0.134, loss/val_loss=0.239] 

Epoch 817, global step 81800: 'loss/val_loss' was not in top 1


Epoch 818: 100%|██████████| 100/100 [00:06<00:00, 15.73it/s, v_num=0, loss/train_loss=0.202, loss/val_loss=0.258]

Epoch 818, global step 81900: 'loss/val_loss' was not in top 1


Epoch 819: 100%|██████████| 100/100 [00:06<00:00, 15.92it/s, v_num=0, loss/train_loss=0.109, loss/val_loss=0.244]

Epoch 819, global step 82000: 'loss/val_loss' was not in top 1


Epoch 820: 100%|██████████| 100/100 [00:06<00:00, 15.52it/s, v_num=0, loss/train_loss=0.155, loss/val_loss=0.228]

Epoch 820, global step 82100: 'loss/val_loss' was not in top 1


Epoch 821: 100%|██████████| 100/100 [00:06<00:00, 15.56it/s, v_num=0, loss/train_loss=0.240, loss/val_loss=0.246]

Epoch 821, global step 82200: 'loss/val_loss' was not in top 1


Epoch 822: 100%|██████████| 100/100 [00:06<00:00, 15.48it/s, v_num=0, loss/train_loss=0.0849, loss/val_loss=0.206]

Epoch 822, global step 82300: 'loss/val_loss' was not in top 1


Epoch 823: 100%|██████████| 100/100 [00:06<00:00, 15.33it/s, v_num=0, loss/train_loss=0.152, loss/val_loss=0.253] 

Epoch 823, global step 82400: 'loss/val_loss' was not in top 1


Epoch 824: 100%|██████████| 100/100 [00:06<00:00, 15.58it/s, v_num=0, loss/train_loss=0.156, loss/val_loss=0.284]

Epoch 824, global step 82500: 'loss/val_loss' was not in top 1


Epoch 825: 100%|██████████| 100/100 [00:06<00:00, 15.17it/s, v_num=0, loss/train_loss=0.103, loss/val_loss=0.288]

Epoch 825, global step 82600: 'loss/val_loss' was not in top 1


Epoch 826: 100%|██████████| 100/100 [00:06<00:00, 15.69it/s, v_num=0, loss/train_loss=0.154, loss/val_loss=0.262]

Epoch 826, global step 82700: 'loss/val_loss' was not in top 1


Epoch 827: 100%|██████████| 100/100 [00:06<00:00, 15.77it/s, v_num=0, loss/train_loss=0.0911, loss/val_loss=0.299]

Epoch 827, global step 82800: 'loss/val_loss' was not in top 1


Epoch 828: 100%|██████████| 100/100 [00:06<00:00, 16.02it/s, v_num=0, loss/train_loss=0.221, loss/val_loss=0.261] 

Epoch 828, global step 82900: 'loss/val_loss' was not in top 1


Epoch 829: 100%|██████████| 100/100 [00:06<00:00, 15.48it/s, v_num=0, loss/train_loss=0.132, loss/val_loss=0.312]

Epoch 829, global step 83000: 'loss/val_loss' was not in top 1


Epoch 830: 100%|██████████| 100/100 [00:06<00:00, 15.47it/s, v_num=0, loss/train_loss=0.114, loss/val_loss=0.382]

Epoch 830, global step 83100: 'loss/val_loss' was not in top 1


Epoch 831: 100%|██████████| 100/100 [00:06<00:00, 15.98it/s, v_num=0, loss/train_loss=0.148, loss/val_loss=0.279]

Epoch 831, global step 83200: 'loss/val_loss' was not in top 1


Epoch 832: 100%|██████████| 100/100 [00:06<00:00, 15.49it/s, v_num=0, loss/train_loss=0.198, loss/val_loss=0.275]

Epoch 832, global step 83300: 'loss/val_loss' was not in top 1


Epoch 833: 100%|██████████| 100/100 [00:06<00:00, 15.35it/s, v_num=0, loss/train_loss=0.195, loss/val_loss=0.255]

Epoch 833, global step 83400: 'loss/val_loss' was not in top 1


Epoch 834: 100%|██████████| 100/100 [00:06<00:00, 15.38it/s, v_num=0, loss/train_loss=0.0993, loss/val_loss=0.288]

Epoch 834, global step 83500: 'loss/val_loss' was not in top 1


Epoch 835: 100%|██████████| 100/100 [00:06<00:00, 15.35it/s, v_num=0, loss/train_loss=0.140, loss/val_loss=0.234] 

Epoch 835, global step 83600: 'loss/val_loss' was not in top 1


Epoch 836: 100%|██████████| 100/100 [00:06<00:00, 15.53it/s, v_num=0, loss/train_loss=0.179, loss/val_loss=0.230] 

Epoch 836, global step 83700: 'loss/val_loss' was not in top 1


Epoch 837: 100%|██████████| 100/100 [00:06<00:00, 15.70it/s, v_num=0, loss/train_loss=0.174, loss/val_loss=0.267]

Epoch 837, global step 83800: 'loss/val_loss' was not in top 1


Epoch 838: 100%|██████████| 100/100 [00:06<00:00, 15.55it/s, v_num=0, loss/train_loss=0.125, loss/val_loss=0.240]

Epoch 838, global step 83900: 'loss/val_loss' was not in top 1


Epoch 839: 100%|██████████| 100/100 [00:06<00:00, 15.65it/s, v_num=0, loss/train_loss=0.133, loss/val_loss=0.260]

Epoch 839, global step 84000: 'loss/val_loss' was not in top 1


Epoch 840: 100%|██████████| 100/100 [00:06<00:00, 15.45it/s, v_num=0, loss/train_loss=0.199, loss/val_loss=0.237]

Epoch 840, global step 84100: 'loss/val_loss' was not in top 1


Epoch 841: 100%|██████████| 100/100 [00:06<00:00, 15.47it/s, v_num=0, loss/train_loss=0.176, loss/val_loss=0.311] 

Epoch 841, global step 84200: 'loss/val_loss' was not in top 1


Epoch 842: 100%|██████████| 100/100 [00:06<00:00, 15.26it/s, v_num=0, loss/train_loss=0.185, loss/val_loss=0.256]

Epoch 842, global step 84300: 'loss/val_loss' was not in top 1


Epoch 843: 100%|██████████| 100/100 [00:06<00:00, 15.89it/s, v_num=0, loss/train_loss=0.203, loss/val_loss=0.227]

Epoch 843, global step 84400: 'loss/val_loss' was not in top 1


Epoch 844: 100%|██████████| 100/100 [00:06<00:00, 15.05it/s, v_num=0, loss/train_loss=0.140, loss/val_loss=0.218]

Epoch 844, global step 84500: 'loss/val_loss' was not in top 1


Epoch 845: 100%|██████████| 100/100 [00:06<00:00, 15.15it/s, v_num=0, loss/train_loss=0.0895, loss/val_loss=0.201]

Epoch 845, global step 84600: 'loss/val_loss' was not in top 1


Epoch 846: 100%|██████████| 100/100 [00:06<00:00, 15.31it/s, v_num=0, loss/train_loss=0.147, loss/val_loss=0.291] 

Epoch 846, global step 84700: 'loss/val_loss' was not in top 1


Epoch 847: 100%|██████████| 100/100 [00:06<00:00, 15.80it/s, v_num=0, loss/train_loss=0.0945, loss/val_loss=0.306]

Epoch 847, global step 84800: 'loss/val_loss' was not in top 1


Epoch 848: 100%|██████████| 100/100 [00:06<00:00, 15.47it/s, v_num=0, loss/train_loss=0.150, loss/val_loss=0.259] 

Epoch 848, global step 84900: 'loss/val_loss' was not in top 1


Epoch 849: 100%|██████████| 100/100 [00:06<00:00, 15.67it/s, v_num=0, loss/train_loss=0.135, loss/val_loss=0.322]

Epoch 849, global step 85000: 'loss/val_loss' was not in top 1


Epoch 850: 100%|██████████| 100/100 [00:06<00:00, 15.71it/s, v_num=0, loss/train_loss=0.163, loss/val_loss=0.295]

Epoch 850, global step 85100: 'loss/val_loss' was not in top 1


Epoch 851: 100%|██████████| 100/100 [00:06<00:00, 15.51it/s, v_num=0, loss/train_loss=0.0994, loss/val_loss=0.216]

Epoch 851, global step 85200: 'loss/val_loss' was not in top 1


Epoch 852: 100%|██████████| 100/100 [00:06<00:00, 15.82it/s, v_num=0, loss/train_loss=0.110, loss/val_loss=0.267] 

Epoch 852, global step 85300: 'loss/val_loss' was not in top 1


Epoch 853: 100%|██████████| 100/100 [00:06<00:00, 15.55it/s, v_num=0, loss/train_loss=0.097, loss/val_loss=0.277]

Epoch 853, global step 85400: 'loss/val_loss' was not in top 1


Epoch 854: 100%|██████████| 100/100 [00:06<00:00, 15.94it/s, v_num=0, loss/train_loss=0.139, loss/val_loss=0.264]

Epoch 854, global step 85500: 'loss/val_loss' was not in top 1


Epoch 855: 100%|██████████| 100/100 [00:06<00:00, 15.20it/s, v_num=0, loss/train_loss=0.144, loss/val_loss=0.251]

Epoch 855, global step 85600: 'loss/val_loss' was not in top 1


Epoch 856: 100%|██████████| 100/100 [00:06<00:00, 15.60it/s, v_num=0, loss/train_loss=0.154, loss/val_loss=0.257]

Epoch 856, global step 85700: 'loss/val_loss' was not in top 1


Epoch 857: 100%|██████████| 100/100 [00:06<00:00, 15.61it/s, v_num=0, loss/train_loss=0.170, loss/val_loss=0.241]

Epoch 857, global step 85800: 'loss/val_loss' was not in top 1


Epoch 858: 100%|██████████| 100/100 [00:06<00:00, 15.42it/s, v_num=0, loss/train_loss=0.131, loss/val_loss=0.325]

Epoch 858, global step 85900: 'loss/val_loss' was not in top 1


Epoch 859: 100%|██████████| 100/100 [00:06<00:00, 15.53it/s, v_num=0, loss/train_loss=0.131, loss/val_loss=0.237]

Epoch 859, global step 86000: 'loss/val_loss' was not in top 1


Epoch 860: 100%|██████████| 100/100 [00:06<00:00, 15.48it/s, v_num=0, loss/train_loss=0.0922, loss/val_loss=0.262]

Epoch 860, global step 86100: 'loss/val_loss' was not in top 1


Epoch 861: 100%|██████████| 100/100 [00:06<00:00, 15.35it/s, v_num=0, loss/train_loss=0.182, loss/val_loss=0.342] 

Epoch 861, global step 86200: 'loss/val_loss' was not in top 1


Epoch 862: 100%|██████████| 100/100 [00:06<00:00, 15.16it/s, v_num=0, loss/train_loss=0.164, loss/val_loss=0.216]

Epoch 862, global step 86300: 'loss/val_loss' was not in top 1


Epoch 863: 100%|██████████| 100/100 [00:06<00:00, 15.62it/s, v_num=0, loss/train_loss=0.102, loss/val_loss=0.291]

Epoch 863, global step 86400: 'loss/val_loss' was not in top 1


Epoch 864: 100%|██████████| 100/100 [00:06<00:00, 15.42it/s, v_num=0, loss/train_loss=0.222, loss/val_loss=0.250]

Epoch 864, global step 86500: 'loss/val_loss' was not in top 1


Epoch 865: 100%|██████████| 100/100 [00:06<00:00, 15.79it/s, v_num=0, loss/train_loss=0.125, loss/val_loss=0.233]

Epoch 865, global step 86600: 'loss/val_loss' was not in top 1


Epoch 866: 100%|██████████| 100/100 [00:06<00:00, 15.66it/s, v_num=0, loss/train_loss=0.139, loss/val_loss=0.269]

Epoch 866, global step 86700: 'loss/val_loss' was not in top 1


Epoch 867: 100%|██████████| 100/100 [00:06<00:00, 15.53it/s, v_num=0, loss/train_loss=0.108, loss/val_loss=0.246] 

Epoch 867, global step 86800: 'loss/val_loss' was not in top 1


Epoch 868: 100%|██████████| 100/100 [00:06<00:00, 15.34it/s, v_num=0, loss/train_loss=0.119, loss/val_loss=0.218]

Epoch 868, global step 86900: 'loss/val_loss' was not in top 1


Epoch 869: 100%|██████████| 100/100 [00:06<00:00, 15.66it/s, v_num=0, loss/train_loss=0.155, loss/val_loss=0.266]

Epoch 869, global step 87000: 'loss/val_loss' was not in top 1


Epoch 870: 100%|██████████| 100/100 [00:06<00:00, 15.69it/s, v_num=0, loss/train_loss=0.195, loss/val_loss=0.294] 

Epoch 870, global step 87100: 'loss/val_loss' was not in top 1


Epoch 871: 100%|██████████| 100/100 [00:06<00:00, 15.32it/s, v_num=0, loss/train_loss=0.223, loss/val_loss=0.282]

Epoch 871, global step 87200: 'loss/val_loss' was not in top 1


Epoch 872: 100%|██████████| 100/100 [00:06<00:00, 16.21it/s, v_num=0, loss/train_loss=0.134, loss/val_loss=0.225]

Epoch 872, global step 87300: 'loss/val_loss' was not in top 1


Epoch 873: 100%|██████████| 100/100 [00:06<00:00, 15.40it/s, v_num=0, loss/train_loss=0.205, loss/val_loss=0.284]

Epoch 873, global step 87400: 'loss/val_loss' was not in top 1


Epoch 874: 100%|██████████| 100/100 [00:06<00:00, 15.27it/s, v_num=0, loss/train_loss=0.302, loss/val_loss=0.270]

Epoch 874, global step 87500: 'loss/val_loss' was not in top 1


Epoch 875: 100%|██████████| 100/100 [00:06<00:00, 15.50it/s, v_num=0, loss/train_loss=0.115, loss/val_loss=0.297] 

Epoch 875, global step 87600: 'loss/val_loss' was not in top 1


Epoch 876: 100%|██████████| 100/100 [00:06<00:00, 15.81it/s, v_num=0, loss/train_loss=0.105, loss/val_loss=0.262]

Epoch 876, global step 87700: 'loss/val_loss' was not in top 1


Epoch 877: 100%|██████████| 100/100 [00:06<00:00, 15.71it/s, v_num=0, loss/train_loss=0.164, loss/val_loss=0.264]

Epoch 877, global step 87800: 'loss/val_loss' was not in top 1


Epoch 878: 100%|██████████| 100/100 [00:06<00:00, 15.67it/s, v_num=0, loss/train_loss=0.185, loss/val_loss=0.208]

Epoch 878, global step 87900: 'loss/val_loss' was not in top 1


Epoch 879: 100%|██████████| 100/100 [00:06<00:00, 15.80it/s, v_num=0, loss/train_loss=0.171, loss/val_loss=0.236] 

Epoch 879, global step 88000: 'loss/val_loss' was not in top 1


Epoch 880: 100%|██████████| 100/100 [00:06<00:00, 15.52it/s, v_num=0, loss/train_loss=0.101, loss/val_loss=0.240]

Epoch 880, global step 88100: 'loss/val_loss' was not in top 1


Epoch 881: 100%|██████████| 100/100 [00:06<00:00, 15.62it/s, v_num=0, loss/train_loss=0.264, loss/val_loss=0.244]

Epoch 881, global step 88200: 'loss/val_loss' was not in top 1


Epoch 882: 100%|██████████| 100/100 [00:06<00:00, 15.79it/s, v_num=0, loss/train_loss=0.0869, loss/val_loss=0.220]

Epoch 882, global step 88300: 'loss/val_loss' was not in top 1


Epoch 883: 100%|██████████| 100/100 [00:06<00:00, 15.85it/s, v_num=0, loss/train_loss=0.141, loss/val_loss=0.194] 

Epoch 883, global step 88400: 'loss/val_loss' was not in top 1


Epoch 884: 100%|██████████| 100/100 [00:06<00:00, 15.84it/s, v_num=0, loss/train_loss=0.183, loss/val_loss=0.211]

Epoch 884, global step 88500: 'loss/val_loss' was not in top 1


Epoch 885: 100%|██████████| 100/100 [00:06<00:00, 15.71it/s, v_num=0, loss/train_loss=0.115, loss/val_loss=0.240] 

Epoch 885, global step 88600: 'loss/val_loss' was not in top 1


Epoch 886: 100%|██████████| 100/100 [00:06<00:00, 15.69it/s, v_num=0, loss/train_loss=0.0878, loss/val_loss=0.266]

Epoch 886, global step 88700: 'loss/val_loss' was not in top 1


Epoch 887: 100%|██████████| 100/100 [00:06<00:00, 15.80it/s, v_num=0, loss/train_loss=0.208, loss/val_loss=0.312] 

Epoch 887, global step 88800: 'loss/val_loss' was not in top 1


Epoch 888: 100%|██████████| 100/100 [00:06<00:00, 15.67it/s, v_num=0, loss/train_loss=0.104, loss/val_loss=0.264]

Epoch 888, global step 88900: 'loss/val_loss' was not in top 1


Epoch 889: 100%|██████████| 100/100 [00:06<00:00, 15.65it/s, v_num=0, loss/train_loss=0.127, loss/val_loss=0.243]

Epoch 889, global step 89000: 'loss/val_loss' was not in top 1


Epoch 890: 100%|██████████| 100/100 [00:06<00:00, 15.58it/s, v_num=0, loss/train_loss=0.100, loss/val_loss=0.260] 

Epoch 890, global step 89100: 'loss/val_loss' was not in top 1


Epoch 891: 100%|██████████| 100/100 [00:06<00:00, 15.65it/s, v_num=0, loss/train_loss=0.0775, loss/val_loss=0.230]

Epoch 891, global step 89200: 'loss/val_loss' was not in top 1


Epoch 892: 100%|██████████| 100/100 [00:06<00:00, 15.36it/s, v_num=0, loss/train_loss=0.112, loss/val_loss=0.279] 

Epoch 892, global step 89300: 'loss/val_loss' was not in top 1


Epoch 893: 100%|██████████| 100/100 [00:06<00:00, 15.57it/s, v_num=0, loss/train_loss=0.116, loss/val_loss=0.231]

Epoch 893, global step 89400: 'loss/val_loss' was not in top 1


Epoch 894: 100%|██████████| 100/100 [00:06<00:00, 15.69it/s, v_num=0, loss/train_loss=0.152, loss/val_loss=0.304]

Epoch 894, global step 89500: 'loss/val_loss' was not in top 1


Epoch 895: 100%|██████████| 100/100 [00:06<00:00, 15.38it/s, v_num=0, loss/train_loss=0.111, loss/val_loss=0.276]

Epoch 895, global step 89600: 'loss/val_loss' was not in top 1


Epoch 896: 100%|██████████| 100/100 [00:06<00:00, 15.20it/s, v_num=0, loss/train_loss=0.251, loss/val_loss=0.235]

Epoch 896, global step 89700: 'loss/val_loss' was not in top 1


Epoch 897: 100%|██████████| 100/100 [00:06<00:00, 14.99it/s, v_num=0, loss/train_loss=0.156, loss/val_loss=0.227]

Epoch 897, global step 89800: 'loss/val_loss' was not in top 1


Epoch 898: 100%|██████████| 100/100 [00:06<00:00, 15.38it/s, v_num=0, loss/train_loss=0.0918, loss/val_loss=0.300]

Epoch 898, global step 89900: 'loss/val_loss' was not in top 1


Epoch 899: 100%|██████████| 100/100 [00:06<00:00, 15.66it/s, v_num=0, loss/train_loss=0.216, loss/val_loss=0.243] 

Epoch 899, global step 90000: 'loss/val_loss' was not in top 1


Epoch 900: 100%|██████████| 100/100 [00:06<00:00, 15.56it/s, v_num=0, loss/train_loss=0.142, loss/val_loss=0.248]

Epoch 900, global step 90100: 'loss/val_loss' was not in top 1


Epoch 901: 100%|██████████| 100/100 [00:06<00:00, 15.23it/s, v_num=0, loss/train_loss=0.185, loss/val_loss=0.275]

Epoch 901, global step 90200: 'loss/val_loss' was not in top 1


Epoch 902: 100%|██████████| 100/100 [00:06<00:00, 15.53it/s, v_num=0, loss/train_loss=0.138, loss/val_loss=0.231]

Epoch 902, global step 90300: 'loss/val_loss' was not in top 1


Epoch 903: 100%|██████████| 100/100 [00:06<00:00, 15.25it/s, v_num=0, loss/train_loss=0.153, loss/val_loss=0.218]

Epoch 903, global step 90400: 'loss/val_loss' was not in top 1


Epoch 904: 100%|██████████| 100/100 [00:06<00:00, 15.70it/s, v_num=0, loss/train_loss=0.171, loss/val_loss=0.256]

Epoch 904, global step 90500: 'loss/val_loss' was not in top 1


Epoch 905: 100%|██████████| 100/100 [00:06<00:00, 15.23it/s, v_num=0, loss/train_loss=0.183, loss/val_loss=0.239]

Epoch 905, global step 90600: 'loss/val_loss' was not in top 1


Epoch 906: 100%|██████████| 100/100 [00:06<00:00, 15.47it/s, v_num=0, loss/train_loss=0.175, loss/val_loss=0.273]

Epoch 906, global step 90700: 'loss/val_loss' was not in top 1


Epoch 907: 100%|██████████| 100/100 [00:06<00:00, 15.42it/s, v_num=0, loss/train_loss=0.0808, loss/val_loss=0.325]

Epoch 907, global step 90800: 'loss/val_loss' was not in top 1


Epoch 908: 100%|██████████| 100/100 [00:06<00:00, 15.61it/s, v_num=0, loss/train_loss=0.149, loss/val_loss=0.230] 

Epoch 908, global step 90900: 'loss/val_loss' was not in top 1


Epoch 909: 100%|██████████| 100/100 [00:06<00:00, 15.66it/s, v_num=0, loss/train_loss=0.131, loss/val_loss=0.247]

Epoch 909, global step 91000: 'loss/val_loss' was not in top 1


Epoch 910: 100%|██████████| 100/100 [00:06<00:00, 15.38it/s, v_num=0, loss/train_loss=0.111, loss/val_loss=0.274]

Epoch 910, global step 91100: 'loss/val_loss' was not in top 1


Epoch 911: 100%|██████████| 100/100 [00:06<00:00, 15.46it/s, v_num=0, loss/train_loss=0.119, loss/val_loss=0.292]

Epoch 911, global step 91200: 'loss/val_loss' was not in top 1


Epoch 912: 100%|██████████| 100/100 [00:06<00:00, 15.63it/s, v_num=0, loss/train_loss=0.115, loss/val_loss=0.218]

Epoch 912, global step 91300: 'loss/val_loss' was not in top 1


Epoch 913: 100%|██████████| 100/100 [00:06<00:00, 15.08it/s, v_num=0, loss/train_loss=0.126, loss/val_loss=0.249]

Epoch 913, global step 91400: 'loss/val_loss' was not in top 1


Epoch 914: 100%|██████████| 100/100 [00:06<00:00, 15.46it/s, v_num=0, loss/train_loss=0.216, loss/val_loss=0.284]

Epoch 914, global step 91500: 'loss/val_loss' was not in top 1


Epoch 915: 100%|██████████| 100/100 [00:06<00:00, 15.74it/s, v_num=0, loss/train_loss=0.0714, loss/val_loss=0.259]

Epoch 915, global step 91600: 'loss/val_loss' was not in top 1


Epoch 916: 100%|██████████| 100/100 [00:06<00:00, 15.23it/s, v_num=0, loss/train_loss=0.143, loss/val_loss=0.260] 

Epoch 916, global step 91700: 'loss/val_loss' was not in top 1


Epoch 917: 100%|██████████| 100/100 [00:06<00:00, 15.22it/s, v_num=0, loss/train_loss=0.0869, loss/val_loss=0.248]

Epoch 917, global step 91800: 'loss/val_loss' was not in top 1


Epoch 918: 100%|██████████| 100/100 [00:06<00:00, 16.04it/s, v_num=0, loss/train_loss=0.154, loss/val_loss=0.262] 

Epoch 918, global step 91900: 'loss/val_loss' was not in top 1


Epoch 919: 100%|██████████| 100/100 [00:06<00:00, 15.31it/s, v_num=0, loss/train_loss=0.142, loss/val_loss=0.208] 

Epoch 919, global step 92000: 'loss/val_loss' was not in top 1


Epoch 920: 100%|██████████| 100/100 [00:06<00:00, 15.17it/s, v_num=0, loss/train_loss=0.111, loss/val_loss=0.251]

Epoch 920, global step 92100: 'loss/val_loss' was not in top 1


Epoch 921: 100%|██████████| 100/100 [00:06<00:00, 15.65it/s, v_num=0, loss/train_loss=0.109, loss/val_loss=0.201] 

Epoch 921, global step 92200: 'loss/val_loss' was not in top 1


Epoch 922: 100%|██████████| 100/100 [00:06<00:00, 15.50it/s, v_num=0, loss/train_loss=0.136, loss/val_loss=0.286]

Epoch 922, global step 92300: 'loss/val_loss' was not in top 1


Epoch 923: 100%|██████████| 100/100 [00:06<00:00, 15.74it/s, v_num=0, loss/train_loss=0.184, loss/val_loss=0.229]

Epoch 923, global step 92400: 'loss/val_loss' was not in top 1


Epoch 924: 100%|██████████| 100/100 [00:06<00:00, 15.38it/s, v_num=0, loss/train_loss=0.122, loss/val_loss=0.268]

Epoch 924, global step 92500: 'loss/val_loss' was not in top 1


Epoch 925: 100%|██████████| 100/100 [00:06<00:00, 15.65it/s, v_num=0, loss/train_loss=0.151, loss/val_loss=0.249]

Epoch 925, global step 92600: 'loss/val_loss' was not in top 1


Epoch 926: 100%|██████████| 100/100 [00:06<00:00, 15.27it/s, v_num=0, loss/train_loss=0.217, loss/val_loss=0.226]

Epoch 926, global step 92700: 'loss/val_loss' was not in top 1


Epoch 927: 100%|██████████| 100/100 [00:06<00:00, 15.94it/s, v_num=0, loss/train_loss=0.0918, loss/val_loss=0.236]

Epoch 927, global step 92800: 'loss/val_loss' was not in top 1


Epoch 928: 100%|██████████| 100/100 [00:06<00:00, 15.28it/s, v_num=0, loss/train_loss=0.127, loss/val_loss=0.237] 

Epoch 928, global step 92900: 'loss/val_loss' was not in top 1


Epoch 929: 100%|██████████| 100/100 [00:06<00:00, 15.38it/s, v_num=0, loss/train_loss=0.108, loss/val_loss=0.266]

Epoch 929, global step 93000: 'loss/val_loss' was not in top 1


Epoch 930: 100%|██████████| 100/100 [00:06<00:00, 15.90it/s, v_num=0, loss/train_loss=0.128, loss/val_loss=0.302] 

Epoch 930, global step 93100: 'loss/val_loss' was not in top 1


Epoch 931: 100%|██████████| 100/100 [00:06<00:00, 15.38it/s, v_num=0, loss/train_loss=0.163, loss/val_loss=0.260] 

Epoch 931, global step 93200: 'loss/val_loss' was not in top 1


Epoch 932: 100%|██████████| 100/100 [00:06<00:00, 15.58it/s, v_num=0, loss/train_loss=0.227, loss/val_loss=0.275]

Epoch 932, global step 93300: 'loss/val_loss' was not in top 1


Epoch 933: 100%|██████████| 100/100 [00:06<00:00, 15.14it/s, v_num=0, loss/train_loss=0.144, loss/val_loss=0.258] 

Epoch 933, global step 93400: 'loss/val_loss' was not in top 1


Epoch 934: 100%|██████████| 100/100 [00:06<00:00, 16.00it/s, v_num=0, loss/train_loss=0.137, loss/val_loss=0.238]

Epoch 934, global step 93500: 'loss/val_loss' was not in top 1


Epoch 935: 100%|██████████| 100/100 [00:06<00:00, 15.40it/s, v_num=0, loss/train_loss=0.112, loss/val_loss=0.261]

Epoch 935, global step 93600: 'loss/val_loss' was not in top 1


Epoch 936: 100%|██████████| 100/100 [00:06<00:00, 15.84it/s, v_num=0, loss/train_loss=0.158, loss/val_loss=0.266]

Epoch 936, global step 93700: 'loss/val_loss' was not in top 1


Epoch 937: 100%|██████████| 100/100 [00:06<00:00, 15.53it/s, v_num=0, loss/train_loss=0.119, loss/val_loss=0.244]

Epoch 937, global step 93800: 'loss/val_loss' was not in top 1


Epoch 938: 100%|██████████| 100/100 [00:06<00:00, 15.49it/s, v_num=0, loss/train_loss=0.199, loss/val_loss=0.290]

Epoch 938, global step 93900: 'loss/val_loss' was not in top 1


Epoch 939: 100%|██████████| 100/100 [00:06<00:00, 15.85it/s, v_num=0, loss/train_loss=0.179, loss/val_loss=0.249]

Epoch 939, global step 94000: 'loss/val_loss' was not in top 1


Epoch 940: 100%|██████████| 100/100 [00:06<00:00, 15.88it/s, v_num=0, loss/train_loss=0.141, loss/val_loss=0.208]

Epoch 940, global step 94100: 'loss/val_loss' was not in top 1


Epoch 941: 100%|██████████| 100/100 [00:06<00:00, 15.42it/s, v_num=0, loss/train_loss=0.172, loss/val_loss=0.242]

Epoch 941, global step 94200: 'loss/val_loss' was not in top 1


Epoch 942: 100%|██████████| 100/100 [00:06<00:00, 15.52it/s, v_num=0, loss/train_loss=0.0836, loss/val_loss=0.193]

Epoch 942, global step 94300: 'loss/val_loss' was not in top 1


Epoch 943: 100%|██████████| 100/100 [00:06<00:00, 15.88it/s, v_num=0, loss/train_loss=0.107, loss/val_loss=0.202] 

Epoch 943, global step 94400: 'loss/val_loss' was not in top 1


Epoch 944: 100%|██████████| 100/100 [00:06<00:00, 15.60it/s, v_num=0, loss/train_loss=0.173, loss/val_loss=0.219]

Epoch 944, global step 94500: 'loss/val_loss' was not in top 1


Epoch 945: 100%|██████████| 100/100 [00:06<00:00, 15.20it/s, v_num=0, loss/train_loss=0.092, loss/val_loss=0.234]

Epoch 945, global step 94600: 'loss/val_loss' was not in top 1


Epoch 946: 100%|██████████| 100/100 [00:06<00:00, 15.45it/s, v_num=0, loss/train_loss=0.152, loss/val_loss=0.282]

Epoch 946, global step 94700: 'loss/val_loss' was not in top 1


Epoch 947: 100%|██████████| 100/100 [00:06<00:00, 15.46it/s, v_num=0, loss/train_loss=0.0733, loss/val_loss=0.267]

Epoch 947, global step 94800: 'loss/val_loss' was not in top 1


Epoch 948: 100%|██████████| 100/100 [00:06<00:00, 15.74it/s, v_num=0, loss/train_loss=0.154, loss/val_loss=0.228] 

Epoch 948, global step 94900: 'loss/val_loss' was not in top 1


Epoch 949: 100%|██████████| 100/100 [00:06<00:00, 15.43it/s, v_num=0, loss/train_loss=0.0927, loss/val_loss=0.240]

Epoch 949, global step 95000: 'loss/val_loss' was not in top 1


Epoch 950: 100%|██████████| 100/100 [00:06<00:00, 15.48it/s, v_num=0, loss/train_loss=0.207, loss/val_loss=0.260] 

Epoch 950, global step 95100: 'loss/val_loss' was not in top 1


Epoch 951: 100%|██████████| 100/100 [00:06<00:00, 15.78it/s, v_num=0, loss/train_loss=0.0963, loss/val_loss=0.236]

Epoch 951, global step 95200: 'loss/val_loss' was not in top 1


Epoch 952: 100%|██████████| 100/100 [00:06<00:00, 15.22it/s, v_num=0, loss/train_loss=0.0927, loss/val_loss=0.262]

Epoch 952, global step 95300: 'loss/val_loss' was not in top 1


Epoch 953: 100%|██████████| 100/100 [00:06<00:00, 15.60it/s, v_num=0, loss/train_loss=0.208, loss/val_loss=0.272] 

Epoch 953, global step 95400: 'loss/val_loss' was not in top 1


Epoch 954: 100%|██████████| 100/100 [00:06<00:00, 15.82it/s, v_num=0, loss/train_loss=0.153, loss/val_loss=0.268]

Epoch 954, global step 95500: 'loss/val_loss' was not in top 1


Epoch 955: 100%|██████████| 100/100 [00:06<00:00, 15.53it/s, v_num=0, loss/train_loss=0.152, loss/val_loss=0.231] 

Epoch 955, global step 95600: 'loss/val_loss' was not in top 1


Epoch 956: 100%|██████████| 100/100 [00:06<00:00, 15.40it/s, v_num=0, loss/train_loss=0.134, loss/val_loss=0.272]

Epoch 956, global step 95700: 'loss/val_loss' was not in top 1


Epoch 957: 100%|██████████| 100/100 [00:06<00:00, 15.02it/s, v_num=0, loss/train_loss=0.124, loss/val_loss=0.253]

Epoch 957, global step 95800: 'loss/val_loss' was not in top 1


Epoch 958: 100%|██████████| 100/100 [00:06<00:00, 15.12it/s, v_num=0, loss/train_loss=0.179, loss/val_loss=0.232]

Epoch 958, global step 95900: 'loss/val_loss' was not in top 1


Epoch 959: 100%|██████████| 100/100 [00:06<00:00, 15.67it/s, v_num=0, loss/train_loss=0.155, loss/val_loss=0.247]

Epoch 959, global step 96000: 'loss/val_loss' was not in top 1


Epoch 960: 100%|██████████| 100/100 [00:06<00:00, 15.04it/s, v_num=0, loss/train_loss=0.116, loss/val_loss=0.237]

Epoch 960, global step 96100: 'loss/val_loss' was not in top 1


Epoch 961: 100%|██████████| 100/100 [00:06<00:00, 15.18it/s, v_num=0, loss/train_loss=0.106, loss/val_loss=0.308]

Epoch 961, global step 96200: 'loss/val_loss' was not in top 1


Epoch 962: 100%|██████████| 100/100 [00:06<00:00, 15.24it/s, v_num=0, loss/train_loss=0.041, loss/val_loss=0.231]

Epoch 962, global step 96300: 'loss/val_loss' was not in top 1


Epoch 963: 100%|██████████| 100/100 [00:06<00:00, 14.76it/s, v_num=0, loss/train_loss=0.141, loss/val_loss=0.258]

Epoch 963, global step 96400: 'loss/val_loss' was not in top 1


Epoch 964: 100%|██████████| 100/100 [00:06<00:00, 15.50it/s, v_num=0, loss/train_loss=0.0994, loss/val_loss=0.222]

Epoch 964, global step 96500: 'loss/val_loss' was not in top 1


Epoch 965: 100%|██████████| 100/100 [00:06<00:00, 15.26it/s, v_num=0, loss/train_loss=0.0894, loss/val_loss=0.245]

Epoch 965, global step 96600: 'loss/val_loss' was not in top 1


Epoch 966: 100%|██████████| 100/100 [00:06<00:00, 15.33it/s, v_num=0, loss/train_loss=0.179, loss/val_loss=0.224] 

Epoch 966, global step 96700: 'loss/val_loss' was not in top 1


Epoch 967: 100%|██████████| 100/100 [00:06<00:00, 15.43it/s, v_num=0, loss/train_loss=0.230, loss/val_loss=0.248]

Epoch 967, global step 96800: 'loss/val_loss' was not in top 1


Epoch 968: 100%|██████████| 100/100 [00:06<00:00, 15.14it/s, v_num=0, loss/train_loss=0.104, loss/val_loss=0.307]

Epoch 968, global step 96900: 'loss/val_loss' was not in top 1


Epoch 969: 100%|██████████| 100/100 [00:06<00:00, 15.39it/s, v_num=0, loss/train_loss=0.0866, loss/val_loss=0.201]

Epoch 969, global step 97000: 'loss/val_loss' was not in top 1


Epoch 970: 100%|██████████| 100/100 [00:06<00:00, 15.22it/s, v_num=0, loss/train_loss=0.125, loss/val_loss=0.215] 

Epoch 970, global step 97100: 'loss/val_loss' was not in top 1


Epoch 971: 100%|██████████| 100/100 [00:06<00:00, 15.27it/s, v_num=0, loss/train_loss=0.109, loss/val_loss=0.220]

Epoch 971, global step 97200: 'loss/val_loss' was not in top 1


Epoch 972: 100%|██████████| 100/100 [00:06<00:00, 15.36it/s, v_num=0, loss/train_loss=0.116, loss/val_loss=0.254]

Epoch 972, global step 97300: 'loss/val_loss' was not in top 1


Epoch 973: 100%|██████████| 100/100 [00:06<00:00, 15.49it/s, v_num=0, loss/train_loss=0.132, loss/val_loss=0.214] 

Epoch 973, global step 97400: 'loss/val_loss' was not in top 1


Epoch 974: 100%|██████████| 100/100 [00:06<00:00, 15.47it/s, v_num=0, loss/train_loss=0.168, loss/val_loss=0.208]

Epoch 974, global step 97500: 'loss/val_loss' was not in top 1


Epoch 975: 100%|██████████| 100/100 [00:06<00:00, 15.57it/s, v_num=0, loss/train_loss=0.147, loss/val_loss=0.262]

Epoch 975, global step 97600: 'loss/val_loss' was not in top 1


Epoch 976: 100%|██████████| 100/100 [00:06<00:00, 15.17it/s, v_num=0, loss/train_loss=0.174, loss/val_loss=0.257]

Epoch 976, global step 97700: 'loss/val_loss' was not in top 1


Epoch 977: 100%|██████████| 100/100 [00:06<00:00, 15.27it/s, v_num=0, loss/train_loss=0.160, loss/val_loss=0.220]

Epoch 977, global step 97800: 'loss/val_loss' was not in top 1


Epoch 978: 100%|██████████| 100/100 [00:06<00:00, 15.38it/s, v_num=0, loss/train_loss=0.146, loss/val_loss=0.234]

Epoch 978, global step 97900: 'loss/val_loss' was not in top 1


Epoch 979: 100%|██████████| 100/100 [00:06<00:00, 15.47it/s, v_num=0, loss/train_loss=0.106, loss/val_loss=0.221] 

Epoch 979, global step 98000: 'loss/val_loss' was not in top 1


Epoch 980: 100%|██████████| 100/100 [00:06<00:00, 15.38it/s, v_num=0, loss/train_loss=0.223, loss/val_loss=0.267]

Epoch 980, global step 98100: 'loss/val_loss' was not in top 1


Epoch 981: 100%|██████████| 100/100 [00:06<00:00, 15.37it/s, v_num=0, loss/train_loss=0.236, loss/val_loss=0.271]

Epoch 981, global step 98200: 'loss/val_loss' was not in top 1


Epoch 982: 100%|██████████| 100/100 [00:06<00:00, 15.57it/s, v_num=0, loss/train_loss=0.163, loss/val_loss=0.256]

Epoch 982, global step 98300: 'loss/val_loss' was not in top 1


Epoch 983: 100%|██████████| 100/100 [00:06<00:00, 15.25it/s, v_num=0, loss/train_loss=0.109, loss/val_loss=0.218]

Epoch 983, global step 98400: 'loss/val_loss' was not in top 1


Epoch 984: 100%|██████████| 100/100 [00:06<00:00, 15.08it/s, v_num=0, loss/train_loss=0.105, loss/val_loss=0.242] 

Epoch 984, global step 98500: 'loss/val_loss' was not in top 1


Epoch 985: 100%|██████████| 100/100 [00:06<00:00, 15.48it/s, v_num=0, loss/train_loss=0.180, loss/val_loss=0.243]

Epoch 985, global step 98600: 'loss/val_loss' was not in top 1


Epoch 986: 100%|██████████| 100/100 [00:06<00:00, 15.25it/s, v_num=0, loss/train_loss=0.0944, loss/val_loss=0.217]

Epoch 986, global step 98700: 'loss/val_loss' was not in top 1


Epoch 987: 100%|██████████| 100/100 [00:06<00:00, 15.72it/s, v_num=0, loss/train_loss=0.120, loss/val_loss=0.236] 

Epoch 987, global step 98800: 'loss/val_loss' was not in top 1


Epoch 988: 100%|██████████| 100/100 [00:06<00:00, 15.29it/s, v_num=0, loss/train_loss=0.166, loss/val_loss=0.280]

Epoch 988, global step 98900: 'loss/val_loss' was not in top 1


Epoch 989: 100%|██████████| 100/100 [00:06<00:00, 15.24it/s, v_num=0, loss/train_loss=0.125, loss/val_loss=0.239]

Epoch 989, global step 99000: 'loss/val_loss' was not in top 1


Epoch 990: 100%|██████████| 100/100 [00:06<00:00, 15.76it/s, v_num=0, loss/train_loss=0.131, loss/val_loss=0.260]

Epoch 990, global step 99100: 'loss/val_loss' was not in top 1


Epoch 991: 100%|██████████| 100/100 [00:06<00:00, 15.29it/s, v_num=0, loss/train_loss=0.154, loss/val_loss=0.265]

Epoch 991, global step 99200: 'loss/val_loss' was not in top 1


Epoch 992: 100%|██████████| 100/100 [00:06<00:00, 15.39it/s, v_num=0, loss/train_loss=0.0528, loss/val_loss=0.213]

Epoch 992, global step 99300: 'loss/val_loss' was not in top 1


Epoch 993: 100%|██████████| 100/100 [00:06<00:00, 15.49it/s, v_num=0, loss/train_loss=0.151, loss/val_loss=0.228] 

Epoch 993, global step 99400: 'loss/val_loss' was not in top 1


Epoch 994: 100%|██████████| 100/100 [00:06<00:00, 15.27it/s, v_num=0, loss/train_loss=0.167, loss/val_loss=0.251]

Epoch 994, global step 99500: 'loss/val_loss' was not in top 1


Epoch 995: 100%|██████████| 100/100 [00:06<00:00, 15.34it/s, v_num=0, loss/train_loss=0.110, loss/val_loss=0.267]

Epoch 995, global step 99600: 'loss/val_loss' was not in top 1


Epoch 996: 100%|██████████| 100/100 [00:06<00:00, 15.56it/s, v_num=0, loss/train_loss=0.211, loss/val_loss=0.256] 

Epoch 996, global step 99700: 'loss/val_loss' was not in top 1


Epoch 997: 100%|██████████| 100/100 [00:06<00:00, 15.61it/s, v_num=0, loss/train_loss=0.0917, loss/val_loss=0.282]

Epoch 997, global step 99800: 'loss/val_loss' was not in top 1


Epoch 998: 100%|██████████| 100/100 [00:06<00:00, 15.15it/s, v_num=0, loss/train_loss=0.162, loss/val_loss=0.282] 

Epoch 998, global step 99900: 'loss/val_loss' was not in top 1


Epoch 999: 100%|██████████| 100/100 [00:06<00:00, 15.42it/s, v_num=0, loss/train_loss=0.140, loss/val_loss=0.232]

Epoch 999, global step 100000: 'loss/val_loss' was not in top 1
`Trainer.fit` stopped: `max_epochs=1000` reached.


Epoch 999: 100%|██████████| 100/100 [00:06<00:00, 15.41it/s, v_num=0, loss/train_loss=0.140, loss/val_loss=0.232]


FileNotFoundError: [Errno 2] No such file or directory: '../.../splits/task1/30_protein_test.csv'

In [21]:
saving_df

NameError: name 'saving_df' is not defined

In [None]:
task1_baselines/Pika/Pika/notebooks/train_pika.ipynb