# YATO Handbook

**YATO**, an open-source Python library for text analysis. In particular, **YATO** focus on sequence labeling and sequence classification tasks, including extensive fundamental NLP tasks such as par-of-speech tagging, chunking, NER, CCG supertagging, sentiment analysis, and sentence classification. **YATO** support both designing specific RNN-based and Transformer-based through user-friendly configuration and integrating the SOTA pre-trained language models such as BERT. 


## Data Preparation 
**YATO** supports sequence labeling and sequence classification tasks.   
**YATO** offers an official function to convert sequence labeling dataset style.**Location:ncrf.utils.tagSchemeConverter**  
NER Dataset Style: BIO, BIOES, BMES...   
Sequence Classification: Text ||| Label  

## Configuration Preparation  
You can specify the model, optimizer, and decoding through the configuration file.   
### Dataloader  
train_dir=the path of train file    
dev_dir=the path of validation file   
test_dir=the path of test file    
model_dir=the path to save model weights  
dset_dir=the path of configuration encode file    
### Model
use_crf=True/False     
use_char=True/False     
char_seq_feature=GRU/LSTM/CNN/False     
use_word_seq=True/False     
use_word_emb=True/False     
word_emb_dir=The path of word embedding file    
word_seq_feature=GRU/LSTM/CNN/FeedFowrd/False   
low_level_transformer=pretrain language model from huggingface  
low_level_transformer_finetune=True/False  
high_level_transformer=pretrain language model from huggingface  
high_level_transformer_finetune=True/False      
cnn_layer=layer number     
char_hidden_dim=dimension number      
hidden_dim=dimension number     
lstm_layer=layer number      
bilstm=True/False

### Hyperparameters       
sentence_classification=True/False        
status=train/decode         
dropout=Dropout Rate         
optimizer=SGD/Adagrad/adadelta/rmsprop/adam/adamw    
iteration=epoch number         
batch_size=batch size           
learning_rate=learning rate         
gpu=True/False         
device=cuda:0        
scheduler=get_linear_schedule_with_warmup/get_cosine_schedule_with_warmup            
warmup_step_rate=warmup steo rate         

## Train

In [1]:
from yato import YATO
model = YATO('sample_data/demo.bert.config')#configuration file
model.train()

MODEL: train
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Start Sentence Classification task...
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
DATA SUMMARY START:
 I/O:
     Train  file directory: /home/hsani/YATO/sample_data/clf.train.txt
     Dev    file directory: /home/hsani/YATO/sample_data/clf.dev.txt
     Test   file directory: /home/hsani/YATO/sample_data/clf.test.txt
     Raw    file directory: None
     Dset   file directory: /home/hsani/YATO/sample_data/bert_base_sample.dset
     Model  file directory: /home/hsani/YATO/sample_data/
     Loadmodel   directory: None
     Decode file directory: None
++++++++++++++++++++++++++++++++++++++++
Data and Settings:
     Tag          scheme: Not sequence labeling task
     Split         token:  ||| 
     MAX SENTENCE LENGTH: 512
     Number   normalized: False
     Word         cutoff: 0
     Train instance number: 4001
     De

Training model...


build sentence classification network...
use_char:  False
word feature extractor:  LSTM
build word representation...
Loading transformer... model: bert-base-uncased


We strongly recommend passing in an `attention_mask` since your input_ids may be padded. See https://huggingface.co/docs/transformers/troubleshooting#incorrect-output-when-padding-tokens-arent-masked.


 ++++++++++++++++++++++++++++++++++++++++
BertConfig {
  "_name_or_path": "bert-base-uncased",
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.39.2",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}

 ++++++++++++++++++++++++++++++++++++++++
Epoch: 0/10


     Instance: 2000; Time: 20.43s; loss: 176.5522;
INFO:test:     Instance: 2000; Time: 20.43s; loss: 176.5522;
     Instance: 4000; Time: 19.00s; loss: 149.4632;
INFO:test:     Instance: 4000; Time: 19.00s; loss: 149.4632;
100%|██████████| 13/13 [00:00<00:00, 18.90it/s]
Dev: time: 0.70s speed: 291.40st/s; acc: 0.5323; f: 0.5090;
INFO:test:Dev: time: 0.70s speed: 291.40st/s; acc: 0.5323; f: 0.5090;
100%|██████████| 13/13 [00:00<00:00, 18.26it/s]
Save current best acc model in file:/home/hsani/YATO/sample_data/acc0.6716_p0.6817_r0.6311_f0.6411.pth
INFO:test:Save current best acc model in file:/home/hsani/YATO/sample_data/acc0.6716_p0.6817_r0.6311_f0.6411.pth
Save current best f model in file:/home/hsani/YATO/sample_data/acc0.6716_p0.6817_r0.6311_f0.6411.pth
INFO:test:Save current best f model in file:/home/hsani/YATO/sample_data/acc0.6716_p0.6817_r0.6311_f0.6411.pth
Test: time: 0.72s, speed: 281.85st/s; acc: 0.6716, p: 0.6817, r: 0.6311, f: 0.6411
INFO:test:Test: time: 0.72s, speed: 281

Epoch: 1/10


     Instance: 2000; Time: 19.30s; loss: 118.8626;
INFO:test:     Instance: 2000; Time: 19.30s; loss: 118.8626;
     Instance: 4000; Time: 19.24s; loss: 113.9749;
INFO:test:     Instance: 4000; Time: 19.24s; loss: 113.9749;
100%|██████████| 13/13 [00:00<00:00, 14.61it/s]
Dev: time: 0.90s speed: 225.49st/s; acc: 0.4925; f: 0.4846;
INFO:test:Dev: time: 0.90s speed: 225.49st/s; acc: 0.4925; f: 0.4846;
100%|██████████| 13/13 [00:00<00:00, 14.22it/s]
Test: time: 0.92s, speed: 219.42st/s; acc: 0.7214, p: 0.7368, r: 0.7404, f: 0.7018
INFO:test:Test: time: 0.92s, speed: 219.42st/s; acc: 0.7214, p: 0.7368, r: 0.7404, f: 0.7018


Epoch: 2/10


     Instance: 2000; Time: 18.86s; loss: 80.6135;
INFO:test:     Instance: 2000; Time: 18.86s; loss: 80.6135;
     Instance: 4000; Time: 19.17s; loss: 76.2640;
INFO:test:     Instance: 4000; Time: 19.17s; loss: 76.2640;
100%|██████████| 13/13 [00:00<00:00, 14.58it/s]
Dev: time: 0.90s speed: 224.99st/s; acc: 0.5323; f: 0.5272;
INFO:test:Dev: time: 0.90s speed: 224.99st/s; acc: 0.5323; f: 0.5272;
100%|██████████| 13/13 [00:00<00:00, 14.24it/s]
Save current best f model in file:/home/hsani/YATO/sample_data/acc0.8905_p0.8889_r0.9034_f0.8933.pth
INFO:test:Save current best f model in file:/home/hsani/YATO/sample_data/acc0.8905_p0.8889_r0.9034_f0.8933.pth
Test: time: 0.92s, speed: 219.86st/s; acc: 0.8905, p: 0.8889, r: 0.9034, f: 0.8933
INFO:test:Test: time: 0.92s, speed: 219.86st/s; acc: 0.8905, p: 0.8889, r: 0.9034, f: 0.8933


Epoch: 3/10


     Instance: 2000; Time: 19.65s; loss: 50.5342;
INFO:test:     Instance: 2000; Time: 19.65s; loss: 50.5342;
     Instance: 4000; Time: 20.37s; loss: 50.4661;
INFO:test:     Instance: 4000; Time: 20.37s; loss: 50.4661;
100%|██████████| 13/13 [00:00<00:00, 19.25it/s]
Dev: time: 0.68s speed: 296.92st/s; acc: 0.5572; f: 0.5547;
INFO:test:Dev: time: 0.68s speed: 296.92st/s; acc: 0.5572; f: 0.5547;
100%|██████████| 13/13 [00:00<00:00, 18.77it/s]
Save current best acc model in file:/home/hsani/YATO/sample_data/acc0.9104_p0.9026_r0.9297_f0.9132.pth
INFO:test:Save current best acc model in file:/home/hsani/YATO/sample_data/acc0.9104_p0.9026_r0.9297_f0.9132.pth
Save current best f model in file:/home/hsani/YATO/sample_data/acc0.9104_p0.9026_r0.9297_f0.9132.pth
INFO:test:Save current best f model in file:/home/hsani/YATO/sample_data/acc0.9104_p0.9026_r0.9297_f0.9132.pth
Test: time: 0.70s, speed: 289.58st/s; acc: 0.9104, p: 0.9026, r: 0.9297, f: 0.9132
INFO:test:Test: time: 0.70s, speed: 289.58s

Epoch: 4/10


     Instance: 2000; Time: 18.59s; loss: 30.6586;
INFO:test:     Instance: 2000; Time: 18.59s; loss: 30.6586;
     Instance: 4000; Time: 20.09s; loss: 30.8606;
INFO:test:     Instance: 4000; Time: 20.09s; loss: 30.8606;
100%|██████████| 13/13 [00:00<00:00, 14.61it/s]
Dev: time: 0.90s speed: 225.43st/s; acc: 0.5622; f: 0.5689;
INFO:test:Dev: time: 0.90s speed: 225.43st/s; acc: 0.5622; f: 0.5689;
100%|██████████| 13/13 [00:00<00:00, 14.33it/s]
Save current best acc model in file:/home/hsani/YATO/sample_data/acc0.9403_p0.9383_r0.9530_f0.9442.pth
INFO:test:Save current best acc model in file:/home/hsani/YATO/sample_data/acc0.9403_p0.9383_r0.9530_f0.9442.pth
Save current best f model in file:/home/hsani/YATO/sample_data/acc0.9403_p0.9383_r0.9530_f0.9442.pth
INFO:test:Save current best f model in file:/home/hsani/YATO/sample_data/acc0.9403_p0.9383_r0.9530_f0.9442.pth
Test: time: 0.91s, speed: 221.10st/s; acc: 0.9403, p: 0.9383, r: 0.9530, f: 0.9442
INFO:test:Test: time: 0.91s, speed: 221.10s

Epoch: 5/10


     Instance: 2000; Time: 20.06s; loss: 16.5714;
INFO:test:     Instance: 2000; Time: 20.06s; loss: 16.5714;
     Instance: 4000; Time: 20.16s; loss: 19.9145;
INFO:test:     Instance: 4000; Time: 20.16s; loss: 19.9145;
100%|██████████| 13/13 [00:00<00:00, 19.26it/s]
Dev: time: 0.68s speed: 297.26st/s; acc: 0.5174; f: 0.5175;
INFO:test:Dev: time: 0.68s speed: 297.26st/s; acc: 0.5174; f: 0.5175;
100%|██████████| 13/13 [00:00<00:00, 15.67it/s]
Test: time: 0.84s, speed: 241.94st/s; acc: 0.9652, p: 0.9660, r: 0.9742, f: 0.9689
INFO:test:Test: time: 0.84s, speed: 241.94st/s; acc: 0.9652, p: 0.9660, r: 0.9742, f: 0.9689


Epoch: 6/10


     Instance: 2000; Time: 19.69s; loss: 17.1701;
INFO:test:     Instance: 2000; Time: 19.69s; loss: 17.1701;
     Instance: 4000; Time: 20.34s; loss: 13.4098;
INFO:test:     Instance: 4000; Time: 20.34s; loss: 13.4098;
100%|██████████| 13/13 [00:00<00:00, 18.69it/s]
Dev: time: 0.70s speed: 288.26st/s; acc: 0.5174; f: 0.5181;
INFO:test:Dev: time: 0.70s speed: 288.26st/s; acc: 0.5174; f: 0.5181;
100%|██████████| 13/13 [00:00<00:00, 18.73it/s]
Test: time: 0.70s, speed: 289.04st/s; acc: 0.9900, p: 0.9907, r: 0.9926, f: 0.9915
INFO:test:Test: time: 0.70s, speed: 289.04st/s; acc: 0.9900, p: 0.9907, r: 0.9926, f: 0.9915


Epoch: 7/10


     Instance: 2000; Time: 19.11s; loss: 9.7385;
INFO:test:     Instance: 2000; Time: 19.11s; loss: 9.7385;
     Instance: 4000; Time: 19.89s; loss: 7.2472;
INFO:test:     Instance: 4000; Time: 19.89s; loss: 7.2472;
100%|██████████| 13/13 [00:00<00:00, 14.59it/s]
Dev: time: 0.90s speed: 225.24st/s; acc: 0.4826; f: 0.4849;
INFO:test:Dev: time: 0.90s speed: 225.24st/s; acc: 0.4826; f: 0.4849;
100%|██████████| 13/13 [00:00<00:00, 14.30it/s]
Test: time: 0.91s, speed: 220.92st/s; acc: 0.9701, p: 0.9779, r: 0.9721, f: 0.9739
INFO:test:Test: time: 0.91s, speed: 220.92st/s; acc: 0.9701, p: 0.9779, r: 0.9721, f: 0.9739


Epoch: 8/10


     Instance: 2000; Time: 18.70s; loss: 10.8484;
INFO:test:     Instance: 2000; Time: 18.70s; loss: 10.8484;
     Instance: 4000; Time: 18.86s; loss: 6.3073;
INFO:test:     Instance: 4000; Time: 18.86s; loss: 6.3073;
100%|██████████| 13/13 [00:00<00:00, 19.32it/s]
Dev: time: 0.68s speed: 298.21st/s; acc: 0.5473; f: 0.5548;
INFO:test:Dev: time: 0.68s speed: 298.21st/s; acc: 0.5473; f: 0.5548;
100%|██████████| 13/13 [00:00<00:00, 18.82it/s]
Test: time: 0.70s, speed: 290.57st/s; acc: 1.0000, p: 1.0000, r: 1.0000, f: 1.0000
INFO:test:Test: time: 0.70s, speed: 290.57st/s; acc: 1.0000, p: 1.0000, r: 1.0000, f: 1.0000


Epoch: 9/10


     Instance: 2000; Time: 19.72s; loss: 4.3336;
INFO:test:     Instance: 2000; Time: 19.72s; loss: 4.3336;
     Instance: 4000; Time: 19.37s; loss: 6.8117;
INFO:test:     Instance: 4000; Time: 19.37s; loss: 6.8117;
100%|██████████| 13/13 [00:00<00:00, 19.32it/s]
Dev: time: 0.68s speed: 298.24st/s; acc: 0.5274; f: 0.5244;
INFO:test:Dev: time: 0.68s speed: 298.24st/s; acc: 0.5274; f: 0.5244;
100%|██████████| 13/13 [00:00<00:00, 18.84it/s]
Test: time: 0.70s, speed: 290.82st/s; acc: 0.9950, p: 0.9962, r: 0.9951, f: 0.9956
INFO:test:Test: time: 0.70s, speed: 290.82st/s; acc: 0.9950, p: 0.9962, r: 0.9951, f: 0.9956
Best Test F1 Score: 0.9441933867224707, Best Validation F1 Score: 0.5688652164950947, Best Test F1 Score Epoch: 4 
INFO:test:Best Test F1 Score: 0.9441933867224707, Best Validation F1 Score: 0.5688652164950947, Best Test F1 Score Epoch: 4 


## Decode    
Decode Configuration 

status=decode  
raw_dir=The path of decode file    
nbest=0 (NER)/1 (sentence classification)   
decode_dir=The path of decode result file  
load_model_dir=The path of model weights          
sentence_classification=True/False  

In [2]:
from yato import YATO
decode_model = YATO('sample_data/decode.config')#configuration file
result_dict = decode_model.decode()

MODEL: decode
nbest: 0
Load Model from file: /home/hsani/YATO/sample_data/
build sentence classification network...
use_char:  False
word feature extractor:  LSTM
build word representation...
Loading transformer... model: bert-base-uncased
 ++++++++++++++++++++++++++++++++++++++++
BertConfig {
  "_name_or_path": "bert-base-uncased",
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.39.2",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}

 ++++++++++++++++++++++++++++++++++++++++
Decode predict data, nbest: 

100%|██████████| 13/13 [00:00<00:00, 14.27it/s]


predict: time:0.92s, speed:220.30st/s; acc: 0.9403
len(content_list) 201 201
Predict predict result has been written into file. /home/hsani/YATO/sample_data/test.clf.decode


In [4]:
from yato import YATO
from utils import text_attention
model = YATO('sample_data/decode.config')

sample = ["a fairly by-the-books blend of action and romance with sprinklings of intentional and unintentional comedy . ||| 1"]
probsutils, weights_ls = model.attention(input_text=sample)
print(probsutils)
sentece = "a fairly by-the-books blend of action and romance with sprinklings of intentional and unintentional comedy . "
atten = weights_ls[0].tolist()

text_attention.visualization(sentece, atten[0], tex = 'sample.tex', color='red')

MODEL: Attention Weight
build sentence classification network...
use_char:  False
word feature extractor:  LSTM
build word representation...
Loading transformer... model: bert-base-uncased
 ++++++++++++++++++++++++++++++++++++++++
BertConfig {
  "_name_or_path": "bert-base-uncased",
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.39.2",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}

 ++++++++++++++++++++++++++++++++++++++++


100%|██████████| 1/1 [00:00<00:00, 82.15it/s]

[array([[0.0000000e+00, 1.6404273e-03, 7.1239949e-04, 1.4826994e-02,
        9.8268533e-01, 1.3490008e-04]], dtype=float32)]





In [5]:
from yato import YATO
from utils import text_attention
model = YATO('sample_data/decode.config')

sample = ["a stirring , funny and finally transporting re-imagining of beauty and the beast and 1930s horror films ||| 4"]
probsutils, weights_ls = model.attention(input_text=sample)
print(probsutils)
sentece = "a stirring , funny and finally transporting re-imagining of beauty and the beast and 1930s horror films  "
atten = weights_ls[0].tolist()

text_attention.visualization(sentece, atten[0], tex = 'sample1.tex', color='red')

MODEL: Attention Weight
build sentence classification network...
use_char:  False
word feature extractor:  LSTM
build word representation...
Loading transformer... model: bert-base-uncased
 ++++++++++++++++++++++++++++++++++++++++
BertConfig {
  "_name_or_path": "bert-base-uncased",
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.39.2",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}

 ++++++++++++++++++++++++++++++++++++++++


100%|██████████| 1/1 [00:00<00:00, 75.38it/s]

[array([[0.0000000e+00, 9.9611026e-01, 1.8969811e-04, 2.8529303e-04,
        2.8180010e-03, 5.9664808e-04]], dtype=float32)]





In [6]:
from yato import YATO
from utils import text_attention
model = YATO('sample_data/decode.config')

sample = ["this is a visually stunning rumination on love , memory , history and the war between art and commerce . ||| 3"]
probsutils, weights_ls = model.attention(input_text=sample)
print(probsutils)
sentece = "this is a visually stunning rumination on love , memory , history and the war between art and commerce ."
atten = weights_ls[0].tolist()

text_attention.visualization(sentece, atten[0], tex = 'sample2.tex', color='red')

MODEL: Attention Weight
build sentence classification network...
use_char:  False
word feature extractor:  LSTM
build word representation...
Loading transformer... model: bert-base-uncased
 ++++++++++++++++++++++++++++++++++++++++
BertConfig {
  "_name_or_path": "bert-base-uncased",
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.39.2",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}

 ++++++++++++++++++++++++++++++++++++++++


100%|██████████| 1/1 [00:00<00:00, 76.98it/s]

[array([[0.0000000e+00, 6.9443661e-01, 5.6235335e-04, 2.2461477e-03,
        3.0148485e-01, 1.2701013e-03]], dtype=float32)]





## Example

In [18]:
### use # to comment out the configure item

### I/O ###
train_dir=demo.train.bioes
dev_dir=demo.dev.bioes
test_dir=demo.test.bioes
model_dir=test/

dset_dir=bert_base_cased.dset

norm_word_emb=False
norm_char_emb=False
number_normalized=False
seg=True
word_emb_dim=50
char_emb_dim=30

###NetworkConfiguration###
use_crf=False
use_char=False
char_seq_feature=CNN
use_word_seq=False
use_word_emb=True
word_seq_feature=LSTM
low_level_transformer=None
low_level_transformer_finetune=False
high_level_transformer=bert-base-cased
high_level_transformer_finetune=True

###TrainingSetting###
status=train
optimizer=AdamW
iteration=10
batch_size=16
ave_batch_loss=False

###Hyperparameters###
cnn_layer=4
char_hidden_dim=50
hidden_dim=768
dropout=0.3
lstm_layer=2
bilstm=True
learning_rate=3e-5
gpu=True
device=cuda:0
scheduler=get_cosine_schedule_with_warmup
warmup_step_rate=0.05

SyntaxError: invalid syntax (698927980.py, line 46)

In [15]:
status=decode
raw_dir=demo.test.bioes
nbest=0
decode_dir=test.decode
load_model_dir=test/acc: 0.9777, p: 0.8973, r: 0.8973, f: 0.8973.pth      
sentence_classification=False

SyntaxError: invalid syntax (4069251795.py, line 5)

In [2]:
from yato import YATO
model = YATO('sample_data/bert_base_gelu_sst2.config')#configuration file
model.train()

MODEL: train
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Start Sentence Classification task...
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
DATA SUMMARY START:
 I/O:
     Train  file directory: /home/hsani/YATO/sample_data/clf.train.txt
     Dev    file directory: /home/hsani/YATO/sample_data/clf.dev.txt
     Test   file directory: /home/hsani/YATO/sample_data/clf.test.txt
     Raw    file directory: None
     Dset   file directory: /home/hsani/YATO/sample_data/bert_base_gelu_sst2.dset
     Model  file directory: /home/hsani/YATO/sample_data/
     Loadmodel   directory: None
     Decode file directory: None
++++++++++++++++++++++++++++++++++++++++
Data and Settings:
     Tag          scheme: Not sequence labeling task
     Split         token:  ||| 
     MAX SENTENCE LENGTH: 512
     Number   normalized: False
     Word         cutoff: 0
     Train instance number: 4001
    

Training model...


build sentence classification network...
use_char:  False
word feature extractor:  LSTM
build word representation...
Loading transformer... model: bert-base-uncased


We strongly recommend passing in an `attention_mask` since your input_ids may be padded. See https://huggingface.co/docs/transformers/troubleshooting#incorrect-output-when-padding-tokens-arent-masked.


 ++++++++++++++++++++++++++++++++++++++++
BertConfig {
  "_name_or_path": "bert-base-uncased",
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.39.2",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}

 ++++++++++++++++++++++++++++++++++++++++
Epoch: 0/20


     Instance: 2000; Time: 20.08s; loss: 175.6447;
INFO:test:     Instance: 2000; Time: 20.08s; loss: 175.6447;
     Instance: 4000; Time: 19.18s; loss: 151.3492;
INFO:test:     Instance: 4000; Time: 19.18s; loss: 151.3492;
100%|██████████| 13/13 [00:00<00:00, 19.41it/s]
Dev: time: 0.68s speed: 299.26st/s; acc: 0.5274; f: 0.5206;
INFO:test:Dev: time: 0.68s speed: 299.26st/s; acc: 0.5274; f: 0.5206;
100%|██████████| 13/13 [00:00<00:00, 18.91it/s]
Save current best acc model in file:/home/hsani/YATO/sample_data/acc0.6567_p0.6562_r0.6543_f0.6482.pth
INFO:test:Save current best acc model in file:/home/hsani/YATO/sample_data/acc0.6567_p0.6562_r0.6543_f0.6482.pth
Save current best f model in file:/home/hsani/YATO/sample_data/acc0.6567_p0.6562_r0.6543_f0.6482.pth
INFO:test:Save current best f model in file:/home/hsani/YATO/sample_data/acc0.6567_p0.6562_r0.6543_f0.6482.pth
Test: time: 0.69s, speed: 291.79st/s; acc: 0.6567, p: 0.6562, r: 0.6543, f: 0.6482
INFO:test:Test: time: 0.69s, speed: 291

Epoch: 1/20


     Instance: 2000; Time: 19.55s; loss: 124.7805;
INFO:test:     Instance: 2000; Time: 19.55s; loss: 124.7805;
     Instance: 4000; Time: 19.24s; loss: 118.6242;
INFO:test:     Instance: 4000; Time: 19.24s; loss: 118.6242;
100%|██████████| 13/13 [00:00<00:00, 19.42it/s]
Dev: time: 0.68s speed: 299.62st/s; acc: 0.4677; f: 0.4615;
INFO:test:Dev: time: 0.68s speed: 299.62st/s; acc: 0.4677; f: 0.4615;
100%|██████████| 13/13 [00:00<00:00, 18.93it/s]
Test: time: 0.69s, speed: 292.06st/s; acc: 0.6866, p: 0.6912, r: 0.6941, f: 0.6671
INFO:test:Test: time: 0.69s, speed: 292.06st/s; acc: 0.6866, p: 0.6912, r: 0.6941, f: 0.6671


Epoch: 2/20


     Instance: 2000; Time: 19.13s; loss: 90.1714;
INFO:test:     Instance: 2000; Time: 19.13s; loss: 90.1714;
     Instance: 4000; Time: 19.28s; loss: 85.7735;
INFO:test:     Instance: 4000; Time: 19.28s; loss: 85.7735;
100%|██████████| 13/13 [00:00<00:00, 19.46it/s]
Dev: time: 0.67s speed: 300.22st/s; acc: 0.5274; f: 0.5253;
INFO:test:Dev: time: 0.67s speed: 300.22st/s; acc: 0.5274; f: 0.5253;
100%|██████████| 13/13 [00:00<00:00, 18.81it/s]
Save current best f model in file:/home/hsani/YATO/sample_data/acc0.8607_p0.8627_r0.8542_f0.8530.pth
INFO:test:Save current best f model in file:/home/hsani/YATO/sample_data/acc0.8607_p0.8627_r0.8542_f0.8530.pth
Test: time: 0.70s, speed: 290.35st/s; acc: 0.8607, p: 0.8627, r: 0.8542, f: 0.8530
INFO:test:Test: time: 0.70s, speed: 290.35st/s; acc: 0.8607, p: 0.8627, r: 0.8542, f: 0.8530


Epoch: 3/20


     Instance: 2000; Time: 18.94s; loss: 61.5113;
INFO:test:     Instance: 2000; Time: 18.94s; loss: 61.5113;
     Instance: 4000; Time: 20.34s; loss: 61.9429;
INFO:test:     Instance: 4000; Time: 20.34s; loss: 61.9429;
100%|██████████| 13/13 [00:00<00:00, 19.41it/s]
Dev: time: 0.68s speed: 299.37st/s; acc: 0.5672; f: 0.5768;
INFO:test:Dev: time: 0.68s speed: 299.37st/s; acc: 0.5672; f: 0.5768;
100%|██████████| 13/13 [00:00<00:00, 18.94it/s]
Save current best acc model in file:/home/hsani/YATO/sample_data/acc0.9303_p0.9281_r0.9414_f0.9321.pth
INFO:test:Save current best acc model in file:/home/hsani/YATO/sample_data/acc0.9303_p0.9281_r0.9414_f0.9321.pth
Save current best f model in file:/home/hsani/YATO/sample_data/acc0.9303_p0.9281_r0.9414_f0.9321.pth
INFO:test:Save current best f model in file:/home/hsani/YATO/sample_data/acc0.9303_p0.9281_r0.9414_f0.9321.pth
Test: time: 0.69s, speed: 292.24st/s; acc: 0.9303, p: 0.9281, r: 0.9414, f: 0.9321
INFO:test:Test: time: 0.69s, speed: 292.24s

Epoch: 4/20


     Instance: 2000; Time: 19.08s; loss: 35.8676;
INFO:test:     Instance: 2000; Time: 19.08s; loss: 35.8676;
     Instance: 4000; Time: 19.99s; loss: 40.4438;
INFO:test:     Instance: 4000; Time: 19.99s; loss: 40.4438;
100%|██████████| 13/13 [00:00<00:00, 19.43it/s]
Dev: time: 0.68s speed: 299.71st/s; acc: 0.5124; f: 0.5171;
INFO:test:Dev: time: 0.68s speed: 299.71st/s; acc: 0.5124; f: 0.5171;
100%|██████████| 13/13 [00:00<00:00, 18.90it/s]
Test: time: 0.69s, speed: 291.75st/s; acc: 0.9005, p: 0.8924, r: 0.9209, f: 0.9005
INFO:test:Test: time: 0.69s, speed: 291.75st/s; acc: 0.9005, p: 0.8924, r: 0.9209, f: 0.9005


Epoch: 5/20


     Instance: 2000; Time: 18.98s; loss: 25.7206;
INFO:test:     Instance: 2000; Time: 18.98s; loss: 25.7206;
     Instance: 4000; Time: 19.28s; loss: 26.0636;
INFO:test:     Instance: 4000; Time: 19.28s; loss: 26.0636;
100%|██████████| 13/13 [00:00<00:00, 19.39it/s]
Dev: time: 0.68s speed: 299.02st/s; acc: 0.5821; f: 0.5838;
INFO:test:Dev: time: 0.68s speed: 299.02st/s; acc: 0.5821; f: 0.5838;
100%|██████████| 13/13 [00:00<00:00, 18.94it/s]
Save current best acc model in file:/home/hsani/YATO/sample_data/acc0.9552_p0.9514_r0.9680_f0.9565.pth
INFO:test:Save current best acc model in file:/home/hsani/YATO/sample_data/acc0.9552_p0.9514_r0.9680_f0.9565.pth
Save current best f model in file:/home/hsani/YATO/sample_data/acc0.9552_p0.9514_r0.9680_f0.9565.pth
INFO:test:Save current best f model in file:/home/hsani/YATO/sample_data/acc0.9552_p0.9514_r0.9680_f0.9565.pth
Test: time: 0.69s, speed: 292.33st/s; acc: 0.9552, p: 0.9514, r: 0.9680, f: 0.9565
INFO:test:Test: time: 0.69s, speed: 292.33s

Epoch: 6/20


     Instance: 2000; Time: 18.87s; loss: 19.5939;
INFO:test:     Instance: 2000; Time: 18.87s; loss: 19.5939;
     Instance: 4000; Time: 19.37s; loss: 17.6004;
INFO:test:     Instance: 4000; Time: 19.37s; loss: 17.6004;
100%|██████████| 13/13 [00:00<00:00, 19.41it/s]
Dev: time: 0.68s speed: 299.43st/s; acc: 0.5622; f: 0.5693;
INFO:test:Dev: time: 0.68s speed: 299.43st/s; acc: 0.5622; f: 0.5693;
100%|██████████| 13/13 [00:00<00:00, 18.91it/s]
Test: time: 0.69s, speed: 291.90st/s; acc: 0.9851, p: 0.9807, r: 0.9877, f: 0.9840
INFO:test:Test: time: 0.69s, speed: 291.90st/s; acc: 0.9851, p: 0.9807, r: 0.9877, f: 0.9840


Epoch: 7/20


     Instance: 2000; Time: 18.99s; loss: 11.5177;
INFO:test:     Instance: 2000; Time: 18.99s; loss: 11.5177;
     Instance: 4000; Time: 19.24s; loss: 12.6215;
INFO:test:     Instance: 4000; Time: 19.24s; loss: 12.6215;
100%|██████████| 13/13 [00:00<00:00, 19.44it/s]
Dev: time: 0.68s speed: 299.76st/s; acc: 0.5522; f: 0.5570;
INFO:test:Dev: time: 0.68s speed: 299.76st/s; acc: 0.5522; f: 0.5570;
100%|██████████| 13/13 [00:00<00:00, 18.90it/s]
Test: time: 0.69s, speed: 291.73st/s; acc: 0.9851, p: 0.9811, r: 0.9873, f: 0.9840
INFO:test:Test: time: 0.69s, speed: 291.73st/s; acc: 0.9851, p: 0.9811, r: 0.9873, f: 0.9840


Epoch: 8/20


     Instance: 2000; Time: 19.01s; loss: 10.9249;
INFO:test:     Instance: 2000; Time: 19.01s; loss: 10.9249;
     Instance: 4000; Time: 19.32s; loss: 10.1166;
INFO:test:     Instance: 4000; Time: 19.32s; loss: 10.1166;
100%|██████████| 13/13 [00:00<00:00, 15.49it/s]
Dev: time: 0.85s speed: 239.08st/s; acc: 0.5423; f: 0.5491;
INFO:test:Dev: time: 0.85s speed: 239.08st/s; acc: 0.5423; f: 0.5491;
100%|██████████| 13/13 [00:00<00:00, 18.95it/s]
Test: time: 0.69s, speed: 292.46st/s; acc: 0.9950, p: 0.9952, r: 0.9961, f: 0.9956
INFO:test:Test: time: 0.69s, speed: 292.46st/s; acc: 0.9950, p: 0.9952, r: 0.9961, f: 0.9956


Epoch: 9/20


     Instance: 2000; Time: 19.29s; loss: 7.5513;
INFO:test:     Instance: 2000; Time: 19.29s; loss: 7.5513;
     Instance: 4000; Time: 19.75s; loss: 7.1758;
INFO:test:     Instance: 4000; Time: 19.75s; loss: 7.1758;
100%|██████████| 13/13 [00:00<00:00, 14.54it/s]
Dev: time: 0.90s speed: 224.42st/s; acc: 0.5572; f: 0.5665;
INFO:test:Dev: time: 0.90s speed: 224.42st/s; acc: 0.5572; f: 0.5665;
100%|██████████| 13/13 [00:00<00:00, 16.83it/s]
Test: time: 0.78s, speed: 259.78st/s; acc: 0.9950, p: 0.9943, r: 0.9965, f: 0.9953
INFO:test:Test: time: 0.78s, speed: 259.78st/s; acc: 0.9950, p: 0.9943, r: 0.9965, f: 0.9953


Epoch: 10/20


     Instance: 2000; Time: 19.81s; loss: 3.7946;
INFO:test:     Instance: 2000; Time: 19.81s; loss: 3.7946;
     Instance: 4000; Time: 19.85s; loss: 4.6316;
INFO:test:     Instance: 4000; Time: 19.85s; loss: 4.6316;
100%|██████████| 13/13 [00:00<00:00, 14.67it/s]
Dev: time: 0.89s speed: 226.38st/s; acc: 0.5373; f: 0.5408;
INFO:test:Dev: time: 0.89s speed: 226.38st/s; acc: 0.5373; f: 0.5408;
100%|██████████| 13/13 [00:00<00:00, 15.33it/s]
Test: time: 0.85s, speed: 236.63st/s; acc: 1.0000, p: 1.0000, r: 1.0000, f: 1.0000
INFO:test:Test: time: 0.85s, speed: 236.63st/s; acc: 1.0000, p: 1.0000, r: 1.0000, f: 1.0000


Epoch: 11/20


     Instance: 2000; Time: 19.68s; loss: 3.1836;
INFO:test:     Instance: 2000; Time: 19.68s; loss: 3.1836;
     Instance: 4000; Time: 19.59s; loss: 2.8631;
INFO:test:     Instance: 4000; Time: 19.59s; loss: 2.8631;
100%|██████████| 13/13 [00:00<00:00, 15.87it/s]
Dev: time: 0.83s speed: 244.90st/s; acc: 0.5323; f: 0.5248;
INFO:test:Dev: time: 0.83s speed: 244.90st/s; acc: 0.5323; f: 0.5248;
100%|██████████| 13/13 [00:00<00:00, 14.68it/s]
Test: time: 0.89s, speed: 226.58st/s; acc: 1.0000, p: 1.0000, r: 1.0000, f: 1.0000
INFO:test:Test: time: 0.89s, speed: 226.58st/s; acc: 1.0000, p: 1.0000, r: 1.0000, f: 1.0000


Epoch: 12/20


     Instance: 2000; Time: 19.18s; loss: 2.5499;
INFO:test:     Instance: 2000; Time: 19.18s; loss: 2.5499;
     Instance: 4000; Time: 19.62s; loss: 1.5558;
INFO:test:     Instance: 4000; Time: 19.62s; loss: 1.5558;
100%|██████████| 13/13 [00:00<00:00, 19.41it/s]
Dev: time: 0.68s speed: 299.47st/s; acc: 0.5622; f: 0.5691;
INFO:test:Dev: time: 0.68s speed: 299.47st/s; acc: 0.5622; f: 0.5691;
100%|██████████| 13/13 [00:00<00:00, 18.89it/s]
Test: time: 0.70s, speed: 291.38st/s; acc: 1.0000, p: 1.0000, r: 1.0000, f: 1.0000
INFO:test:Test: time: 0.70s, speed: 291.38st/s; acc: 1.0000, p: 1.0000, r: 1.0000, f: 1.0000


Epoch: 13/20


     Instance: 2000; Time: 19.35s; loss: 0.9698;
INFO:test:     Instance: 2000; Time: 19.35s; loss: 0.9698;
     Instance: 4000; Time: 19.52s; loss: 1.6355;
INFO:test:     Instance: 4000; Time: 19.52s; loss: 1.6355;
100%|██████████| 13/13 [00:00<00:00, 19.47it/s]
Dev: time: 0.67s speed: 300.56st/s; acc: 0.5771; f: 0.5845;
INFO:test:Dev: time: 0.67s speed: 300.56st/s; acc: 0.5771; f: 0.5845;
100%|██████████| 13/13 [00:00<00:00, 18.99it/s]
Save current best f model in file:/home/hsani/YATO/sample_data/acc1.0000_p1.0000_r1.0000_f1.0000.pth
INFO:test:Save current best f model in file:/home/hsani/YATO/sample_data/acc1.0000_p1.0000_r1.0000_f1.0000.pth
Test: time: 0.69s, speed: 293.19st/s; acc: 1.0000, p: 1.0000, r: 1.0000, f: 1.0000
INFO:test:Test: time: 0.69s, speed: 293.19st/s; acc: 1.0000, p: 1.0000, r: 1.0000, f: 1.0000


Epoch: 14/20


     Instance: 2000; Time: 19.53s; loss: 0.6991;
INFO:test:     Instance: 2000; Time: 19.53s; loss: 0.6991;
     Instance: 4000; Time: 19.64s; loss: 0.8367;
INFO:test:     Instance: 4000; Time: 19.64s; loss: 0.8367;
100%|██████████| 13/13 [00:00<00:00, 19.42it/s]
Dev: time: 0.68s speed: 299.53st/s; acc: 0.5522; f: 0.5467;
INFO:test:Dev: time: 0.68s speed: 299.53st/s; acc: 0.5522; f: 0.5467;
100%|██████████| 13/13 [00:00<00:00, 15.82it/s]
Test: time: 0.83s, speed: 244.22st/s; acc: 1.0000, p: 1.0000, r: 1.0000, f: 1.0000
INFO:test:Test: time: 0.83s, speed: 244.22st/s; acc: 1.0000, p: 1.0000, r: 1.0000, f: 1.0000


Epoch: 15/20


     Instance: 2000; Time: 19.51s; loss: 0.9432;
INFO:test:     Instance: 2000; Time: 19.51s; loss: 0.9432;
     Instance: 4000; Time: 19.46s; loss: 0.3136;
INFO:test:     Instance: 4000; Time: 19.46s; loss: 0.3136;
100%|██████████| 13/13 [00:00<00:00, 14.72it/s]
Dev: time: 0.89s speed: 227.30st/s; acc: 0.5672; f: 0.5795;
INFO:test:Dev: time: 0.89s speed: 227.30st/s; acc: 0.5672; f: 0.5795;
100%|██████████| 13/13 [00:00<00:00, 14.34it/s]
Test: time: 0.91s, speed: 221.43st/s; acc: 1.0000, p: 1.0000, r: 1.0000, f: 1.0000
INFO:test:Test: time: 0.91s, speed: 221.43st/s; acc: 1.0000, p: 1.0000, r: 1.0000, f: 1.0000


Epoch: 16/20


     Instance: 2000; Time: 19.20s; loss: 0.5188;
INFO:test:     Instance: 2000; Time: 19.20s; loss: 0.5188;
     Instance: 4000; Time: 19.49s; loss: 1.0113;
INFO:test:     Instance: 4000; Time: 19.49s; loss: 1.0113;
100%|██████████| 13/13 [00:00<00:00, 15.10it/s]
Dev: time: 0.87s speed: 233.06st/s; acc: 0.5721; f: 0.5799;
INFO:test:Dev: time: 0.87s speed: 233.06st/s; acc: 0.5721; f: 0.5799;
100%|██████████| 13/13 [00:00<00:00, 18.98it/s]
Test: time: 0.69s, speed: 292.93st/s; acc: 1.0000, p: 1.0000, r: 1.0000, f: 1.0000
INFO:test:Test: time: 0.69s, speed: 292.93st/s; acc: 1.0000, p: 1.0000, r: 1.0000, f: 1.0000


Epoch: 17/20


     Instance: 2000; Time: 19.21s; loss: 0.2074;
INFO:test:     Instance: 2000; Time: 19.21s; loss: 0.2074;
     Instance: 4000; Time: 19.00s; loss: 0.3912;
INFO:test:     Instance: 4000; Time: 19.00s; loss: 0.3912;
100%|██████████| 13/13 [00:00<00:00, 19.45it/s]
Dev: time: 0.67s speed: 300.13st/s; acc: 0.5473; f: 0.5522;
INFO:test:Dev: time: 0.67s speed: 300.13st/s; acc: 0.5473; f: 0.5522;
100%|██████████| 13/13 [00:00<00:00, 18.95it/s]
Test: time: 0.69s, speed: 292.62st/s; acc: 1.0000, p: 1.0000, r: 1.0000, f: 1.0000
INFO:test:Test: time: 0.69s, speed: 292.62st/s; acc: 1.0000, p: 1.0000, r: 1.0000, f: 1.0000


Epoch: 18/20


     Instance: 2000; Time: 19.28s; loss: 0.0763;
INFO:test:     Instance: 2000; Time: 19.28s; loss: 0.0763;
     Instance: 4000; Time: 19.20s; loss: 0.4515;
INFO:test:     Instance: 4000; Time: 19.20s; loss: 0.4515;
100%|██████████| 13/13 [00:00<00:00, 19.45it/s]
Dev: time: 0.67s speed: 300.22st/s; acc: 0.5871; f: 0.5973;
INFO:test:Dev: time: 0.67s speed: 300.22st/s; acc: 0.5871; f: 0.5973;
100%|██████████| 13/13 [00:00<00:00, 19.00it/s]
Save current best acc model in file:/home/hsani/YATO/sample_data/acc1.0000_p1.0000_r1.0000_f1.0000.pth
INFO:test:Save current best acc model in file:/home/hsani/YATO/sample_data/acc1.0000_p1.0000_r1.0000_f1.0000.pth
Save current best f model in file:/home/hsani/YATO/sample_data/acc1.0000_p1.0000_r1.0000_f1.0000.pth
INFO:test:Save current best f model in file:/home/hsani/YATO/sample_data/acc1.0000_p1.0000_r1.0000_f1.0000.pth
Test: time: 0.69s, speed: 293.26st/s; acc: 1.0000, p: 1.0000, r: 1.0000, f: 1.0000
INFO:test:Test: time: 0.69s, speed: 293.26st/s;

Epoch: 19/20


     Instance: 2000; Time: 19.29s; loss: 0.5496;
INFO:test:     Instance: 2000; Time: 19.29s; loss: 0.5496;
     Instance: 4000; Time: 19.13s; loss: 0.4215;
INFO:test:     Instance: 4000; Time: 19.13s; loss: 0.4215;
100%|██████████| 13/13 [00:00<00:00, 14.64it/s]
Dev: time: 0.89s speed: 226.08st/s; acc: 0.5672; f: 0.5713;
INFO:test:Dev: time: 0.89s speed: 226.08st/s; acc: 0.5672; f: 0.5713;
100%|██████████| 13/13 [00:00<00:00, 14.14it/s]
Test: time: 0.93s, speed: 218.38st/s; acc: 1.0000, p: 1.0000, r: 1.0000, f: 1.0000
INFO:test:Test: time: 0.93s, speed: 218.38st/s; acc: 1.0000, p: 1.0000, r: 1.0000, f: 1.0000
Best Test F1 Score: 1.0, Best Validation F1 Score: 0.5972556970237612, Best Test F1 Score Epoch: 18 
INFO:test:Best Test F1 Score: 1.0, Best Validation F1 Score: 0.5972556970237612, Best Test F1 Score Epoch: 18 


In [1]:
from yato import YATO
model = YATO('sample_data/bert_base_gelu_sst2.config')#configuration file
model.train()

MODEL: train
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Start Sentence Classification task...
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
DATA SUMMARY START:
 I/O:
     Train  file directory: /home/hsani/YATO/dataset/train.txt
     Dev    file directory: /home/hsani/YATO/dataset/valid.txt
     Test   file directory: /home/hsani/YATO/dataset/test.txt
     Raw    file directory: None
     Dset   file directory: /home/hsani/YATO/sample_data/bert_base_gelu_sst2.dset
     Model  file directory: /home/hsani/YATO/dataset/
     Loadmodel   directory: None
     Decode file directory: None
++++++++++++++++++++++++++++++++++++++++
Data and Settings:
     Tag          scheme: Not sequence labeling task
     Split         token:  ||| 
     MAX SENTENCE LENGTH: 512
     Number   normalized: False
     Word         cutoff: 0
     Train instance number: 204567
     Dev   instance number: 

Training model...


build sentence classification network...
use_char:  False
word feature extractor:  LSTM
build word representation...
Loading transformer... model: bert-base-uncased
 ++++++++++++++++++++++++++++++++++++++++
BertConfig {
  "_name_or_path": "bert-base-uncased",
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.39.2",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}

 ++++++++++++++++++++++++++++++++++++++++


We strongly recommend passing in an `attention_mask` since your input_ids may be padded. See https://huggingface.co/docs/transformers/troubleshooting#incorrect-output-when-padding-tokens-arent-masked.


Epoch: 0/10


     Instance: 2000; Time: 10.92s; loss: 1238.7957;
INFO:test:     Instance: 2000; Time: 10.92s; loss: 1238.7957;
     Instance: 4000; Time: 10.31s; loss: 1094.6781;
INFO:test:     Instance: 4000; Time: 10.31s; loss: 1094.6781;
     Instance: 6000; Time: 10.26s; loss: 982.4624;
INFO:test:     Instance: 6000; Time: 10.26s; loss: 982.4624;
     Instance: 8000; Time: 10.34s; loss: 922.1888;
INFO:test:     Instance: 8000; Time: 10.34s; loss: 922.1888;
     Instance: 10000; Time: 10.34s; loss: 875.4126;
INFO:test:     Instance: 10000; Time: 10.34s; loss: 875.4126;
     Instance: 12000; Time: 10.32s; loss: 829.5411;
INFO:test:     Instance: 12000; Time: 10.32s; loss: 829.5411;
     Instance: 14000; Time: 10.30s; loss: 788.8942;
INFO:test:     Instance: 14000; Time: 10.30s; loss: 788.8942;
     Instance: 16000; Time: 10.33s; loss: 795.1806;
INFO:test:     Instance: 16000; Time: 10.33s; loss: 795.1806;
     Instance: 18000; Time: 10.31s; loss: 766.6774;
INFO:test:     Instance: 18000; Time: 10

Epoch: 1/10


     Instance: 2000; Time: 10.49s; loss: 437.3026;
INFO:test:     Instance: 2000; Time: 10.49s; loss: 437.3026;
     Instance: 4000; Time: 10.26s; loss: 430.0945;
INFO:test:     Instance: 4000; Time: 10.26s; loss: 430.0945;
     Instance: 6000; Time: 10.28s; loss: 413.1413;
INFO:test:     Instance: 6000; Time: 10.28s; loss: 413.1413;
     Instance: 8000; Time: 10.31s; loss: 420.7138;
INFO:test:     Instance: 8000; Time: 10.31s; loss: 420.7138;
     Instance: 10000; Time: 10.29s; loss: 398.2891;
INFO:test:     Instance: 10000; Time: 10.29s; loss: 398.2891;
     Instance: 12000; Time: 10.31s; loss: 414.1611;
INFO:test:     Instance: 12000; Time: 10.31s; loss: 414.1611;
     Instance: 14000; Time: 10.31s; loss: 403.2612;
INFO:test:     Instance: 14000; Time: 10.31s; loss: 403.2612;
     Instance: 16000; Time: 10.29s; loss: 424.5441;
INFO:test:     Instance: 16000; Time: 10.29s; loss: 424.5441;
     Instance: 18000; Time: 10.28s; loss: 412.2492;
INFO:test:     Instance: 18000; Time: 10.28s

Epoch: 2/10


     Instance: 2000; Time: 10.39s; loss: 298.0806;
INFO:test:     Instance: 2000; Time: 10.39s; loss: 298.0806;
     Instance: 4000; Time: 10.31s; loss: 306.5491;
INFO:test:     Instance: 4000; Time: 10.31s; loss: 306.5491;
     Instance: 6000; Time: 10.34s; loss: 306.7805;
INFO:test:     Instance: 6000; Time: 10.34s; loss: 306.7805;
     Instance: 8000; Time: 10.30s; loss: 303.2202;
INFO:test:     Instance: 8000; Time: 10.30s; loss: 303.2202;
     Instance: 10000; Time: 10.29s; loss: 312.3829;
INFO:test:     Instance: 10000; Time: 10.29s; loss: 312.3829;
     Instance: 12000; Time: 10.29s; loss: 296.3042;
INFO:test:     Instance: 12000; Time: 10.29s; loss: 296.3042;
     Instance: 14000; Time: 10.31s; loss: 291.5059;
INFO:test:     Instance: 14000; Time: 10.31s; loss: 291.5059;
     Instance: 16000; Time: 10.31s; loss: 320.8635;
INFO:test:     Instance: 16000; Time: 10.31s; loss: 320.8635;
     Instance: 18000; Time: 10.32s; loss: 312.9636;
INFO:test:     Instance: 18000; Time: 10.32s

KeyboardInterrupt: 