# Comparing model performance: Could TARS models be useful for low resource languages?

# TARS model with Zero-Shot Classification

First we will be employing a cutting-edge approach for Named Entity Recognition (NER) using TARS models with zero-shot classification. Traditional NER approaches often require extensive training data in each language, which can be time-consuming and resource-intensive. However, by leveraging TARS models with zero-shot classification, we can achieve NER across different languages with minimal training data, making it adaptable to various languages.

Through our implementation of TARS models with zero-shot classification for NER, we aim to showcase the potential of this approach in enabling cross-lingual NER with minimal training data. 

In [1]:
!pip install flair

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting flair
  Downloading flair-0.12.2-py3-none-any.whl (373 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m373.1/373.1 kB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m
Collecting gdown==4.4.0
  Downloading gdown-4.4.0.tar.gz (14 kB)
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting bpemb>=0.3.2
  Downloading bpemb-0.3.4-py3-none-any.whl (19 kB)
Collecting ftfy
  Downloading ftfy-6.1.1-py3-none-any.whl (53 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m53.1/53.1 kB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting transformers[sentencepiece]>=4.18.0
  Downloading transformers-4.28.1-py3-none-any.whl (7.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.0/7.0 MB[0m [31m94.7 MB/s[0m et

In [None]:
#For Spanish
from flair.models import TARSTagger
from flair.data import Sentence

# 1. Load zero-shot NER tagger
tars = TARSTagger.load('tars-ner')

# 2. Prepare some test sentences
sentences = [
    Sentence("El campus de la Universidad Autónoma de Madrid se encuentra en Cantoblanco, en Madrid, España"),
    Sentence("El Banco Santander anunció una nueva alianza con el Banco de México"),
    Sentence("Viajé a Colombia en un avión de Avianca para participar en una conferencia en Bogotá"),
    Sentence("El Quijote es una novela escrita por Miguel de Cervantes"),
]

# 3. Define some classes of named entities 
labels = ["campus", "Universidad", "México", "Capital", "Hospital", "serie","país","novela","escritor","organización","obra","lugar"]
tars.add_and_switch_to_new_task('ner-spanish', labels, label_type='ner')

# 4. Predict for these classes and print results
for sentence in sentences:
    tars.predict(sentence)
    print(sentence.to_tagged_string("ner"))

2023-04-16 15:28:44,242 SequenceTagger predicts: Dictionary with 5 tags: O, S-entity, B-entity, E-entity, I-entity
Sentence[17]: "El campus de la Universidad Autónoma de Madrid se encuentra en Cantoblanco, en Madrid, España" → ["Universidad Autónoma de Madrid"/organización, "Madrid"/país, "España"/país]
Sentence[12]: "El Banco Santander anunció una nueva alianza con el Banco de México" → ["El Banco Santander"/organización, "Banco de México"/organización]
Sentence[15]: "Viajé a Colombia en un avión de Avianca para participar en una conferencia en Bogotá" → ["Colombia"/país, "Avianca"/organización]
Sentence[10]: "El Quijote es una novela escrita por Miguel de Cervantes" → ["El Quijote"/escritor]


As observed from the results, there are certain challenges with NER tags, such as 'El Quijote' being interpreted as a writer instead of a literary work. This could be attributed to the combination of an article ('El') and a noun ('Quijote'), which the system interprets as the writer, whereas 'Miguel de Cervantes' is not interpreted as anything.

Furthermore, the system also fails to identify 'Bogotá' or 'Madrid' as capitals, but classifies 'Madrid' as a country, which is incorrect.

These limitations highlight the constraints of using Zero-Shot classification. However, considering the lack of additional training data, the overall performance is commendable.

In [None]:
#For German:
from flair.models import TARSTagger
from flair.data import Sentence

# 1. Load zero-shot NER tagger
tars = TARSTagger.load('tars-ner')

# 2. Prepare some test sentences
sentences = [
    Sentence("Die Ludwig-Maximilians-Universität München liegt in der Maxvorstadt in München, Deutschland"),
    Sentence("Der FC Bayern München spielt gegen Borussia Dortmund"),
    Sentence("Ich reiste mit einem Lufthansa-Flugzeug nach Berlin, um an einer Konferenz in Potsdam teilzunehmen"),
    Sentence("Der Roman 'Die Blechtrommel' wurde von Günter Grass geschrieben"),
]

# 3. Define some classes of named entities 
labels = ["organization","location","person","author","Land","Hauptstadt","Fußballteam"]
tars.add_and_switch_to_new_task('ner-german', labels, label_type='ner')

# 4. Predict for these classes and print results
for sentence in sentences:
    tars.predict(sentence)
    print(sentence.to_tagged_string("ner"))

2023-04-16 15:39:27,414 SequenceTagger predicts: Dictionary with 5 tags: O, S-entity, B-entity, E-entity, I-entity
Sentence[11]: "Die Ludwig-Maximilians-Universität München liegt in der Maxvorstadt in München, Deutschland" → ["Maxvorstadt"/location, "München"/Land, "Deutschland"/location]
Sentence[8]: "Der FC Bayern München spielt gegen Borussia Dortmund" → ["FC Bayern München"/Fußballteam, "Borussia Dortmund"/Fußballteam]
Sentence[15]: "Ich reiste mit einem Lufthansa-Flugzeug nach Berlin, um an einer Konferenz in Potsdam teilzunehmen" → ["Berlin"/Land, "Potsdam"/Land]
Sentence[11]: "Der Roman 'Die Blechtrommel' wurde von Günter Grass geschrieben" → ["Die Blechtrommel"/author]


It is true that increasing the number of labels during training can potentially improve the performance of the system, as it allows for more fine-grained distinctions to be learned. 

The differentiation between countries and capitals may be challenging for the system due to the limited data available for training. Countries and capitals often share similar linguistic characteristics, making it difficult for the model to discern between them without adequate contextual information. Additionally, the lack of contextual clues, such as the presence of other named entities or surrounding text, may further hinder the system's ability to accurately differentiate between countries and capitals.

In [None]:
#For Basque:
from flair.models import TARSTagger
from flair.data import Sentence

# 1. Load zero-shot NER tagger for Basque
tars = TARSTagger.load('tars-ner')

# 2. Prepare some test sentences in Basque
sentences = [
    Sentence("Euskal Herriko Unibertsitatea Donostian dago, Gipuzkoan"),
    Sentence("Euskal Selekzioa Italiarekin jokatuko du"),
    Sentence("Nafarroako Gobernua Pamplonan dago"),
    Sentence("Jon Kortajarena euskal aktore ezaguna da"),
]

# 3. Define some classes of named entities in Basque
labels = ["erakunde","kokapena","pertsona","aktorea"]
tars.add_and_switch_to_new_task('ner-basque', labels, label_type='ner')

# 4. Predict for these classes and print results
for sentence in sentences:
    tars.predict(sentence)
    print(sentence.to_tagged_string("ner"))


2023-04-19 21:28:11,453 https://nlp.informatik.hu-berlin.de/resources/models/tars-ner/tars-ner.pt not found in cache, downloading to /tmp/tmprs_bql06


100%|██████████| 1.32G/1.32G [00:37<00:00, 37.8MB/s]

2023-04-19 21:28:49,122 copying /tmp/tmprs_bql06 to cache at /root/.flair/models/tars-ner.pt





2023-04-19 21:28:54,429 removing temp file /tmp/tmprs_bql06


Downloading (…)lve/main/config.json:   0%|          | 0.00/482 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

2023-04-19 21:29:15,087 SequenceTagger predicts: Dictionary with 5 tags: O, S-entity, B-entity, E-entity, I-entity
Sentence[7]: "Euskal Herriko Unibertsitatea Donostian dago, Gipuzkoan"
Sentence[5]: "Euskal Selekzioa Italiarekin jokatuko du"
Sentence[4]: "Nafarroako Gobernua Pamplonan dago"
Sentence[6]: "Jon Kortajarena euskal aktore ezaguna da"


The model does not perform accurately for Basque. This can be due to the fact that when applying a model trained on one language, such as English, to another language like Basque, the model may encounter challenges in accurately identifying and classifying named entities due to differences in language structure, vocabulary, and cultural references.

In [None]:
#For Japanese:
from flair.models import TARSTagger
from flair.data import Sentence

# 1. Load zero-shot NER tagger
tars = TARSTagger.load('tars-ner')

# 2. Prepare some test sentences
sentences = [
    Sentence('田中さんは東京で会議に出席しました'),
    Sentence('山田太郎は日本の政治家で、自由民主党に所属しています'),
    Sentence('この本は村上春樹の小説で、『ノルウェイの森』というタイトルです'),
    Sentence('大阪市は日本の都市で、大阪城や道頓堀が有名です'),
]

# 3. Define some classes of named entities 
labels = ['人名', '地名', '組織名', 'その他']
tars.add_and_switch_to_new_task('ner-japanese', labels, label_type='ner')

# 4. Predict for these classes and print results
for sentence in sentences:
    tars.predict(sentence)
    print(sentence.to_tagged_string("ner"))

2023-04-16 16:29:30,020 SequenceTagger predicts: Dictionary with 5 tags: O, S-entity, B-entity, E-entity, I-entity
Sentence[1]: "田中さんは東京で会議に出席しました"
Sentence[1]: "山田太郎は日本の政治家で、自由民主党に所属しています"
Sentence[1]: "この本は村上春樹の小説で、『ノルウェイの森』というタイトルです"
Sentence[1]: "大阪市は日本の都市で、大阪城や道頓堀が有名です"


Translation of the sentences:
- "Mr. Tanaka attended a meeting in Tokyo."
- "Taro Yamada is a Japanese politician and belongs to the Liberal Democratic Party."
- "This book is a novel by Haruki Murakami titled 'Norwegian Wood'."
- "Osaka is a Japanese city known for Osaka Castle and Dotonbori."

As with Basque, the model does not perform well. This may be due, as said before, to the fact that TARSTagger has been trained only with English data, which complicates its performance in other languages. 

In addition, Japanese uses characters, which makes it even more difficult to interpret data based on English.

# Training a TARS model

First, we connect with our Google Drive account to save our models there.

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


Instead of training TARS from scratch, we decided to combine it with Flair Embeddings.

TARS is a powerful base model that is trained on a large corpus of text data, making it capable of capturing general language patterns and features. However, training TARS from scratch requires a significant amount of computational resources, including GPU and RAM, and can be time-consuming.

On the other hand, Flair embeddings provide contextualized word representations that take into account the surrounding words in a sentence. These embeddings capture the semantic meaning and syntactic dependencies of words in a sentence, which can be beneficial for tasks like NER where context is important.

By combining the TARSClassifier model with Flair embeddings, the hope is to enhance the performance of the NER model. Flair embeddings can provide additional contextual information to complement the knowledge and capabilities of the TARS model. This combination can potentially improve the model's ability to accurately recognize named entities and differentiate between different entity types.

We follow the same process for all the models, downsampling the spanish and german datasets in order to save time and GPU.

## For Spanish

In [None]:
#First we check what labels the dataset has
from flair.datasets import CONLL_03_SPANISH

# Load the corpus
corpus = CONLL_03_SPANISH()

label_type='ner'

# Get the label dictionary
label_dict = corpus.make_label_dictionary(label_type=label_type)

# Get the list of labels
labels = label_dict.get_items()

# Print the list of labels
print("Labels:", labels)

In order to improve the learning capability of TARS models, we create new label names that are more descriptive, as suggested by the authors.

In [None]:
from flair.data import Corpus
from flair.datasets import CONLL_03_SPANISH
from flair.models import TARSClassifier
from flair.trainers import ModelTrainer
from flair.embeddings import WordEmbeddings, FlairEmbeddings, StackedEmbeddings
from flair.trainers import ModelTrainer
from flair.data import Sentence
from flair.models import SequenceTagger

# 1. define label names in natural language since some datasets come with cryptic set of labels
label_name_map = {'B-ORG': 'Beginning of an organization name',
                  'I-ORG': 'Inside an organization name',
                  'B-PER': 'Beginning of a person name',
                  'I-PER': 'Inside a person name',
                  'B-LOC': 'Beginning of a location name',
                  'I-LOC': 'Inside a location name',
                  'B-MISC': 'Beginning of a miscellaneous name',
                  'I-MISC': 'Inside a miscellaneous name'
                  }

# 2. get the corpus
corpus = CONLL_03_SPANISH(label_name_map=label_name_map).downsample(0.08) # downsample the corpus

# 3. what label do you want to predict?
label_type = 'ner'

# 4. make a label dictionary
label_dict = corpus.make_label_dictionary(label_type=label_type)

# 5. start from their existing TARS base model for English
tars = TARSClassifier.load("tars-base")

# 5a: alternatively, comment out previous line and comment in next line to train a new TARS model from scratch instead
#tars = TARSClassifier(embeddings="bert-base-uncased")

# 6. switch to a new task (TARS can do multiple tasks so you must define one)
tars.add_and_switch_to_new_task(task_name="ner-tagging",
                                label_dictionary=label_dict,
                                label_type=label_type,
                                )

# 7. initialize embedding stack with Flair and GloVe
embedding_types = [
    WordEmbeddings('glove'),
    FlairEmbeddings('news-forward'),
    FlairEmbeddings('news-backward'),
]

embeddings = StackedEmbeddings(embeddings=embedding_types)

# 8. initialize sequence tagger
tagger = SequenceTagger(hidden_size=256,
                        embeddings=embeddings,
                        tag_dictionary=label_dict,
                        tag_type=label_type,
                        use_crf=True)

# 10. create a ModelTrainer and start training
trainer = ModelTrainer(tagger, corpus)
trainer.train(base_path='/content/drive/MyDrive/ColabNotebooks/nermodels/spanish',
              learning_rate=0.1,
              mini_batch_size=32,
              max_epochs=150)


2023-04-18 19:03:36,688 Reading data from /root/.flair/datasets/conll_03_spanish
2023-04-18 19:03:36,691 Train: /root/.flair/datasets/conll_03_spanish/esp.train
2023-04-18 19:03:36,693 Dev: /root/.flair/datasets/conll_03_spanish/esp.testa
2023-04-18 19:03:36,695 Test: /root/.flair/datasets/conll_03_spanish/esp.testb
2023-04-18 19:03:40,780 Computing label dictionary. Progress:


666it [00:00, 28822.67it/s]

2023-04-18 19:03:40,811 Dictionary created for label 'ner' with 9 values: Beginning of an organization name (seen 603 times), Inside an organization name (seen 416 times), Beginning of a person name (seen 355 times), Beginning of a location name (seen 341 times), Inside a person name (seen 330 times), Inside a miscellaneous name (seen 242 times), Beginning of a miscellaneous name (seen 173 times), Inside a location name (seen 140 times)





2023-04-18 19:03:43,671 TARS initialized without a task. You need to call .add_and_switch_to_new_task() before training this model
2023-04-18 19:03:50,838 SequenceTagger predicts: Dictionary with 9 tags: <unk>, Beginning of an organization name, Inside an organization name, Beginning of a person name, Beginning of a location name, Inside a person name, Inside a miscellaneous name, Beginning of a miscellaneous name, Inside a location name
2023-04-18 19:03:51,072 ----------------------------------------------------------------------------------------------------
2023-04-18 19:03:51,077 Model: "SequenceTagger(
  (embeddings): StackedEmbeddings(
    (list_embedding_0): WordEmbeddings(
      'glove'
      (embedding): Embedding(400001, 100)
    )
    (list_embedding_1): FlairEmbeddings(
      (lm): LanguageModel(
        (drop): Dropout(p=0.05, inplace=False)
        (encoder): Embedding(300, 100)
        (rnn): LSTM(100, 2048)
      )
    )
    (list_embedding_2): FlairEmbeddings(
      (l

100%|██████████| 5/5 [00:05<00:00,  1.00s/it]

2023-04-18 19:04:12,281 Evaluating as a multi-label problem: False
2023-04-18 19:04:12,319 DEV : loss 0.7514978647232056 - f1-score (micro avg)  0.0041
2023-04-18 19:04:12,341 BAD EPOCHS (no improvement): 0
2023-04-18 19:04:12,346 saving best model





2023-04-18 19:04:14,731 ----------------------------------------------------------------------------------------------------
2023-04-18 19:04:15,647 epoch 2 - iter 2/21 - loss 0.65787264 - time (sec): 0.91 - samples/sec: 2174.20 - lr: 0.100000
2023-04-18 19:04:16,560 epoch 2 - iter 4/21 - loss 0.59551988 - time (sec): 1.83 - samples/sec: 2171.76 - lr: 0.100000
2023-04-18 19:04:17,535 epoch 2 - iter 6/21 - loss 0.56470091 - time (sec): 2.80 - samples/sec: 2105.09 - lr: 0.100000
2023-04-18 19:04:18,304 epoch 2 - iter 8/21 - loss 0.55363077 - time (sec): 3.57 - samples/sec: 2201.31 - lr: 0.100000
2023-04-18 19:04:19,024 epoch 2 - iter 10/21 - loss 0.54232830 - time (sec): 4.29 - samples/sec: 2323.60 - lr: 0.100000
2023-04-18 19:04:20,020 epoch 2 - iter 12/21 - loss 0.52886068 - time (sec): 5.29 - samples/sec: 2307.83 - lr: 0.100000
2023-04-18 19:04:20,708 epoch 2 - iter 14/21 - loss 0.52350143 - time (sec): 5.97 - samples/sec: 2420.43 - lr: 0.100000
2023-04-18 19:04:21,341 epoch 2 - iter 

100%|██████████| 5/5 [00:02<00:00,  2.40it/s]

2023-04-18 19:04:24,908 Evaluating as a multi-label problem: False
2023-04-18 19:04:24,965 DEV : loss 0.44864895939826965 - f1-score (micro avg)  0.0469
2023-04-18 19:04:25,020 BAD EPOCHS (no improvement): 0
2023-04-18 19:04:25,030 saving best model





2023-04-18 19:04:27,237 ----------------------------------------------------------------------------------------------------
2023-04-18 19:04:27,796 epoch 3 - iter 2/21 - loss 0.44605675 - time (sec): 0.55 - samples/sec: 3786.04 - lr: 0.100000
2023-04-18 19:04:28,352 epoch 3 - iter 4/21 - loss 0.41087923 - time (sec): 1.11 - samples/sec: 3636.67 - lr: 0.100000
2023-04-18 19:04:28,986 epoch 3 - iter 6/21 - loss 0.43758197 - time (sec): 1.74 - samples/sec: 3465.52 - lr: 0.100000
2023-04-18 19:04:29,545 epoch 3 - iter 8/21 - loss 0.42791426 - time (sec): 2.30 - samples/sec: 3595.32 - lr: 0.100000
2023-04-18 19:04:30,140 epoch 3 - iter 10/21 - loss 0.43459755 - time (sec): 2.90 - samples/sec: 3551.80 - lr: 0.100000
2023-04-18 19:04:30,713 epoch 3 - iter 12/21 - loss 0.43825036 - time (sec): 3.47 - samples/sec: 3516.88 - lr: 0.100000
2023-04-18 19:04:31,469 epoch 3 - iter 14/21 - loss 0.42096521 - time (sec): 4.23 - samples/sec: 3372.18 - lr: 0.100000
2023-04-18 19:04:32,424 epoch 3 - iter 

100%|██████████| 5/5 [00:03<00:00,  1.49it/s]

2023-04-18 19:04:37,802 Evaluating as a multi-label problem: False
2023-04-18 19:04:37,854 DEV : loss 0.3953027129173279 - f1-score (micro avg)  0.0697
2023-04-18 19:04:37,895 BAD EPOCHS (no improvement): 0
2023-04-18 19:04:37,908 saving best model





2023-04-18 19:04:40,430 ----------------------------------------------------------------------------------------------------
2023-04-18 19:04:41,022 epoch 4 - iter 2/21 - loss 0.34699466 - time (sec): 0.58 - samples/sec: 3588.28 - lr: 0.100000
2023-04-18 19:04:41,567 epoch 4 - iter 4/21 - loss 0.29945572 - time (sec): 1.13 - samples/sec: 3496.76 - lr: 0.100000
2023-04-18 19:04:42,089 epoch 4 - iter 6/21 - loss 0.31868945 - time (sec): 1.65 - samples/sec: 3639.25 - lr: 0.100000
2023-04-18 19:04:42,596 epoch 4 - iter 8/21 - loss 0.30746071 - time (sec): 2.16 - samples/sec: 3634.76 - lr: 0.100000
2023-04-18 19:04:43,124 epoch 4 - iter 10/21 - loss 0.30437840 - time (sec): 2.69 - samples/sec: 3696.96 - lr: 0.100000
2023-04-18 19:04:43,755 epoch 4 - iter 12/21 - loss 0.30604518 - time (sec): 3.32 - samples/sec: 3597.20 - lr: 0.100000
2023-04-18 19:04:44,312 epoch 4 - iter 14/21 - loss 0.31456772 - time (sec): 3.87 - samples/sec: 3655.19 - lr: 0.100000
2023-04-18 19:04:44,910 epoch 4 - iter 

100%|██████████| 5/5 [00:01<00:00,  3.65it/s]

2023-04-18 19:04:47,303 Evaluating as a multi-label problem: False
2023-04-18 19:04:47,395 DEV : loss 0.27277010679244995 - f1-score (micro avg)  0.1326
2023-04-18 19:04:47,459 BAD EPOCHS (no improvement): 0
2023-04-18 19:04:47,469 saving best model





2023-04-18 19:04:50,010 ----------------------------------------------------------------------------------------------------
2023-04-18 19:04:50,644 epoch 5 - iter 2/21 - loss 0.29882812 - time (sec): 0.63 - samples/sec: 3146.55 - lr: 0.100000
2023-04-18 19:04:51,218 epoch 5 - iter 4/21 - loss 0.29574135 - time (sec): 1.20 - samples/sec: 3242.15 - lr: 0.100000
2023-04-18 19:04:51,803 epoch 5 - iter 6/21 - loss 0.27519473 - time (sec): 1.78 - samples/sec: 3314.77 - lr: 0.100000
2023-04-18 19:04:52,330 epoch 5 - iter 8/21 - loss 0.27921069 - time (sec): 2.31 - samples/sec: 3409.85 - lr: 0.100000
2023-04-18 19:04:52,892 epoch 5 - iter 10/21 - loss 0.26224861 - time (sec): 2.87 - samples/sec: 3508.72 - lr: 0.100000
2023-04-18 19:04:53,315 epoch 5 - iter 12/21 - loss 0.26028036 - time (sec): 3.30 - samples/sec: 3691.39 - lr: 0.100000
2023-04-18 19:04:53,686 epoch 5 - iter 14/21 - loss 0.25795601 - time (sec): 3.67 - samples/sec: 3874.39 - lr: 0.100000
2023-04-18 19:04:54,060 epoch 5 - iter 

100%|██████████| 5/5 [00:01<00:00,  4.14it/s]


2023-04-18 19:04:56,241 Evaluating as a multi-label problem: False
2023-04-18 19:04:56,276 DEV : loss 0.2265610247850418 - f1-score (micro avg)  0.1569
2023-04-18 19:04:56,299 BAD EPOCHS (no improvement): 0
2023-04-18 19:04:56,304 saving best model
2023-04-18 19:04:58,221 ----------------------------------------------------------------------------------------------------
2023-04-18 19:04:58,675 epoch 6 - iter 2/21 - loss 0.19521553 - time (sec): 0.45 - samples/sec: 4108.73 - lr: 0.100000
2023-04-18 19:04:59,056 epoch 6 - iter 4/21 - loss 0.20670362 - time (sec): 0.83 - samples/sec: 4365.79 - lr: 0.100000
2023-04-18 19:04:59,426 epoch 6 - iter 6/21 - loss 0.22720605 - time (sec): 1.20 - samples/sec: 4751.19 - lr: 0.100000
2023-04-18 19:04:59,772 epoch 6 - iter 8/21 - loss 0.22099176 - time (sec): 1.55 - samples/sec: 4991.49 - lr: 0.100000
2023-04-18 19:05:00,129 epoch 6 - iter 10/21 - loss 0.23213018 - time (sec): 1.90 - samples/sec: 5131.14 - lr: 0.100000
2023-04-18 19:05:00,449 epoch 

100%|██████████| 5/5 [00:01<00:00,  3.69it/s]

2023-04-18 19:05:03,535 Evaluating as a multi-label problem: False
2023-04-18 19:05:03,586 DEV : loss 0.24437367916107178 - f1-score (micro avg)  0.1326
2023-04-18 19:05:03,630 BAD EPOCHS (no improvement): 1
2023-04-18 19:05:03,637 ----------------------------------------------------------------------------------------------------





2023-04-18 19:05:04,172 epoch 7 - iter 2/21 - loss 0.15375161 - time (sec): 0.53 - samples/sec: 3470.06 - lr: 0.100000
2023-04-18 19:05:04,739 epoch 7 - iter 4/21 - loss 0.16492796 - time (sec): 1.10 - samples/sec: 3670.41 - lr: 0.100000
2023-04-18 19:05:05,317 epoch 7 - iter 6/21 - loss 0.17410278 - time (sec): 1.68 - samples/sec: 3569.59 - lr: 0.100000
2023-04-18 19:05:05,806 epoch 7 - iter 8/21 - loss 0.18162000 - time (sec): 2.17 - samples/sec: 3528.05 - lr: 0.100000
2023-04-18 19:05:06,364 epoch 7 - iter 10/21 - loss 0.18149615 - time (sec): 2.73 - samples/sec: 3552.99 - lr: 0.100000
2023-04-18 19:05:06,893 epoch 7 - iter 12/21 - loss 0.18778619 - time (sec): 3.25 - samples/sec: 3582.11 - lr: 0.100000
2023-04-18 19:05:07,571 epoch 7 - iter 14/21 - loss 0.19225442 - time (sec): 3.93 - samples/sec: 3491.31 - lr: 0.100000
2023-04-18 19:05:08,319 epoch 7 - iter 16/21 - loss 0.19136604 - time (sec): 4.68 - samples/sec: 3419.44 - lr: 0.100000
2023-04-18 19:05:08,889 epoch 7 - iter 18/21

100%|██████████| 5/5 [00:01<00:00,  3.47it/s]


2023-04-18 19:05:11,019 Evaluating as a multi-label problem: False
2023-04-18 19:05:11,062 DEV : loss 0.1963139772415161 - f1-score (micro avg)  0.1583
2023-04-18 19:05:11,102 BAD EPOCHS (no improvement): 0
2023-04-18 19:05:11,109 saving best model
2023-04-18 19:05:13,208 ----------------------------------------------------------------------------------------------------
2023-04-18 19:05:13,612 epoch 8 - iter 2/21 - loss 0.18371350 - time (sec): 0.40 - samples/sec: 4739.73 - lr: 0.100000
2023-04-18 19:05:14,021 epoch 8 - iter 4/21 - loss 0.19143115 - time (sec): 0.81 - samples/sec: 5068.95 - lr: 0.100000
2023-04-18 19:05:14,351 epoch 8 - iter 6/21 - loss 0.18392530 - time (sec): 1.14 - samples/sec: 5100.84 - lr: 0.100000
2023-04-18 19:05:14,748 epoch 8 - iter 8/21 - loss 0.17911608 - time (sec): 1.53 - samples/sec: 5123.98 - lr: 0.100000
2023-04-18 19:05:15,119 epoch 8 - iter 10/21 - loss 0.17602732 - time (sec): 1.91 - samples/sec: 5181.49 - lr: 0.100000
2023-04-18 19:05:15,532 epoch 

100%|██████████| 5/5 [00:01<00:00,  4.26it/s]


2023-04-18 19:05:18,337 Evaluating as a multi-label problem: False
2023-04-18 19:05:18,373 DEV : loss 0.17873038351535797 - f1-score (micro avg)  0.1673
2023-04-18 19:05:18,396 BAD EPOCHS (no improvement): 0
2023-04-18 19:05:18,402 saving best model
2023-04-18 19:05:20,721 ----------------------------------------------------------------------------------------------------
2023-04-18 19:05:21,314 epoch 9 - iter 2/21 - loss 0.12138563 - time (sec): 0.59 - samples/sec: 3513.79 - lr: 0.100000
2023-04-18 19:05:21,842 epoch 9 - iter 4/21 - loss 0.13724756 - time (sec): 1.11 - samples/sec: 3614.74 - lr: 0.100000
2023-04-18 19:05:22,442 epoch 9 - iter 6/21 - loss 0.14825493 - time (sec): 1.71 - samples/sec: 3688.98 - lr: 0.100000
2023-04-18 19:05:23,005 epoch 9 - iter 8/21 - loss 0.14579090 - time (sec): 2.28 - samples/sec: 3651.96 - lr: 0.100000
2023-04-18 19:05:23,602 epoch 9 - iter 10/21 - loss 0.15556145 - time (sec): 2.87 - samples/sec: 3644.78 - lr: 0.100000
2023-04-18 19:05:24,179 epoch

100%|██████████| 5/5 [00:01<00:00,  3.92it/s]

2023-04-18 19:05:27,256 Evaluating as a multi-label problem: False
2023-04-18 19:05:27,291 DEV : loss 0.20439347624778748 - f1-score (micro avg)  0.1408
2023-04-18 19:05:27,317 BAD EPOCHS (no improvement): 1
2023-04-18 19:05:27,323 ----------------------------------------------------------------------------------------------------





2023-04-18 19:05:27,749 epoch 10 - iter 2/21 - loss 0.13740062 - time (sec): 0.42 - samples/sec: 4553.79 - lr: 0.100000
2023-04-18 19:05:28,223 epoch 10 - iter 4/21 - loss 0.15634062 - time (sec): 0.90 - samples/sec: 4485.40 - lr: 0.100000
2023-04-18 19:05:28,704 epoch 10 - iter 6/21 - loss 0.15616580 - time (sec): 1.38 - samples/sec: 4394.09 - lr: 0.100000
2023-04-18 19:05:29,103 epoch 10 - iter 8/21 - loss 0.15246046 - time (sec): 1.78 - samples/sec: 4461.49 - lr: 0.100000
2023-04-18 19:05:29,542 epoch 10 - iter 10/21 - loss 0.15775700 - time (sec): 2.22 - samples/sec: 4522.27 - lr: 0.100000
2023-04-18 19:05:30,029 epoch 10 - iter 12/21 - loss 0.15632825 - time (sec): 2.70 - samples/sec: 4497.51 - lr: 0.100000
2023-04-18 19:05:30,428 epoch 10 - iter 14/21 - loss 0.15557768 - time (sec): 3.10 - samples/sec: 4577.88 - lr: 0.100000
2023-04-18 19:05:30,862 epoch 10 - iter 16/21 - loss 0.15529569 - time (sec): 3.54 - samples/sec: 4579.30 - lr: 0.100000
2023-04-18 19:05:31,349 epoch 10 - i

100%|██████████| 5/5 [00:01<00:00,  3.39it/s]

2023-04-18 19:05:33,445 Evaluating as a multi-label problem: False
2023-04-18 19:05:33,479 DEV : loss 0.1688769906759262 - f1-score (micro avg)  0.1777
2023-04-18 19:05:33,507 BAD EPOCHS (no improvement): 0
2023-04-18 19:05:33,513 saving best model





2023-04-18 19:05:35,660 ----------------------------------------------------------------------------------------------------
2023-04-18 19:05:36,253 epoch 11 - iter 2/21 - loss 0.12967764 - time (sec): 0.59 - samples/sec: 3351.71 - lr: 0.100000
2023-04-18 19:05:36,837 epoch 11 - iter 4/21 - loss 0.13328090 - time (sec): 1.18 - samples/sec: 3155.19 - lr: 0.100000
2023-04-18 19:05:37,568 epoch 11 - iter 6/21 - loss 0.13068505 - time (sec): 1.91 - samples/sec: 3159.68 - lr: 0.100000
2023-04-18 19:05:38,389 epoch 11 - iter 8/21 - loss 0.12833250 - time (sec): 2.73 - samples/sec: 2926.98 - lr: 0.100000
2023-04-18 19:05:39,241 epoch 11 - iter 10/21 - loss 0.13705347 - time (sec): 3.58 - samples/sec: 2752.40 - lr: 0.100000
2023-04-18 19:05:40,369 epoch 11 - iter 12/21 - loss 0.13609663 - time (sec): 4.71 - samples/sec: 2587.85 - lr: 0.100000
2023-04-18 19:05:40,971 epoch 11 - iter 14/21 - loss 0.13540063 - time (sec): 5.31 - samples/sec: 2679.19 - lr: 0.100000
2023-04-18 19:05:41,499 epoch 11

100%|██████████| 5/5 [00:02<00:00,  2.23it/s]

2023-04-18 19:05:45,377 Evaluating as a multi-label problem: False
2023-04-18 19:05:45,431 DEV : loss 0.16851715743541718 - f1-score (micro avg)  0.171
2023-04-18 19:05:45,473 BAD EPOCHS (no improvement): 1
2023-04-18 19:05:45,484 ----------------------------------------------------------------------------------------------------





2023-04-18 19:05:46,271 epoch 12 - iter 2/21 - loss 0.13545851 - time (sec): 0.78 - samples/sec: 2651.05 - lr: 0.100000
2023-04-18 19:05:47,249 epoch 12 - iter 4/21 - loss 0.12767625 - time (sec): 1.76 - samples/sec: 2278.94 - lr: 0.100000
2023-04-18 19:05:47,708 epoch 12 - iter 6/21 - loss 0.13033734 - time (sec): 2.22 - samples/sec: 2727.68 - lr: 0.100000
2023-04-18 19:05:48,210 epoch 12 - iter 8/21 - loss 0.13381233 - time (sec): 2.72 - samples/sec: 2980.99 - lr: 0.100000
2023-04-18 19:05:48,623 epoch 12 - iter 10/21 - loss 0.12753223 - time (sec): 3.14 - samples/sec: 3238.77 - lr: 0.100000
2023-04-18 19:05:49,059 epoch 12 - iter 12/21 - loss 0.12660077 - time (sec): 3.57 - samples/sec: 3386.83 - lr: 0.100000
2023-04-18 19:05:49,459 epoch 12 - iter 14/21 - loss 0.12861886 - time (sec): 3.97 - samples/sec: 3552.58 - lr: 0.100000
2023-04-18 19:05:49,899 epoch 12 - iter 16/21 - loss 0.13206583 - time (sec): 4.41 - samples/sec: 3639.28 - lr: 0.100000
2023-04-18 19:05:50,367 epoch 12 - i

100%|██████████| 5/5 [00:02<00:00,  2.40it/s]

2023-04-18 19:05:53,230 Evaluating as a multi-label problem: False
2023-04-18 19:05:53,290 DEV : loss 0.16093729436397552 - f1-score (micro avg)  0.1803
2023-04-18 19:05:53,351 BAD EPOCHS (no improvement): 0
2023-04-18 19:05:53,362 saving best model





2023-04-18 19:05:55,924 ----------------------------------------------------------------------------------------------------
2023-04-18 19:05:56,477 epoch 13 - iter 2/21 - loss 0.13975602 - time (sec): 0.55 - samples/sec: 3518.59 - lr: 0.100000
2023-04-18 19:05:56,925 epoch 13 - iter 4/21 - loss 0.15069150 - time (sec): 1.00 - samples/sec: 4191.03 - lr: 0.100000
2023-04-18 19:05:57,282 epoch 13 - iter 6/21 - loss 0.13750089 - time (sec): 1.35 - samples/sec: 4536.42 - lr: 0.100000
2023-04-18 19:05:57,683 epoch 13 - iter 8/21 - loss 0.13957522 - time (sec): 1.75 - samples/sec: 4631.18 - lr: 0.100000
2023-04-18 19:05:58,080 epoch 13 - iter 10/21 - loss 0.13383925 - time (sec): 2.15 - samples/sec: 4739.66 - lr: 0.100000
2023-04-18 19:05:58,411 epoch 13 - iter 12/21 - loss 0.12754453 - time (sec): 2.48 - samples/sec: 4835.98 - lr: 0.100000
2023-04-18 19:05:58,765 epoch 13 - iter 14/21 - loss 0.12100986 - time (sec): 2.84 - samples/sec: 4912.62 - lr: 0.100000
2023-04-18 19:05:59,143 epoch 13

100%|██████████| 5/5 [00:01<00:00,  4.32it/s]


2023-04-18 19:06:01,233 Evaluating as a multi-label problem: False
2023-04-18 19:06:01,269 DEV : loss 0.16144134104251862 - f1-score (micro avg)  0.1721
2023-04-18 19:06:01,299 BAD EPOCHS (no improvement): 1
2023-04-18 19:06:01,314 ----------------------------------------------------------------------------------------------------
2023-04-18 19:06:01,762 epoch 14 - iter 2/21 - loss 0.15564281 - time (sec): 0.44 - samples/sec: 5214.33 - lr: 0.100000
2023-04-18 19:06:02,138 epoch 14 - iter 4/21 - loss 0.14300291 - time (sec): 0.82 - samples/sec: 5165.48 - lr: 0.100000
2023-04-18 19:06:02,549 epoch 14 - iter 6/21 - loss 0.13769091 - time (sec): 1.23 - samples/sec: 5039.89 - lr: 0.100000
2023-04-18 19:06:02,981 epoch 14 - iter 8/21 - loss 0.13147846 - time (sec): 1.66 - samples/sec: 5058.86 - lr: 0.100000
2023-04-18 19:06:03,328 epoch 14 - iter 10/21 - loss 0.13012417 - time (sec): 2.01 - samples/sec: 5186.48 - lr: 0.100000
2023-04-18 19:06:03,643 epoch 14 - iter 12/21 - loss 0.12482981 - 

100%|██████████| 5/5 [00:01<00:00,  2.96it/s]

2023-04-18 19:06:07,247 Evaluating as a multi-label problem: False
2023-04-18 19:06:07,303 DEV : loss 0.14978691935539246 - f1-score (micro avg)  0.1844
2023-04-18 19:06:07,349 BAD EPOCHS (no improvement): 0
2023-04-18 19:06:07,364 saving best model





2023-04-18 19:06:09,928 ----------------------------------------------------------------------------------------------------
2023-04-18 19:06:10,542 epoch 15 - iter 2/21 - loss 0.11412308 - time (sec): 0.61 - samples/sec: 3720.12 - lr: 0.100000
2023-04-18 19:06:11,155 epoch 15 - iter 4/21 - loss 0.11023550 - time (sec): 1.23 - samples/sec: 3564.41 - lr: 0.100000
2023-04-18 19:06:11,689 epoch 15 - iter 6/21 - loss 0.10938397 - time (sec): 1.76 - samples/sec: 3560.29 - lr: 0.100000
2023-04-18 19:06:12,284 epoch 15 - iter 8/21 - loss 0.11278916 - time (sec): 2.35 - samples/sec: 3594.99 - lr: 0.100000
2023-04-18 19:06:12,674 epoch 15 - iter 10/21 - loss 0.11358028 - time (sec): 2.74 - samples/sec: 3813.78 - lr: 0.100000
2023-04-18 19:06:13,008 epoch 15 - iter 12/21 - loss 0.11820159 - time (sec): 3.08 - samples/sec: 3973.46 - lr: 0.100000
2023-04-18 19:06:13,388 epoch 15 - iter 14/21 - loss 0.11963134 - time (sec): 3.46 - samples/sec: 4111.77 - lr: 0.100000
2023-04-18 19:06:13,745 epoch 15

100%|██████████| 5/5 [00:01<00:00,  4.36it/s]


2023-04-18 19:06:15,814 Evaluating as a multi-label problem: False
2023-04-18 19:06:15,849 DEV : loss 0.14985893666744232 - f1-score (micro avg)  0.1796
2023-04-18 19:06:15,872 BAD EPOCHS (no improvement): 1
2023-04-18 19:06:15,881 ----------------------------------------------------------------------------------------------------
2023-04-18 19:06:16,241 epoch 16 - iter 2/21 - loss 0.11824811 - time (sec): 0.36 - samples/sec: 6231.39 - lr: 0.100000
2023-04-18 19:06:16,585 epoch 16 - iter 4/21 - loss 0.11002604 - time (sec): 0.70 - samples/sec: 5638.74 - lr: 0.100000
2023-04-18 19:06:16,932 epoch 16 - iter 6/21 - loss 0.11151404 - time (sec): 1.05 - samples/sec: 5496.80 - lr: 0.100000
2023-04-18 19:06:17,444 epoch 16 - iter 8/21 - loss 0.11806882 - time (sec): 1.56 - samples/sec: 5090.12 - lr: 0.100000
2023-04-18 19:06:17,862 epoch 16 - iter 10/21 - loss 0.11159995 - time (sec): 1.98 - samples/sec: 5142.70 - lr: 0.100000
2023-04-18 19:06:18,326 epoch 16 - iter 12/21 - loss 0.11658832 - 

100%|██████████| 5/5 [00:01<00:00,  3.52it/s]

2023-04-18 19:06:21,689 Evaluating as a multi-label problem: False
2023-04-18 19:06:21,721 DEV : loss 0.16751442849636078 - f1-score (micro avg)  0.1736
2023-04-18 19:06:21,753 BAD EPOCHS (no improvement): 2
2023-04-18 19:06:21,764 ----------------------------------------------------------------------------------------------------





2023-04-18 19:06:22,195 epoch 17 - iter 2/21 - loss 0.09191973 - time (sec): 0.43 - samples/sec: 4024.37 - lr: 0.100000
2023-04-18 19:06:22,869 epoch 17 - iter 4/21 - loss 0.11444239 - time (sec): 1.10 - samples/sec: 3552.94 - lr: 0.100000
2023-04-18 19:06:23,566 epoch 17 - iter 6/21 - loss 0.10630235 - time (sec): 1.80 - samples/sec: 3436.12 - lr: 0.100000
2023-04-18 19:06:24,187 epoch 17 - iter 8/21 - loss 0.10251471 - time (sec): 2.42 - samples/sec: 3549.12 - lr: 0.100000
2023-04-18 19:06:24,857 epoch 17 - iter 10/21 - loss 0.09784403 - time (sec): 3.09 - samples/sec: 3449.34 - lr: 0.100000
2023-04-18 19:06:25,412 epoch 17 - iter 12/21 - loss 0.10509013 - time (sec): 3.64 - samples/sec: 3469.97 - lr: 0.100000
2023-04-18 19:06:25,969 epoch 17 - iter 14/21 - loss 0.10615859 - time (sec): 4.20 - samples/sec: 3455.19 - lr: 0.100000
2023-04-18 19:06:26,490 epoch 17 - iter 16/21 - loss 0.10832169 - time (sec): 4.72 - samples/sec: 3515.89 - lr: 0.100000
2023-04-18 19:06:27,039 epoch 17 - i

100%|██████████| 5/5 [00:01<00:00,  3.71it/s]


2023-04-18 19:06:29,152 Evaluating as a multi-label problem: False
2023-04-18 19:06:29,186 DEV : loss 0.14255468547344208 - f1-score (micro avg)  0.1837
2023-04-18 19:06:29,212 BAD EPOCHS (no improvement): 3
2023-04-18 19:06:29,219 ----------------------------------------------------------------------------------------------------
2023-04-18 19:06:29,661 epoch 18 - iter 2/21 - loss 0.07785879 - time (sec): 0.44 - samples/sec: 4870.12 - lr: 0.100000
2023-04-18 19:06:30,122 epoch 18 - iter 4/21 - loss 0.07753815 - time (sec): 0.90 - samples/sec: 4778.57 - lr: 0.100000
2023-04-18 19:06:30,669 epoch 18 - iter 6/21 - loss 0.08994272 - time (sec): 1.45 - samples/sec: 4323.85 - lr: 0.100000
2023-04-18 19:06:31,219 epoch 18 - iter 8/21 - loss 0.09436760 - time (sec): 2.00 - samples/sec: 4220.47 - lr: 0.100000
2023-04-18 19:06:31,732 epoch 18 - iter 10/21 - loss 0.09680326 - time (sec): 2.51 - samples/sec: 4102.92 - lr: 0.100000
2023-04-18 19:06:32,246 epoch 18 - iter 12/21 - loss 0.09887657 - 

100%|██████████| 5/5 [00:01<00:00,  3.45it/s]


2023-04-18 19:06:36,289 Evaluating as a multi-label problem: False
2023-04-18 19:06:36,324 DEV : loss 0.1402125060558319 - f1-score (micro avg)  0.1833
2023-04-18 19:06:36,348 Epoch    18: reducing learning rate of group 0 to 5.0000e-02.
2023-04-18 19:06:36,351 BAD EPOCHS (no improvement): 4
2023-04-18 19:06:36,357 ----------------------------------------------------------------------------------------------------
2023-04-18 19:06:36,812 epoch 19 - iter 2/21 - loss 0.10576028 - time (sec): 0.45 - samples/sec: 4777.89 - lr: 0.050000
2023-04-18 19:06:37,222 epoch 19 - iter 4/21 - loss 0.09173054 - time (sec): 0.86 - samples/sec: 5104.58 - lr: 0.050000
2023-04-18 19:06:37,581 epoch 19 - iter 6/21 - loss 0.08921691 - time (sec): 1.22 - samples/sec: 5276.81 - lr: 0.050000
2023-04-18 19:06:38,003 epoch 19 - iter 8/21 - loss 0.09435244 - time (sec): 1.64 - samples/sec: 5096.81 - lr: 0.050000
2023-04-18 19:06:38,560 epoch 19 - iter 10/21 - loss 0.09346472 - time (sec): 2.20 - samples/sec: 4765

100%|██████████| 5/5 [00:01<00:00,  2.62it/s]

2023-04-18 19:06:43,238 Evaluating as a multi-label problem: False
2023-04-18 19:06:43,296 DEV : loss 0.14607398211956024 - f1-score (micro avg)  0.1826
2023-04-18 19:06:43,343 BAD EPOCHS (no improvement): 1
2023-04-18 19:06:43,352 ----------------------------------------------------------------------------------------------------





2023-04-18 19:06:43,851 epoch 20 - iter 2/21 - loss 0.07972691 - time (sec): 0.50 - samples/sec: 3807.73 - lr: 0.050000
2023-04-18 19:06:44,284 epoch 20 - iter 4/21 - loss 0.09200583 - time (sec): 0.93 - samples/sec: 4188.95 - lr: 0.050000
2023-04-18 19:06:44,673 epoch 20 - iter 6/21 - loss 0.09328837 - time (sec): 1.32 - samples/sec: 4656.43 - lr: 0.050000
2023-04-18 19:06:45,067 epoch 20 - iter 8/21 - loss 0.09427556 - time (sec): 1.71 - samples/sec: 4789.47 - lr: 0.050000
2023-04-18 19:06:45,421 epoch 20 - iter 10/21 - loss 0.09106388 - time (sec): 2.07 - samples/sec: 4960.78 - lr: 0.050000
2023-04-18 19:06:45,765 epoch 20 - iter 12/21 - loss 0.09393708 - time (sec): 2.41 - samples/sec: 5018.71 - lr: 0.050000
2023-04-18 19:06:46,128 epoch 20 - iter 14/21 - loss 0.09504101 - time (sec): 2.77 - samples/sec: 5109.77 - lr: 0.050000
2023-04-18 19:06:46,499 epoch 20 - iter 16/21 - loss 0.09104479 - time (sec): 3.14 - samples/sec: 5066.89 - lr: 0.050000
2023-04-18 19:06:49,542 epoch 20 - i

100%|██████████| 5/5 [00:01<00:00,  3.87it/s]


2023-04-18 19:06:51,469 Evaluating as a multi-label problem: False
2023-04-18 19:06:51,504 DEV : loss 0.13961374759674072 - f1-score (micro avg)  0.187
2023-04-18 19:06:51,527 BAD EPOCHS (no improvement): 0
2023-04-18 19:06:51,532 saving best model
2023-04-18 19:06:53,407 ----------------------------------------------------------------------------------------------------
2023-04-18 19:06:54,106 epoch 21 - iter 2/21 - loss 0.09760012 - time (sec): 0.69 - samples/sec: 3305.40 - lr: 0.050000
2023-04-18 19:06:54,632 epoch 21 - iter 4/21 - loss 0.09468798 - time (sec): 1.22 - samples/sec: 3412.94 - lr: 0.050000
2023-04-18 19:06:55,189 epoch 21 - iter 6/21 - loss 0.09089201 - time (sec): 1.77 - samples/sec: 3449.13 - lr: 0.050000
2023-04-18 19:06:55,779 epoch 21 - iter 8/21 - loss 0.09145857 - time (sec): 2.36 - samples/sec: 3422.62 - lr: 0.050000
2023-04-18 19:06:56,353 epoch 21 - iter 10/21 - loss 0.08628935 - time (sec): 2.94 - samples/sec: 3430.93 - lr: 0.050000
2023-04-18 19:06:56,954 e

100%|██████████| 5/5 [00:01<00:00,  3.91it/s]


2023-04-18 19:07:00,669 Evaluating as a multi-label problem: False
2023-04-18 19:07:00,703 DEV : loss 0.13934674859046936 - f1-score (micro avg)  0.187
2023-04-18 19:07:00,725 BAD EPOCHS (no improvement): 0
2023-04-18 19:07:00,730 ----------------------------------------------------------------------------------------------------
2023-04-18 19:07:01,165 epoch 22 - iter 2/21 - loss 0.08538425 - time (sec): 0.43 - samples/sec: 4917.52 - lr: 0.050000
2023-04-18 19:07:01,605 epoch 22 - iter 4/21 - loss 0.07765370 - time (sec): 0.87 - samples/sec: 4875.34 - lr: 0.050000
2023-04-18 19:07:02,028 epoch 22 - iter 6/21 - loss 0.09153527 - time (sec): 1.30 - samples/sec: 4679.30 - lr: 0.050000
2023-04-18 19:07:02,493 epoch 22 - iter 8/21 - loss 0.08745464 - time (sec): 1.76 - samples/sec: 4508.51 - lr: 0.050000
2023-04-18 19:07:02,928 epoch 22 - iter 10/21 - loss 0.08780707 - time (sec): 2.20 - samples/sec: 4474.05 - lr: 0.050000
2023-04-18 19:07:03,406 epoch 22 - iter 12/21 - loss 0.08673405 - t

100%|██████████| 5/5 [00:01<00:00,  3.25it/s]

2023-04-18 19:07:07,099 Evaluating as a multi-label problem: False
2023-04-18 19:07:07,150 DEV : loss 0.13722354173660278 - f1-score (micro avg)  0.1826
2023-04-18 19:07:07,185 BAD EPOCHS (no improvement): 1
2023-04-18 19:07:07,191 ----------------------------------------------------------------------------------------------------





2023-04-18 19:07:07,737 epoch 23 - iter 2/21 - loss 0.08635531 - time (sec): 0.54 - samples/sec: 3549.57 - lr: 0.050000
2023-04-18 19:07:08,134 epoch 23 - iter 4/21 - loss 0.08967713 - time (sec): 0.94 - samples/sec: 3849.38 - lr: 0.050000
2023-04-18 19:07:08,624 epoch 23 - iter 6/21 - loss 0.08044074 - time (sec): 1.43 - samples/sec: 3980.40 - lr: 0.050000
2023-04-18 19:07:09,172 epoch 23 - iter 8/21 - loss 0.08034190 - time (sec): 1.98 - samples/sec: 3894.36 - lr: 0.050000
2023-04-18 19:07:09,846 epoch 23 - iter 10/21 - loss 0.07822779 - time (sec): 2.65 - samples/sec: 3664.20 - lr: 0.050000
2023-04-18 19:07:10,457 epoch 23 - iter 12/21 - loss 0.07943874 - time (sec): 3.26 - samples/sec: 3649.17 - lr: 0.050000
2023-04-18 19:07:11,025 epoch 23 - iter 14/21 - loss 0.07611893 - time (sec): 3.83 - samples/sec: 3655.36 - lr: 0.050000
2023-04-18 19:07:11,684 epoch 23 - iter 16/21 - loss 0.07916444 - time (sec): 4.49 - samples/sec: 3611.96 - lr: 0.050000
2023-04-18 19:07:12,255 epoch 23 - i

100%|██████████| 5/5 [00:01<00:00,  2.86it/s]

2023-04-18 19:07:14,945 Evaluating as a multi-label problem: False
2023-04-18 19:07:14,979 DEV : loss 0.15192443132400513 - f1-score (micro avg)  0.171
2023-04-18 19:07:15,003 BAD EPOCHS (no improvement): 2
2023-04-18 19:07:15,008 ----------------------------------------------------------------------------------------------------





2023-04-18 19:07:15,485 epoch 24 - iter 2/21 - loss 0.06564511 - time (sec): 0.47 - samples/sec: 4246.52 - lr: 0.050000
2023-04-18 19:07:15,865 epoch 24 - iter 4/21 - loss 0.06913184 - time (sec): 0.85 - samples/sec: 4551.20 - lr: 0.050000
2023-04-18 19:07:16,295 epoch 24 - iter 6/21 - loss 0.07636549 - time (sec): 1.28 - samples/sec: 4701.78 - lr: 0.050000
2023-04-18 19:07:16,683 epoch 24 - iter 8/21 - loss 0.08276522 - time (sec): 1.67 - samples/sec: 4734.41 - lr: 0.050000
2023-04-18 19:07:17,147 epoch 24 - iter 10/21 - loss 0.07799617 - time (sec): 2.13 - samples/sec: 4712.06 - lr: 0.050000
2023-04-18 19:07:17,497 epoch 24 - iter 12/21 - loss 0.07613645 - time (sec): 2.48 - samples/sec: 4810.72 - lr: 0.050000
2023-04-18 19:07:17,904 epoch 24 - iter 14/21 - loss 0.07757794 - time (sec): 2.89 - samples/sec: 4918.96 - lr: 0.050000
2023-04-18 19:07:18,300 epoch 24 - iter 16/21 - loss 0.07896314 - time (sec): 3.29 - samples/sec: 4897.38 - lr: 0.050000
2023-04-18 19:07:18,699 epoch 24 - i

100%|██████████| 5/5 [00:01<00:00,  3.90it/s]


2023-04-18 19:07:20,557 Evaluating as a multi-label problem: False
2023-04-18 19:07:20,590 DEV : loss 0.1418699324131012 - f1-score (micro avg)  0.1792
2023-04-18 19:07:20,614 BAD EPOCHS (no improvement): 3
2023-04-18 19:07:20,619 ----------------------------------------------------------------------------------------------------
2023-04-18 19:07:21,095 epoch 25 - iter 2/21 - loss 0.08516321 - time (sec): 0.47 - samples/sec: 4668.77 - lr: 0.050000
2023-04-18 19:07:21,505 epoch 25 - iter 4/21 - loss 0.08694728 - time (sec): 0.89 - samples/sec: 4532.77 - lr: 0.050000
2023-04-18 19:07:21,928 epoch 25 - iter 6/21 - loss 0.08612861 - time (sec): 1.31 - samples/sec: 4713.22 - lr: 0.050000
2023-04-18 19:07:22,334 epoch 25 - iter 8/21 - loss 0.08082706 - time (sec): 1.71 - samples/sec: 4858.07 - lr: 0.050000
2023-04-18 19:07:22,728 epoch 25 - iter 10/21 - loss 0.07906500 - time (sec): 2.11 - samples/sec: 4934.79 - lr: 0.050000
2023-04-18 19:07:23,140 epoch 25 - iter 12/21 - loss 0.07535149 - t

100%|██████████| 5/5 [00:01<00:00,  2.52it/s]

2023-04-18 19:07:27,049 Evaluating as a multi-label problem: False
2023-04-18 19:07:27,107 DEV : loss 0.13605733215808868 - f1-score (micro avg)  0.1855
2023-04-18 19:07:27,154 Epoch    25: reducing learning rate of group 0 to 2.5000e-02.
2023-04-18 19:07:27,161 BAD EPOCHS (no improvement): 4
2023-04-18 19:07:27,169 ----------------------------------------------------------------------------------------------------





2023-04-18 19:07:27,740 epoch 26 - iter 2/21 - loss 0.07083508 - time (sec): 0.57 - samples/sec: 3405.82 - lr: 0.025000
2023-04-18 19:07:28,311 epoch 26 - iter 4/21 - loss 0.06452409 - time (sec): 1.14 - samples/sec: 3333.63 - lr: 0.025000
2023-04-18 19:07:28,918 epoch 26 - iter 6/21 - loss 0.07153895 - time (sec): 1.75 - samples/sec: 3356.86 - lr: 0.025000
2023-04-18 19:07:29,468 epoch 26 - iter 8/21 - loss 0.07614104 - time (sec): 2.30 - samples/sec: 3405.94 - lr: 0.025000
2023-04-18 19:07:30,064 epoch 26 - iter 10/21 - loss 0.07518133 - time (sec): 2.89 - samples/sec: 3380.34 - lr: 0.025000
2023-04-18 19:07:30,507 epoch 26 - iter 12/21 - loss 0.07419702 - time (sec): 3.34 - samples/sec: 3516.85 - lr: 0.025000
2023-04-18 19:07:30,919 epoch 26 - iter 14/21 - loss 0.07449911 - time (sec): 3.75 - samples/sec: 3692.83 - lr: 0.025000
2023-04-18 19:07:31,325 epoch 26 - iter 16/21 - loss 0.07492005 - time (sec): 4.15 - samples/sec: 3843.85 - lr: 0.025000
2023-04-18 19:07:31,948 epoch 26 - i

100%|██████████| 5/5 [00:01<00:00,  3.78it/s]


2023-04-18 19:07:34,086 Evaluating as a multi-label problem: False
2023-04-18 19:07:34,119 DEV : loss 0.13190858066082 - f1-score (micro avg)  0.1893
2023-04-18 19:07:34,142 BAD EPOCHS (no improvement): 0
2023-04-18 19:07:34,147 saving best model
2023-04-18 19:07:36,032 ----------------------------------------------------------------------------------------------------
2023-04-18 19:07:36,554 epoch 27 - iter 2/21 - loss 0.08157466 - time (sec): 0.50 - samples/sec: 4202.97 - lr: 0.025000
2023-04-18 19:07:36,975 epoch 27 - iter 4/21 - loss 0.07351114 - time (sec): 0.92 - samples/sec: 4415.07 - lr: 0.025000
2023-04-18 19:07:37,385 epoch 27 - iter 6/21 - loss 0.07469096 - time (sec): 1.33 - samples/sec: 4512.15 - lr: 0.025000
2023-04-18 19:07:37,800 epoch 27 - iter 8/21 - loss 0.06890802 - time (sec): 1.75 - samples/sec: 4735.80 - lr: 0.025000
2023-04-18 19:07:38,201 epoch 27 - iter 10/21 - loss 0.06879211 - time (sec): 2.15 - samples/sec: 4770.41 - lr: 0.025000
2023-04-18 19:07:38,601 epo

100%|██████████| 5/5 [00:02<00:00,  2.41it/s]

2023-04-18 19:07:42,539 Evaluating as a multi-label problem: False
2023-04-18 19:07:42,630 DEV : loss 0.13099077343940735 - f1-score (micro avg)  0.1904
2023-04-18 19:07:42,676 BAD EPOCHS (no improvement): 0
2023-04-18 19:07:42,685 saving best model





2023-04-18 19:07:45,244 ----------------------------------------------------------------------------------------------------
2023-04-18 19:07:45,955 epoch 28 - iter 2/21 - loss 0.05470464 - time (sec): 0.71 - samples/sec: 2896.29 - lr: 0.025000
2023-04-18 19:07:46,409 epoch 28 - iter 4/21 - loss 0.06249216 - time (sec): 1.16 - samples/sec: 3323.48 - lr: 0.025000
2023-04-18 19:07:46,811 epoch 28 - iter 6/21 - loss 0.07072904 - time (sec): 1.56 - samples/sec: 3729.60 - lr: 0.025000
2023-04-18 19:07:47,251 epoch 28 - iter 8/21 - loss 0.07143631 - time (sec): 2.00 - samples/sec: 4104.44 - lr: 0.025000
2023-04-18 19:07:47,625 epoch 28 - iter 10/21 - loss 0.06819030 - time (sec): 2.38 - samples/sec: 4290.24 - lr: 0.025000
2023-04-18 19:07:48,030 epoch 28 - iter 12/21 - loss 0.06940704 - time (sec): 2.78 - samples/sec: 4383.39 - lr: 0.025000
2023-04-18 19:07:48,471 epoch 28 - iter 14/21 - loss 0.07109807 - time (sec): 3.22 - samples/sec: 4448.97 - lr: 0.025000
2023-04-18 19:07:48,876 epoch 28

100%|██████████| 5/5 [00:01<00:00,  3.98it/s]

2023-04-18 19:07:51,084 Evaluating as a multi-label problem: False
2023-04-18 19:07:51,121 DEV : loss 0.13237814605236053 - f1-score (micro avg)  0.1874
2023-04-18 19:07:51,147 BAD EPOCHS (no improvement): 1
2023-04-18 19:07:51,153 ----------------------------------------------------------------------------------------------------





2023-04-18 19:07:51,584 epoch 29 - iter 2/21 - loss 0.05732999 - time (sec): 0.43 - samples/sec: 4838.60 - lr: 0.025000
2023-04-18 19:07:51,986 epoch 29 - iter 4/21 - loss 0.05266501 - time (sec): 0.83 - samples/sec: 4910.82 - lr: 0.025000
2023-04-18 19:07:52,532 epoch 29 - iter 6/21 - loss 0.06313931 - time (sec): 1.38 - samples/sec: 4571.24 - lr: 0.025000
2023-04-18 19:07:52,992 epoch 29 - iter 8/21 - loss 0.06983929 - time (sec): 1.84 - samples/sec: 4513.65 - lr: 0.025000
2023-04-18 19:07:53,388 epoch 29 - iter 10/21 - loss 0.07026530 - time (sec): 2.23 - samples/sec: 4529.79 - lr: 0.025000
2023-04-18 19:07:53,869 epoch 29 - iter 12/21 - loss 0.06944674 - time (sec): 2.71 - samples/sec: 4477.35 - lr: 0.025000
2023-04-18 19:07:54,347 epoch 29 - iter 14/21 - loss 0.06886577 - time (sec): 3.19 - samples/sec: 4389.77 - lr: 0.025000
2023-04-18 19:07:54,823 epoch 29 - iter 16/21 - loss 0.06984280 - time (sec): 3.67 - samples/sec: 4412.48 - lr: 0.025000
2023-04-18 19:07:55,346 epoch 29 - i

100%|██████████| 5/5 [00:04<00:00,  1.19it/s]

2023-04-18 19:08:00,383 Evaluating as a multi-label problem: False





2023-04-18 19:08:00,562 DEV : loss 0.13316360116004944 - f1-score (micro avg)  0.1893
2023-04-18 19:08:00,657 BAD EPOCHS (no improvement): 2
2023-04-18 19:08:00,684 ----------------------------------------------------------------------------------------------------
2023-04-18 19:08:01,789 epoch 30 - iter 2/21 - loss 0.06304785 - time (sec): 1.10 - samples/sec: 1855.08 - lr: 0.025000
2023-04-18 19:08:02,691 epoch 30 - iter 4/21 - loss 0.06762183 - time (sec): 2.00 - samples/sec: 1928.61 - lr: 0.025000
2023-04-18 19:08:03,296 epoch 30 - iter 6/21 - loss 0.06870079 - time (sec): 2.61 - samples/sec: 2277.76 - lr: 0.025000
2023-04-18 19:08:03,800 epoch 30 - iter 8/21 - loss 0.06619327 - time (sec): 3.11 - samples/sec: 2505.65 - lr: 0.025000
2023-04-18 19:08:04,407 epoch 30 - iter 10/21 - loss 0.06550409 - time (sec): 3.72 - samples/sec: 2640.84 - lr: 0.025000
2023-04-18 19:08:05,041 epoch 30 - iter 12/21 - loss 0.06649343 - time (sec): 4.35 - samples/sec: 2800.50 - lr: 0.025000
2023-04-18 1

100%|██████████| 5/5 [00:01<00:00,  3.73it/s]

2023-04-18 19:08:09,563 Evaluating as a multi-label problem: False
2023-04-18 19:08:09,596 DEV : loss 0.13544374704360962 - f1-score (micro avg)  0.1896
2023-04-18 19:08:09,620 BAD EPOCHS (no improvement): 3
2023-04-18 19:08:09,625 ----------------------------------------------------------------------------------------------------





2023-04-18 19:08:10,061 epoch 31 - iter 2/21 - loss 0.05992782 - time (sec): 0.43 - samples/sec: 4787.84 - lr: 0.025000
2023-04-18 19:08:10,501 epoch 31 - iter 4/21 - loss 0.06033467 - time (sec): 0.87 - samples/sec: 4806.98 - lr: 0.025000
2023-04-18 19:08:10,991 epoch 31 - iter 6/21 - loss 0.06123703 - time (sec): 1.36 - samples/sec: 4669.07 - lr: 0.025000
2023-04-18 19:08:11,402 epoch 31 - iter 8/21 - loss 0.06742112 - time (sec): 1.77 - samples/sec: 4697.02 - lr: 0.025000
2023-04-18 19:08:11,802 epoch 31 - iter 10/21 - loss 0.07119233 - time (sec): 2.17 - samples/sec: 4843.57 - lr: 0.025000
2023-04-18 19:08:12,167 epoch 31 - iter 12/21 - loss 0.06886989 - time (sec): 2.54 - samples/sec: 4869.12 - lr: 0.025000
2023-04-18 19:08:12,587 epoch 31 - iter 14/21 - loss 0.06911322 - time (sec): 2.96 - samples/sec: 4852.47 - lr: 0.025000
2023-04-18 19:08:13,076 epoch 31 - iter 16/21 - loss 0.06668856 - time (sec): 3.45 - samples/sec: 4741.06 - lr: 0.025000
2023-04-18 19:08:13,558 epoch 31 - i

100%|██████████| 5/5 [00:01<00:00,  2.50it/s]

2023-04-18 19:08:16,320 Evaluating as a multi-label problem: False
2023-04-18 19:08:16,373 DEV : loss 0.13620418310165405 - f1-score (micro avg)  0.1878
2023-04-18 19:08:16,418 Epoch    31: reducing learning rate of group 0 to 1.2500e-02.
2023-04-18 19:08:16,424 BAD EPOCHS (no improvement): 4
2023-04-18 19:08:16,428 ----------------------------------------------------------------------------------------------------





2023-04-18 19:08:17,033 epoch 32 - iter 2/21 - loss 0.05882163 - time (sec): 0.60 - samples/sec: 3263.67 - lr: 0.012500
2023-04-18 19:08:17,643 epoch 32 - iter 4/21 - loss 0.06386118 - time (sec): 1.21 - samples/sec: 3271.92 - lr: 0.012500
2023-04-18 19:08:18,157 epoch 32 - iter 6/21 - loss 0.07145858 - time (sec): 1.73 - samples/sec: 3585.07 - lr: 0.012500
2023-04-18 19:08:18,810 epoch 32 - iter 8/21 - loss 0.06748273 - time (sec): 2.38 - samples/sec: 3400.96 - lr: 0.012500
2023-04-18 19:08:19,396 epoch 32 - iter 10/21 - loss 0.06907194 - time (sec): 2.97 - samples/sec: 3407.76 - lr: 0.012500
2023-04-18 19:08:20,030 epoch 32 - iter 12/21 - loss 0.06988276 - time (sec): 3.60 - samples/sec: 3392.28 - lr: 0.012500
2023-04-18 19:08:20,638 epoch 32 - iter 14/21 - loss 0.07099381 - time (sec): 4.21 - samples/sec: 3384.25 - lr: 0.012500
2023-04-18 19:08:21,163 epoch 32 - iter 16/21 - loss 0.07012097 - time (sec): 4.73 - samples/sec: 3408.04 - lr: 0.012500
2023-04-18 19:08:21,842 epoch 32 - i

100%|██████████| 5/5 [00:01<00:00,  2.51it/s]

2023-04-18 19:08:24,687 Evaluating as a multi-label problem: False
2023-04-18 19:08:24,743 DEV : loss 0.13788273930549622 - f1-score (micro avg)  0.1874
2023-04-18 19:08:24,788 BAD EPOCHS (no improvement): 1
2023-04-18 19:08:24,798 ----------------------------------------------------------------------------------------------------





2023-04-18 19:08:25,420 epoch 33 - iter 2/21 - loss 0.05568042 - time (sec): 0.62 - samples/sec: 3391.83 - lr: 0.012500
2023-04-18 19:08:25,974 epoch 33 - iter 4/21 - loss 0.05576589 - time (sec): 1.17 - samples/sec: 3125.42 - lr: 0.012500
2023-04-18 19:08:26,622 epoch 33 - iter 6/21 - loss 0.05946252 - time (sec): 1.82 - samples/sec: 3250.20 - lr: 0.012500
2023-04-18 19:08:27,257 epoch 33 - iter 8/21 - loss 0.05454367 - time (sec): 2.45 - samples/sec: 3314.15 - lr: 0.012500
2023-04-18 19:08:27,797 epoch 33 - iter 10/21 - loss 0.05754731 - time (sec): 2.99 - samples/sec: 3245.40 - lr: 0.012500
2023-04-18 19:08:28,377 epoch 33 - iter 12/21 - loss 0.05923050 - time (sec): 3.57 - samples/sec: 3302.44 - lr: 0.012500
2023-04-18 19:08:29,046 epoch 33 - iter 14/21 - loss 0.05840698 - time (sec): 4.24 - samples/sec: 3330.71 - lr: 0.012500
2023-04-18 19:08:29,579 epoch 33 - iter 16/21 - loss 0.06281644 - time (sec): 4.77 - samples/sec: 3362.71 - lr: 0.012500
2023-04-18 19:08:30,146 epoch 33 - i

100%|██████████| 5/5 [00:03<00:00,  1.65it/s]

2023-04-18 19:08:34,023 Evaluating as a multi-label problem: False
2023-04-18 19:08:34,080 DEV : loss 0.13477382063865662 - f1-score (micro avg)  0.1859
2023-04-18 19:08:34,145 BAD EPOCHS (no improvement): 2
2023-04-18 19:08:34,160 ----------------------------------------------------------------------------------------------------





2023-04-18 19:08:34,820 epoch 34 - iter 2/21 - loss 0.05858094 - time (sec): 0.66 - samples/sec: 2904.08 - lr: 0.012500
2023-04-18 19:08:35,507 epoch 34 - iter 4/21 - loss 0.06236962 - time (sec): 1.34 - samples/sec: 2844.11 - lr: 0.012500
2023-04-18 19:08:36,093 epoch 34 - iter 6/21 - loss 0.05978187 - time (sec): 1.93 - samples/sec: 3047.61 - lr: 0.012500
2023-04-18 19:08:36,717 epoch 34 - iter 8/21 - loss 0.06326810 - time (sec): 2.55 - samples/sec: 3175.25 - lr: 0.012500
2023-04-18 19:08:37,326 epoch 34 - iter 10/21 - loss 0.06530290 - time (sec): 3.16 - samples/sec: 3225.06 - lr: 0.012500
2023-04-18 19:08:37,985 epoch 34 - iter 12/21 - loss 0.06753367 - time (sec): 3.82 - samples/sec: 3207.66 - lr: 0.012500
2023-04-18 19:08:38,565 epoch 34 - iter 14/21 - loss 0.06758409 - time (sec): 4.40 - samples/sec: 3255.66 - lr: 0.012500
2023-04-18 19:08:39,112 epoch 34 - iter 16/21 - loss 0.06612026 - time (sec): 4.95 - samples/sec: 3335.15 - lr: 0.012500
2023-04-18 19:08:39,514 epoch 34 - i

100%|██████████| 5/5 [00:01<00:00,  3.74it/s]

2023-04-18 19:08:41,456 Evaluating as a multi-label problem: False
2023-04-18 19:08:41,490 DEV : loss 0.1353333741426468 - f1-score (micro avg)  0.193
2023-04-18 19:08:41,522 BAD EPOCHS (no improvement): 0
2023-04-18 19:08:41,528 saving best model





2023-04-18 19:08:43,703 ----------------------------------------------------------------------------------------------------
2023-04-18 19:08:44,748 epoch 35 - iter 2/21 - loss 0.06087575 - time (sec): 1.04 - samples/sec: 1867.63 - lr: 0.012500
2023-04-18 19:08:45,933 epoch 35 - iter 4/21 - loss 0.06097286 - time (sec): 2.22 - samples/sec: 1826.68 - lr: 0.012500
2023-04-18 19:08:46,861 epoch 35 - iter 6/21 - loss 0.05829249 - time (sec): 3.15 - samples/sec: 1903.44 - lr: 0.012500
2023-04-18 19:08:47,856 epoch 35 - iter 8/21 - loss 0.05704183 - time (sec): 4.15 - samples/sec: 1919.52 - lr: 0.012500
2023-04-18 19:08:49,052 epoch 35 - iter 10/21 - loss 0.05979413 - time (sec): 5.34 - samples/sec: 1877.94 - lr: 0.012500
2023-04-18 19:08:49,976 epoch 35 - iter 12/21 - loss 0.06351486 - time (sec): 6.27 - samples/sec: 1912.78 - lr: 0.012500
2023-04-18 19:08:50,789 epoch 35 - iter 14/21 - loss 0.06654803 - time (sec): 7.08 - samples/sec: 1996.65 - lr: 0.012500
2023-04-18 19:08:51,422 epoch 35

100%|██████████| 5/5 [00:02<00:00,  2.07it/s]

2023-04-18 19:08:55,500 Evaluating as a multi-label problem: False
2023-04-18 19:08:55,565 DEV : loss 0.13693246245384216 - f1-score (micro avg)  0.19
2023-04-18 19:08:55,608 BAD EPOCHS (no improvement): 1
2023-04-18 19:08:55,617 ----------------------------------------------------------------------------------------------------





2023-04-18 19:08:56,399 epoch 36 - iter 2/21 - loss 0.09463889 - time (sec): 0.77 - samples/sec: 2184.65 - lr: 0.012500
2023-04-18 19:08:56,961 epoch 36 - iter 4/21 - loss 0.08510583 - time (sec): 1.34 - samples/sec: 2608.50 - lr: 0.012500
2023-04-18 19:08:57,460 epoch 36 - iter 6/21 - loss 0.07333120 - time (sec): 1.84 - samples/sec: 3008.33 - lr: 0.012500
2023-04-18 19:08:57,973 epoch 36 - iter 8/21 - loss 0.06690047 - time (sec): 2.35 - samples/sec: 3221.91 - lr: 0.012500
2023-04-18 19:08:58,397 epoch 36 - iter 10/21 - loss 0.06726229 - time (sec): 2.77 - samples/sec: 3469.69 - lr: 0.012500
2023-04-18 19:08:58,793 epoch 36 - iter 12/21 - loss 0.06834066 - time (sec): 3.17 - samples/sec: 3597.04 - lr: 0.012500
2023-04-18 19:08:59,245 epoch 36 - iter 14/21 - loss 0.06956385 - time (sec): 3.62 - samples/sec: 3766.47 - lr: 0.012500
2023-04-18 19:08:59,718 epoch 36 - iter 16/21 - loss 0.06956186 - time (sec): 4.09 - samples/sec: 3924.34 - lr: 0.012500
2023-04-18 19:09:00,147 epoch 36 - i

100%|██████████| 5/5 [00:01<00:00,  2.59it/s]


2023-04-18 19:09:04,448 Evaluating as a multi-label problem: False
2023-04-18 19:09:04,507 DEV : loss 0.13829481601715088 - f1-score (micro avg)  0.1882
2023-04-18 19:09:04,551 BAD EPOCHS (no improvement): 2
2023-04-18 19:09:04,561 ----------------------------------------------------------------------------------------------------
2023-04-18 19:09:05,235 epoch 37 - iter 2/21 - loss 0.08078774 - time (sec): 0.67 - samples/sec: 3009.16 - lr: 0.012500
2023-04-18 19:09:05,781 epoch 37 - iter 4/21 - loss 0.07083391 - time (sec): 1.22 - samples/sec: 3363.59 - lr: 0.012500
2023-04-18 19:09:06,195 epoch 37 - iter 6/21 - loss 0.06751136 - time (sec): 1.63 - samples/sec: 3726.68 - lr: 0.012500
2023-04-18 19:09:06,647 epoch 37 - iter 8/21 - loss 0.06486511 - time (sec): 2.08 - samples/sec: 3939.25 - lr: 0.012500
2023-04-18 19:09:07,040 epoch 37 - iter 10/21 - loss 0.06436476 - time (sec): 2.48 - samples/sec: 4090.99 - lr: 0.012500
2023-04-18 19:09:07,526 epoch 37 - iter 12/21 - loss 0.06378047 - 

100%|██████████| 5/5 [00:01<00:00,  3.93it/s]


2023-04-18 19:09:10,505 Evaluating as a multi-label problem: False
2023-04-18 19:09:10,539 DEV : loss 0.13633853197097778 - f1-score (micro avg)  0.1904
2023-04-18 19:09:10,564 BAD EPOCHS (no improvement): 3
2023-04-18 19:09:10,570 ----------------------------------------------------------------------------------------------------
2023-04-18 19:09:10,986 epoch 38 - iter 2/21 - loss 0.05327627 - time (sec): 0.41 - samples/sec: 4927.07 - lr: 0.012500
2023-04-18 19:09:11,428 epoch 38 - iter 4/21 - loss 0.05600410 - time (sec): 0.85 - samples/sec: 4589.61 - lr: 0.012500
2023-04-18 19:09:11,848 epoch 38 - iter 6/21 - loss 0.06142990 - time (sec): 1.27 - samples/sec: 4702.59 - lr: 0.012500
2023-04-18 19:09:12,271 epoch 38 - iter 8/21 - loss 0.06743795 - time (sec): 1.70 - samples/sec: 4903.19 - lr: 0.012500
2023-04-18 19:09:12,638 epoch 38 - iter 10/21 - loss 0.06527019 - time (sec): 2.06 - samples/sec: 4854.25 - lr: 0.012500
2023-04-18 19:09:13,014 epoch 38 - iter 12/21 - loss 0.06708839 - 

100%|██████████| 5/5 [00:01<00:00,  3.22it/s]

2023-04-18 19:09:16,344 Evaluating as a multi-label problem: False
2023-04-18 19:09:16,399 DEV : loss 0.1407056301832199 - f1-score (micro avg)  0.1911
2023-04-18 19:09:16,444 Epoch    38: reducing learning rate of group 0 to 6.2500e-03.
2023-04-18 19:09:16,450 BAD EPOCHS (no improvement): 4
2023-04-18 19:09:16,454 ----------------------------------------------------------------------------------------------------





2023-04-18 19:09:16,987 epoch 39 - iter 2/21 - loss 0.07017260 - time (sec): 0.53 - samples/sec: 3391.47 - lr: 0.006250
2023-04-18 19:09:17,585 epoch 39 - iter 4/21 - loss 0.07758742 - time (sec): 1.13 - samples/sec: 3394.98 - lr: 0.006250
2023-04-18 19:09:18,110 epoch 39 - iter 6/21 - loss 0.07054878 - time (sec): 1.65 - samples/sec: 3549.90 - lr: 0.006250
2023-04-18 19:09:18,640 epoch 39 - iter 8/21 - loss 0.06919701 - time (sec): 2.18 - samples/sec: 3514.75 - lr: 0.006250
2023-04-18 19:09:19,222 epoch 39 - iter 10/21 - loss 0.06665778 - time (sec): 2.77 - samples/sec: 3568.60 - lr: 0.006250
2023-04-18 19:09:19,846 epoch 39 - iter 12/21 - loss 0.06605375 - time (sec): 3.39 - samples/sec: 3522.77 - lr: 0.006250
2023-04-18 19:09:20,595 epoch 39 - iter 14/21 - loss 0.06499671 - time (sec): 4.14 - samples/sec: 3434.04 - lr: 0.006250
2023-04-18 19:09:21,062 epoch 39 - iter 16/21 - loss 0.06458995 - time (sec): 4.61 - samples/sec: 3477.79 - lr: 0.006250
2023-04-18 19:09:21,497 epoch 39 - i

100%|██████████| 5/5 [00:01<00:00,  3.89it/s]


2023-04-18 19:09:23,440 Evaluating as a multi-label problem: False
2023-04-18 19:09:23,480 DEV : loss 0.1377653181552887 - f1-score (micro avg)  0.19
2023-04-18 19:09:23,507 BAD EPOCHS (no improvement): 1
2023-04-18 19:09:23,513 ----------------------------------------------------------------------------------------------------
2023-04-18 19:09:23,945 epoch 40 - iter 2/21 - loss 0.05496888 - time (sec): 0.43 - samples/sec: 4393.88 - lr: 0.006250
2023-04-18 19:09:24,378 epoch 40 - iter 4/21 - loss 0.06527235 - time (sec): 0.86 - samples/sec: 4905.15 - lr: 0.006250
2023-04-18 19:09:24,751 epoch 40 - iter 6/21 - loss 0.05761363 - time (sec): 1.24 - samples/sec: 4879.85 - lr: 0.006250
2023-04-18 19:09:25,144 epoch 40 - iter 8/21 - loss 0.05772867 - time (sec): 1.63 - samples/sec: 5059.39 - lr: 0.006250
2023-04-18 19:09:25,509 epoch 40 - iter 10/21 - loss 0.05840334 - time (sec): 1.99 - samples/sec: 5097.18 - lr: 0.006250
2023-04-18 19:09:25,882 epoch 40 - iter 12/21 - loss 0.05934638 - tim

100%|██████████| 5/5 [00:01<00:00,  3.89it/s]

2023-04-18 19:09:28,985 Evaluating as a multi-label problem: False
2023-04-18 19:09:29,019 DEV : loss 0.13802160322666168 - f1-score (micro avg)  0.193
2023-04-18 19:09:29,042 BAD EPOCHS (no improvement): 2
2023-04-18 19:09:29,047 ----------------------------------------------------------------------------------------------------





2023-04-18 19:09:29,482 epoch 41 - iter 2/21 - loss 0.04444286 - time (sec): 0.43 - samples/sec: 5148.13 - lr: 0.006250
2023-04-18 19:09:29,948 epoch 41 - iter 4/21 - loss 0.06284822 - time (sec): 0.90 - samples/sec: 4945.75 - lr: 0.006250
2023-04-18 19:09:30,335 epoch 41 - iter 6/21 - loss 0.06627426 - time (sec): 1.28 - samples/sec: 4815.22 - lr: 0.006250
2023-04-18 19:09:30,782 epoch 41 - iter 8/21 - loss 0.06178607 - time (sec): 1.73 - samples/sec: 4818.87 - lr: 0.006250
2023-04-18 19:09:31,345 epoch 41 - iter 10/21 - loss 0.06417340 - time (sec): 2.29 - samples/sec: 4537.65 - lr: 0.006250
2023-04-18 19:09:31,921 epoch 41 - iter 12/21 - loss 0.06582850 - time (sec): 2.87 - samples/sec: 4379.01 - lr: 0.006250
2023-04-18 19:09:32,452 epoch 41 - iter 14/21 - loss 0.06427934 - time (sec): 3.40 - samples/sec: 4317.30 - lr: 0.006250
2023-04-18 19:09:33,009 epoch 41 - iter 16/21 - loss 0.06190015 - time (sec): 3.96 - samples/sec: 4217.66 - lr: 0.006250
2023-04-18 19:09:33,522 epoch 41 - i

100%|██████████| 5/5 [00:02<00:00,  2.41it/s]

2023-04-18 19:09:36,414 Evaluating as a multi-label problem: False
2023-04-18 19:09:36,448 DEV : loss 0.13765326142311096 - f1-score (micro avg)  0.1889
2023-04-18 19:09:36,471 BAD EPOCHS (no improvement): 3
2023-04-18 19:09:36,479 ----------------------------------------------------------------------------------------------------





2023-04-18 19:09:36,964 epoch 42 - iter 2/21 - loss 0.05883834 - time (sec): 0.48 - samples/sec: 4510.98 - lr: 0.006250
2023-04-18 19:09:37,375 epoch 42 - iter 4/21 - loss 0.05966785 - time (sec): 0.89 - samples/sec: 4545.77 - lr: 0.006250
2023-04-18 19:09:37,759 epoch 42 - iter 6/21 - loss 0.05511524 - time (sec): 1.28 - samples/sec: 4724.85 - lr: 0.006250
2023-04-18 19:09:38,155 epoch 42 - iter 8/21 - loss 0.05876061 - time (sec): 1.68 - samples/sec: 4765.81 - lr: 0.006250
2023-04-18 19:09:38,568 epoch 42 - iter 10/21 - loss 0.05949513 - time (sec): 2.09 - samples/sec: 4846.98 - lr: 0.006250
2023-04-18 19:09:38,955 epoch 42 - iter 12/21 - loss 0.06009685 - time (sec): 2.47 - samples/sec: 4842.61 - lr: 0.006250
2023-04-18 19:09:39,346 epoch 42 - iter 14/21 - loss 0.05967133 - time (sec): 2.87 - samples/sec: 4853.73 - lr: 0.006250
2023-04-18 19:09:39,771 epoch 42 - iter 16/21 - loss 0.06438797 - time (sec): 3.29 - samples/sec: 4841.11 - lr: 0.006250
2023-04-18 19:09:40,157 epoch 42 - i

100%|██████████| 5/5 [00:01<00:00,  3.75it/s]

2023-04-18 19:09:42,153 Evaluating as a multi-label problem: False
2023-04-18 19:09:42,187 DEV : loss 0.13851115107536316 - f1-score (micro avg)  0.1911
2023-04-18 19:09:42,212 Epoch    42: reducing learning rate of group 0 to 3.1250e-03.
2023-04-18 19:09:42,213 BAD EPOCHS (no improvement): 4
2023-04-18 19:09:42,220 ----------------------------------------------------------------------------------------------------





2023-04-18 19:09:42,649 epoch 43 - iter 2/21 - loss 0.07652895 - time (sec): 0.43 - samples/sec: 4674.48 - lr: 0.003125
2023-04-18 19:09:43,070 epoch 43 - iter 4/21 - loss 0.06770600 - time (sec): 0.85 - samples/sec: 4686.03 - lr: 0.003125
2023-04-18 19:09:43,498 epoch 43 - iter 6/21 - loss 0.06333184 - time (sec): 1.28 - samples/sec: 4668.45 - lr: 0.003125
2023-04-18 19:09:43,922 epoch 43 - iter 8/21 - loss 0.06599364 - time (sec): 1.70 - samples/sec: 4859.84 - lr: 0.003125
2023-04-18 19:09:44,318 epoch 43 - iter 10/21 - loss 0.06555970 - time (sec): 2.10 - samples/sec: 4827.56 - lr: 0.003125
2023-04-18 19:09:44,797 epoch 43 - iter 12/21 - loss 0.06640495 - time (sec): 2.57 - samples/sec: 4824.89 - lr: 0.003125
2023-04-18 19:09:45,189 epoch 43 - iter 14/21 - loss 0.06583982 - time (sec): 2.97 - samples/sec: 4863.00 - lr: 0.003125
2023-04-18 19:09:45,561 epoch 43 - iter 16/21 - loss 0.06428669 - time (sec): 3.34 - samples/sec: 4879.67 - lr: 0.003125
2023-04-18 19:09:45,945 epoch 43 - i

100%|██████████| 5/5 [00:01<00:00,  2.61it/s]

2023-04-18 19:09:48,527 Evaluating as a multi-label problem: False
2023-04-18 19:09:48,589 DEV : loss 0.13714449107646942 - f1-score (micro avg)  0.1904
2023-04-18 19:09:48,635 BAD EPOCHS (no improvement): 1
2023-04-18 19:09:48,645 ----------------------------------------------------------------------------------------------------





2023-04-18 19:09:49,176 epoch 44 - iter 2/21 - loss 0.05090388 - time (sec): 0.53 - samples/sec: 3322.86 - lr: 0.003125
2023-04-18 19:09:49,749 epoch 44 - iter 4/21 - loss 0.05906730 - time (sec): 1.10 - samples/sec: 3370.00 - lr: 0.003125
2023-04-18 19:09:50,394 epoch 44 - iter 6/21 - loss 0.06768605 - time (sec): 1.75 - samples/sec: 3341.26 - lr: 0.003125
2023-04-18 19:09:50,993 epoch 44 - iter 8/21 - loss 0.06584609 - time (sec): 2.35 - samples/sec: 3347.41 - lr: 0.003125
2023-04-18 19:09:51,670 epoch 44 - iter 10/21 - loss 0.06656935 - time (sec): 3.02 - samples/sec: 3303.09 - lr: 0.003125
2023-04-18 19:09:52,191 epoch 44 - iter 12/21 - loss 0.06528729 - time (sec): 3.54 - samples/sec: 3438.14 - lr: 0.003125
2023-04-18 19:09:52,649 epoch 44 - iter 14/21 - loss 0.06304491 - time (sec): 4.00 - samples/sec: 3584.80 - lr: 0.003125
2023-04-18 19:09:53,052 epoch 44 - iter 16/21 - loss 0.06465381 - time (sec): 4.41 - samples/sec: 3700.06 - lr: 0.003125
2023-04-18 19:09:53,511 epoch 44 - i

100%|██████████| 5/5 [00:01<00:00,  3.65it/s]

2023-04-18 19:09:55,779 Evaluating as a multi-label problem: False
2023-04-18 19:09:55,814 DEV : loss 0.13726116716861725 - f1-score (micro avg)  0.1904
2023-04-18 19:09:55,842 BAD EPOCHS (no improvement): 2
2023-04-18 19:09:55,848 ----------------------------------------------------------------------------------------------------





2023-04-18 19:09:56,288 epoch 45 - iter 2/21 - loss 0.06630815 - time (sec): 0.44 - samples/sec: 4560.62 - lr: 0.003125
2023-04-18 19:09:56,678 epoch 45 - iter 4/21 - loss 0.05798296 - time (sec): 0.83 - samples/sec: 4388.20 - lr: 0.003125
2023-04-18 19:09:57,105 epoch 45 - iter 6/21 - loss 0.06086333 - time (sec): 1.26 - samples/sec: 4603.48 - lr: 0.003125
2023-04-18 19:09:57,557 epoch 45 - iter 8/21 - loss 0.06297152 - time (sec): 1.71 - samples/sec: 4553.71 - lr: 0.003125
2023-04-18 19:09:57,927 epoch 45 - iter 10/21 - loss 0.06316249 - time (sec): 2.08 - samples/sec: 4652.88 - lr: 0.003125
2023-04-18 19:09:58,389 epoch 45 - iter 12/21 - loss 0.06361830 - time (sec): 2.54 - samples/sec: 4648.85 - lr: 0.003125
2023-04-18 19:09:58,768 epoch 45 - iter 14/21 - loss 0.06405455 - time (sec): 2.92 - samples/sec: 4667.40 - lr: 0.003125
2023-04-18 19:09:59,168 epoch 45 - iter 16/21 - loss 0.06217668 - time (sec): 3.32 - samples/sec: 4746.01 - lr: 0.003125
2023-04-18 19:09:59,596 epoch 45 - i

100%|██████████| 5/5 [00:01<00:00,  3.85it/s]

2023-04-18 19:10:01,481 Evaluating as a multi-label problem: False
2023-04-18 19:10:01,519 DEV : loss 0.13733716309070587 - f1-score (micro avg)  0.1904
2023-04-18 19:10:01,545 BAD EPOCHS (no improvement): 3
2023-04-18 19:10:01,550 ----------------------------------------------------------------------------------------------------





2023-04-18 19:10:02,003 epoch 46 - iter 2/21 - loss 0.05202655 - time (sec): 0.45 - samples/sec: 4325.79 - lr: 0.003125
2023-04-18 19:10:02,598 epoch 46 - iter 4/21 - loss 0.06265441 - time (sec): 1.05 - samples/sec: 3891.53 - lr: 0.003125
2023-04-18 19:10:03,148 epoch 46 - iter 6/21 - loss 0.06036495 - time (sec): 1.60 - samples/sec: 3865.01 - lr: 0.003125
2023-04-18 19:10:03,728 epoch 46 - iter 8/21 - loss 0.05962838 - time (sec): 2.18 - samples/sec: 3770.18 - lr: 0.003125
2023-04-18 19:10:04,298 epoch 46 - iter 10/21 - loss 0.06088200 - time (sec): 2.75 - samples/sec: 3737.30 - lr: 0.003125
2023-04-18 19:10:04,867 epoch 46 - iter 12/21 - loss 0.06074638 - time (sec): 3.32 - samples/sec: 3679.05 - lr: 0.003125
2023-04-18 19:10:05,415 epoch 46 - iter 14/21 - loss 0.06203798 - time (sec): 3.86 - samples/sec: 3690.88 - lr: 0.003125
2023-04-18 19:10:05,982 epoch 46 - iter 16/21 - loss 0.06379159 - time (sec): 4.43 - samples/sec: 3666.14 - lr: 0.003125
2023-04-18 19:10:06,551 epoch 46 - i

100%|██████████| 5/5 [00:01<00:00,  3.81it/s]

2023-04-18 19:10:08,763 Evaluating as a multi-label problem: False
2023-04-18 19:10:08,802 DEV : loss 0.13694538176059723 - f1-score (micro avg)  0.1889
2023-04-18 19:10:08,827 Epoch    46: reducing learning rate of group 0 to 1.5625e-03.
2023-04-18 19:10:08,830 BAD EPOCHS (no improvement): 4
2023-04-18 19:10:08,837 ----------------------------------------------------------------------------------------------------





2023-04-18 19:10:09,285 epoch 47 - iter 2/21 - loss 0.05569834 - time (sec): 0.45 - samples/sec: 4605.51 - lr: 0.001563
2023-04-18 19:10:09,704 epoch 47 - iter 4/21 - loss 0.06466276 - time (sec): 0.86 - samples/sec: 4603.51 - lr: 0.001563
2023-04-18 19:10:10,071 epoch 47 - iter 6/21 - loss 0.06174817 - time (sec): 1.23 - samples/sec: 4711.71 - lr: 0.001563
2023-04-18 19:10:10,500 epoch 47 - iter 8/21 - loss 0.05973404 - time (sec): 1.66 - samples/sec: 4854.76 - lr: 0.001563
2023-04-18 19:10:10,904 epoch 47 - iter 10/21 - loss 0.05894313 - time (sec): 2.06 - samples/sec: 5019.57 - lr: 0.001563
2023-04-18 19:10:11,312 epoch 47 - iter 12/21 - loss 0.05799440 - time (sec): 2.47 - samples/sec: 5054.97 - lr: 0.001563
2023-04-18 19:10:11,694 epoch 47 - iter 14/21 - loss 0.05855957 - time (sec): 2.85 - samples/sec: 5078.78 - lr: 0.001563
2023-04-18 19:10:12,093 epoch 47 - iter 16/21 - loss 0.06021288 - time (sec): 3.25 - samples/sec: 5050.91 - lr: 0.001563
2023-04-18 19:10:12,535 epoch 47 - i

100%|██████████| 5/5 [00:01<00:00,  3.74it/s]

2023-04-18 19:10:14,472 Evaluating as a multi-label problem: False
2023-04-18 19:10:14,510 DEV : loss 0.1373702883720398 - f1-score (micro avg)  0.1908
2023-04-18 19:10:14,535 BAD EPOCHS (no improvement): 1
2023-04-18 19:10:14,541 ----------------------------------------------------------------------------------------------------





2023-04-18 19:10:14,997 epoch 48 - iter 2/21 - loss 0.06855756 - time (sec): 0.45 - samples/sec: 4486.95 - lr: 0.001563
2023-04-18 19:10:15,395 epoch 48 - iter 4/21 - loss 0.06549395 - time (sec): 0.85 - samples/sec: 4743.75 - lr: 0.001563
2023-04-18 19:10:15,846 epoch 48 - iter 6/21 - loss 0.06918624 - time (sec): 1.30 - samples/sec: 4654.16 - lr: 0.001563
2023-04-18 19:10:16,210 epoch 48 - iter 8/21 - loss 0.06907042 - time (sec): 1.67 - samples/sec: 4727.03 - lr: 0.001563
2023-04-18 19:10:16,606 epoch 48 - iter 10/21 - loss 0.07246528 - time (sec): 2.06 - samples/sec: 4756.15 - lr: 0.001563
2023-04-18 19:10:17,038 epoch 48 - iter 12/21 - loss 0.06969432 - time (sec): 2.50 - samples/sec: 4813.77 - lr: 0.001563
2023-04-18 19:10:17,490 epoch 48 - iter 14/21 - loss 0.06833460 - time (sec): 2.95 - samples/sec: 4707.64 - lr: 0.001563
2023-04-18 19:10:18,039 epoch 48 - iter 16/21 - loss 0.06880619 - time (sec): 3.50 - samples/sec: 4532.58 - lr: 0.001563
2023-04-18 19:10:18,621 epoch 48 - i

100%|██████████| 5/5 [00:02<00:00,  2.43it/s]

2023-04-18 19:10:21,569 Evaluating as a multi-label problem: False
2023-04-18 19:10:21,630 DEV : loss 0.13750578463077545 - f1-score (micro avg)  0.1904
2023-04-18 19:10:21,669 BAD EPOCHS (no improvement): 2
2023-04-18 19:10:21,674 ----------------------------------------------------------------------------------------------------





2023-04-18 19:10:22,270 epoch 49 - iter 2/21 - loss 0.05555748 - time (sec): 0.59 - samples/sec: 3407.95 - lr: 0.001563
2023-04-18 19:10:22,850 epoch 49 - iter 4/21 - loss 0.07249843 - time (sec): 1.17 - samples/sec: 3375.39 - lr: 0.001563
2023-04-18 19:10:23,290 epoch 49 - iter 6/21 - loss 0.06270150 - time (sec): 1.61 - samples/sec: 3625.51 - lr: 0.001563
2023-04-18 19:10:23,761 epoch 49 - iter 8/21 - loss 0.06567686 - time (sec): 2.09 - samples/sec: 3894.87 - lr: 0.001563
2023-04-18 19:10:24,148 epoch 49 - iter 10/21 - loss 0.06715542 - time (sec): 2.47 - samples/sec: 4111.19 - lr: 0.001563
2023-04-18 19:10:24,525 epoch 49 - iter 12/21 - loss 0.06533443 - time (sec): 2.85 - samples/sec: 4248.16 - lr: 0.001563
2023-04-18 19:10:24,946 epoch 49 - iter 14/21 - loss 0.06463988 - time (sec): 3.27 - samples/sec: 4331.57 - lr: 0.001563
2023-04-18 19:10:25,373 epoch 49 - iter 16/21 - loss 0.06246375 - time (sec): 3.70 - samples/sec: 4327.73 - lr: 0.001563
2023-04-18 19:10:25,801 epoch 49 - i

100%|██████████| 5/5 [00:01<00:00,  3.87it/s]

2023-04-18 19:10:27,717 Evaluating as a multi-label problem: False
2023-04-18 19:10:27,750 DEV : loss 0.137706458568573 - f1-score (micro avg)  0.1908
2023-04-18 19:10:27,774 BAD EPOCHS (no improvement): 3
2023-04-18 19:10:27,779 ----------------------------------------------------------------------------------------------------





2023-04-18 19:10:28,261 epoch 50 - iter 2/21 - loss 0.07078924 - time (sec): 0.48 - samples/sec: 4668.13 - lr: 0.001563
2023-04-18 19:10:28,688 epoch 50 - iter 4/21 - loss 0.06185084 - time (sec): 0.91 - samples/sec: 4566.64 - lr: 0.001563
2023-04-18 19:10:29,080 epoch 50 - iter 6/21 - loss 0.05995958 - time (sec): 1.30 - samples/sec: 4765.23 - lr: 0.001563
2023-04-18 19:10:29,502 epoch 50 - iter 8/21 - loss 0.06069771 - time (sec): 1.72 - samples/sec: 4812.13 - lr: 0.001563
2023-04-18 19:10:29,893 epoch 50 - iter 10/21 - loss 0.05621650 - time (sec): 2.11 - samples/sec: 4881.69 - lr: 0.001563
2023-04-18 19:10:30,358 epoch 50 - iter 12/21 - loss 0.05554816 - time (sec): 2.58 - samples/sec: 4821.62 - lr: 0.001563
2023-04-18 19:10:30,759 epoch 50 - iter 14/21 - loss 0.05656777 - time (sec): 2.98 - samples/sec: 4869.75 - lr: 0.001563
2023-04-18 19:10:31,133 epoch 50 - iter 16/21 - loss 0.05919135 - time (sec): 3.35 - samples/sec: 4890.86 - lr: 0.001563
2023-04-18 19:10:31,530 epoch 50 - i

100%|██████████| 5/5 [00:01<00:00,  3.09it/s]

2023-04-18 19:10:33,747 Evaluating as a multi-label problem: False
2023-04-18 19:10:33,804 DEV : loss 0.13756518065929413 - f1-score (micro avg)  0.1915
2023-04-18 19:10:33,844 Epoch    50: reducing learning rate of group 0 to 7.8125e-04.
2023-04-18 19:10:33,846 BAD EPOCHS (no improvement): 4
2023-04-18 19:10:33,851 ----------------------------------------------------------------------------------------------------





2023-04-18 19:10:34,469 epoch 51 - iter 2/21 - loss 0.06242178 - time (sec): 0.62 - samples/sec: 3467.49 - lr: 0.000781
2023-04-18 19:10:35,037 epoch 51 - iter 4/21 - loss 0.05961645 - time (sec): 1.18 - samples/sec: 3522.53 - lr: 0.000781
2023-04-18 19:10:35,612 epoch 51 - iter 6/21 - loss 0.05507582 - time (sec): 1.76 - samples/sec: 3522.31 - lr: 0.000781
2023-04-18 19:10:36,180 epoch 51 - iter 8/21 - loss 0.05307684 - time (sec): 2.33 - samples/sec: 3492.69 - lr: 0.000781
2023-04-18 19:10:36,858 epoch 51 - iter 10/21 - loss 0.05525203 - time (sec): 3.01 - samples/sec: 3422.24 - lr: 0.000781
2023-04-18 19:10:37,416 epoch 51 - iter 12/21 - loss 0.05547154 - time (sec): 3.56 - samples/sec: 3319.89 - lr: 0.000781
2023-04-18 19:10:38,022 epoch 51 - iter 14/21 - loss 0.05848523 - time (sec): 4.17 - samples/sec: 3345.98 - lr: 0.000781
2023-04-18 19:10:38,545 epoch 51 - iter 16/21 - loss 0.05909290 - time (sec): 4.69 - samples/sec: 3407.88 - lr: 0.000781
2023-04-18 19:10:38,979 epoch 51 - i

100%|██████████| 5/5 [00:01<00:00,  3.99it/s]


2023-04-18 19:10:40,883 Evaluating as a multi-label problem: False
2023-04-18 19:10:40,920 DEV : loss 0.1376531571149826 - f1-score (micro avg)  0.1904
2023-04-18 19:10:40,945 BAD EPOCHS (no improvement): 1
2023-04-18 19:10:40,951 ----------------------------------------------------------------------------------------------------
2023-04-18 19:10:41,391 epoch 52 - iter 2/21 - loss 0.07212329 - time (sec): 0.44 - samples/sec: 5065.44 - lr: 0.000781
2023-04-18 19:10:41,757 epoch 52 - iter 4/21 - loss 0.06598952 - time (sec): 0.80 - samples/sec: 5011.16 - lr: 0.000781
2023-04-18 19:10:42,192 epoch 52 - iter 6/21 - loss 0.06357974 - time (sec): 1.24 - samples/sec: 4882.76 - lr: 0.000781
2023-04-18 19:10:42,589 epoch 52 - iter 8/21 - loss 0.06102625 - time (sec): 1.64 - samples/sec: 4888.83 - lr: 0.000781
2023-04-18 19:10:42,998 epoch 52 - iter 10/21 - loss 0.06074436 - time (sec): 2.05 - samples/sec: 4907.49 - lr: 0.000781
2023-04-18 19:10:43,402 epoch 52 - iter 12/21 - loss 0.06109101 - t

100%|██████████| 5/5 [00:01<00:00,  3.99it/s]


2023-04-18 19:10:46,430 Evaluating as a multi-label problem: False
2023-04-18 19:10:46,464 DEV : loss 0.13768690824508667 - f1-score (micro avg)  0.1923
2023-04-18 19:10:46,489 BAD EPOCHS (no improvement): 2
2023-04-18 19:10:46,494 ----------------------------------------------------------------------------------------------------
2023-04-18 19:10:46,915 epoch 53 - iter 2/21 - loss 0.06706919 - time (sec): 0.41 - samples/sec: 4935.02 - lr: 0.000781
2023-04-18 19:10:47,347 epoch 53 - iter 4/21 - loss 0.06235737 - time (sec): 0.84 - samples/sec: 4949.77 - lr: 0.000781
2023-04-18 19:10:47,745 epoch 53 - iter 6/21 - loss 0.06353176 - time (sec): 1.24 - samples/sec: 5031.13 - lr: 0.000781
2023-04-18 19:10:48,149 epoch 53 - iter 8/21 - loss 0.06595269 - time (sec): 1.65 - samples/sec: 5039.56 - lr: 0.000781
2023-04-18 19:10:48,603 epoch 53 - iter 10/21 - loss 0.06542484 - time (sec): 2.10 - samples/sec: 4845.05 - lr: 0.000781
2023-04-18 19:10:49,105 epoch 53 - iter 12/21 - loss 0.06180723 - 

100%|██████████| 5/5 [00:04<00:00,  1.08it/s]

2023-04-18 19:10:57,818 Evaluating as a multi-label problem: False
2023-04-18 19:10:57,851 DEV : loss 0.13716845214366913 - f1-score (micro avg)  0.1926
2023-04-18 19:10:57,875 BAD EPOCHS (no improvement): 3
2023-04-18 19:10:57,881 ----------------------------------------------------------------------------------------------------





2023-04-18 19:10:58,503 epoch 54 - iter 2/21 - loss 0.07570976 - time (sec): 0.62 - samples/sec: 3762.10 - lr: 0.000781
2023-04-18 19:10:59,041 epoch 54 - iter 4/21 - loss 0.07072107 - time (sec): 1.16 - samples/sec: 3823.90 - lr: 0.000781
2023-04-18 19:10:59,597 epoch 54 - iter 6/21 - loss 0.06612990 - time (sec): 1.71 - samples/sec: 3660.54 - lr: 0.000781
2023-04-18 19:11:00,250 epoch 54 - iter 8/21 - loss 0.06570689 - time (sec): 2.37 - samples/sec: 3565.79 - lr: 0.000781
2023-04-18 19:11:00,764 epoch 54 - iter 10/21 - loss 0.06791306 - time (sec): 2.88 - samples/sec: 3606.49 - lr: 0.000781
2023-04-18 19:11:01,317 epoch 54 - iter 12/21 - loss 0.06792127 - time (sec): 3.43 - samples/sec: 3641.82 - lr: 0.000781
2023-04-18 19:11:01,898 epoch 54 - iter 14/21 - loss 0.06666692 - time (sec): 4.01 - samples/sec: 3543.65 - lr: 0.000781
2023-04-18 19:11:02,432 epoch 54 - iter 16/21 - loss 0.06461720 - time (sec): 4.55 - samples/sec: 3558.02 - lr: 0.000781
2023-04-18 19:11:02,985 epoch 54 - i

100%|██████████| 5/5 [00:02<00:00,  2.15it/s]

2023-04-18 19:11:06,089 Evaluating as a multi-label problem: False
2023-04-18 19:11:06,196 DEV : loss 0.13703589141368866 - f1-score (micro avg)  0.1926





2023-04-18 19:11:06,286 Epoch    54: reducing learning rate of group 0 to 3.9063e-04.
2023-04-18 19:11:06,290 BAD EPOCHS (no improvement): 4
2023-04-18 19:11:06,311 ----------------------------------------------------------------------------------------------------
2023-04-18 19:11:07,289 epoch 55 - iter 2/21 - loss 0.04894595 - time (sec): 0.97 - samples/sec: 1947.60 - lr: 0.000391
2023-04-18 19:11:08,222 epoch 55 - iter 4/21 - loss 0.05490827 - time (sec): 1.91 - samples/sec: 2026.04 - lr: 0.000391
2023-04-18 19:11:08,983 epoch 55 - iter 6/21 - loss 0.06059174 - time (sec): 2.67 - samples/sec: 2149.30 - lr: 0.000391
2023-04-18 19:11:10,288 epoch 55 - iter 8/21 - loss 0.05891756 - time (sec): 3.97 - samples/sec: 1981.90 - lr: 0.000391
2023-04-18 19:11:11,131 epoch 55 - iter 10/21 - loss 0.05870259 - time (sec): 4.82 - samples/sec: 2037.08 - lr: 0.000391
2023-04-18 19:11:11,744 epoch 55 - iter 12/21 - loss 0.06001115 - time (sec): 5.43 - samples/sec: 2170.40 - lr: 0.000391
2023-04-18 1

100%|██████████| 5/5 [00:01<00:00,  2.82it/s]

2023-04-18 19:11:16,245 Evaluating as a multi-label problem: False
2023-04-18 19:11:16,283 DEV : loss 0.1370275914669037 - f1-score (micro avg)  0.1926
2023-04-18 19:11:16,309 BAD EPOCHS (no improvement): 1
2023-04-18 19:11:16,314 ----------------------------------------------------------------------------------------------------





2023-04-18 19:11:16,806 epoch 56 - iter 2/21 - loss 0.05461170 - time (sec): 0.49 - samples/sec: 4705.18 - lr: 0.000391
2023-04-18 19:11:17,205 epoch 56 - iter 4/21 - loss 0.04637715 - time (sec): 0.89 - samples/sec: 4762.88 - lr: 0.000391
2023-04-18 19:11:17,630 epoch 56 - iter 6/21 - loss 0.05496760 - time (sec): 1.31 - samples/sec: 4826.05 - lr: 0.000391
2023-04-18 19:11:18,039 epoch 56 - iter 8/21 - loss 0.05353023 - time (sec): 1.72 - samples/sec: 4899.71 - lr: 0.000391
2023-04-18 19:11:18,410 epoch 56 - iter 10/21 - loss 0.05425722 - time (sec): 2.09 - samples/sec: 4865.74 - lr: 0.000391
2023-04-18 19:11:18,799 epoch 56 - iter 12/21 - loss 0.05522479 - time (sec): 2.48 - samples/sec: 4859.70 - lr: 0.000391
2023-04-18 19:11:19,269 epoch 56 - iter 14/21 - loss 0.05736328 - time (sec): 2.95 - samples/sec: 4841.26 - lr: 0.000391
2023-04-18 19:11:19,688 epoch 56 - iter 16/21 - loss 0.05923209 - time (sec): 3.37 - samples/sec: 4851.22 - lr: 0.000391
2023-04-18 19:11:20,108 epoch 56 - i

100%|██████████| 5/5 [00:01<00:00,  2.66it/s]

2023-04-18 19:11:22,625 Evaluating as a multi-label problem: False
2023-04-18 19:11:22,677 DEV : loss 0.1369936317205429 - f1-score (micro avg)  0.1926
2023-04-18 19:11:22,716 BAD EPOCHS (no improvement): 2
2023-04-18 19:11:22,724 ----------------------------------------------------------------------------------------------------





2023-04-18 19:11:23,298 epoch 57 - iter 2/21 - loss 0.08631243 - time (sec): 0.57 - samples/sec: 3250.84 - lr: 0.000391
2023-04-18 19:11:23,994 epoch 57 - iter 4/21 - loss 0.06129952 - time (sec): 1.27 - samples/sec: 3131.18 - lr: 0.000391
2023-04-18 19:11:24,625 epoch 57 - iter 6/21 - loss 0.06432249 - time (sec): 1.90 - samples/sec: 3280.32 - lr: 0.000391
2023-04-18 19:11:25,205 epoch 57 - iter 8/21 - loss 0.06274792 - time (sec): 2.48 - samples/sec: 3333.98 - lr: 0.000391
2023-04-18 19:11:25,808 epoch 57 - iter 10/21 - loss 0.06422724 - time (sec): 3.08 - samples/sec: 3318.75 - lr: 0.000391
2023-04-18 19:11:26,418 epoch 57 - iter 12/21 - loss 0.06446094 - time (sec): 3.69 - samples/sec: 3331.89 - lr: 0.000391
2023-04-18 19:11:26,869 epoch 57 - iter 14/21 - loss 0.06338751 - time (sec): 4.14 - samples/sec: 3483.87 - lr: 0.000391
2023-04-18 19:11:27,283 epoch 57 - iter 16/21 - loss 0.06118101 - time (sec): 4.56 - samples/sec: 3596.24 - lr: 0.000391
2023-04-18 19:11:27,660 epoch 57 - i

100%|██████████| 5/5 [00:01<00:00,  2.56it/s]

2023-04-18 19:11:30,418 Evaluating as a multi-label problem: False
2023-04-18 19:11:30,473 DEV : loss 0.13698343932628632 - f1-score (micro avg)  0.1919
2023-04-18 19:11:30,521 BAD EPOCHS (no improvement): 3
2023-04-18 19:11:30,530 ----------------------------------------------------------------------------------------------------





2023-04-18 19:11:31,156 epoch 58 - iter 2/21 - loss 0.08869075 - time (sec): 0.62 - samples/sec: 3433.00 - lr: 0.000391
2023-04-18 19:11:31,858 epoch 58 - iter 4/21 - loss 0.06364332 - time (sec): 1.33 - samples/sec: 3223.08 - lr: 0.000391
2023-04-18 19:11:32,501 epoch 58 - iter 6/21 - loss 0.06232101 - time (sec): 1.97 - samples/sec: 3304.87 - lr: 0.000391
2023-04-18 19:11:33,062 epoch 58 - iter 8/21 - loss 0.06495292 - time (sec): 2.53 - samples/sec: 3369.41 - lr: 0.000391
2023-04-18 19:11:33,538 epoch 58 - iter 10/21 - loss 0.06294052 - time (sec): 3.01 - samples/sec: 3475.09 - lr: 0.000391
2023-04-18 19:11:33,989 epoch 58 - iter 12/21 - loss 0.06382453 - time (sec): 3.46 - samples/sec: 3565.50 - lr: 0.000391
2023-04-18 19:11:34,413 epoch 58 - iter 14/21 - loss 0.06140591 - time (sec): 3.88 - samples/sec: 3770.33 - lr: 0.000391
2023-04-18 19:11:34,823 epoch 58 - iter 16/21 - loss 0.06210947 - time (sec): 4.29 - samples/sec: 3896.41 - lr: 0.000391
2023-04-18 19:11:35,244 epoch 58 - i

100%|██████████| 5/5 [00:01<00:00,  2.97it/s]

2023-04-18 19:11:37,495 Evaluating as a multi-label problem: False
2023-04-18 19:11:37,558 DEV : loss 0.1370149850845337 - f1-score (micro avg)  0.1919
2023-04-18 19:11:37,602 Epoch    58: reducing learning rate of group 0 to 1.9531e-04.
2023-04-18 19:11:37,606 BAD EPOCHS (no improvement): 4
2023-04-18 19:11:37,616 ----------------------------------------------------------------------------------------------------





2023-04-18 19:11:38,264 epoch 59 - iter 2/21 - loss 0.07655566 - time (sec): 0.65 - samples/sec: 3094.21 - lr: 0.000195
2023-04-18 19:11:38,879 epoch 59 - iter 4/21 - loss 0.06980470 - time (sec): 1.26 - samples/sec: 3252.91 - lr: 0.000195
2023-04-18 19:11:39,469 epoch 59 - iter 6/21 - loss 0.06624009 - time (sec): 1.85 - samples/sec: 3340.96 - lr: 0.000195
2023-04-18 19:11:40,096 epoch 59 - iter 8/21 - loss 0.06447733 - time (sec): 2.48 - samples/sec: 3288.88 - lr: 0.000195
2023-04-18 19:11:40,673 epoch 59 - iter 10/21 - loss 0.06491780 - time (sec): 3.06 - samples/sec: 3290.82 - lr: 0.000195
2023-04-18 19:11:41,220 epoch 59 - iter 12/21 - loss 0.06306786 - time (sec): 3.60 - samples/sec: 3335.69 - lr: 0.000195
2023-04-18 19:11:41,846 epoch 59 - iter 14/21 - loss 0.06362482 - time (sec): 4.23 - samples/sec: 3382.38 - lr: 0.000195
2023-04-18 19:11:42,308 epoch 59 - iter 16/21 - loss 0.06367515 - time (sec): 4.69 - samples/sec: 3474.85 - lr: 0.000195
2023-04-18 19:11:42,716 epoch 59 - i

100%|██████████| 5/5 [00:01<00:00,  3.89it/s]

2023-04-18 19:11:44,596 Evaluating as a multi-label problem: False
2023-04-18 19:11:44,629 DEV : loss 0.13697566092014313 - f1-score (micro avg)  0.1919
2023-04-18 19:11:44,654 BAD EPOCHS (no improvement): 1
2023-04-18 19:11:44,660 ----------------------------------------------------------------------------------------------------





2023-04-18 19:11:45,069 epoch 60 - iter 2/21 - loss 0.05246682 - time (sec): 0.41 - samples/sec: 5180.30 - lr: 0.000195
2023-04-18 19:11:45,474 epoch 60 - iter 4/21 - loss 0.04512298 - time (sec): 0.81 - samples/sec: 5194.85 - lr: 0.000195
2023-04-18 19:11:45,914 epoch 60 - iter 6/21 - loss 0.05155849 - time (sec): 1.25 - samples/sec: 5177.86 - lr: 0.000195
2023-04-18 19:11:46,280 epoch 60 - iter 8/21 - loss 0.05163838 - time (sec): 1.62 - samples/sec: 5167.70 - lr: 0.000195
2023-04-18 19:11:46,686 epoch 60 - iter 10/21 - loss 0.05554368 - time (sec): 2.02 - samples/sec: 5073.86 - lr: 0.000195
2023-04-18 19:11:47,112 epoch 60 - iter 12/21 - loss 0.05580978 - time (sec): 2.45 - samples/sec: 5100.15 - lr: 0.000195
2023-04-18 19:11:47,514 epoch 60 - iter 14/21 - loss 0.05776149 - time (sec): 2.85 - samples/sec: 5102.72 - lr: 0.000195
2023-04-18 19:11:47,951 epoch 60 - iter 16/21 - loss 0.05677826 - time (sec): 3.29 - samples/sec: 5012.34 - lr: 0.000195
2023-04-18 19:11:48,326 epoch 60 - i

100%|██████████| 5/5 [00:01<00:00,  3.91it/s]

2023-04-18 19:11:50,183 Evaluating as a multi-label problem: False
2023-04-18 19:11:50,221 DEV : loss 0.13702844083309174 - f1-score (micro avg)  0.1919
2023-04-18 19:11:50,247 BAD EPOCHS (no improvement): 2
2023-04-18 19:11:50,252 ----------------------------------------------------------------------------------------------------





2023-04-18 19:11:50,689 epoch 61 - iter 2/21 - loss 0.06549552 - time (sec): 0.44 - samples/sec: 4733.31 - lr: 0.000195
2023-04-18 19:11:51,094 epoch 61 - iter 4/21 - loss 0.06475379 - time (sec): 0.84 - samples/sec: 4620.02 - lr: 0.000195
2023-04-18 19:11:51,550 epoch 61 - iter 6/21 - loss 0.06002039 - time (sec): 1.30 - samples/sec: 4623.34 - lr: 0.000195
2023-04-18 19:11:51,892 epoch 61 - iter 8/21 - loss 0.06660406 - time (sec): 1.64 - samples/sec: 4744.93 - lr: 0.000195
2023-04-18 19:11:52,294 epoch 61 - iter 10/21 - loss 0.06753940 - time (sec): 2.04 - samples/sec: 4558.56 - lr: 0.000195
2023-04-18 19:11:52,900 epoch 61 - iter 12/21 - loss 0.06655925 - time (sec): 2.65 - samples/sec: 4398.34 - lr: 0.000195
2023-04-18 19:11:53,400 epoch 61 - iter 14/21 - loss 0.06576926 - time (sec): 3.15 - samples/sec: 4311.65 - lr: 0.000195
2023-04-18 19:11:53,945 epoch 61 - iter 16/21 - loss 0.06551468 - time (sec): 3.69 - samples/sec: 4197.88 - lr: 0.000195
2023-04-18 19:11:54,544 epoch 61 - i

100%|██████████| 5/5 [00:01<00:00,  2.51it/s]

2023-04-18 19:11:57,440 Evaluating as a multi-label problem: False
2023-04-18 19:11:57,474 DEV : loss 0.13710422813892365 - f1-score (micro avg)  0.1919
2023-04-18 19:11:57,498 BAD EPOCHS (no improvement): 3
2023-04-18 19:11:57,505 ----------------------------------------------------------------------------------------------------





2023-04-18 19:11:57,982 epoch 62 - iter 2/21 - loss 0.05498650 - time (sec): 0.47 - samples/sec: 3986.40 - lr: 0.000195
2023-04-18 19:11:58,378 epoch 62 - iter 4/21 - loss 0.06325301 - time (sec): 0.87 - samples/sec: 4306.18 - lr: 0.000195
2023-04-18 19:11:58,837 epoch 62 - iter 6/21 - loss 0.06357186 - time (sec): 1.33 - samples/sec: 4549.59 - lr: 0.000195
2023-04-18 19:11:59,265 epoch 62 - iter 8/21 - loss 0.06211690 - time (sec): 1.75 - samples/sec: 4677.10 - lr: 0.000195
2023-04-18 19:11:59,665 epoch 62 - iter 10/21 - loss 0.06386680 - time (sec): 2.15 - samples/sec: 4738.72 - lr: 0.000195
2023-04-18 19:12:00,076 epoch 62 - iter 12/21 - loss 0.06121704 - time (sec): 2.57 - samples/sec: 4802.80 - lr: 0.000195
2023-04-18 19:12:00,466 epoch 62 - iter 14/21 - loss 0.06113730 - time (sec): 2.95 - samples/sec: 4804.91 - lr: 0.000195
2023-04-18 19:12:00,834 epoch 62 - iter 16/21 - loss 0.06107926 - time (sec): 3.32 - samples/sec: 4803.55 - lr: 0.000195
2023-04-18 19:12:01,271 epoch 62 - i

100%|██████████| 5/5 [00:01<00:00,  3.84it/s]

2023-04-18 19:12:03,175 Evaluating as a multi-label problem: False
2023-04-18 19:12:03,210 DEV : loss 0.13701950013637543 - f1-score (micro avg)  0.1919
2023-04-18 19:12:03,239 Epoch    62: reducing learning rate of group 0 to 9.7656e-05.
2023-04-18 19:12:03,240 BAD EPOCHS (no improvement): 4
2023-04-18 19:12:03,249 ----------------------------------------------------------------------------------------------------
2023-04-18 19:12:03,252 ----------------------------------------------------------------------------------------------------
2023-04-18 19:12:03,255 learning rate too small - quitting training!
2023-04-18 19:12:03,256 ----------------------------------------------------------------------------------------------------





2023-04-18 19:12:05,060 ----------------------------------------------------------------------------------------------------
2023-04-18 19:12:07,076 SequenceTagger predicts: Dictionary with 11 tags: <unk>, Beginning of an organization name, Inside an organization name, Beginning of a person name, Beginning of a location name, Inside a person name, Inside a miscellaneous name, Beginning of a miscellaneous name, Inside a location name, <START>, <STOP>


100%|██████████| 4/4 [00:02<00:00,  1.41it/s]

2023-04-18 19:12:10,360 Evaluating as a multi-label problem: False
2023-04-18 19:12:10,412 0.0792	0.7512	0.1434	0.0792
2023-04-18 19:12:10,416 
Results:
- F-score (micro) 0.1434
- F-score (macro) 0.6454
- Accuracy 0.0792

By class:
                                   precision    recall  f1-score   support

                            <unk>     0.0000    0.0000    0.0000         0
Beginning of an organization name     0.7258    0.9000    0.8036       100
      Inside an organization name     0.7320    0.8765    0.7978        81
     Beginning of a location name     0.8594    0.6471    0.7383        85
       Beginning of a person name     0.8261    0.9268    0.8736        41
      Inside a miscellaneous name     0.6667    0.3182    0.4308        44
             Inside a person name     0.9583    0.9200    0.9388        25
Beginning of a miscellaneous name     0.7273    0.4211    0.5333        19
           Inside a location name     0.8182    0.6000    0.6923        15

                




{'test_score': 0.14335582964859206,
 'dev_score_history': [0.004098360655737704,
  0.04694485842026826,
  0.06967213114754099,
  0.13263785394932937,
  0.1568554396423249,
  0.13263785394932937,
  0.1583457526080477,
  0.1672876304023845,
  0.14083457526080478,
  0.17771982116244414,
  0.17101341281669152,
  0.18032786885245902,
  0.1721311475409836,
  0.1844262295081967,
  0.1795827123695976,
  0.1736214605067064,
  0.1836810730253353,
  0.1833084947839046,
  0.1825633383010432,
  0.1870342771982116,
  0.1870342771982116,
  0.1825633383010432,
  0.17101341281669152,
  0.1792101341281669,
  0.1855439642324888,
  0.18926974664679586,
  0.19038748137108794,
  0.18740685543964233,
  0.18926974664679586,
  0.18964232488822652,
  0.18777943368107303,
  0.18740685543964233,
  0.18591654247391953,
  0.19299552906110284,
  0.1900149031296572,
  0.18815201192250372,
  0.19038748137108794,
  0.19113263785394935,
  0.1900149031296572,
  0.19299552906110284,
  0.1888971684053651,
  0.1911326378539

## For German

In [None]:
#First we check what labels the dataset has
from flair.datasets import NER_GERMAN_BIOFID

# Load the corpus
corpus = NER_GERMAN_BIOFID()

label_type='ner'

# Get the label dictionary
label_dict = corpus.make_label_dictionary(label_type=label_type)

# Get the list of labels
labels = label_dict.get_items()

# Print the list of labels
print("Labels:", labels)


2023-04-19 09:22:35,475 https://raw.githubusercontent.com/texttechnologylab/BIOfid/master/BIOfid-Dataset-NER/train.conll not found in cache, downloading to /tmp/tmp6euhe5p6


5.67MB [00:00, 30.2MB/s]

2023-04-19 09:22:35,735 copying /tmp/tmp6euhe5p6 to cache at /root/.flair/datasets/ner_german_biofid/train.conll
2023-04-19 09:22:35,753 removing temp file /tmp/tmp6euhe5p6





2023-04-19 09:22:36,027 https://raw.githubusercontent.com/texttechnologylab/BIOfid/master/BIOfid-Dataset-NER/dev.conll not found in cache, downloading to /tmp/tmpzw77vse6


543kB [00:00, 32.2MB/s]                   

2023-04-19 09:22:36,114 copying /tmp/tmpzw77vse6 to cache at /root/.flair/datasets/ner_german_biofid/dev.conll
2023-04-19 09:22:36,117 removing temp file /tmp/tmpzw77vse6





2023-04-19 09:22:36,393 https://raw.githubusercontent.com/texttechnologylab/BIOfid/master/BIOfid-Dataset-NER/test.conll not found in cache, downloading to /tmp/tmp280to5w7


742kB [00:00, 27.1MB/s]                   

2023-04-19 09:22:36,484 copying /tmp/tmp280to5w7 to cache at /root/.flair/datasets/ner_german_biofid/test.conll
2023-04-19 09:22:36,487 removing temp file /tmp/tmp280to5w7
2023-04-19 09:22:36,490 Reading data from /root/.flair/datasets/ner_german_biofid
2023-04-19 09:22:36,492 Train: /root/.flair/datasets/ner_german_biofid/train.conll
2023-04-19 09:22:36,494 Dev: /root/.flair/datasets/ner_german_biofid/dev.conll
2023-04-19 09:22:36,495 Test: /root/.flair/datasets/ner_german_biofid/test.conll





2023-04-19 09:22:55,724 Computing label dictionary. Progress:


12668it [00:00, 42393.28it/s]


2023-04-19 09:22:56,030 Dictionary created for label 'ner' with 7 values: TAX (seen 12647 times), OTHER (seen 6147 times), LOC (seen 5541 times), PER (seen 4659 times), TME (seen 4244 times), ORG (seen 962 times)
Labels: ['<unk>', 'TAX', 'OTHER', 'LOC', 'PER', 'TME', 'ORG']


In [None]:
from flair.data import Corpus
from flair.datasets import NER_GERMAN_BIOFID
from flair.models import TARSClassifier
from flair.trainers import ModelTrainer
from flair.embeddings import WordEmbeddings, FlairEmbeddings, StackedEmbeddings
from flair.trainers import ModelTrainer
from flair.data import Sentence
from flair.models import SequenceTagger

# 1. define label names in natural language since some datasets come with cryptic set of labels
label_name_map = {'<unk>': 'Unknown',
                  'TAX': 'Tax',
                  'OTHER': 'Other',
                  'LOC': 'Location',
                  'PER': 'Person',
                  'TME': 'Time',
                  'ORG': 'Organization',
                  }

# 2. get the corpus
corpus: Corpus = NER_GERMAN_BIOFID(label_name_map=label_name_map).downsample(0.05)

# 3. what label do you want to predict?
label_type = 'ner'

# 4. make a label dictionary
label_dict = corpus.make_label_dictionary(label_type=label_type)

# 5. start from their existing TARS base model for English
tars = TARSClassifier.load("tars-base")

# 5a: alternatively, comment out previous line and comment in next line to train a new TARS model from scratch instead
#tars = TARSClassifier(embeddings="bert-base-uncased")

# 6. switch to a new task (TARS can do multiple tasks so you must define one)
tars.add_and_switch_to_new_task(task_name="ner-tagging",
                                label_dictionary=label_dict,
                                label_type=label_type,
                                )

# 7. initialize embedding stack with Flair and GloVe
embedding_types = [
    WordEmbeddings('glove'),
    FlairEmbeddings('news-forward'),
    FlairEmbeddings('news-backward'),
]

embeddings = StackedEmbeddings(embeddings=embedding_types)

# 8. initialize sequence tagger
tagger = SequenceTagger(hidden_size=256,
                        embeddings=embeddings,
                        tag_dictionary=label_dict,
                        tag_type=label_type,
                        use_crf=True)


2023-04-20 09:27:35,291 https://raw.githubusercontent.com/texttechnologylab/BIOfid/master/BIOfid-Dataset-NER/train.conll not found in cache, downloading to /tmp/tmptgkr3w6x


5.67MB [00:00, 96.2MB/s]                   

2023-04-20 09:27:35,393 copying /tmp/tmptgkr3w6x to cache at /root/.flair/datasets/ner_german_biofid/train.conll
2023-04-20 09:27:35,407 removing temp file /tmp/tmptgkr3w6x





2023-04-20 09:27:35,669 https://raw.githubusercontent.com/texttechnologylab/BIOfid/master/BIOfid-Dataset-NER/dev.conll not found in cache, downloading to /tmp/tmp80aojc6s


543kB [00:00, 36.7MB/s]                   

2023-04-20 09:27:35,725 copying /tmp/tmp80aojc6s to cache at /root/.flair/datasets/ner_german_biofid/dev.conll
2023-04-20 09:27:35,728 removing temp file /tmp/tmp80aojc6s





2023-04-20 09:27:36,045 https://raw.githubusercontent.com/texttechnologylab/BIOfid/master/BIOfid-Dataset-NER/test.conll not found in cache, downloading to /tmp/tmpr612wb4e


742kB [00:00, 45.3MB/s]                   

2023-04-20 09:27:36,106 copying /tmp/tmpr612wb4e to cache at /root/.flair/datasets/ner_german_biofid/test.conll
2023-04-20 09:27:36,109 removing temp file /tmp/tmpr612wb4e
2023-04-20 09:27:36,110 Reading data from /root/.flair/datasets/ner_german_biofid
2023-04-20 09:27:36,111 Train: /root/.flair/datasets/ner_german_biofid/train.conll
2023-04-20 09:27:36,112 Dev: /root/.flair/datasets/ner_german_biofid/dev.conll
2023-04-20 09:27:36,113 Test: /root/.flair/datasets/ner_german_biofid/test.conll





2023-04-20 09:27:52,267 Computing label dictionary. Progress:


633it [00:00, 37435.41it/s]

2023-04-20 09:27:52,292 Dictionary created for label 'ner' with 7 values: Tax (seen 518 times), Other (seen 319 times), Person (seen 259 times), Location (seen 246 times), Time (seen 206 times), Organization (seen 46 times)





2023-04-20 09:27:53,094 https://nlp.informatik.hu-berlin.de/resources/models/tars-base/tars-base-v8.pt not found in cache, downloading to /tmp/tmpt8bzpygz


100%|██████████| 418M/418M [00:24<00:00, 17.5MB/s]

2023-04-20 09:28:18,575 copying /tmp/tmpt8bzpygz to cache at /root/.flair/models/tars-base-v8.pt





2023-04-20 09:28:20,037 removing temp file /tmp/tmpt8bzpygz


Downloading (…)okenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

2023-04-20 09:28:25,324 TARS initialized without a task. You need to call .add_and_switch_to_new_task() before training this model
2023-04-20 09:28:26,121 https://flair.informatik.hu-berlin.de/resources/embeddings/token/glove.gensim.vectors.npy not found in cache, downloading to /tmp/tmplm3ta0la


100%|██████████| 153M/153M [00:15<00:00, 10.0MB/s]

2023-04-20 09:28:42,589 copying /tmp/tmplm3ta0la to cache at /root/.flair/embeddings/glove.gensim.vectors.npy





2023-04-20 09:28:42,840 removing temp file /tmp/tmplm3ta0la
2023-04-20 09:28:43,359 https://flair.informatik.hu-berlin.de/resources/embeddings/token/glove.gensim not found in cache, downloading to /tmp/tmpb1wxrp4s


100%|██████████| 20.5M/20.5M [00:02<00:00, 9.19MB/s]

2023-04-20 09:28:46,195 copying /tmp/tmpb1wxrp4s to cache at /root/.flair/embeddings/glove.gensim
2023-04-20 09:28:46,213 removing temp file /tmp/tmpb1wxrp4s





2023-04-20 09:28:49,875 https://flair.informatik.hu-berlin.de/resources/embeddings/flair/news-forward-0.4.1.pt not found in cache, downloading to /tmp/tmpv4j6jywf


100%|██████████| 69.7M/69.7M [00:05<00:00, 14.4MB/s]

2023-04-20 09:28:55,444 copying /tmp/tmpv4j6jywf to cache at /root/.flair/embeddings/news-forward-0.4.1.pt





2023-04-20 09:28:55,513 removing temp file /tmp/tmpv4j6jywf
2023-04-20 09:28:56,343 https://flair.informatik.hu-berlin.de/resources/embeddings/flair/news-backward-0.4.1.pt not found in cache, downloading to /tmp/tmp762zsm6j


100%|██████████| 69.7M/69.7M [00:05<00:00, 14.3MB/s]

2023-04-20 09:29:01,948 copying /tmp/tmp762zsm6j to cache at /root/.flair/embeddings/news-backward-0.4.1.pt





2023-04-20 09:29:02,078 removing temp file /tmp/tmp762zsm6j
2023-04-20 09:29:02,267 SequenceTagger predicts: Dictionary with 25 tags: O, S-Tax, B-Tax, E-Tax, I-Tax, S-Other, B-Other, E-Other, I-Other, S-Person, B-Person, E-Person, I-Person, S-Location, B-Location, E-Location, I-Location, S-Time, B-Time, E-Time, I-Time, S-Organization, B-Organization, E-Organization, I-Organization


In [None]:
# 10. create a ModelTrainer and start training
trainer = ModelTrainer(tagger, corpus)

trainer.train(base_path='/content/drive/MyDrive/ColabNotebooks/nermodels/german',
              learning_rate=0.1,
              mini_batch_size=32,
              max_epochs=150)

2023-04-19 09:30:35,531 ----------------------------------------------------------------------------------------------------
2023-04-19 09:30:35,535 Model: "SequenceTagger(
  (embeddings): StackedEmbeddings(
    (list_embedding_0): WordEmbeddings(
      'glove'
      (embedding): Embedding(400001, 100)
    )
    (list_embedding_1): FlairEmbeddings(
      (lm): LanguageModel(
        (drop): Dropout(p=0.05, inplace=False)
        (encoder): Embedding(300, 100)
        (rnn): LSTM(100, 2048)
      )
    )
    (list_embedding_2): FlairEmbeddings(
      (lm): LanguageModel(
        (drop): Dropout(p=0.05, inplace=False)
        (encoder): Embedding(300, 100)
        (rnn): LSTM(100, 2048)
      )
    )
  )
  (word_dropout): WordDropout(p=0.05)
  (locked_dropout): LockedDropout(p=0.5)
  (embedding2nn): Linear(in_features=4196, out_features=4196, bias=True)
  (rnn): LSTM(4196, 256, batch_first=True, bidirectional=True)
  (linear): Linear(in_features=512, out_features=27, bias=True)
  (loss_f

100%|██████████| 3/3 [00:02<00:00,  1.18it/s]

2023-04-19 09:30:53,237 Evaluating as a multi-label problem: False
2023-04-19 09:30:53,266 DEV : loss 1.6010942459106445 - f1-score (micro avg)  0.0248
2023-04-19 09:30:53,277 BAD EPOCHS (no improvement): 0
2023-04-19 09:30:53,285 saving best model





2023-04-19 09:30:55,304 ----------------------------------------------------------------------------------------------------
2023-04-19 09:30:55,902 epoch 2 - iter 2/20 - loss 1.33800383 - time (sec): 0.58 - samples/sec: 2766.59 - lr: 0.100000
2023-04-19 09:30:56,394 epoch 2 - iter 4/20 - loss 1.37406819 - time (sec): 1.07 - samples/sec: 2775.80 - lr: 0.100000
2023-04-19 09:30:56,833 epoch 2 - iter 6/20 - loss 1.36895672 - time (sec): 1.51 - samples/sec: 2949.88 - lr: 0.100000
2023-04-19 09:30:57,154 epoch 2 - iter 8/20 - loss 1.37226361 - time (sec): 1.83 - samples/sec: 3073.50 - lr: 0.100000
2023-04-19 09:30:57,645 epoch 2 - iter 10/20 - loss 1.32344255 - time (sec): 2.32 - samples/sec: 3093.38 - lr: 0.100000
2023-04-19 09:30:58,175 epoch 2 - iter 12/20 - loss 1.33369324 - time (sec): 2.85 - samples/sec: 3028.98 - lr: 0.100000
2023-04-19 09:30:58,729 epoch 2 - iter 14/20 - loss 1.31765336 - time (sec): 3.40 - samples/sec: 3007.09 - lr: 0.100000
2023-04-19 09:30:59,195 epoch 2 - iter 

100%|██████████| 3/3 [00:00<00:00,  3.06it/s]

2023-04-19 09:31:00,992 Evaluating as a multi-label problem: False
2023-04-19 09:31:01,014 DEV : loss 1.3547207117080688 - f1-score (micro avg)  0.0533
2023-04-19 09:31:01,031 BAD EPOCHS (no improvement): 0
2023-04-19 09:31:01,049 saving best model





2023-04-19 09:31:03,078 ----------------------------------------------------------------------------------------------------
2023-04-19 09:31:03,530 epoch 3 - iter 2/20 - loss 1.07234421 - time (sec): 0.44 - samples/sec: 3469.01 - lr: 0.100000
2023-04-19 09:31:03,916 epoch 3 - iter 4/20 - loss 1.03744655 - time (sec): 0.83 - samples/sec: 3791.15 - lr: 0.100000
2023-04-19 09:31:04,255 epoch 3 - iter 6/20 - loss 1.01984831 - time (sec): 1.17 - samples/sec: 3933.35 - lr: 0.100000
2023-04-19 09:31:04,549 epoch 3 - iter 8/20 - loss 1.06841739 - time (sec): 1.46 - samples/sec: 4001.07 - lr: 0.100000
2023-04-19 09:31:04,978 epoch 3 - iter 10/20 - loss 1.08299769 - time (sec): 1.89 - samples/sec: 3873.65 - lr: 0.100000
2023-04-19 09:31:05,247 epoch 3 - iter 12/20 - loss 1.08807160 - time (sec): 2.16 - samples/sec: 3985.86 - lr: 0.100000
2023-04-19 09:31:05,658 epoch 3 - iter 14/20 - loss 1.08339946 - time (sec): 2.57 - samples/sec: 3979.43 - lr: 0.100000
2023-04-19 09:31:05,925 epoch 3 - iter 

100%|██████████| 3/3 [00:00<00:00,  5.00it/s]

2023-04-19 09:31:07,157 Evaluating as a multi-label problem: False
2023-04-19 09:31:07,172 DEV : loss 1.3167533874511719 - f1-score (micro avg)  0.0446
2023-04-19 09:31:07,182 BAD EPOCHS (no improvement): 1





2023-04-19 09:31:07,191 ----------------------------------------------------------------------------------------------------
2023-04-19 09:31:07,432 epoch 4 - iter 2/20 - loss 1.09564136 - time (sec): 0.24 - samples/sec: 4922.89 - lr: 0.100000
2023-04-19 09:31:07,866 epoch 4 - iter 4/20 - loss 1.10364967 - time (sec): 0.67 - samples/sec: 4009.17 - lr: 0.100000
2023-04-19 09:31:08,222 epoch 4 - iter 6/20 - loss 1.06774389 - time (sec): 1.03 - samples/sec: 4253.83 - lr: 0.100000
2023-04-19 09:31:08,505 epoch 4 - iter 8/20 - loss 1.03685302 - time (sec): 1.31 - samples/sec: 4421.18 - lr: 0.100000
2023-04-19 09:31:08,857 epoch 4 - iter 10/20 - loss 1.04423538 - time (sec): 1.66 - samples/sec: 4330.97 - lr: 0.100000
2023-04-19 09:31:09,110 epoch 4 - iter 12/20 - loss 1.02344532 - time (sec): 1.92 - samples/sec: 4396.69 - lr: 0.100000
2023-04-19 09:31:09,466 epoch 4 - iter 14/20 - loss 1.00293596 - time (sec): 2.27 - samples/sec: 4349.04 - lr: 0.100000
2023-04-19 09:31:09,933 epoch 4 - iter 

100%|██████████| 3/3 [00:00<00:00,  4.66it/s]

2023-04-19 09:31:11,187 Evaluating as a multi-label problem: False
2023-04-19 09:31:11,207 DEV : loss 1.1087384223937988 - f1-score (micro avg)  0.2302
2023-04-19 09:31:11,222 BAD EPOCHS (no improvement): 0
2023-04-19 09:31:11,231 saving best model





2023-04-19 09:31:13,975 ----------------------------------------------------------------------------------------------------
2023-04-19 09:31:14,436 epoch 5 - iter 2/20 - loss 0.93085511 - time (sec): 0.45 - samples/sec: 3265.61 - lr: 0.100000
2023-04-19 09:31:14,893 epoch 5 - iter 4/20 - loss 0.88768293 - time (sec): 0.91 - samples/sec: 3214.64 - lr: 0.100000
2023-04-19 09:31:15,523 epoch 5 - iter 6/20 - loss 0.95504054 - time (sec): 1.54 - samples/sec: 2761.42 - lr: 0.100000
2023-04-19 09:31:16,028 epoch 5 - iter 8/20 - loss 0.95521872 - time (sec): 2.04 - samples/sec: 2802.54 - lr: 0.100000
2023-04-19 09:31:16,414 epoch 5 - iter 10/20 - loss 0.94641905 - time (sec): 2.43 - samples/sec: 2895.10 - lr: 0.100000
2023-04-19 09:31:16,983 epoch 5 - iter 12/20 - loss 0.95350592 - time (sec): 3.00 - samples/sec: 2852.79 - lr: 0.100000
2023-04-19 09:31:17,308 epoch 5 - iter 14/20 - loss 0.92766856 - time (sec): 3.32 - samples/sec: 3037.30 - lr: 0.100000
2023-04-19 09:31:17,680 epoch 5 - iter 

100%|██████████| 3/3 [00:00<00:00,  4.97it/s]

2023-04-19 09:31:18,848 Evaluating as a multi-label problem: False
2023-04-19 09:31:18,862 DEV : loss 1.078316330909729 - f1-score (micro avg)  0.2846
2023-04-19 09:31:18,872 BAD EPOCHS (no improvement): 0





2023-04-19 09:31:18,879 saving best model
2023-04-19 09:31:20,909 ----------------------------------------------------------------------------------------------------
2023-04-19 09:31:21,425 epoch 6 - iter 2/20 - loss 0.90078799 - time (sec): 0.51 - samples/sec: 3004.75 - lr: 0.100000
2023-04-19 09:31:21,725 epoch 6 - iter 4/20 - loss 0.95396633 - time (sec): 0.81 - samples/sec: 3398.76 - lr: 0.100000
2023-04-19 09:31:22,100 epoch 6 - iter 6/20 - loss 0.89845003 - time (sec): 1.18 - samples/sec: 3624.03 - lr: 0.100000
2023-04-19 09:31:22,500 epoch 6 - iter 8/20 - loss 0.87480036 - time (sec): 1.58 - samples/sec: 3557.89 - lr: 0.100000
2023-04-19 09:31:22,869 epoch 6 - iter 10/20 - loss 0.84578787 - time (sec): 1.95 - samples/sec: 3683.89 - lr: 0.100000
2023-04-19 09:31:23,162 epoch 6 - iter 12/20 - loss 0.82276214 - time (sec): 2.25 - samples/sec: 3814.99 - lr: 0.100000
2023-04-19 09:31:23,475 epoch 6 - iter 14/20 - loss 0.83716117 - time (sec): 2.56 - samples/sec: 3994.96 - lr: 0.1000

100%|██████████| 3/3 [00:00<00:00,  4.93it/s]

2023-04-19 09:31:25,016 Evaluating as a multi-label problem: False
2023-04-19 09:31:25,031 DEV : loss 0.9068048000335693 - f1-score (micro avg)  0.2363
2023-04-19 09:31:25,041 BAD EPOCHS (no improvement): 1





2023-04-19 09:31:25,047 ----------------------------------------------------------------------------------------------------
2023-04-19 09:31:25,296 epoch 7 - iter 2/20 - loss 0.77802929 - time (sec): 0.24 - samples/sec: 4855.16 - lr: 0.100000
2023-04-19 09:31:25,634 epoch 7 - iter 4/20 - loss 0.75023800 - time (sec): 0.58 - samples/sec: 4650.66 - lr: 0.100000
2023-04-19 09:31:25,914 epoch 7 - iter 6/20 - loss 0.78326454 - time (sec): 0.86 - samples/sec: 4881.15 - lr: 0.100000
2023-04-19 09:31:26,220 epoch 7 - iter 8/20 - loss 0.78997560 - time (sec): 1.17 - samples/sec: 4804.92 - lr: 0.100000
2023-04-19 09:31:26,548 epoch 7 - iter 10/20 - loss 0.79227005 - time (sec): 1.50 - samples/sec: 4607.96 - lr: 0.100000
2023-04-19 09:31:26,876 epoch 7 - iter 12/20 - loss 0.79241818 - time (sec): 1.82 - samples/sec: 4552.43 - lr: 0.100000
2023-04-19 09:31:27,204 epoch 7 - iter 14/20 - loss 0.79413650 - time (sec): 2.15 - samples/sec: 4435.57 - lr: 0.100000
2023-04-19 09:31:27,682 epoch 7 - iter 

100%|██████████| 3/3 [00:01<00:00,  2.40it/s]

2023-04-19 09:31:30,393 Evaluating as a multi-label problem: False
2023-04-19 09:31:30,418 DEV : loss 0.7715145945549011 - f1-score (micro avg)  0.3214
2023-04-19 09:31:30,435 BAD EPOCHS (no improvement): 0
2023-04-19 09:31:30,444 saving best model





2023-04-19 09:31:33,035 ----------------------------------------------------------------------------------------------------
2023-04-19 09:31:33,509 epoch 8 - iter 2/20 - loss 0.66492973 - time (sec): 0.46 - samples/sec: 2911.10 - lr: 0.100000
2023-04-19 09:31:33,925 epoch 8 - iter 4/20 - loss 0.73589441 - time (sec): 0.88 - samples/sec: 3291.11 - lr: 0.100000
2023-04-19 09:31:34,366 epoch 8 - iter 6/20 - loss 0.70282009 - time (sec): 1.32 - samples/sec: 3254.68 - lr: 0.100000
2023-04-19 09:31:34,673 epoch 8 - iter 8/20 - loss 0.71342034 - time (sec): 1.63 - samples/sec: 3420.37 - lr: 0.100000
2023-04-19 09:31:35,012 epoch 8 - iter 10/20 - loss 0.68261798 - time (sec): 1.97 - samples/sec: 3600.61 - lr: 0.100000
2023-04-19 09:31:35,400 epoch 8 - iter 12/20 - loss 0.70294598 - time (sec): 2.35 - samples/sec: 3643.78 - lr: 0.100000
2023-04-19 09:31:35,770 epoch 8 - iter 14/20 - loss 0.69454059 - time (sec): 2.72 - samples/sec: 3716.22 - lr: 0.100000
2023-04-19 09:31:36,139 epoch 8 - iter 

100%|██████████| 3/3 [00:00<00:00,  4.88it/s]

2023-04-19 09:31:37,400 Evaluating as a multi-label problem: False
2023-04-19 09:31:37,415 DEV : loss 0.8505101203918457 - f1-score (micro avg)  0.3103





2023-04-19 09:31:37,425 BAD EPOCHS (no improvement): 1
2023-04-19 09:31:37,432 ----------------------------------------------------------------------------------------------------
2023-04-19 09:31:37,797 epoch 9 - iter 2/20 - loss 0.70203037 - time (sec): 0.36 - samples/sec: 4461.13 - lr: 0.100000
2023-04-19 09:31:38,099 epoch 9 - iter 4/20 - loss 0.66501340 - time (sec): 0.66 - samples/sec: 4806.55 - lr: 0.100000
2023-04-19 09:31:38,388 epoch 9 - iter 6/20 - loss 0.69828011 - time (sec): 0.95 - samples/sec: 4773.90 - lr: 0.100000
2023-04-19 09:31:38,726 epoch 9 - iter 8/20 - loss 0.69107463 - time (sec): 1.29 - samples/sec: 4506.64 - lr: 0.100000
2023-04-19 09:31:39,080 epoch 9 - iter 10/20 - loss 0.69721045 - time (sec): 1.65 - samples/sec: 4482.90 - lr: 0.100000
2023-04-19 09:31:39,349 epoch 9 - iter 12/20 - loss 0.71791867 - time (sec): 1.91 - samples/sec: 4512.18 - lr: 0.100000
2023-04-19 09:31:39,791 epoch 9 - iter 14/20 - loss 0.70872486 - time (sec): 2.36 - samples/sec: 4352.97

100%|██████████| 3/3 [00:00<00:00,  4.98it/s]

2023-04-19 09:31:41,313 Evaluating as a multi-label problem: False
2023-04-19 09:31:41,328 DEV : loss 0.7049003839492798 - f1-score (micro avg)  0.323





2023-04-19 09:31:41,340 BAD EPOCHS (no improvement): 0
2023-04-19 09:31:41,347 saving best model
2023-04-19 09:31:43,386 ----------------------------------------------------------------------------------------------------
2023-04-19 09:31:43,814 epoch 10 - iter 2/20 - loss 0.63216632 - time (sec): 0.42 - samples/sec: 3452.63 - lr: 0.100000
2023-04-19 09:31:44,285 epoch 10 - iter 4/20 - loss 0.67945410 - time (sec): 0.89 - samples/sec: 3146.03 - lr: 0.100000
2023-04-19 09:31:44,814 epoch 10 - iter 6/20 - loss 0.67360779 - time (sec): 1.42 - samples/sec: 3039.69 - lr: 0.100000
2023-04-19 09:31:45,173 epoch 10 - iter 8/20 - loss 0.69221979 - time (sec): 1.78 - samples/sec: 3191.86 - lr: 0.100000
2023-04-19 09:31:45,554 epoch 10 - iter 10/20 - loss 0.69951986 - time (sec): 2.16 - samples/sec: 3195.13 - lr: 0.100000
2023-04-19 09:31:46,019 epoch 10 - iter 12/20 - loss 0.68018192 - time (sec): 2.62 - samples/sec: 3214.12 - lr: 0.100000
2023-04-19 09:31:46,675 epoch 10 - iter 14/20 - loss 0.6

100%|██████████| 3/3 [00:00<00:00,  3.07it/s]

2023-04-19 09:31:48,880 Evaluating as a multi-label problem: False
2023-04-19 09:31:48,902 DEV : loss 0.6637966632843018 - f1-score (micro avg)  0.3123
2023-04-19 09:31:48,917 BAD EPOCHS (no improvement): 1
2023-04-19 09:31:48,924 ----------------------------------------------------------------------------------------------------





2023-04-19 09:31:49,347 epoch 11 - iter 2/20 - loss 0.65715643 - time (sec): 0.42 - samples/sec: 3075.41 - lr: 0.100000
2023-04-19 09:31:49,813 epoch 11 - iter 4/20 - loss 0.64153909 - time (sec): 0.89 - samples/sec: 2681.56 - lr: 0.100000
2023-04-19 09:31:50,264 epoch 11 - iter 6/20 - loss 0.66250041 - time (sec): 1.34 - samples/sec: 2859.38 - lr: 0.100000
2023-04-19 09:31:50,764 epoch 11 - iter 8/20 - loss 0.66518183 - time (sec): 1.84 - samples/sec: 2865.34 - lr: 0.100000
2023-04-19 09:31:51,168 epoch 11 - iter 10/20 - loss 0.66176437 - time (sec): 2.24 - samples/sec: 2928.85 - lr: 0.100000
2023-04-19 09:31:51,766 epoch 11 - iter 12/20 - loss 0.64720432 - time (sec): 2.84 - samples/sec: 2899.76 - lr: 0.100000
2023-04-19 09:31:52,137 epoch 11 - iter 14/20 - loss 0.64944374 - time (sec): 3.21 - samples/sec: 2991.88 - lr: 0.100000
2023-04-19 09:31:52,508 epoch 11 - iter 16/20 - loss 0.63207828 - time (sec): 3.58 - samples/sec: 3075.92 - lr: 0.100000
2023-04-19 09:31:53,007 epoch 11 - i

100%|██████████| 3/3 [00:00<00:00,  3.70it/s]

2023-04-19 09:31:54,307 Evaluating as a multi-label problem: False
2023-04-19 09:31:54,321 DEV : loss 0.6811742782592773 - f1-score (micro avg)  0.3509
2023-04-19 09:31:54,332 BAD EPOCHS (no improvement): 0
2023-04-19 09:31:54,339 saving best model





2023-04-19 09:31:56,342 ----------------------------------------------------------------------------------------------------
2023-04-19 09:31:56,923 epoch 12 - iter 2/20 - loss 0.57440710 - time (sec): 0.56 - samples/sec: 2809.61 - lr: 0.100000
2023-04-19 09:31:57,295 epoch 12 - iter 4/20 - loss 0.55861594 - time (sec): 0.93 - samples/sec: 3322.05 - lr: 0.100000
2023-04-19 09:31:57,626 epoch 12 - iter 6/20 - loss 0.57906778 - time (sec): 1.26 - samples/sec: 3550.01 - lr: 0.100000
2023-04-19 09:31:57,971 epoch 12 - iter 8/20 - loss 0.58076104 - time (sec): 1.61 - samples/sec: 3654.02 - lr: 0.100000
2023-04-19 09:31:58,368 epoch 12 - iter 10/20 - loss 0.64448154 - time (sec): 2.00 - samples/sec: 3720.06 - lr: 0.100000
2023-04-19 09:31:58,673 epoch 12 - iter 12/20 - loss 0.65472013 - time (sec): 2.31 - samples/sec: 3876.27 - lr: 0.100000
2023-04-19 09:31:58,989 epoch 12 - iter 14/20 - loss 0.64214567 - time (sec): 2.63 - samples/sec: 3943.56 - lr: 0.100000
2023-04-19 09:31:59,358 epoch 12

100%|██████████| 3/3 [00:00<00:00,  3.16it/s]

2023-04-19 09:32:01,025 Evaluating as a multi-label problem: False
2023-04-19 09:32:01,045 DEV : loss 0.6732955574989319 - f1-score (micro avg)  0.4582
2023-04-19 09:32:01,061 BAD EPOCHS (no improvement): 0
2023-04-19 09:32:01,066 saving best model





2023-04-19 09:32:03,663 ----------------------------------------------------------------------------------------------------
2023-04-19 09:32:04,176 epoch 13 - iter 2/20 - loss 0.58336662 - time (sec): 0.50 - samples/sec: 2586.61 - lr: 0.100000
2023-04-19 09:32:04,650 epoch 13 - iter 4/20 - loss 0.58496441 - time (sec): 0.97 - samples/sec: 2897.10 - lr: 0.100000
2023-04-19 09:32:05,082 epoch 13 - iter 6/20 - loss 0.60679968 - time (sec): 1.40 - samples/sec: 2908.82 - lr: 0.100000
2023-04-19 09:32:05,476 epoch 13 - iter 8/20 - loss 0.60719910 - time (sec): 1.80 - samples/sec: 3013.96 - lr: 0.100000
2023-04-19 09:32:06,003 epoch 13 - iter 10/20 - loss 0.60563541 - time (sec): 2.32 - samples/sec: 3007.03 - lr: 0.100000
2023-04-19 09:32:06,467 epoch 13 - iter 12/20 - loss 0.60391788 - time (sec): 2.79 - samples/sec: 2976.28 - lr: 0.100000
2023-04-19 09:32:06,838 epoch 13 - iter 14/20 - loss 0.58399189 - time (sec): 3.16 - samples/sec: 3126.08 - lr: 0.100000
2023-04-19 09:32:07,144 epoch 13

100%|██████████| 3/3 [00:00<00:00,  5.01it/s]

2023-04-19 09:32:08,570 Evaluating as a multi-label problem: False
2023-04-19 09:32:08,589 DEV : loss 0.6265740394592285 - f1-score (micro avg)  0.3946





2023-04-19 09:32:08,600 BAD EPOCHS (no improvement): 1
2023-04-19 09:32:08,608 ----------------------------------------------------------------------------------------------------
2023-04-19 09:32:08,925 epoch 14 - iter 2/20 - loss 0.62935671 - time (sec): 0.31 - samples/sec: 4572.86 - lr: 0.100000
2023-04-19 09:32:09,212 epoch 14 - iter 4/20 - loss 0.55427763 - time (sec): 0.60 - samples/sec: 4614.44 - lr: 0.100000
2023-04-19 09:32:09,607 epoch 14 - iter 6/20 - loss 0.56020044 - time (sec): 1.00 - samples/sec: 4185.15 - lr: 0.100000
2023-04-19 09:32:10,092 epoch 14 - iter 8/20 - loss 0.55483127 - time (sec): 1.48 - samples/sec: 3731.25 - lr: 0.100000
2023-04-19 09:32:10,561 epoch 14 - iter 10/20 - loss 0.56153392 - time (sec): 1.95 - samples/sec: 3572.91 - lr: 0.100000
2023-04-19 09:32:10,989 epoch 14 - iter 12/20 - loss 0.57551034 - time (sec): 2.38 - samples/sec: 3424.89 - lr: 0.100000
2023-04-19 09:32:11,434 epoch 14 - iter 14/20 - loss 0.56743252 - time (sec): 2.82 - samples/sec: 

100%|██████████| 3/3 [00:00<00:00,  3.13it/s]

2023-04-19 09:32:13,826 Evaluating as a multi-label problem: False
2023-04-19 09:32:13,851 DEV : loss 0.5792855024337769 - f1-score (micro avg)  0.4759
2023-04-19 09:32:13,866 BAD EPOCHS (no improvement): 0
2023-04-19 09:32:13,881 saving best model





2023-04-19 09:32:15,925 ----------------------------------------------------------------------------------------------------
2023-04-19 09:32:16,510 epoch 15 - iter 2/20 - loss 0.56496252 - time (sec): 0.46 - samples/sec: 2868.49 - lr: 0.100000
2023-04-19 09:32:16,972 epoch 15 - iter 4/20 - loss 0.56339330 - time (sec): 0.93 - samples/sec: 2939.33 - lr: 0.100000
2023-04-19 09:32:17,439 epoch 15 - iter 6/20 - loss 0.55600886 - time (sec): 1.39 - samples/sec: 3061.75 - lr: 0.100000
2023-04-19 09:32:17,856 epoch 15 - iter 8/20 - loss 0.57332844 - time (sec): 1.81 - samples/sec: 3079.34 - lr: 0.100000
2023-04-19 09:32:18,325 epoch 15 - iter 10/20 - loss 0.56334826 - time (sec): 2.28 - samples/sec: 3078.96 - lr: 0.100000
2023-04-19 09:32:18,784 epoch 15 - iter 12/20 - loss 0.56280099 - time (sec): 2.74 - samples/sec: 3083.10 - lr: 0.100000
2023-04-19 09:32:19,255 epoch 15 - iter 14/20 - loss 0.56674287 - time (sec): 3.21 - samples/sec: 3106.54 - lr: 0.100000
2023-04-19 09:32:19,688 epoch 15

100%|██████████| 3/3 [00:00<00:00,  3.12it/s]

2023-04-19 09:32:21,717 Evaluating as a multi-label problem: False
2023-04-19 09:32:21,739 DEV : loss 0.5914086103439331 - f1-score (micro avg)  0.457
2023-04-19 09:32:21,756 BAD EPOCHS (no improvement): 1
2023-04-19 09:32:21,762 ----------------------------------------------------------------------------------------------------





2023-04-19 09:32:22,290 epoch 16 - iter 2/20 - loss 0.56752940 - time (sec): 0.52 - samples/sec: 2622.19 - lr: 0.100000
2023-04-19 09:32:22,630 epoch 16 - iter 4/20 - loss 0.53687792 - time (sec): 0.86 - samples/sec: 3295.20 - lr: 0.100000
2023-04-19 09:32:22,964 epoch 16 - iter 6/20 - loss 0.50631648 - time (sec): 1.20 - samples/sec: 3713.48 - lr: 0.100000
2023-04-19 09:32:23,475 epoch 16 - iter 8/20 - loss 0.53065691 - time (sec): 1.71 - samples/sec: 3547.00 - lr: 0.100000
2023-04-19 09:32:24,003 epoch 16 - iter 10/20 - loss 0.52541076 - time (sec): 2.24 - samples/sec: 3333.09 - lr: 0.100000
2023-04-19 09:32:24,466 epoch 16 - iter 12/20 - loss 0.53750639 - time (sec): 2.70 - samples/sec: 3241.47 - lr: 0.100000
2023-04-19 09:32:24,741 epoch 16 - iter 14/20 - loss 0.53935830 - time (sec): 2.98 - samples/sec: 3306.74 - lr: 0.100000
2023-04-19 09:32:25,116 epoch 16 - iter 16/20 - loss 0.53078759 - time (sec): 3.35 - samples/sec: 3404.86 - lr: 0.100000
2023-04-19 09:32:25,555 epoch 16 - i

100%|██████████| 3/3 [00:00<00:00,  3.75it/s]

2023-04-19 09:32:26,771 Evaluating as a multi-label problem: False
2023-04-19 09:32:26,796 DEV : loss 0.557583212852478 - f1-score (micro avg)  0.4367
2023-04-19 09:32:26,817 BAD EPOCHS (no improvement): 2
2023-04-19 09:32:26,826 ----------------------------------------------------------------------------------------------------





2023-04-19 09:32:27,236 epoch 17 - iter 2/20 - loss 0.53627022 - time (sec): 0.40 - samples/sec: 3245.74 - lr: 0.100000
2023-04-19 09:32:27,835 epoch 17 - iter 4/20 - loss 0.49791989 - time (sec): 1.00 - samples/sec: 2944.28 - lr: 0.100000
2023-04-19 09:32:28,317 epoch 17 - iter 6/20 - loss 0.50649307 - time (sec): 1.49 - samples/sec: 3185.32 - lr: 0.100000
2023-04-19 09:32:28,555 epoch 17 - iter 8/20 - loss 0.51268502 - time (sec): 1.72 - samples/sec: 3479.92 - lr: 0.100000
2023-04-19 09:32:28,860 epoch 17 - iter 10/20 - loss 0.50858502 - time (sec): 2.03 - samples/sec: 3567.62 - lr: 0.100000
2023-04-19 09:32:29,197 epoch 17 - iter 12/20 - loss 0.50142185 - time (sec): 2.37 - samples/sec: 3709.71 - lr: 0.100000
2023-04-19 09:32:29,563 epoch 17 - iter 14/20 - loss 0.49984055 - time (sec): 2.73 - samples/sec: 3721.87 - lr: 0.100000
2023-04-19 09:32:29,850 epoch 17 - iter 16/20 - loss 0.49102590 - time (sec): 3.02 - samples/sec: 3874.37 - lr: 0.100000
2023-04-19 09:32:30,109 epoch 17 - i

100%|██████████| 3/3 [00:00<00:00,  4.94it/s]

2023-04-19 09:32:31,063 Evaluating as a multi-label problem: False
2023-04-19 09:32:31,077 DEV : loss 0.6090171933174133 - f1-score (micro avg)  0.3642





2023-04-19 09:32:31,088 BAD EPOCHS (no improvement): 3
2023-04-19 09:32:31,094 ----------------------------------------------------------------------------------------------------
2023-04-19 09:32:31,432 epoch 18 - iter 2/20 - loss 0.54254503 - time (sec): 0.34 - samples/sec: 3917.92 - lr: 0.100000
2023-04-19 09:32:31,754 epoch 18 - iter 4/20 - loss 0.54586662 - time (sec): 0.66 - samples/sec: 4270.01 - lr: 0.100000
2023-04-19 09:32:32,061 epoch 18 - iter 6/20 - loss 0.52117453 - time (sec): 0.97 - samples/sec: 4350.60 - lr: 0.100000
2023-04-19 09:32:32,542 epoch 18 - iter 8/20 - loss 0.50178635 - time (sec): 1.45 - samples/sec: 4019.55 - lr: 0.100000
2023-04-19 09:32:33,093 epoch 18 - iter 10/20 - loss 0.52351530 - time (sec): 2.00 - samples/sec: 3691.37 - lr: 0.100000
2023-04-19 09:32:33,554 epoch 18 - iter 12/20 - loss 0.51498398 - time (sec): 2.46 - samples/sec: 3616.42 - lr: 0.100000
2023-04-19 09:32:33,928 epoch 18 - iter 14/20 - loss 0.51213270 - time (sec): 2.83 - samples/sec: 

100%|██████████| 3/3 [00:00<00:00,  3.01it/s]

2023-04-19 09:32:36,177 Evaluating as a multi-label problem: False
2023-04-19 09:32:36,198 DEV : loss 0.48969781398773193 - f1-score (micro avg)  0.4451
2023-04-19 09:32:36,215 Epoch    18: reducing learning rate of group 0 to 5.0000e-02.
2023-04-19 09:32:36,220 BAD EPOCHS (no improvement): 4
2023-04-19 09:32:36,224 ----------------------------------------------------------------------------------------------------





2023-04-19 09:32:36,720 epoch 19 - iter 2/20 - loss 0.49211277 - time (sec): 0.49 - samples/sec: 2553.16 - lr: 0.050000
2023-04-19 09:32:37,158 epoch 19 - iter 4/20 - loss 0.43029247 - time (sec): 0.93 - samples/sec: 2946.61 - lr: 0.050000
2023-04-19 09:32:37,651 epoch 19 - iter 6/20 - loss 0.41170542 - time (sec): 1.42 - samples/sec: 2811.19 - lr: 0.050000
2023-04-19 09:32:38,020 epoch 19 - iter 8/20 - loss 0.42476385 - time (sec): 1.79 - samples/sec: 3082.80 - lr: 0.050000
2023-04-19 09:32:38,332 epoch 19 - iter 10/20 - loss 0.41812207 - time (sec): 2.11 - samples/sec: 3289.46 - lr: 0.050000
2023-04-19 09:32:38,711 epoch 19 - iter 12/20 - loss 0.44553911 - time (sec): 2.48 - samples/sec: 3379.63 - lr: 0.050000
2023-04-19 09:32:39,029 epoch 19 - iter 14/20 - loss 0.44923877 - time (sec): 2.80 - samples/sec: 3477.76 - lr: 0.050000
2023-04-19 09:32:39,437 epoch 19 - iter 16/20 - loss 0.45543712 - time (sec): 3.21 - samples/sec: 3563.16 - lr: 0.050000
2023-04-19 09:32:39,875 epoch 19 - i

100%|██████████| 3/3 [00:00<00:00,  4.96it/s]

2023-04-19 09:32:40,803 Evaluating as a multi-label problem: False
2023-04-19 09:32:40,817 DEV : loss 0.45113518834114075 - f1-score (micro avg)  0.5233





2023-04-19 09:32:40,828 BAD EPOCHS (no improvement): 0
2023-04-19 09:32:40,842 saving best model
2023-04-19 09:32:42,872 ----------------------------------------------------------------------------------------------------
2023-04-19 09:32:43,297 epoch 20 - iter 2/20 - loss 0.45449918 - time (sec): 0.41 - samples/sec: 3645.22 - lr: 0.050000
2023-04-19 09:32:43,672 epoch 20 - iter 4/20 - loss 0.46326154 - time (sec): 0.78 - samples/sec: 3757.86 - lr: 0.050000
2023-04-19 09:32:44,153 epoch 20 - iter 6/20 - loss 0.47724961 - time (sec): 1.26 - samples/sec: 3743.46 - lr: 0.050000
2023-04-19 09:32:44,483 epoch 20 - iter 8/20 - loss 0.46361693 - time (sec): 1.59 - samples/sec: 3885.39 - lr: 0.050000
2023-04-19 09:32:44,769 epoch 20 - iter 10/20 - loss 0.47028721 - time (sec): 1.88 - samples/sec: 4008.85 - lr: 0.050000
2023-04-19 09:32:45,026 epoch 20 - iter 12/20 - loss 0.46861160 - time (sec): 2.13 - samples/sec: 4055.06 - lr: 0.050000
2023-04-19 09:32:45,316 epoch 20 - iter 14/20 - loss 0.4

100%|██████████| 3/3 [00:00<00:00,  4.80it/s]

2023-04-19 09:32:46,832 Evaluating as a multi-label problem: False
2023-04-19 09:32:46,846 DEV : loss 0.48551392555236816 - f1-score (micro avg)  0.4904
2023-04-19 09:32:46,859 BAD EPOCHS (no improvement): 1





2023-04-19 09:32:46,868 ----------------------------------------------------------------------------------------------------
2023-04-19 09:32:47,164 epoch 21 - iter 2/20 - loss 0.41759146 - time (sec): 0.29 - samples/sec: 4648.79 - lr: 0.050000
2023-04-19 09:32:47,620 epoch 21 - iter 4/20 - loss 0.38382273 - time (sec): 0.75 - samples/sec: 3977.23 - lr: 0.050000
2023-04-19 09:32:48,042 epoch 21 - iter 6/20 - loss 0.39852055 - time (sec): 1.17 - samples/sec: 3786.61 - lr: 0.050000
2023-04-19 09:32:48,582 epoch 21 - iter 8/20 - loss 0.42630035 - time (sec): 1.71 - samples/sec: 3393.20 - lr: 0.050000
2023-04-19 09:32:49,234 epoch 21 - iter 10/20 - loss 0.41817946 - time (sec): 2.36 - samples/sec: 3132.93 - lr: 0.050000
2023-04-19 09:32:49,772 epoch 21 - iter 12/20 - loss 0.42074276 - time (sec): 2.90 - samples/sec: 3099.99 - lr: 0.050000
2023-04-19 09:32:50,283 epoch 21 - iter 14/20 - loss 0.41401239 - time (sec): 3.41 - samples/sec: 3071.68 - lr: 0.050000
2023-04-19 09:32:50,831 epoch 21

100%|██████████| 3/3 [00:01<00:00,  2.24it/s]

2023-04-19 09:32:53,123 Evaluating as a multi-label problem: False
2023-04-19 09:32:53,159 DEV : loss 0.4767385423183441 - f1-score (micro avg)  0.5029
2023-04-19 09:32:53,180 BAD EPOCHS (no improvement): 2
2023-04-19 09:32:53,189 ----------------------------------------------------------------------------------------------------





2023-04-19 09:32:53,708 epoch 22 - iter 2/20 - loss 0.36793860 - time (sec): 0.52 - samples/sec: 2884.82 - lr: 0.050000
2023-04-19 09:32:54,112 epoch 22 - iter 4/20 - loss 0.42393059 - time (sec): 0.92 - samples/sec: 2881.67 - lr: 0.050000
2023-04-19 09:32:54,635 epoch 22 - iter 6/20 - loss 0.43727550 - time (sec): 1.44 - samples/sec: 2893.21 - lr: 0.050000
2023-04-19 09:32:55,119 epoch 22 - iter 8/20 - loss 0.42216651 - time (sec): 1.93 - samples/sec: 2970.18 - lr: 0.050000
2023-04-19 09:32:55,428 epoch 22 - iter 10/20 - loss 0.42040450 - time (sec): 2.24 - samples/sec: 3186.98 - lr: 0.050000
2023-04-19 09:32:55,710 epoch 22 - iter 12/20 - loss 0.42161988 - time (sec): 2.52 - samples/sec: 3336.73 - lr: 0.050000
2023-04-19 09:32:56,033 epoch 22 - iter 14/20 - loss 0.42062096 - time (sec): 2.84 - samples/sec: 3528.06 - lr: 0.050000
2023-04-19 09:32:56,363 epoch 22 - iter 16/20 - loss 0.42685821 - time (sec): 3.17 - samples/sec: 3663.64 - lr: 0.050000
2023-04-19 09:32:56,692 epoch 22 - i

100%|██████████| 3/3 [00:00<00:00,  4.79it/s]

2023-04-19 09:32:57,582 Evaluating as a multi-label problem: False
2023-04-19 09:32:57,597 DEV : loss 0.4558273255825043 - f1-score (micro avg)  0.5061
2023-04-19 09:32:57,607 BAD EPOCHS (no improvement): 3





2023-04-19 09:32:57,616 ----------------------------------------------------------------------------------------------------
2023-04-19 09:32:58,084 epoch 23 - iter 2/20 - loss 0.37922816 - time (sec): 0.46 - samples/sec: 3122.14 - lr: 0.050000
2023-04-19 09:32:58,437 epoch 23 - iter 4/20 - loss 0.39035517 - time (sec): 0.82 - samples/sec: 3784.29 - lr: 0.050000
2023-04-19 09:32:58,771 epoch 23 - iter 6/20 - loss 0.39388498 - time (sec): 1.15 - samples/sec: 4093.19 - lr: 0.050000
2023-04-19 09:32:59,086 epoch 23 - iter 8/20 - loss 0.40702332 - time (sec): 1.46 - samples/sec: 4136.35 - lr: 0.050000
2023-04-19 09:32:59,417 epoch 23 - iter 10/20 - loss 0.42094081 - time (sec): 1.80 - samples/sec: 4171.38 - lr: 0.050000
2023-04-19 09:32:59,700 epoch 23 - iter 12/20 - loss 0.41068371 - time (sec): 2.08 - samples/sec: 4238.09 - lr: 0.050000
2023-04-19 09:32:59,971 epoch 23 - iter 14/20 - loss 0.41112623 - time (sec): 2.35 - samples/sec: 4273.76 - lr: 0.050000
2023-04-19 09:33:00,324 epoch 23

100%|██████████| 3/3 [00:00<00:00,  5.03it/s]

2023-04-19 09:33:01,533 Evaluating as a multi-label problem: False
2023-04-19 09:33:01,547 DEV : loss 0.4450351893901825 - f1-score (micro avg)  0.5133
2023-04-19 09:33:01,557 Epoch    23: reducing learning rate of group 0 to 2.5000e-02.
2023-04-19 09:33:01,561 BAD EPOCHS (no improvement): 4





2023-04-19 09:33:01,569 ----------------------------------------------------------------------------------------------------
2023-04-19 09:33:01,919 epoch 24 - iter 2/20 - loss 0.39483679 - time (sec): 0.35 - samples/sec: 4369.90 - lr: 0.025000
2023-04-19 09:33:02,255 epoch 24 - iter 4/20 - loss 0.38992237 - time (sec): 0.68 - samples/sec: 4482.15 - lr: 0.025000
2023-04-19 09:33:02,666 epoch 24 - iter 6/20 - loss 0.39590122 - time (sec): 1.09 - samples/sec: 4048.30 - lr: 0.025000
2023-04-19 09:33:03,030 epoch 24 - iter 8/20 - loss 0.41534995 - time (sec): 1.46 - samples/sec: 4053.48 - lr: 0.025000
2023-04-19 09:33:03,294 epoch 24 - iter 10/20 - loss 0.41111074 - time (sec): 1.72 - samples/sec: 4185.22 - lr: 0.025000
2023-04-19 09:33:03,571 epoch 24 - iter 12/20 - loss 0.40530280 - time (sec): 2.00 - samples/sec: 4219.21 - lr: 0.025000
2023-04-19 09:33:03,846 epoch 24 - iter 14/20 - loss 0.40657023 - time (sec): 2.27 - samples/sec: 4353.84 - lr: 0.025000
2023-04-19 09:33:04,136 epoch 24

100%|██████████| 3/3 [00:00<00:00,  3.02it/s]

2023-04-19 09:33:06,156 Evaluating as a multi-label problem: False
2023-04-19 09:33:06,178 DEV : loss 0.4331347942352295 - f1-score (micro avg)  0.5165
2023-04-19 09:33:06,197 BAD EPOCHS (no improvement): 1
2023-04-19 09:33:06,205 ----------------------------------------------------------------------------------------------------





2023-04-19 09:33:06,616 epoch 25 - iter 2/20 - loss 0.35885791 - time (sec): 0.41 - samples/sec: 3243.81 - lr: 0.025000
2023-04-19 09:33:07,120 epoch 25 - iter 4/20 - loss 0.38635084 - time (sec): 0.91 - samples/sec: 3078.89 - lr: 0.025000
2023-04-19 09:33:07,503 epoch 25 - iter 6/20 - loss 0.39949624 - time (sec): 1.29 - samples/sec: 3089.18 - lr: 0.025000
2023-04-19 09:33:07,957 epoch 25 - iter 8/20 - loss 0.39171257 - time (sec): 1.75 - samples/sec: 3130.03 - lr: 0.025000
2023-04-19 09:33:08,703 epoch 25 - iter 10/20 - loss 0.37588272 - time (sec): 2.49 - samples/sec: 2929.09 - lr: 0.025000
2023-04-19 09:33:09,154 epoch 25 - iter 12/20 - loss 0.38330449 - time (sec): 2.95 - samples/sec: 2926.26 - lr: 0.025000
2023-04-19 09:33:09,615 epoch 25 - iter 14/20 - loss 0.38493716 - time (sec): 3.41 - samples/sec: 2917.89 - lr: 0.025000
2023-04-19 09:33:10,085 epoch 25 - iter 16/20 - loss 0.39456731 - time (sec): 3.88 - samples/sec: 2993.69 - lr: 0.025000
2023-04-19 09:33:10,348 epoch 25 - i

100%|██████████| 3/3 [00:00<00:00,  4.86it/s]

2023-04-19 09:33:11,271 Evaluating as a multi-label problem: False





2023-04-19 09:33:11,301 DEV : loss 0.4221927523612976 - f1-score (micro avg)  0.5133
2023-04-19 09:33:11,311 BAD EPOCHS (no improvement): 2
2023-04-19 09:33:11,328 ----------------------------------------------------------------------------------------------------
2023-04-19 09:33:11,798 epoch 26 - iter 2/20 - loss 0.37285010 - time (sec): 0.47 - samples/sec: 3577.42 - lr: 0.025000
2023-04-19 09:33:12,167 epoch 26 - iter 4/20 - loss 0.37019951 - time (sec): 0.83 - samples/sec: 3825.05 - lr: 0.025000
2023-04-19 09:33:12,517 epoch 26 - iter 6/20 - loss 0.37932879 - time (sec): 1.18 - samples/sec: 3870.71 - lr: 0.025000
2023-04-19 09:33:12,768 epoch 26 - iter 8/20 - loss 0.38701715 - time (sec): 1.44 - samples/sec: 4106.49 - lr: 0.025000
2023-04-19 09:33:13,107 epoch 26 - iter 10/20 - loss 0.38483152 - time (sec): 1.77 - samples/sec: 4121.88 - lr: 0.025000
2023-04-19 09:33:13,431 epoch 26 - iter 12/20 - loss 0.38975462 - time (sec): 2.10 - samples/sec: 4168.04 - lr: 0.025000
2023-04-19 09

100%|██████████| 3/3 [00:00<00:00,  4.82it/s]

2023-04-19 09:33:15,356 Evaluating as a multi-label problem: False





2023-04-19 09:33:15,371 DEV : loss 0.4156075716018677 - f1-score (micro avg)  0.5103
2023-04-19 09:33:15,383 BAD EPOCHS (no improvement): 3
2023-04-19 09:33:15,388 ----------------------------------------------------------------------------------------------------
2023-04-19 09:33:15,739 epoch 27 - iter 2/20 - loss 0.36795918 - time (sec): 0.35 - samples/sec: 4371.86 - lr: 0.025000
2023-04-19 09:33:16,019 epoch 27 - iter 4/20 - loss 0.39495923 - time (sec): 0.63 - samples/sec: 4633.32 - lr: 0.025000
2023-04-19 09:33:16,408 epoch 27 - iter 6/20 - loss 0.37439782 - time (sec): 1.01 - samples/sec: 4403.47 - lr: 0.025000
2023-04-19 09:33:16,714 epoch 27 - iter 8/20 - loss 0.39307809 - time (sec): 1.32 - samples/sec: 4429.02 - lr: 0.025000
2023-04-19 09:33:17,091 epoch 27 - iter 10/20 - loss 0.37958794 - time (sec): 1.70 - samples/sec: 4384.44 - lr: 0.025000
2023-04-19 09:33:17,496 epoch 27 - iter 12/20 - loss 0.38007616 - time (sec): 2.10 - samples/sec: 4216.87 - lr: 0.025000
2023-04-19 09

100%|██████████| 3/3 [00:00<00:00,  4.99it/s]

2023-04-19 09:33:19,325 Evaluating as a multi-label problem: False
2023-04-19 09:33:19,342 DEV : loss 0.42147836089134216 - f1-score (micro avg)  0.5179





2023-04-19 09:33:19,358 Epoch    27: reducing learning rate of group 0 to 1.2500e-02.
2023-04-19 09:33:19,364 BAD EPOCHS (no improvement): 4
2023-04-19 09:33:19,369 ----------------------------------------------------------------------------------------------------
2023-04-19 09:33:19,651 epoch 28 - iter 2/20 - loss 0.34395159 - time (sec): 0.28 - samples/sec: 4621.88 - lr: 0.012500
2023-04-19 09:33:19,992 epoch 28 - iter 4/20 - loss 0.35050725 - time (sec): 0.62 - samples/sec: 4237.82 - lr: 0.012500
2023-04-19 09:33:20,435 epoch 28 - iter 6/20 - loss 0.37143149 - time (sec): 1.06 - samples/sec: 3806.28 - lr: 0.012500
2023-04-19 09:33:20,796 epoch 28 - iter 8/20 - loss 0.38161539 - time (sec): 1.42 - samples/sec: 3771.44 - lr: 0.012500
2023-04-19 09:33:21,262 epoch 28 - iter 10/20 - loss 0.37414924 - time (sec): 1.89 - samples/sec: 3671.73 - lr: 0.012500
2023-04-19 09:33:21,643 epoch 28 - iter 12/20 - loss 0.37588243 - time (sec): 2.27 - samples/sec: 3639.24 - lr: 0.012500
2023-04-19 0

100%|██████████| 3/3 [00:00<00:00,  3.08it/s]

2023-04-19 09:33:24,868 Evaluating as a multi-label problem: False
2023-04-19 09:33:24,891 DEV : loss 0.4152737259864807 - f1-score (micro avg)  0.5074
2023-04-19 09:33:24,906 BAD EPOCHS (no improvement): 1
2023-04-19 09:33:24,916 ----------------------------------------------------------------------------------------------------





2023-04-19 09:33:25,459 epoch 29 - iter 2/20 - loss 0.36926878 - time (sec): 0.54 - samples/sec: 3116.03 - lr: 0.012500
2023-04-19 09:33:25,741 epoch 29 - iter 4/20 - loss 0.36113594 - time (sec): 0.82 - samples/sec: 3797.83 - lr: 0.012500
2023-04-19 09:33:26,024 epoch 29 - iter 6/20 - loss 0.37404627 - time (sec): 1.11 - samples/sec: 4003.61 - lr: 0.012500
2023-04-19 09:33:26,327 epoch 29 - iter 8/20 - loss 0.37099417 - time (sec): 1.41 - samples/sec: 4073.08 - lr: 0.012500
2023-04-19 09:33:26,615 epoch 29 - iter 10/20 - loss 0.37197733 - time (sec): 1.70 - samples/sec: 4200.27 - lr: 0.012500
2023-04-19 09:33:26,967 epoch 29 - iter 12/20 - loss 0.37337099 - time (sec): 2.05 - samples/sec: 4194.67 - lr: 0.012500
2023-04-19 09:33:27,257 epoch 29 - iter 14/20 - loss 0.37536085 - time (sec): 2.34 - samples/sec: 4262.77 - lr: 0.012500
2023-04-19 09:33:27,578 epoch 29 - iter 16/20 - loss 0.37856912 - time (sec): 2.66 - samples/sec: 4245.40 - lr: 0.012500
2023-04-19 09:33:27,919 epoch 29 - i

100%|██████████| 3/3 [00:00<00:00,  4.95it/s]

2023-04-19 09:33:28,866 Evaluating as a multi-label problem: False
2023-04-19 09:33:28,882 DEV : loss 0.4059978127479553 - f1-score (micro avg)  0.5044





2023-04-19 09:33:28,895 BAD EPOCHS (no improvement): 2
2023-04-19 09:33:28,900 ----------------------------------------------------------------------------------------------------
2023-04-19 09:33:29,330 epoch 30 - iter 2/20 - loss 0.30916860 - time (sec): 0.42 - samples/sec: 3509.32 - lr: 0.012500
2023-04-19 09:33:29,643 epoch 30 - iter 4/20 - loss 0.38277916 - time (sec): 0.74 - samples/sec: 3776.95 - lr: 0.012500
2023-04-19 09:33:30,061 epoch 30 - iter 6/20 - loss 0.40186959 - time (sec): 1.16 - samples/sec: 3889.74 - lr: 0.012500
2023-04-19 09:33:30,328 epoch 30 - iter 8/20 - loss 0.39495076 - time (sec): 1.42 - samples/sec: 4007.37 - lr: 0.012500
2023-04-19 09:33:30,725 epoch 30 - iter 10/20 - loss 0.38797170 - time (sec): 1.82 - samples/sec: 4046.82 - lr: 0.012500
2023-04-19 09:33:31,043 epoch 30 - iter 12/20 - loss 0.38151279 - time (sec): 2.14 - samples/sec: 4217.34 - lr: 0.012500
2023-04-19 09:33:31,329 epoch 30 - iter 14/20 - loss 0.38817314 - time (sec): 2.42 - samples/sec: 

100%|██████████| 3/3 [00:00<00:00,  4.85it/s]

2023-04-19 09:33:32,879 Evaluating as a multi-label problem: False
2023-04-19 09:33:32,892 DEV : loss 0.40301403403282166 - f1-score (micro avg)  0.5101





2023-04-19 09:33:32,903 BAD EPOCHS (no improvement): 3
2023-04-19 09:33:32,909 ----------------------------------------------------------------------------------------------------
2023-04-19 09:33:33,208 epoch 31 - iter 2/20 - loss 0.34839627 - time (sec): 0.29 - samples/sec: 4259.94 - lr: 0.012500
2023-04-19 09:33:33,632 epoch 31 - iter 4/20 - loss 0.40815440 - time (sec): 0.72 - samples/sec: 4033.74 - lr: 0.012500
2023-04-19 09:33:33,969 epoch 31 - iter 6/20 - loss 0.37697049 - time (sec): 1.06 - samples/sec: 4129.59 - lr: 0.012500
2023-04-19 09:33:34,284 epoch 31 - iter 8/20 - loss 0.37162832 - time (sec): 1.37 - samples/sec: 4165.85 - lr: 0.012500
2023-04-19 09:33:34,613 epoch 31 - iter 10/20 - loss 0.37027983 - time (sec): 1.70 - samples/sec: 4171.38 - lr: 0.012500
2023-04-19 09:33:34,870 epoch 31 - iter 12/20 - loss 0.37959754 - time (sec): 1.96 - samples/sec: 4240.98 - lr: 0.012500
2023-04-19 09:33:35,342 epoch 31 - iter 14/20 - loss 0.36975256 - time (sec): 2.43 - samples/sec: 

100%|██████████| 3/3 [00:00<00:00,  3.06it/s]

2023-04-19 09:33:37,659 Evaluating as a multi-label problem: False
2023-04-19 09:33:37,680 DEV : loss 0.41470974683761597 - f1-score (micro avg)  0.5148
2023-04-19 09:33:37,700 Epoch    31: reducing learning rate of group 0 to 6.2500e-03.
2023-04-19 09:33:37,705 BAD EPOCHS (no improvement): 4
2023-04-19 09:33:37,712 ----------------------------------------------------------------------------------------------------





2023-04-19 09:33:38,167 epoch 32 - iter 2/20 - loss 0.37351335 - time (sec): 0.45 - samples/sec: 3018.26 - lr: 0.006250
2023-04-19 09:33:38,743 epoch 32 - iter 4/20 - loss 0.41449264 - time (sec): 1.03 - samples/sec: 2753.05 - lr: 0.006250
2023-04-19 09:33:39,170 epoch 32 - iter 6/20 - loss 0.39011191 - time (sec): 1.46 - samples/sec: 2907.89 - lr: 0.006250
2023-04-19 09:33:39,738 epoch 32 - iter 8/20 - loss 0.36875818 - time (sec): 2.02 - samples/sec: 2809.08 - lr: 0.006250
2023-04-19 09:33:40,405 epoch 32 - iter 10/20 - loss 0.35493189 - time (sec): 2.69 - samples/sec: 2675.08 - lr: 0.006250
2023-04-19 09:33:40,825 epoch 32 - iter 12/20 - loss 0.35919066 - time (sec): 3.11 - samples/sec: 2764.17 - lr: 0.006250
2023-04-19 09:33:41,107 epoch 32 - iter 14/20 - loss 0.36720360 - time (sec): 3.39 - samples/sec: 2940.78 - lr: 0.006250
2023-04-19 09:33:41,426 epoch 32 - iter 16/20 - loss 0.36782941 - time (sec): 3.71 - samples/sec: 3106.10 - lr: 0.006250
2023-04-19 09:33:41,758 epoch 32 - i

100%|██████████| 3/3 [00:00<00:00,  4.82it/s]

2023-04-19 09:33:42,725 Evaluating as a multi-label problem: False
2023-04-19 09:33:42,746 DEV : loss 0.4084714353084564 - f1-score (micro avg)  0.5088





2023-04-19 09:33:42,759 BAD EPOCHS (no improvement): 1
2023-04-19 09:33:42,765 ----------------------------------------------------------------------------------------------------
2023-04-19 09:33:43,155 epoch 33 - iter 2/20 - loss 0.36401862 - time (sec): 0.38 - samples/sec: 4487.40 - lr: 0.006250
2023-04-19 09:33:43,472 epoch 33 - iter 4/20 - loss 0.38146819 - time (sec): 0.70 - samples/sec: 4303.94 - lr: 0.006250
2023-04-19 09:33:43,808 epoch 33 - iter 6/20 - loss 0.37804442 - time (sec): 1.04 - samples/sec: 4329.06 - lr: 0.006250
2023-04-19 09:33:44,162 epoch 33 - iter 8/20 - loss 0.38655737 - time (sec): 1.39 - samples/sec: 4238.83 - lr: 0.006250
2023-04-19 09:33:44,438 epoch 33 - iter 10/20 - loss 0.37412458 - time (sec): 1.67 - samples/sec: 4351.86 - lr: 0.006250
2023-04-19 09:33:44,753 epoch 33 - iter 12/20 - loss 0.36795589 - time (sec): 1.98 - samples/sec: 4368.34 - lr: 0.006250
2023-04-19 09:33:45,099 epoch 33 - iter 14/20 - loss 0.36380937 - time (sec): 2.33 - samples/sec: 

100%|██████████| 3/3 [00:00<00:00,  4.96it/s]

2023-04-19 09:33:46,760 Evaluating as a multi-label problem: False
2023-04-19 09:33:46,775 DEV : loss 0.406707227230072 - f1-score (micro avg)  0.5176
2023-04-19 09:33:46,786 BAD EPOCHS (no improvement): 2





2023-04-19 09:33:46,802 ----------------------------------------------------------------------------------------------------
2023-04-19 09:33:47,109 epoch 34 - iter 2/20 - loss 0.33880471 - time (sec): 0.30 - samples/sec: 4859.96 - lr: 0.006250
2023-04-19 09:33:47,420 epoch 34 - iter 4/20 - loss 0.36973554 - time (sec): 0.61 - samples/sec: 4655.93 - lr: 0.006250
2023-04-19 09:33:47,750 epoch 34 - iter 6/20 - loss 0.36537939 - time (sec): 0.94 - samples/sec: 4612.22 - lr: 0.006250
2023-04-19 09:33:48,072 epoch 34 - iter 8/20 - loss 0.36721331 - time (sec): 1.27 - samples/sec: 4554.80 - lr: 0.006250
2023-04-19 09:33:48,388 epoch 34 - iter 10/20 - loss 0.37911764 - time (sec): 1.58 - samples/sec: 4426.67 - lr: 0.006250
2023-04-19 09:33:48,829 epoch 34 - iter 12/20 - loss 0.36237873 - time (sec): 2.02 - samples/sec: 4346.16 - lr: 0.006250
2023-04-19 09:33:49,137 epoch 34 - iter 14/20 - loss 0.36146785 - time (sec): 2.33 - samples/sec: 4369.00 - lr: 0.006250
2023-04-19 09:33:49,462 epoch 34

100%|██████████| 3/3 [00:00<00:00,  4.40it/s]

2023-04-19 09:33:50,824 Evaluating as a multi-label problem: False
2023-04-19 09:33:50,846 DEV : loss 0.4112165570259094 - f1-score (micro avg)  0.5161
2023-04-19 09:33:50,866 BAD EPOCHS (no improvement): 3
2023-04-19 09:33:50,874 ----------------------------------------------------------------------------------------------------





2023-04-19 09:33:51,368 epoch 35 - iter 2/20 - loss 0.33639773 - time (sec): 0.49 - samples/sec: 2943.89 - lr: 0.006250
2023-04-19 09:33:51,780 epoch 35 - iter 4/20 - loss 0.34627016 - time (sec): 0.90 - samples/sec: 3151.34 - lr: 0.006250
2023-04-19 09:33:52,078 epoch 35 - iter 6/20 - loss 0.35230378 - time (sec): 1.20 - samples/sec: 3417.02 - lr: 0.006250
2023-04-19 09:33:52,705 epoch 35 - iter 8/20 - loss 0.34341660 - time (sec): 1.83 - samples/sec: 3264.12 - lr: 0.006250
2023-04-19 09:33:53,072 epoch 35 - iter 10/20 - loss 0.34691348 - time (sec): 2.19 - samples/sec: 3332.51 - lr: 0.006250
2023-04-19 09:33:53,515 epoch 35 - iter 12/20 - loss 0.35553121 - time (sec): 2.64 - samples/sec: 3307.41 - lr: 0.006250
2023-04-19 09:33:53,970 epoch 35 - iter 14/20 - loss 0.35414093 - time (sec): 3.09 - samples/sec: 3271.32 - lr: 0.006250
2023-04-19 09:33:54,414 epoch 35 - iter 16/20 - loss 0.35149348 - time (sec): 3.54 - samples/sec: 3226.51 - lr: 0.006250
2023-04-19 09:33:54,952 epoch 35 - i

100%|██████████| 3/3 [00:00<00:00,  3.29it/s]

2023-04-19 09:33:56,258 Evaluating as a multi-label problem: False





2023-04-19 09:33:56,277 DEV : loss 0.4082743227481842 - f1-score (micro avg)  0.5174
2023-04-19 09:33:56,288 Epoch    35: reducing learning rate of group 0 to 3.1250e-03.
2023-04-19 09:33:56,289 BAD EPOCHS (no improvement): 4
2023-04-19 09:33:56,297 ----------------------------------------------------------------------------------------------------
2023-04-19 09:33:56,649 epoch 36 - iter 2/20 - loss 0.32065441 - time (sec): 0.35 - samples/sec: 3873.41 - lr: 0.003125
2023-04-19 09:33:56,967 epoch 36 - iter 4/20 - loss 0.33604261 - time (sec): 0.66 - samples/sec: 4157.64 - lr: 0.003125
2023-04-19 09:33:57,333 epoch 36 - iter 6/20 - loss 0.35202672 - time (sec): 1.03 - samples/sec: 4084.33 - lr: 0.003125
2023-04-19 09:33:57,706 epoch 36 - iter 8/20 - loss 0.33963678 - time (sec): 1.40 - samples/sec: 4240.43 - lr: 0.003125
2023-04-19 09:33:58,037 epoch 36 - iter 10/20 - loss 0.35250546 - time (sec): 1.73 - samples/sec: 4219.19 - lr: 0.003125
2023-04-19 09:33:58,357 epoch 36 - iter 12/20 - 

100%|██████████| 3/3 [00:00<00:00,  4.98it/s]

2023-04-19 09:34:00,283 Evaluating as a multi-label problem: False
2023-04-19 09:34:00,299 DEV : loss 0.4052373766899109 - f1-score (micro avg)  0.5174





2023-04-19 09:34:00,313 BAD EPOCHS (no improvement): 1
2023-04-19 09:34:00,318 ----------------------------------------------------------------------------------------------------
2023-04-19 09:34:00,703 epoch 37 - iter 2/20 - loss 0.34589088 - time (sec): 0.38 - samples/sec: 4056.54 - lr: 0.003125
2023-04-19 09:34:00,998 epoch 37 - iter 4/20 - loss 0.34660888 - time (sec): 0.67 - samples/sec: 4329.67 - lr: 0.003125
2023-04-19 09:34:01,345 epoch 37 - iter 6/20 - loss 0.34765712 - time (sec): 1.02 - samples/sec: 4420.01 - lr: 0.003125
2023-04-19 09:34:01,613 epoch 37 - iter 8/20 - loss 0.36000429 - time (sec): 1.29 - samples/sec: 4595.68 - lr: 0.003125
2023-04-19 09:34:01,873 epoch 37 - iter 10/20 - loss 0.36117631 - time (sec): 1.55 - samples/sec: 4657.42 - lr: 0.003125
2023-04-19 09:34:02,326 epoch 37 - iter 12/20 - loss 0.35420413 - time (sec): 2.00 - samples/sec: 4299.47 - lr: 0.003125
2023-04-19 09:34:02,714 epoch 37 - iter 14/20 - loss 0.36252803 - time (sec): 2.39 - samples/sec: 

100%|██████████| 3/3 [00:00<00:00,  4.92it/s]

2023-04-19 09:34:04,351 Evaluating as a multi-label problem: False





2023-04-19 09:34:04,369 DEV : loss 0.4041535556316376 - f1-score (micro avg)  0.5174
2023-04-19 09:34:04,381 BAD EPOCHS (no improvement): 2
2023-04-19 09:34:04,386 ----------------------------------------------------------------------------------------------------
2023-04-19 09:34:04,677 epoch 38 - iter 2/20 - loss 0.39117824 - time (sec): 0.29 - samples/sec: 4581.60 - lr: 0.003125
2023-04-19 09:34:05,026 epoch 38 - iter 4/20 - loss 0.38366734 - time (sec): 0.64 - samples/sec: 4398.06 - lr: 0.003125
2023-04-19 09:34:05,451 epoch 38 - iter 6/20 - loss 0.39410212 - time (sec): 1.06 - samples/sec: 4091.81 - lr: 0.003125
2023-04-19 09:34:05,820 epoch 38 - iter 8/20 - loss 0.38397313 - time (sec): 1.43 - samples/sec: 4015.60 - lr: 0.003125
2023-04-19 09:34:06,126 epoch 38 - iter 10/20 - loss 0.36641249 - time (sec): 1.74 - samples/sec: 4070.29 - lr: 0.003125
2023-04-19 09:34:06,547 epoch 38 - iter 12/20 - loss 0.36475076 - time (sec): 2.16 - samples/sec: 3971.22 - lr: 0.003125
2023-04-19 09

100%|██████████| 3/3 [00:00<00:00,  3.06it/s]

2023-04-19 09:34:09,297 Evaluating as a multi-label problem: False
2023-04-19 09:34:09,322 DEV : loss 0.4037749469280243 - f1-score (micro avg)  0.5205
2023-04-19 09:34:09,343 BAD EPOCHS (no improvement): 3
2023-04-19 09:34:09,352 ----------------------------------------------------------------------------------------------------





2023-04-19 09:34:10,135 epoch 39 - iter 2/20 - loss 0.32000198 - time (sec): 0.78 - samples/sec: 2179.21 - lr: 0.003125
2023-04-19 09:34:10,488 epoch 39 - iter 4/20 - loss 0.37500485 - time (sec): 1.13 - samples/sec: 2439.67 - lr: 0.003125
2023-04-19 09:34:10,971 epoch 39 - iter 6/20 - loss 0.36818181 - time (sec): 1.62 - samples/sec: 2593.77 - lr: 0.003125
2023-04-19 09:34:11,432 epoch 39 - iter 8/20 - loss 0.35281376 - time (sec): 2.08 - samples/sec: 2671.05 - lr: 0.003125
2023-04-19 09:34:11,765 epoch 39 - iter 10/20 - loss 0.35502606 - time (sec): 2.41 - samples/sec: 2897.74 - lr: 0.003125
2023-04-19 09:34:12,096 epoch 39 - iter 12/20 - loss 0.35098173 - time (sec): 2.74 - samples/sec: 3079.30 - lr: 0.003125
2023-04-19 09:34:12,350 epoch 39 - iter 14/20 - loss 0.35311585 - time (sec): 3.00 - samples/sec: 3237.04 - lr: 0.003125
2023-04-19 09:34:12,694 epoch 39 - iter 16/20 - loss 0.35186227 - time (sec): 3.34 - samples/sec: 3361.23 - lr: 0.003125
2023-04-19 09:34:13,038 epoch 39 - i

100%|██████████| 3/3 [00:00<00:00,  4.80it/s]

2023-04-19 09:34:14,051 Evaluating as a multi-label problem: False
2023-04-19 09:34:14,066 DEV : loss 0.40174388885498047 - f1-score (micro avg)  0.5174





2023-04-19 09:34:14,080 Epoch    39: reducing learning rate of group 0 to 1.5625e-03.
2023-04-19 09:34:14,084 BAD EPOCHS (no improvement): 4
2023-04-19 09:34:14,092 ----------------------------------------------------------------------------------------------------
2023-04-19 09:34:14,463 epoch 40 - iter 2/20 - loss 0.31092146 - time (sec): 0.37 - samples/sec: 4221.88 - lr: 0.001563
2023-04-19 09:34:14,772 epoch 40 - iter 4/20 - loss 0.34715446 - time (sec): 0.68 - samples/sec: 4420.58 - lr: 0.001563
2023-04-19 09:34:15,052 epoch 40 - iter 6/20 - loss 0.34086703 - time (sec): 0.96 - samples/sec: 4659.28 - lr: 0.001563
2023-04-19 09:34:15,350 epoch 40 - iter 8/20 - loss 0.35360283 - time (sec): 1.26 - samples/sec: 4584.37 - lr: 0.001563
2023-04-19 09:34:15,604 epoch 40 - iter 10/20 - loss 0.34569047 - time (sec): 1.51 - samples/sec: 4662.13 - lr: 0.001563
2023-04-19 09:34:15,901 epoch 40 - iter 12/20 - loss 0.33803625 - time (sec): 1.81 - samples/sec: 4753.74 - lr: 0.001563
2023-04-19 0

100%|██████████| 3/3 [00:00<00:00,  4.94it/s]

2023-04-19 09:34:17,989 Evaluating as a multi-label problem: False
2023-04-19 09:34:18,010 DEV : loss 0.40358057618141174 - f1-score (micro avg)  0.519





2023-04-19 09:34:18,025 BAD EPOCHS (no improvement): 1
2023-04-19 09:34:18,029 ----------------------------------------------------------------------------------------------------
2023-04-19 09:34:18,466 epoch 41 - iter 2/20 - loss 0.35168457 - time (sec): 0.44 - samples/sec: 3867.12 - lr: 0.001563
2023-04-19 09:34:18,749 epoch 41 - iter 4/20 - loss 0.34354527 - time (sec): 0.72 - samples/sec: 4090.40 - lr: 0.001563
2023-04-19 09:34:19,011 epoch 41 - iter 6/20 - loss 0.34472191 - time (sec): 0.98 - samples/sec: 4287.28 - lr: 0.001563
2023-04-19 09:34:19,366 epoch 41 - iter 8/20 - loss 0.35792591 - time (sec): 1.34 - samples/sec: 4225.83 - lr: 0.001563
2023-04-19 09:34:19,716 epoch 41 - iter 10/20 - loss 0.35994257 - time (sec): 1.69 - samples/sec: 4249.37 - lr: 0.001563
2023-04-19 09:34:20,052 epoch 41 - iter 12/20 - loss 0.35741220 - time (sec): 2.02 - samples/sec: 4229.37 - lr: 0.001563
2023-04-19 09:34:20,343 epoch 41 - iter 14/20 - loss 0.36248472 - time (sec): 2.31 - samples/sec: 

100%|██████████| 3/3 [00:00<00:00,  3.47it/s]

2023-04-19 09:34:22,235 Evaluating as a multi-label problem: False
2023-04-19 09:34:22,259 DEV : loss 0.4042622745037079 - f1-score (micro avg)  0.5205
2023-04-19 09:34:22,278 BAD EPOCHS (no improvement): 2
2023-04-19 09:34:22,286 ----------------------------------------------------------------------------------------------------





2023-04-19 09:34:22,631 epoch 42 - iter 2/20 - loss 0.37282552 - time (sec): 0.34 - samples/sec: 3536.27 - lr: 0.001563
2023-04-19 09:34:23,027 epoch 42 - iter 4/20 - loss 0.38065866 - time (sec): 0.74 - samples/sec: 3498.35 - lr: 0.001563
2023-04-19 09:34:23,522 epoch 42 - iter 6/20 - loss 0.35450017 - time (sec): 1.23 - samples/sec: 3418.40 - lr: 0.001563
2023-04-19 09:34:23,963 epoch 42 - iter 8/20 - loss 0.36549036 - time (sec): 1.67 - samples/sec: 3262.91 - lr: 0.001563
2023-04-19 09:34:24,447 epoch 42 - iter 10/20 - loss 0.37573639 - time (sec): 2.16 - samples/sec: 3196.35 - lr: 0.001563
2023-04-19 09:34:24,961 epoch 42 - iter 12/20 - loss 0.36618477 - time (sec): 2.67 - samples/sec: 3145.68 - lr: 0.001563
2023-04-19 09:34:25,595 epoch 42 - iter 14/20 - loss 0.34990594 - time (sec): 3.31 - samples/sec: 3032.00 - lr: 0.001563
2023-04-19 09:34:25,988 epoch 42 - iter 16/20 - loss 0.35638705 - time (sec): 3.70 - samples/sec: 3090.42 - lr: 0.001563
2023-04-19 09:34:26,418 epoch 42 - i

100%|██████████| 3/3 [00:00<00:00,  4.82it/s]

2023-04-19 09:34:27,452 Evaluating as a multi-label problem: False
2023-04-19 09:34:27,467 DEV : loss 0.4049437940120697 - f1-score (micro avg)  0.5161
2023-04-19 09:34:27,478 BAD EPOCHS (no improvement): 3





2023-04-19 09:34:27,492 ----------------------------------------------------------------------------------------------------
2023-04-19 09:34:27,832 epoch 43 - iter 2/20 - loss 0.44264915 - time (sec): 0.34 - samples/sec: 3930.95 - lr: 0.001563
2023-04-19 09:34:28,101 epoch 43 - iter 4/20 - loss 0.37898666 - time (sec): 0.61 - samples/sec: 4401.85 - lr: 0.001563
2023-04-19 09:34:28,505 epoch 43 - iter 6/20 - loss 0.36827743 - time (sec): 1.01 - samples/sec: 4242.55 - lr: 0.001563
2023-04-19 09:34:28,881 epoch 43 - iter 8/20 - loss 0.35662583 - time (sec): 1.39 - samples/sec: 4334.78 - lr: 0.001563
2023-04-19 09:34:29,280 epoch 43 - iter 10/20 - loss 0.35461385 - time (sec): 1.79 - samples/sec: 4060.26 - lr: 0.001563
2023-04-19 09:34:29,584 epoch 43 - iter 12/20 - loss 0.35463767 - time (sec): 2.09 - samples/sec: 4070.79 - lr: 0.001563
2023-04-19 09:34:29,959 epoch 43 - iter 14/20 - loss 0.36200217 - time (sec): 2.47 - samples/sec: 4057.94 - lr: 0.001563
2023-04-19 09:34:30,238 epoch 43

100%|██████████| 3/3 [00:00<00:00,  4.83it/s]

2023-04-19 09:34:31,429 Evaluating as a multi-label problem: False
2023-04-19 09:34:31,443 DEV : loss 0.4048040211200714 - f1-score (micro avg)  0.519
2023-04-19 09:34:31,454 Epoch    43: reducing learning rate of group 0 to 7.8125e-04.
2023-04-19 09:34:31,456 BAD EPOCHS (no improvement): 4





2023-04-19 09:34:31,460 ----------------------------------------------------------------------------------------------------
2023-04-19 09:34:31,776 epoch 44 - iter 2/20 - loss 0.34372059 - time (sec): 0.31 - samples/sec: 4299.95 - lr: 0.000781
2023-04-19 09:34:32,062 epoch 44 - iter 4/20 - loss 0.34369911 - time (sec): 0.60 - samples/sec: 4581.32 - lr: 0.000781
2023-04-19 09:34:32,382 epoch 44 - iter 6/20 - loss 0.35075627 - time (sec): 0.92 - samples/sec: 4444.76 - lr: 0.000781
2023-04-19 09:34:32,667 epoch 44 - iter 8/20 - loss 0.34612971 - time (sec): 1.20 - samples/sec: 4500.11 - lr: 0.000781
2023-04-19 09:34:33,111 epoch 44 - iter 10/20 - loss 0.34859679 - time (sec): 1.65 - samples/sec: 4169.51 - lr: 0.000781
2023-04-19 09:34:33,415 epoch 44 - iter 12/20 - loss 0.35715351 - time (sec): 1.95 - samples/sec: 4304.98 - lr: 0.000781
2023-04-19 09:34:33,837 epoch 44 - iter 14/20 - loss 0.35456991 - time (sec): 2.37 - samples/sec: 4212.61 - lr: 0.000781
2023-04-19 09:34:34,089 epoch 44

100%|██████████| 3/3 [00:00<00:00,  5.06it/s]

2023-04-19 09:34:35,326 Evaluating as a multi-label problem: False
2023-04-19 09:34:35,341 DEV : loss 0.40361472964286804 - f1-score (micro avg)  0.519
2023-04-19 09:34:35,352 BAD EPOCHS (no improvement): 1
2023-04-19 09:34:35,357 ----------------------------------------------------------------------------------------------------





2023-04-19 09:34:35,711 epoch 45 - iter 2/20 - loss 0.34750778 - time (sec): 0.35 - samples/sec: 4006.30 - lr: 0.000781
2023-04-19 09:34:35,988 epoch 45 - iter 4/20 - loss 0.34939519 - time (sec): 0.63 - samples/sec: 4434.97 - lr: 0.000781
2023-04-19 09:34:36,344 epoch 45 - iter 6/20 - loss 0.35231149 - time (sec): 0.98 - samples/sec: 4333.08 - lr: 0.000781
2023-04-19 09:34:36,852 epoch 45 - iter 8/20 - loss 0.35581321 - time (sec): 1.49 - samples/sec: 3719.96 - lr: 0.000781
2023-04-19 09:34:37,313 epoch 45 - iter 10/20 - loss 0.35761367 - time (sec): 1.95 - samples/sec: 3618.48 - lr: 0.000781
2023-04-19 09:34:37,723 epoch 45 - iter 12/20 - loss 0.35073295 - time (sec): 2.36 - samples/sec: 3634.65 - lr: 0.000781
2023-04-19 09:34:38,134 epoch 45 - iter 14/20 - loss 0.35427657 - time (sec): 2.77 - samples/sec: 3580.86 - lr: 0.000781
2023-04-19 09:34:38,554 epoch 45 - iter 16/20 - loss 0.35856854 - time (sec): 3.19 - samples/sec: 3578.86 - lr: 0.000781
2023-04-19 09:34:38,973 epoch 45 - i

100%|██████████| 3/3 [00:00<00:00,  3.02it/s]

2023-04-19 09:34:40,358 Evaluating as a multi-label problem: False
2023-04-19 09:34:40,383 DEV : loss 0.4035283923149109 - f1-score (micro avg)  0.519
2023-04-19 09:34:40,401 BAD EPOCHS (no improvement): 2
2023-04-19 09:34:40,409 ----------------------------------------------------------------------------------------------------





2023-04-19 09:34:40,933 epoch 46 - iter 2/20 - loss 0.37602879 - time (sec): 0.52 - samples/sec: 2311.29 - lr: 0.000781
2023-04-19 09:34:41,328 epoch 46 - iter 4/20 - loss 0.35955044 - time (sec): 0.92 - samples/sec: 2622.96 - lr: 0.000781
2023-04-19 09:34:41,840 epoch 46 - iter 6/20 - loss 0.35038632 - time (sec): 1.43 - samples/sec: 2751.37 - lr: 0.000781
2023-04-19 09:34:42,454 epoch 46 - iter 8/20 - loss 0.34456265 - time (sec): 2.04 - samples/sec: 2664.57 - lr: 0.000781
2023-04-19 09:34:42,786 epoch 46 - iter 10/20 - loss 0.35957042 - time (sec): 2.37 - samples/sec: 2897.33 - lr: 0.000781
2023-04-19 09:34:43,117 epoch 46 - iter 12/20 - loss 0.35769316 - time (sec): 2.71 - samples/sec: 3117.65 - lr: 0.000781
2023-04-19 09:34:43,502 epoch 46 - iter 14/20 - loss 0.35947331 - time (sec): 3.09 - samples/sec: 3279.43 - lr: 0.000781
2023-04-19 09:34:43,826 epoch 46 - iter 16/20 - loss 0.36595151 - time (sec): 3.41 - samples/sec: 3362.68 - lr: 0.000781
2023-04-19 09:34:44,227 epoch 46 - i

100%|██████████| 3/3 [00:00<00:00,  4.93it/s]

2023-04-19 09:34:45,137 Evaluating as a multi-label problem: False





2023-04-19 09:34:45,165 DEV : loss 0.4045107662677765 - f1-score (micro avg)  0.519
2023-04-19 09:34:45,179 BAD EPOCHS (no improvement): 3
2023-04-19 09:34:45,187 ----------------------------------------------------------------------------------------------------
2023-04-19 09:34:45,446 epoch 47 - iter 2/20 - loss 0.37582418 - time (sec): 0.25 - samples/sec: 5261.37 - lr: 0.000781
2023-04-19 09:34:45,783 epoch 47 - iter 4/20 - loss 0.37864606 - time (sec): 0.59 - samples/sec: 4792.42 - lr: 0.000781
2023-04-19 09:34:46,188 epoch 47 - iter 6/20 - loss 0.36432727 - time (sec): 1.00 - samples/sec: 4450.46 - lr: 0.000781
2023-04-19 09:34:46,428 epoch 47 - iter 8/20 - loss 0.36607366 - time (sec): 1.23 - samples/sec: 4556.33 - lr: 0.000781
2023-04-19 09:34:46,716 epoch 47 - iter 10/20 - loss 0.36627214 - time (sec): 1.52 - samples/sec: 4594.05 - lr: 0.000781
2023-04-19 09:34:47,025 epoch 47 - iter 12/20 - loss 0.36634093 - time (sec): 1.83 - samples/sec: 4545.87 - lr: 0.000781
2023-04-19 09:

100%|██████████| 3/3 [00:00<00:00,  5.04it/s]

2023-04-19 09:34:49,051 Evaluating as a multi-label problem: False
2023-04-19 09:34:49,065 DEV : loss 0.40423327684402466 - f1-score (micro avg)  0.519
2023-04-19 09:34:49,076 Epoch    47: reducing learning rate of group 0 to 3.9063e-04.
2023-04-19 09:34:49,079 BAD EPOCHS (no improvement): 4





2023-04-19 09:34:49,084 ----------------------------------------------------------------------------------------------------
2023-04-19 09:34:49,389 epoch 48 - iter 2/20 - loss 0.34665942 - time (sec): 0.30 - samples/sec: 4323.21 - lr: 0.000391
2023-04-19 09:34:49,841 epoch 48 - iter 4/20 - loss 0.36410718 - time (sec): 0.75 - samples/sec: 4081.38 - lr: 0.000391
2023-04-19 09:34:50,125 epoch 48 - iter 6/20 - loss 0.35327952 - time (sec): 1.04 - samples/sec: 4278.62 - lr: 0.000391
2023-04-19 09:34:50,443 epoch 48 - iter 8/20 - loss 0.35453808 - time (sec): 1.35 - samples/sec: 4375.18 - lr: 0.000391
2023-04-19 09:34:50,733 epoch 48 - iter 10/20 - loss 0.36193229 - time (sec): 1.64 - samples/sec: 4486.99 - lr: 0.000391
2023-04-19 09:34:51,039 epoch 48 - iter 12/20 - loss 0.36689242 - time (sec): 1.95 - samples/sec: 4465.65 - lr: 0.000391
2023-04-19 09:34:51,305 epoch 48 - iter 14/20 - loss 0.36265959 - time (sec): 2.22 - samples/sec: 4521.98 - lr: 0.000391
2023-04-19 09:34:51,636 epoch 48

100%|██████████| 3/3 [00:00<00:00,  3.14it/s]

2023-04-19 09:34:53,278 Evaluating as a multi-label problem: False
2023-04-19 09:34:53,306 DEV : loss 0.404233455657959 - f1-score (micro avg)  0.519
2023-04-19 09:34:53,325 BAD EPOCHS (no improvement): 1
2023-04-19 09:34:53,332 ----------------------------------------------------------------------------------------------------





2023-04-19 09:34:53,768 epoch 49 - iter 2/20 - loss 0.40964554 - time (sec): 0.43 - samples/sec: 3681.75 - lr: 0.000391
2023-04-19 09:34:54,436 epoch 49 - iter 4/20 - loss 0.36452593 - time (sec): 1.10 - samples/sec: 2991.35 - lr: 0.000391
2023-04-19 09:34:54,913 epoch 49 - iter 6/20 - loss 0.35593476 - time (sec): 1.58 - samples/sec: 3101.45 - lr: 0.000391
2023-04-19 09:34:55,365 epoch 49 - iter 8/20 - loss 0.35511064 - time (sec): 2.03 - samples/sec: 3118.75 - lr: 0.000391
2023-04-19 09:34:55,827 epoch 49 - iter 10/20 - loss 0.36192060 - time (sec): 2.49 - samples/sec: 3106.50 - lr: 0.000391
2023-04-19 09:34:56,191 epoch 49 - iter 12/20 - loss 0.35993587 - time (sec): 2.86 - samples/sec: 3154.69 - lr: 0.000391
2023-04-19 09:34:56,717 epoch 49 - iter 14/20 - loss 0.35271594 - time (sec): 3.38 - samples/sec: 3100.83 - lr: 0.000391
2023-04-19 09:34:57,141 epoch 49 - iter 16/20 - loss 0.35018635 - time (sec): 3.80 - samples/sec: 3086.51 - lr: 0.000391
2023-04-19 09:34:57,553 epoch 49 - i

100%|██████████| 3/3 [00:00<00:00,  4.70it/s]

2023-04-19 09:34:58,604 Evaluating as a multi-label problem: False





2023-04-19 09:34:58,625 DEV : loss 0.4037018418312073 - f1-score (micro avg)  0.519
2023-04-19 09:34:58,636 BAD EPOCHS (no improvement): 2
2023-04-19 09:34:58,641 ----------------------------------------------------------------------------------------------------
2023-04-19 09:34:58,992 epoch 50 - iter 2/20 - loss 0.29755676 - time (sec): 0.35 - samples/sec: 4300.93 - lr: 0.000391
2023-04-19 09:34:59,304 epoch 50 - iter 4/20 - loss 0.36389844 - time (sec): 0.66 - samples/sec: 4168.58 - lr: 0.000391
2023-04-19 09:34:59,664 epoch 50 - iter 6/20 - loss 0.36966433 - time (sec): 1.02 - samples/sec: 4118.17 - lr: 0.000391
2023-04-19 09:35:00,104 epoch 50 - iter 8/20 - loss 0.35059390 - time (sec): 1.46 - samples/sec: 4058.14 - lr: 0.000391
2023-04-19 09:35:00,392 epoch 50 - iter 10/20 - loss 0.35200612 - time (sec): 1.75 - samples/sec: 4209.21 - lr: 0.000391
2023-04-19 09:35:00,728 epoch 50 - iter 12/20 - loss 0.35536234 - time (sec): 2.08 - samples/sec: 4156.94 - lr: 0.000391
2023-04-19 09:

100%|██████████| 3/3 [00:00<00:00,  4.73it/s]

2023-04-19 09:35:02,678 Evaluating as a multi-label problem: False
2023-04-19 09:35:02,694 DEV : loss 0.40357863903045654 - f1-score (micro avg)  0.519





2023-04-19 09:35:02,709 BAD EPOCHS (no improvement): 3
2023-04-19 09:35:02,714 ----------------------------------------------------------------------------------------------------
2023-04-19 09:35:03,194 epoch 51 - iter 2/20 - loss 0.35753425 - time (sec): 0.48 - samples/sec: 3680.07 - lr: 0.000391
2023-04-19 09:35:03,554 epoch 51 - iter 4/20 - loss 0.33323917 - time (sec): 0.84 - samples/sec: 3999.27 - lr: 0.000391
2023-04-19 09:35:03,913 epoch 51 - iter 6/20 - loss 0.34481057 - time (sec): 1.19 - samples/sec: 4091.52 - lr: 0.000391
2023-04-19 09:35:04,257 epoch 51 - iter 8/20 - loss 0.36119791 - time (sec): 1.54 - samples/sec: 4051.40 - lr: 0.000391
2023-04-19 09:35:04,541 epoch 51 - iter 10/20 - loss 0.35949527 - time (sec): 1.82 - samples/sec: 4137.76 - lr: 0.000391
2023-04-19 09:35:04,871 epoch 51 - iter 12/20 - loss 0.36016047 - time (sec): 2.15 - samples/sec: 4167.79 - lr: 0.000391
2023-04-19 09:35:05,132 epoch 51 - iter 14/20 - loss 0.36146433 - time (sec): 2.41 - samples/sec: 

100%|██████████| 3/3 [00:00<00:00,  4.70it/s]

2023-04-19 09:35:06,716 Evaluating as a multi-label problem: False
2023-04-19 09:35:06,733 DEV : loss 0.4033395051956177 - f1-score (micro avg)  0.519





2023-04-19 09:35:06,746 Epoch    51: reducing learning rate of group 0 to 1.9531e-04.
2023-04-19 09:35:06,749 BAD EPOCHS (no improvement): 4
2023-04-19 09:35:06,759 ----------------------------------------------------------------------------------------------------
2023-04-19 09:35:07,114 epoch 52 - iter 2/20 - loss 0.39335777 - time (sec): 0.35 - samples/sec: 4361.34 - lr: 0.000195
2023-04-19 09:35:07,467 epoch 52 - iter 4/20 - loss 0.37828685 - time (sec): 0.70 - samples/sec: 4285.63 - lr: 0.000195
2023-04-19 09:35:07,810 epoch 52 - iter 6/20 - loss 0.38183875 - time (sec): 1.05 - samples/sec: 4314.64 - lr: 0.000195
2023-04-19 09:35:08,307 epoch 52 - iter 8/20 - loss 0.36577962 - time (sec): 1.54 - samples/sec: 3929.47 - lr: 0.000195
2023-04-19 09:35:08,713 epoch 52 - iter 10/20 - loss 0.36922195 - time (sec): 1.95 - samples/sec: 3817.74 - lr: 0.000195
2023-04-19 09:35:09,129 epoch 52 - iter 12/20 - loss 0.36434407 - time (sec): 2.36 - samples/sec: 3718.31 - lr: 0.000195
2023-04-19 0

100%|██████████| 3/3 [00:00<00:00,  3.00it/s]

2023-04-19 09:35:11,961 Evaluating as a multi-label problem: False
2023-04-19 09:35:11,983 DEV : loss 0.4031262695789337 - f1-score (micro avg)  0.519
2023-04-19 09:35:12,001 BAD EPOCHS (no improvement): 1
2023-04-19 09:35:12,010 ----------------------------------------------------------------------------------------------------





2023-04-19 09:35:12,505 epoch 53 - iter 2/20 - loss 0.35024196 - time (sec): 0.49 - samples/sec: 2452.43 - lr: 0.000195
2023-04-19 09:35:12,891 epoch 53 - iter 4/20 - loss 0.35865151 - time (sec): 0.88 - samples/sec: 2645.30 - lr: 0.000195
2023-04-19 09:35:13,364 epoch 53 - iter 6/20 - loss 0.35536228 - time (sec): 1.35 - samples/sec: 2691.97 - lr: 0.000195
2023-04-19 09:35:13,747 epoch 53 - iter 8/20 - loss 0.36442101 - time (sec): 1.73 - samples/sec: 2926.38 - lr: 0.000195
2023-04-19 09:35:14,087 epoch 53 - iter 10/20 - loss 0.36916153 - time (sec): 2.07 - samples/sec: 3170.36 - lr: 0.000195
2023-04-19 09:35:14,475 epoch 53 - iter 12/20 - loss 0.36383376 - time (sec): 2.46 - samples/sec: 3301.81 - lr: 0.000195
2023-04-19 09:35:14,813 epoch 53 - iter 14/20 - loss 0.36065820 - time (sec): 2.80 - samples/sec: 3460.29 - lr: 0.000195
2023-04-19 09:35:15,117 epoch 53 - iter 16/20 - loss 0.35404840 - time (sec): 3.10 - samples/sec: 3532.80 - lr: 0.000195
2023-04-19 09:35:15,454 epoch 53 - i

100%|██████████| 3/3 [00:00<00:00,  4.79it/s]

2023-04-19 09:35:16,564 Evaluating as a multi-label problem: False





2023-04-19 09:35:16,585 DEV : loss 0.40308117866516113 - f1-score (micro avg)  0.519
2023-04-19 09:35:16,599 BAD EPOCHS (no improvement): 2
2023-04-19 09:35:16,604 ----------------------------------------------------------------------------------------------------
2023-04-19 09:35:16,936 epoch 54 - iter 2/20 - loss 0.41088749 - time (sec): 0.33 - samples/sec: 4393.92 - lr: 0.000195
2023-04-19 09:35:17,196 epoch 54 - iter 4/20 - loss 0.38739083 - time (sec): 0.59 - samples/sec: 4757.09 - lr: 0.000195
2023-04-19 09:35:17,535 epoch 54 - iter 6/20 - loss 0.36174397 - time (sec): 0.93 - samples/sec: 4670.47 - lr: 0.000195
2023-04-19 09:35:17,825 epoch 54 - iter 8/20 - loss 0.36785856 - time (sec): 1.22 - samples/sec: 4615.14 - lr: 0.000195
2023-04-19 09:35:18,120 epoch 54 - iter 10/20 - loss 0.36891780 - time (sec): 1.51 - samples/sec: 4621.81 - lr: 0.000195
2023-04-19 09:35:18,407 epoch 54 - iter 12/20 - loss 0.36534085 - time (sec): 1.80 - samples/sec: 4618.13 - lr: 0.000195
2023-04-19 09

100%|██████████| 3/3 [00:00<00:00,  4.88it/s]

2023-04-19 09:35:20,489 Evaluating as a multi-label problem: False
2023-04-19 09:35:20,506 DEV : loss 0.4030531644821167 - f1-score (micro avg)  0.519





2023-04-19 09:35:20,522 BAD EPOCHS (no improvement): 3
2023-04-19 09:35:20,527 ----------------------------------------------------------------------------------------------------
2023-04-19 09:35:20,930 epoch 55 - iter 2/20 - loss 0.35968505 - time (sec): 0.40 - samples/sec: 4284.19 - lr: 0.000195
2023-04-19 09:35:21,297 epoch 55 - iter 4/20 - loss 0.34990156 - time (sec): 0.76 - samples/sec: 4259.56 - lr: 0.000195
2023-04-19 09:35:21,611 epoch 55 - iter 6/20 - loss 0.34570425 - time (sec): 1.08 - samples/sec: 4420.19 - lr: 0.000195
2023-04-19 09:35:22,034 epoch 55 - iter 8/20 - loss 0.34531203 - time (sec): 1.50 - samples/sec: 4145.30 - lr: 0.000195
2023-04-19 09:35:22,348 epoch 55 - iter 10/20 - loss 0.35468776 - time (sec): 1.82 - samples/sec: 4200.31 - lr: 0.000195
2023-04-19 09:35:22,641 epoch 55 - iter 12/20 - loss 0.34699337 - time (sec): 2.11 - samples/sec: 4238.83 - lr: 0.000195
2023-04-19 09:35:22,892 epoch 55 - iter 14/20 - loss 0.34446856 - time (sec): 2.36 - samples/sec: 

100%|██████████| 3/3 [00:00<00:00,  3.12it/s]

2023-04-19 09:35:24,973 Evaluating as a multi-label problem: False
2023-04-19 09:35:24,993 DEV : loss 0.4028220772743225 - f1-score (micro avg)  0.519
2023-04-19 09:35:25,010 Epoch    55: reducing learning rate of group 0 to 9.7656e-05.
2023-04-19 09:35:25,012 BAD EPOCHS (no improvement): 4
2023-04-19 09:35:25,019 ----------------------------------------------------------------------------------------------------
2023-04-19 09:35:25,021 ----------------------------------------------------------------------------------------------------
2023-04-19 09:35:25,023 learning rate too small - quitting training!
2023-04-19 09:35:25,027 ----------------------------------------------------------------------------------------------------





2023-04-19 09:35:27,484 ----------------------------------------------------------------------------------------------------
2023-04-19 09:35:33,826 SequenceTagger predicts: Dictionary with 27 tags: O, S-Tax, B-Tax, E-Tax, I-Tax, S-Other, B-Other, E-Other, I-Other, S-Location, B-Location, E-Location, I-Location, S-Person, B-Person, E-Person, I-Person, S-Time, B-Time, E-Time, I-Time, S-Organization, B-Organization, E-Organization, I-Organization, <START>, <STOP>


100%|██████████| 3/3 [00:01<00:00,  1.54it/s]

2023-04-19 09:35:36,369 Evaluating as a multi-label problem: False
2023-04-19 09:35:36,400 0.6165	0.3981	0.4838	0.3203
2023-04-19 09:35:36,406 
Results:
- F-score (micro) 0.4838
- F-score (macro) 0.369
- Accuracy 0.3203

By class:
              precision    recall  f1-score   support

         Tax     0.6389    0.5750    0.6053        80
       Other     0.4091    0.2000    0.2687        45
        Time     0.8947    0.6800    0.7727        25
    Location     0.6000    0.0811    0.1429        37
      Person     0.4667    0.3889    0.4242        18
Organization     0.0000    0.0000    0.0000         1

   micro avg     0.6165    0.3981    0.4838       206
   macro avg     0.5016    0.3208    0.3690       206
weighted avg     0.5946    0.3981    0.4502       206

2023-04-19 09:35:36,411 ----------------------------------------------------------------------------------------------------





{'test_score': 0.4837758112094395,
 'dev_score_history': [0.024844720496894408,
  0.05333333333333334,
  0.04464285714285714,
  0.23021582733812948,
  0.2846153846153846,
  0.2363112391930836,
  0.3214285714285714,
  0.3103448275862069,
  0.32298136645962733,
  0.3123123123123123,
  0.3508771929824561,
  0.4582043343653251,
  0.39455782312925164,
  0.47593582887700536,
  0.456953642384106,
  0.4367088607594936,
  0.3641618497109827,
  0.44514106583072105,
  0.5232558139534884,
  0.49044585987261136,
  0.5028901734104047,
  0.5060975609756098,
  0.5132743362831858,
  0.5165165165165164,
  0.5132743362831858,
  0.5102639296187683,
  0.5178571428571429,
  0.5073746312684365,
  0.504398826979472,
  0.5101449275362319,
  0.5147928994082841,
  0.5087719298245613,
  0.5176470588235293,
  0.5161290322580645,
  0.5174418604651163,
  0.5174418604651163,
  0.5174418604651163,
  0.52046783625731,
  0.5174418604651163,
  0.5189504373177843,
  0.52046783625731,
  0.5161290322580645,
  0.518950437317

## For Basque

In [None]:
#First we check what labels the dataset has
from flair.datasets import NER_BASQUE

# Load the corpus
corpus = NER_BASQUE()

label_type='ner'

# Get the label dictionary
label_dict = corpus.make_label_dictionary(label_type=label_type)

# Get the list of labels
labels = label_dict.get_items()

# Print the list of labels
print("Labels:", labels)


2023-04-19 16:19:25,287 http://ixa2.si.ehu.eus/eiec//eiec_v1.0.tgz not found in cache, downloading to /tmp/tmp8fm8lhi4


100%|██████████| 167k/167k [00:00<00:00, 341kB/s]

2023-04-19 16:19:26,028 copying /tmp/tmp8fm8lhi4 to cache at /root/.flair/datasets/ner_basque/eiec_v1.0.tgz
2023-04-19 16:19:26,031 removing temp file /tmp/tmp8fm8lhi4
2023-04-19 16:19:26,044 Reading data from /root/.flair/datasets/ner_basque
2023-04-19 16:19:26,045 Train: /root/.flair/datasets/ner_basque/named_ent_eu.train
2023-04-19 16:19:26,046 Dev: None
2023-04-19 16:19:26,048 Test: /root/.flair/datasets/ner_basque/named_ent_eu.test





2023-04-19 16:19:27,326 Computing label dictionary. Progress:


2297it [00:00, 30642.91it/s]

2023-04-19 16:19:27,457 Dictionary created for label 'ner' with 5 values: ORG (seen 1110 times), PER (seen 1095 times), LOC (seen 1087 times), OTH (seen 135 times)
Labels: ['<unk>', 'ORG', 'PER', 'LOC', 'OTH']





In [None]:
from flair.data import Corpus
from flair.datasets import NER_BASQUE
from flair.models import TARSClassifier
from flair.trainers import ModelTrainer
from flair.embeddings import WordEmbeddings, FlairEmbeddings, StackedEmbeddings
from flair.trainers import ModelTrainer
from flair.data import Sentence
from flair.models import SequenceTagger

# 1. define label names in natural language since some datasets come with cryptic set of labels
label_name_map = {'<unk>': 'Unknown',
                  'LOC': 'location entity',
                  'PER': 'person entity',
                  'ORG': 'organization entity',
                  'OTH': 'other label',
                  }

# 2. get the corpus
corpus: Corpus = NER_BASQUE(label_name_map=label_name_map)

# 3. what label do you want to predict?
label_type = 'ner'

# 4. make a label dictionary
label_dict = corpus.make_label_dictionary(label_type=label_type)

# 5. start from their existing TARS base model for English
tars = TARSClassifier.load("tars-base")

# 5a: alternatively, comment out previous line and comment in next line to train a new TARS model from scratch instead
#tars = TARSClassifier(embeddings="bert-base-uncased")

# 6. switch to a new task (TARS can do multiple tasks so you must define one)
tars.add_and_switch_to_new_task(task_name="ner-tagging-basque",
                                label_dictionary=label_dict,
                                label_type=label_type,
                                )


# 7. initialize embedding stack with Flair and GloVe
embedding_types = [
    WordEmbeddings('glove'),
    FlairEmbeddings('news-forward'),
    FlairEmbeddings('news-backward'),
]

embeddings = StackedEmbeddings(embeddings=embedding_types)

# 8. initialize sequence tagger
tagger = SequenceTagger(hidden_size=256,
                        embeddings=embeddings,
                        tag_dictionary=label_dict,
                        tag_type=label_type,
                        use_crf=True)

2023-04-20 09:45:39,042 Reading data from /root/.flair/datasets/ner_basque
2023-04-20 09:45:39,043 Train: /root/.flair/datasets/ner_basque/named_ent_eu.train
2023-04-20 09:45:39,045 Dev: None
2023-04-20 09:45:39,046 Test: /root/.flair/datasets/ner_basque/named_ent_eu.test
2023-04-20 09:45:40,008 Computing label dictionary. Progress:


2297it [00:00, 42392.81it/s]

2023-04-20 09:45:40,068 Dictionary created for label 'ner' with 5 values: organization entity (seen 1123 times), location entity (seen 1112 times), person entity (seen 1104 times), other label (seen 141 times)





2023-04-20 09:45:42,485 TARS initialized without a task. You need to call .add_and_switch_to_new_task() before training this model
2023-04-20 09:45:47,781 SequenceTagger predicts: Dictionary with 17 tags: O, S-organization entity, B-organization entity, E-organization entity, I-organization entity, S-location entity, B-location entity, E-location entity, I-location entity, S-person entity, B-person entity, E-person entity, I-person entity, S-other label, B-other label, E-other label, I-other label


In [None]:
# 10. create a ModelTrainer and start training
trainer = ModelTrainer(tagger, corpus)

trainer.train(base_path='/content/drive/MyDrive/ColabNotebooks/nermodels/basque',
              learning_rate=0.1,
              mini_batch_size=32,
              max_epochs=150)

2023-04-20 09:49:50,160 ----------------------------------------------------------------------------------------------------
2023-04-20 09:49:50,163 Model: "SequenceTagger(
  (embeddings): StackedEmbeddings(
    (list_embedding_0): WordEmbeddings(
      'glove'
      (embedding): Embedding(400001, 100)
    )
    (list_embedding_1): FlairEmbeddings(
      (lm): LanguageModel(
        (drop): Dropout(p=0.05, inplace=False)
        (encoder): Embedding(300, 100)
        (rnn): LSTM(100, 2048)
      )
    )
    (list_embedding_2): FlairEmbeddings(
      (lm): LanguageModel(
        (drop): Dropout(p=0.05, inplace=False)
        (encoder): Embedding(300, 100)
        (rnn): LSTM(100, 2048)
      )
    )
  )
  (word_dropout): WordDropout(p=0.05)
  (locked_dropout): LockedDropout(p=0.5)
  (embedding2nn): Linear(in_features=4196, out_features=4196, bias=True)
  (rnn): LSTM(4196, 256, batch_first=True, bidirectional=True)
  (linear): Linear(in_features=512, out_features=19, bias=True)
  (loss_f



2023-04-20 09:49:54,397 epoch 1 - iter 7/72 - loss 1.70472670 - time (sec): 4.17 - samples/sec: 945.75 - lr: 0.100000
2023-04-20 09:49:56,920 epoch 1 - iter 14/72 - loss 1.25912823 - time (sec): 6.69 - samples/sec: 1167.14 - lr: 0.100000
2023-04-20 09:50:00,614 epoch 1 - iter 21/72 - loss 1.03510223 - time (sec): 10.39 - samples/sec: 1179.30 - lr: 0.100000
2023-04-20 09:50:03,857 epoch 1 - iter 28/72 - loss 0.92118756 - time (sec): 13.63 - samples/sec: 1185.19 - lr: 0.100000
2023-04-20 09:50:07,001 epoch 1 - iter 35/72 - loss 0.86380709 - time (sec): 16.77 - samples/sec: 1210.80 - lr: 0.100000
2023-04-20 09:50:09,466 epoch 1 - iter 42/72 - loss 0.81601820 - time (sec): 19.24 - samples/sec: 1242.01 - lr: 0.100000
2023-04-20 09:50:12,950 epoch 1 - iter 49/72 - loss 0.78744688 - time (sec): 22.72 - samples/sec: 1233.24 - lr: 0.100000
2023-04-20 09:50:16,327 epoch 1 - iter 56/72 - loss 0.75730716 - time (sec): 26.10 - samples/sec: 1209.12 - lr: 0.100000
2023-04-20 09:50:20,384 epoch 1 - it

100%|██████████| 8/8 [00:03<00:00,  2.25it/s]

2023-04-20 09:50:27,786 Evaluating as a multi-label problem: False
2023-04-20 09:50:27,803 DEV : loss 0.4311607778072357 - f1-score (micro avg)  0.2209
2023-04-20 09:50:27,828 BAD EPOCHS (no improvement): 0
2023-04-20 09:50:27,834 saving best model





2023-04-20 09:50:29,696 ----------------------------------------------------------------------------------------------------
2023-04-20 09:50:30,723 epoch 2 - iter 7/72 - loss 0.40147805 - time (sec): 1.01 - samples/sec: 3642.85 - lr: 0.100000
2023-04-20 09:50:31,865 epoch 2 - iter 14/72 - loss 0.40313576 - time (sec): 2.15 - samples/sec: 3487.52 - lr: 0.100000
2023-04-20 09:50:32,945 epoch 2 - iter 21/72 - loss 0.42177775 - time (sec): 3.23 - samples/sec: 3482.01 - lr: 0.100000
2023-04-20 09:50:34,215 epoch 2 - iter 28/72 - loss 0.41480260 - time (sec): 4.50 - samples/sec: 3445.09 - lr: 0.100000
2023-04-20 09:50:35,272 epoch 2 - iter 35/72 - loss 0.41119962 - time (sec): 5.56 - samples/sec: 3453.08 - lr: 0.100000
2023-04-20 09:50:36,331 epoch 2 - iter 42/72 - loss 0.41304041 - time (sec): 6.62 - samples/sec: 3526.94 - lr: 0.100000
2023-04-20 09:50:37,098 epoch 2 - iter 49/72 - loss 0.40687737 - time (sec): 7.38 - samples/sec: 3706.75 - lr: 0.100000
2023-04-20 09:50:37,804 epoch 2 - it

100%|██████████| 8/8 [00:01<00:00,  5.49it/s]

2023-04-20 09:50:41,383 Evaluating as a multi-label problem: False
2023-04-20 09:50:41,406 DEV : loss 0.3018215298652649 - f1-score (micro avg)  0.2782
2023-04-20 09:50:41,440 BAD EPOCHS (no improvement): 0
2023-04-20 09:50:41,447 saving best model





2023-04-20 09:50:43,364 ----------------------------------------------------------------------------------------------------
2023-04-20 09:50:44,257 epoch 3 - iter 7/72 - loss 0.32632578 - time (sec): 0.89 - samples/sec: 4350.28 - lr: 0.100000
2023-04-20 09:50:45,066 epoch 3 - iter 14/72 - loss 0.31950442 - time (sec): 1.70 - samples/sec: 4605.25 - lr: 0.100000
2023-04-20 09:50:45,987 epoch 3 - iter 21/72 - loss 0.29705222 - time (sec): 2.62 - samples/sec: 4523.73 - lr: 0.100000
2023-04-20 09:50:46,897 epoch 3 - iter 28/72 - loss 0.29542812 - time (sec): 3.53 - samples/sec: 4462.55 - lr: 0.100000
2023-04-20 09:50:48,040 epoch 3 - iter 35/72 - loss 0.29682175 - time (sec): 4.67 - samples/sec: 4177.03 - lr: 0.100000
2023-04-20 09:50:49,142 epoch 3 - iter 42/72 - loss 0.30111911 - time (sec): 5.77 - samples/sec: 4051.92 - lr: 0.100000
2023-04-20 09:50:50,326 epoch 3 - iter 49/72 - loss 0.29792615 - time (sec): 6.96 - samples/sec: 3901.00 - lr: 0.100000
2023-04-20 09:50:51,742 epoch 3 - it

100%|██████████| 8/8 [00:01<00:00,  7.16it/s]

2023-04-20 09:50:55,536 Evaluating as a multi-label problem: False
2023-04-20 09:50:55,551 DEV : loss 0.24561071395874023 - f1-score (micro avg)  0.4251





2023-04-20 09:50:55,584 BAD EPOCHS (no improvement): 0
2023-04-20 09:50:55,591 saving best model
2023-04-20 09:50:57,457 ----------------------------------------------------------------------------------------------------
2023-04-20 09:50:58,374 epoch 4 - iter 7/72 - loss 0.31540045 - time (sec): 0.90 - samples/sec: 4557.41 - lr: 0.100000
2023-04-20 09:50:59,259 epoch 4 - iter 14/72 - loss 0.29102335 - time (sec): 1.79 - samples/sec: 4492.92 - lr: 0.100000
2023-04-20 09:51:00,088 epoch 4 - iter 21/72 - loss 0.27589487 - time (sec): 2.62 - samples/sec: 4583.34 - lr: 0.100000
2023-04-20 09:51:00,900 epoch 4 - iter 28/72 - loss 0.27082085 - time (sec): 3.43 - samples/sec: 4655.86 - lr: 0.100000
2023-04-20 09:51:01,884 epoch 4 - iter 35/72 - loss 0.27326454 - time (sec): 4.41 - samples/sec: 4501.97 - lr: 0.100000
2023-04-20 09:51:03,023 epoch 4 - iter 42/72 - loss 0.26721439 - time (sec): 5.55 - samples/sec: 4268.73 - lr: 0.100000
2023-04-20 09:51:04,153 epoch 4 - iter 49/72 - loss 0.26513

100%|██████████| 8/8 [00:01<00:00,  7.55it/s]

2023-04-20 09:51:10,113 Evaluating as a multi-label problem: False
2023-04-20 09:51:10,130 DEV : loss 0.22475405037403107 - f1-score (micro avg)  0.4556





2023-04-20 09:51:10,156 BAD EPOCHS (no improvement): 0
2023-04-20 09:51:10,164 saving best model
2023-04-20 09:51:12,047 ----------------------------------------------------------------------------------------------------
2023-04-20 09:51:13,060 epoch 5 - iter 7/72 - loss 0.24002678 - time (sec): 1.00 - samples/sec: 4001.83 - lr: 0.100000
2023-04-20 09:51:13,836 epoch 5 - iter 14/72 - loss 0.23925880 - time (sec): 1.78 - samples/sec: 4440.87 - lr: 0.100000
2023-04-20 09:51:14,627 epoch 5 - iter 21/72 - loss 0.25767804 - time (sec): 2.57 - samples/sec: 4528.19 - lr: 0.100000
2023-04-20 09:51:15,386 epoch 5 - iter 28/72 - loss 0.24176917 - time (sec): 3.33 - samples/sec: 4659.67 - lr: 0.100000
2023-04-20 09:51:16,201 epoch 5 - iter 35/72 - loss 0.24226203 - time (sec): 4.14 - samples/sec: 4786.63 - lr: 0.100000
2023-04-20 09:51:17,297 epoch 5 - iter 42/72 - loss 0.23921790 - time (sec): 5.24 - samples/sec: 4571.09 - lr: 0.100000
2023-04-20 09:51:18,364 epoch 5 - iter 49/72 - loss 0.23751

100%|██████████| 8/8 [00:01<00:00,  6.80it/s]

2023-04-20 09:51:24,168 Evaluating as a multi-label problem: False
2023-04-20 09:51:24,187 DEV : loss 0.2043876349925995 - f1-score (micro avg)  0.4942





2023-04-20 09:51:24,220 BAD EPOCHS (no improvement): 0
2023-04-20 09:51:24,224 saving best model
2023-04-20 09:51:26,070 ----------------------------------------------------------------------------------------------------
2023-04-20 09:51:27,013 epoch 6 - iter 7/72 - loss 0.22277099 - time (sec): 0.89 - samples/sec: 4390.92 - lr: 0.100000
2023-04-20 09:51:27,791 epoch 6 - iter 14/72 - loss 0.21364306 - time (sec): 1.67 - samples/sec: 4773.68 - lr: 0.100000
2023-04-20 09:51:28,595 epoch 6 - iter 21/72 - loss 0.21995758 - time (sec): 2.47 - samples/sec: 4838.66 - lr: 0.100000
2023-04-20 09:51:29,285 epoch 6 - iter 28/72 - loss 0.21999619 - time (sec): 3.16 - samples/sec: 4986.55 - lr: 0.100000
2023-04-20 09:51:30,081 epoch 6 - iter 35/72 - loss 0.21939723 - time (sec): 3.96 - samples/sec: 4985.93 - lr: 0.100000
2023-04-20 09:51:30,846 epoch 6 - iter 42/72 - loss 0.21809341 - time (sec): 4.72 - samples/sec: 5030.94 - lr: 0.100000
2023-04-20 09:51:31,636 epoch 6 - iter 49/72 - loss 0.21920

100%|██████████| 8/8 [00:01<00:00,  4.60it/s]

2023-04-20 09:51:37,716 Evaluating as a multi-label problem: False
2023-04-20 09:51:37,738 DEV : loss 0.18809719383716583 - f1-score (micro avg)  0.5572
2023-04-20 09:51:37,778 BAD EPOCHS (no improvement): 0
2023-04-20 09:51:37,783 saving best model





2023-04-20 09:51:39,959 ----------------------------------------------------------------------------------------------------
2023-04-20 09:51:40,930 epoch 7 - iter 7/72 - loss 0.18572221 - time (sec): 0.93 - samples/sec: 4330.82 - lr: 0.100000
2023-04-20 09:51:41,720 epoch 7 - iter 14/72 - loss 0.20801224 - time (sec): 1.72 - samples/sec: 4617.10 - lr: 0.100000
2023-04-20 09:51:42,405 epoch 7 - iter 21/72 - loss 0.21468605 - time (sec): 2.40 - samples/sec: 4832.99 - lr: 0.100000
2023-04-20 09:51:43,249 epoch 7 - iter 28/72 - loss 0.21009678 - time (sec): 3.25 - samples/sec: 4850.41 - lr: 0.100000
2023-04-20 09:51:43,929 epoch 7 - iter 35/72 - loss 0.20802158 - time (sec): 3.93 - samples/sec: 4959.85 - lr: 0.100000
2023-04-20 09:51:44,659 epoch 7 - iter 42/72 - loss 0.21439986 - time (sec): 4.66 - samples/sec: 5002.98 - lr: 0.100000
2023-04-20 09:51:45,410 epoch 7 - iter 49/72 - loss 0.21118619 - time (sec): 5.41 - samples/sec: 5022.48 - lr: 0.100000
2023-04-20 09:51:46,244 epoch 7 - it

100%|██████████| 8/8 [00:01<00:00,  4.14it/s]

2023-04-20 09:51:50,457 Evaluating as a multi-label problem: False
2023-04-20 09:51:50,480 DEV : loss 0.1851503849029541 - f1-score (micro avg)  0.5751
2023-04-20 09:51:50,534 BAD EPOCHS (no improvement): 0
2023-04-20 09:51:50,543 saving best model





2023-04-20 09:51:53,072 ----------------------------------------------------------------------------------------------------
2023-04-20 09:51:54,237 epoch 8 - iter 7/72 - loss 0.20078820 - time (sec): 1.16 - samples/sec: 3211.55 - lr: 0.100000
2023-04-20 09:51:55,188 epoch 8 - iter 14/72 - loss 0.19746588 - time (sec): 2.11 - samples/sec: 3419.65 - lr: 0.100000
2023-04-20 09:51:56,099 epoch 8 - iter 21/72 - loss 0.19571741 - time (sec): 3.02 - samples/sec: 3805.99 - lr: 0.100000
2023-04-20 09:51:56,828 epoch 8 - iter 28/72 - loss 0.19306255 - time (sec): 3.75 - samples/sec: 4098.59 - lr: 0.100000
2023-04-20 09:51:57,731 epoch 8 - iter 35/72 - loss 0.20046012 - time (sec): 4.66 - samples/sec: 4226.24 - lr: 0.100000
2023-04-20 09:51:58,508 epoch 8 - iter 42/72 - loss 0.20012465 - time (sec): 5.43 - samples/sec: 4312.36 - lr: 0.100000
2023-04-20 09:51:59,447 epoch 8 - iter 49/72 - loss 0.19702801 - time (sec): 6.37 - samples/sec: 4297.91 - lr: 0.100000
2023-04-20 09:52:00,358 epoch 8 - it

100%|██████████| 8/8 [00:01<00:00,  4.24it/s]

2023-04-20 09:52:04,687 Evaluating as a multi-label problem: False
2023-04-20 09:52:04,702 DEV : loss 0.18110936880111694 - f1-score (micro avg)  0.5939





2023-04-20 09:52:04,730 BAD EPOCHS (no improvement): 0
2023-04-20 09:52:04,736 saving best model
2023-04-20 09:52:07,088 ----------------------------------------------------------------------------------------------------
2023-04-20 09:52:08,186 epoch 9 - iter 7/72 - loss 0.18000272 - time (sec): 1.10 - samples/sec: 3351.54 - lr: 0.100000
2023-04-20 09:52:09,442 epoch 9 - iter 14/72 - loss 0.19008086 - time (sec): 2.35 - samples/sec: 3285.59 - lr: 0.100000
2023-04-20 09:52:10,749 epoch 9 - iter 21/72 - loss 0.17866792 - time (sec): 3.66 - samples/sec: 3172.59 - lr: 0.100000
2023-04-20 09:52:11,546 epoch 9 - iter 28/72 - loss 0.17870203 - time (sec): 4.46 - samples/sec: 3475.03 - lr: 0.100000
2023-04-20 09:52:12,300 epoch 9 - iter 35/72 - loss 0.17754329 - time (sec): 5.21 - samples/sec: 3724.61 - lr: 0.100000
2023-04-20 09:52:13,008 epoch 9 - iter 42/72 - loss 0.17923286 - time (sec): 5.92 - samples/sec: 3930.87 - lr: 0.100000
2023-04-20 09:52:13,764 epoch 9 - iter 49/72 - loss 0.18254

100%|██████████| 8/8 [00:01<00:00,  5.70it/s]

2023-04-20 09:52:18,864 Evaluating as a multi-label problem: False
2023-04-20 09:52:18,879 DEV : loss 0.18324582278728485 - f1-score (micro avg)  0.5186





2023-04-20 09:52:18,907 BAD EPOCHS (no improvement): 1
2023-04-20 09:52:18,912 ----------------------------------------------------------------------------------------------------
2023-04-20 09:52:19,812 epoch 10 - iter 7/72 - loss 0.17362314 - time (sec): 0.89 - samples/sec: 4334.98 - lr: 0.100000
2023-04-20 09:52:20,634 epoch 10 - iter 14/72 - loss 0.17306972 - time (sec): 1.72 - samples/sec: 4695.98 - lr: 0.100000
2023-04-20 09:52:21,657 epoch 10 - iter 21/72 - loss 0.17743780 - time (sec): 2.74 - samples/sec: 4357.56 - lr: 0.100000
2023-04-20 09:52:22,637 epoch 10 - iter 28/72 - loss 0.17553640 - time (sec): 3.72 - samples/sec: 4262.62 - lr: 0.100000
2023-04-20 09:52:23,672 epoch 10 - iter 35/72 - loss 0.17438049 - time (sec): 4.76 - samples/sec: 4162.11 - lr: 0.100000
2023-04-20 09:52:25,102 epoch 10 - iter 42/72 - loss 0.17602305 - time (sec): 6.19 - samples/sec: 3829.80 - lr: 0.100000
2023-04-20 09:52:26,210 epoch 10 - iter 49/72 - loss 0.18111605 - time (sec): 7.29 - samples/se

100%|██████████| 8/8 [00:01<00:00,  7.61it/s]

2023-04-20 09:52:29,846 Evaluating as a multi-label problem: False
2023-04-20 09:52:29,862 DEV : loss 0.1736479252576828 - f1-score (micro avg)  0.5696





2023-04-20 09:52:29,887 BAD EPOCHS (no improvement): 2
2023-04-20 09:52:29,894 ----------------------------------------------------------------------------------------------------
2023-04-20 09:52:30,985 epoch 11 - iter 7/72 - loss 0.16178767 - time (sec): 1.09 - samples/sec: 3732.67 - lr: 0.100000
2023-04-20 09:52:31,889 epoch 11 - iter 14/72 - loss 0.16871134 - time (sec): 1.99 - samples/sec: 3859.36 - lr: 0.100000
2023-04-20 09:52:33,461 epoch 11 - iter 21/72 - loss 0.16661730 - time (sec): 3.56 - samples/sec: 3296.02 - lr: 0.100000
2023-04-20 09:52:34,616 epoch 11 - iter 28/72 - loss 0.16428471 - time (sec): 4.72 - samples/sec: 3340.93 - lr: 0.100000
2023-04-20 09:52:35,753 epoch 11 - iter 35/72 - loss 0.16577333 - time (sec): 5.85 - samples/sec: 3367.94 - lr: 0.100000
2023-04-20 09:52:37,819 epoch 11 - iter 42/72 - loss 0.16651466 - time (sec): 7.92 - samples/sec: 2974.70 - lr: 0.100000
2023-04-20 09:52:40,079 epoch 11 - iter 49/72 - loss 0.16999041 - time (sec): 10.18 - samples/s

100%|██████████| 8/8 [00:01<00:00,  4.11it/s]

2023-04-20 09:52:47,009 Evaluating as a multi-label problem: False
2023-04-20 09:52:47,031 DEV : loss 0.16427379846572876 - f1-score (micro avg)  0.5887
2023-04-20 09:52:47,074 BAD EPOCHS (no improvement): 3
2023-04-20 09:52:47,082 ----------------------------------------------------------------------------------------------------





2023-04-20 09:52:48,528 epoch 12 - iter 7/72 - loss 0.18607405 - time (sec): 1.44 - samples/sec: 2698.05 - lr: 0.100000
2023-04-20 09:52:49,720 epoch 12 - iter 14/72 - loss 0.18000065 - time (sec): 2.63 - samples/sec: 2865.93 - lr: 0.100000
2023-04-20 09:52:51,003 epoch 12 - iter 21/72 - loss 0.16784164 - time (sec): 3.92 - samples/sec: 2985.54 - lr: 0.100000
2023-04-20 09:52:52,206 epoch 12 - iter 28/72 - loss 0.17038950 - time (sec): 5.12 - samples/sec: 3009.18 - lr: 0.100000
2023-04-20 09:52:53,441 epoch 12 - iter 35/72 - loss 0.17080159 - time (sec): 6.35 - samples/sec: 3059.60 - lr: 0.100000
2023-04-20 09:52:54,450 epoch 12 - iter 42/72 - loss 0.17179488 - time (sec): 7.36 - samples/sec: 3169.90 - lr: 0.100000
2023-04-20 09:52:55,597 epoch 12 - iter 49/72 - loss 0.17438111 - time (sec): 8.51 - samples/sec: 3205.52 - lr: 0.100000
2023-04-20 09:52:56,724 epoch 12 - iter 56/72 - loss 0.17510411 - time (sec): 9.64 - samples/sec: 3234.47 - lr: 0.100000
2023-04-20 09:52:57,782 epoch 12 

100%|██████████| 8/8 [00:01<00:00,  7.49it/s]

2023-04-20 09:53:00,083 Evaluating as a multi-label problem: False
2023-04-20 09:53:00,098 DEV : loss 0.17266671359539032 - f1-score (micro avg)  0.5726





2023-04-20 09:53:00,133 Epoch    12: reducing learning rate of group 0 to 5.0000e-02.
2023-04-20 09:53:00,137 BAD EPOCHS (no improvement): 4
2023-04-20 09:53:00,142 ----------------------------------------------------------------------------------------------------
2023-04-20 09:53:00,925 epoch 13 - iter 7/72 - loss 0.15291909 - time (sec): 0.78 - samples/sec: 5173.84 - lr: 0.050000
2023-04-20 09:53:01,829 epoch 13 - iter 14/72 - loss 0.14019170 - time (sec): 1.69 - samples/sec: 4690.64 - lr: 0.050000
2023-04-20 09:53:02,581 epoch 13 - iter 21/72 - loss 0.14907880 - time (sec): 2.44 - samples/sec: 4820.82 - lr: 0.050000
2023-04-20 09:53:03,366 epoch 13 - iter 28/72 - loss 0.15516979 - time (sec): 3.22 - samples/sec: 4858.36 - lr: 0.050000
2023-04-20 09:53:04,104 epoch 13 - iter 35/72 - loss 0.15951685 - time (sec): 3.96 - samples/sec: 4869.73 - lr: 0.050000
2023-04-20 09:53:04,884 epoch 13 - iter 42/72 - loss 0.15718485 - time (sec): 4.74 - samples/sec: 4852.59 - lr: 0.050000
2023-04-2

100%|██████████| 8/8 [00:02<00:00,  3.50it/s]

2023-04-20 09:53:11,102 Evaluating as a multi-label problem: False
2023-04-20 09:53:11,125 DEV : loss 0.159393772482872 - f1-score (micro avg)  0.6125
2023-04-20 09:53:11,167 BAD EPOCHS (no improvement): 0
2023-04-20 09:53:11,174 saving best model





2023-04-20 09:53:13,439 ----------------------------------------------------------------------------------------------------
2023-04-20 09:53:14,299 epoch 14 - iter 7/72 - loss 0.14048312 - time (sec): 0.83 - samples/sec: 4713.17 - lr: 0.050000
2023-04-20 09:53:15,094 epoch 14 - iter 14/72 - loss 0.16260802 - time (sec): 1.63 - samples/sec: 4810.26 - lr: 0.050000
2023-04-20 09:53:15,941 epoch 14 - iter 21/72 - loss 0.15667704 - time (sec): 2.47 - samples/sec: 4667.29 - lr: 0.050000
2023-04-20 09:53:16,721 epoch 14 - iter 28/72 - loss 0.15415580 - time (sec): 3.25 - samples/sec: 4697.69 - lr: 0.050000
2023-04-20 09:53:17,486 epoch 14 - iter 35/72 - loss 0.15630968 - time (sec): 4.02 - samples/sec: 4746.11 - lr: 0.050000
2023-04-20 09:53:18,254 epoch 14 - iter 42/72 - loss 0.15705394 - time (sec): 4.79 - samples/sec: 4792.74 - lr: 0.050000
2023-04-20 09:53:19,154 epoch 14 - iter 49/72 - loss 0.16262778 - time (sec): 5.69 - samples/sec: 4709.96 - lr: 0.050000
2023-04-20 09:53:20,294 epoch

100%|██████████| 8/8 [00:01<00:00,  4.50it/s]

2023-04-20 09:53:24,692 Evaluating as a multi-label problem: False
2023-04-20 09:53:24,729 DEV : loss 0.1577455997467041 - f1-score (micro avg)  0.6081
2023-04-20 09:53:24,785 BAD EPOCHS (no improvement): 1
2023-04-20 09:53:24,797 ----------------------------------------------------------------------------------------------------





2023-04-20 09:53:25,945 epoch 15 - iter 7/72 - loss 0.15175008 - time (sec): 1.14 - samples/sec: 3318.57 - lr: 0.050000
2023-04-20 09:53:27,120 epoch 15 - iter 14/72 - loss 0.15335391 - time (sec): 2.32 - samples/sec: 3239.49 - lr: 0.050000
2023-04-20 09:53:28,476 epoch 15 - iter 21/72 - loss 0.15480240 - time (sec): 3.68 - samples/sec: 3166.41 - lr: 0.050000
2023-04-20 09:53:29,447 epoch 15 - iter 28/72 - loss 0.15347604 - time (sec): 4.65 - samples/sec: 3349.96 - lr: 0.050000
2023-04-20 09:53:30,265 epoch 15 - iter 35/72 - loss 0.14962030 - time (sec): 5.46 - samples/sec: 3563.02 - lr: 0.050000
2023-04-20 09:53:31,051 epoch 15 - iter 42/72 - loss 0.14707876 - time (sec): 6.25 - samples/sec: 3745.46 - lr: 0.050000
2023-04-20 09:53:31,837 epoch 15 - iter 49/72 - loss 0.15560385 - time (sec): 7.04 - samples/sec: 3890.53 - lr: 0.050000
2023-04-20 09:53:32,615 epoch 15 - iter 56/72 - loss 0.15443384 - time (sec): 7.82 - samples/sec: 4007.55 - lr: 0.050000
2023-04-20 09:53:33,276 epoch 15 

100%|██████████| 8/8 [00:01<00:00,  7.74it/s]

2023-04-20 09:53:35,430 Evaluating as a multi-label problem: False
2023-04-20 09:53:35,446 DEV : loss 0.1557440161705017 - f1-score (micro avg)  0.5977
2023-04-20 09:53:35,474 BAD EPOCHS (no improvement): 2





2023-04-20 09:53:35,482 ----------------------------------------------------------------------------------------------------
2023-04-20 09:53:36,447 epoch 16 - iter 7/72 - loss 0.16195832 - time (sec): 0.96 - samples/sec: 4398.28 - lr: 0.050000
2023-04-20 09:53:37,177 epoch 16 - iter 14/72 - loss 0.15494415 - time (sec): 1.69 - samples/sec: 4844.38 - lr: 0.050000
2023-04-20 09:53:38,035 epoch 16 - iter 21/72 - loss 0.15013687 - time (sec): 2.55 - samples/sec: 4789.00 - lr: 0.050000
2023-04-20 09:53:38,721 epoch 16 - iter 28/72 - loss 0.14804017 - time (sec): 3.24 - samples/sec: 4933.13 - lr: 0.050000
2023-04-20 09:53:39,729 epoch 16 - iter 35/72 - loss 0.14908854 - time (sec): 4.24 - samples/sec: 4684.34 - lr: 0.050000
2023-04-20 09:53:40,749 epoch 16 - iter 42/72 - loss 0.15005778 - time (sec): 5.26 - samples/sec: 4478.28 - lr: 0.050000
2023-04-20 09:53:41,871 epoch 16 - iter 49/72 - loss 0.15016498 - time (sec): 6.39 - samples/sec: 4313.98 - lr: 0.050000
2023-04-20 09:53:42,876 epoch

100%|██████████| 8/8 [00:01<00:00,  7.65it/s]

2023-04-20 09:53:46,317 Evaluating as a multi-label problem: False
2023-04-20 09:53:46,333 DEV : loss 0.15185897052288055 - f1-score (micro avg)  0.6198





2023-04-20 09:53:46,359 BAD EPOCHS (no improvement): 0
2023-04-20 09:53:46,367 saving best model
2023-04-20 09:53:48,476 ----------------------------------------------------------------------------------------------------
2023-04-20 09:53:49,341 epoch 17 - iter 7/72 - loss 0.14771235 - time (sec): 0.86 - samples/sec: 4497.40 - lr: 0.050000
2023-04-20 09:53:50,163 epoch 17 - iter 14/72 - loss 0.14754512 - time (sec): 1.68 - samples/sec: 4580.21 - lr: 0.050000
2023-04-20 09:53:50,817 epoch 17 - iter 21/72 - loss 0.15526068 - time (sec): 2.33 - samples/sec: 4903.87 - lr: 0.050000
2023-04-20 09:53:51,717 epoch 17 - iter 28/72 - loss 0.15523376 - time (sec): 3.23 - samples/sec: 4832.99 - lr: 0.050000
2023-04-20 09:53:52,399 epoch 17 - iter 35/72 - loss 0.15517559 - time (sec): 3.92 - samples/sec: 4942.75 - lr: 0.050000
2023-04-20 09:53:53,322 epoch 17 - iter 42/72 - loss 0.15123570 - time (sec): 4.84 - samples/sec: 4862.72 - lr: 0.050000
2023-04-20 09:53:53,997 epoch 17 - iter 49/72 - loss 

100%|██████████| 8/8 [00:02<00:00,  3.18it/s]

2023-04-20 09:54:00,359 Evaluating as a multi-label problem: False
2023-04-20 09:54:00,383 DEV : loss 0.1600029617547989 - f1-score (micro avg)  0.5801
2023-04-20 09:54:00,423 BAD EPOCHS (no improvement): 1
2023-04-20 09:54:00,431 ----------------------------------------------------------------------------------------------------





2023-04-20 09:54:01,404 epoch 18 - iter 7/72 - loss 0.12019405 - time (sec): 0.97 - samples/sec: 4045.61 - lr: 0.050000
2023-04-20 09:54:02,236 epoch 18 - iter 14/72 - loss 0.13346715 - time (sec): 1.80 - samples/sec: 4387.31 - lr: 0.050000
2023-04-20 09:54:03,023 epoch 18 - iter 21/72 - loss 0.13167005 - time (sec): 2.59 - samples/sec: 4526.58 - lr: 0.050000
2023-04-20 09:54:03,759 epoch 18 - iter 28/72 - loss 0.13550402 - time (sec): 3.32 - samples/sec: 4661.97 - lr: 0.050000
2023-04-20 09:54:04,547 epoch 18 - iter 35/72 - loss 0.13614760 - time (sec): 4.11 - samples/sec: 4724.56 - lr: 0.050000
2023-04-20 09:54:05,253 epoch 18 - iter 42/72 - loss 0.13700530 - time (sec): 4.81 - samples/sec: 4846.19 - lr: 0.050000
2023-04-20 09:54:06,078 epoch 18 - iter 49/72 - loss 0.13828683 - time (sec): 5.64 - samples/sec: 4837.10 - lr: 0.050000
2023-04-20 09:54:06,817 epoch 18 - iter 56/72 - loss 0.13770276 - time (sec): 6.38 - samples/sec: 4913.18 - lr: 0.050000
2023-04-20 09:54:07,688 epoch 18 

100%|██████████| 8/8 [00:01<00:00,  7.00it/s]

2023-04-20 09:54:10,290 Evaluating as a multi-label problem: False
2023-04-20 09:54:10,310 DEV : loss 0.1591956466436386 - f1-score (micro avg)  0.6204
2023-04-20 09:54:10,353 BAD EPOCHS (no improvement): 0
2023-04-20 09:54:10,364 saving best model





2023-04-20 09:54:12,764 ----------------------------------------------------------------------------------------------------
2023-04-20 09:54:14,052 epoch 19 - iter 7/72 - loss 0.13380456 - time (sec): 1.25 - samples/sec: 3115.22 - lr: 0.050000
2023-04-20 09:54:15,370 epoch 19 - iter 14/72 - loss 0.13808991 - time (sec): 2.57 - samples/sec: 3068.15 - lr: 0.050000
2023-04-20 09:54:16,505 epoch 19 - iter 21/72 - loss 0.13615337 - time (sec): 3.71 - samples/sec: 3193.71 - lr: 0.050000
2023-04-20 09:54:17,281 epoch 19 - iter 28/72 - loss 0.14231760 - time (sec): 4.48 - samples/sec: 3526.57 - lr: 0.050000
2023-04-20 09:54:17,945 epoch 19 - iter 35/72 - loss 0.14099076 - time (sec): 5.15 - samples/sec: 3783.21 - lr: 0.050000
2023-04-20 09:54:18,680 epoch 19 - iter 42/72 - loss 0.13825366 - time (sec): 5.88 - samples/sec: 3959.78 - lr: 0.050000
2023-04-20 09:54:19,755 epoch 19 - iter 49/72 - loss 0.13893219 - time (sec): 6.96 - samples/sec: 3922.46 - lr: 0.050000
2023-04-20 09:54:20,773 epoch

100%|██████████| 8/8 [00:01<00:00,  7.22it/s]

2023-04-20 09:54:24,550 Evaluating as a multi-label problem: False





2023-04-20 09:54:24,571 DEV : loss 0.14909514784812927 - f1-score (micro avg)  0.6241
2023-04-20 09:54:24,598 BAD EPOCHS (no improvement): 0
2023-04-20 09:54:24,603 saving best model
2023-04-20 09:54:26,712 ----------------------------------------------------------------------------------------------------
2023-04-20 09:54:27,943 epoch 20 - iter 7/72 - loss 0.11573977 - time (sec): 1.18 - samples/sec: 3198.62 - lr: 0.050000
2023-04-20 09:54:29,057 epoch 20 - iter 14/72 - loss 0.13246012 - time (sec): 2.29 - samples/sec: 3400.57 - lr: 0.050000
2023-04-20 09:54:30,328 epoch 20 - iter 21/72 - loss 0.13258961 - time (sec): 3.56 - samples/sec: 3295.12 - lr: 0.050000
2023-04-20 09:54:31,530 epoch 20 - iter 28/72 - loss 0.13272497 - time (sec): 4.76 - samples/sec: 3320.62 - lr: 0.050000
2023-04-20 09:54:32,414 epoch 20 - iter 35/72 - loss 0.13556645 - time (sec): 5.65 - samples/sec: 3507.37 - lr: 0.050000
2023-04-20 09:54:33,294 epoch 20 - iter 42/72 - loss 0.13717467 - time (sec): 6.53 - sam

100%|██████████| 8/8 [00:01<00:00,  5.32it/s]

2023-04-20 09:54:38,925 Evaluating as a multi-label problem: False
2023-04-20 09:54:38,943 DEV : loss 0.15071117877960205 - f1-score (micro avg)  0.6156





2023-04-20 09:54:38,978 BAD EPOCHS (no improvement): 1
2023-04-20 09:54:38,985 ----------------------------------------------------------------------------------------------------
2023-04-20 09:54:40,044 epoch 21 - iter 7/72 - loss 0.13148537 - time (sec): 1.05 - samples/sec: 3910.03 - lr: 0.050000
2023-04-20 09:54:40,803 epoch 21 - iter 14/72 - loss 0.13140160 - time (sec): 1.81 - samples/sec: 4342.95 - lr: 0.050000
2023-04-20 09:54:41,706 epoch 21 - iter 21/72 - loss 0.13621580 - time (sec): 2.72 - samples/sec: 4326.84 - lr: 0.050000
2023-04-20 09:54:42,588 epoch 21 - iter 28/72 - loss 0.13583865 - time (sec): 3.60 - samples/sec: 4325.40 - lr: 0.050000
2023-04-20 09:54:43,550 epoch 21 - iter 35/72 - loss 0.13547940 - time (sec): 4.56 - samples/sec: 4278.39 - lr: 0.050000
2023-04-20 09:54:44,560 epoch 21 - iter 42/72 - loss 0.13646354 - time (sec): 5.57 - samples/sec: 4204.69 - lr: 0.050000
2023-04-20 09:54:45,774 epoch 21 - iter 49/72 - loss 0.13719009 - time (sec): 6.78 - samples/se

100%|██████████| 8/8 [00:01<00:00,  7.48it/s]

2023-04-20 09:54:49,863 Evaluating as a multi-label problem: False
2023-04-20 09:54:49,879 DEV : loss 0.1512536257505417 - f1-score (micro avg)  0.6215





2023-04-20 09:54:49,903 BAD EPOCHS (no improvement): 2
2023-04-20 09:54:49,912 ----------------------------------------------------------------------------------------------------
2023-04-20 09:54:50,648 epoch 22 - iter 7/72 - loss 0.14308849 - time (sec): 0.73 - samples/sec: 5319.87 - lr: 0.050000
2023-04-20 09:54:51,451 epoch 22 - iter 14/72 - loss 0.13470146 - time (sec): 1.53 - samples/sec: 5182.73 - lr: 0.050000
2023-04-20 09:54:52,211 epoch 22 - iter 21/72 - loss 0.13162428 - time (sec): 2.29 - samples/sec: 5175.00 - lr: 0.050000
2023-04-20 09:54:52,984 epoch 22 - iter 28/72 - loss 0.12791929 - time (sec): 3.07 - samples/sec: 5145.29 - lr: 0.050000
2023-04-20 09:54:53,789 epoch 22 - iter 35/72 - loss 0.12986892 - time (sec): 3.87 - samples/sec: 5065.47 - lr: 0.050000
2023-04-20 09:54:54,564 epoch 22 - iter 42/72 - loss 0.13229687 - time (sec): 4.65 - samples/sec: 5054.42 - lr: 0.050000
2023-04-20 09:54:55,253 epoch 22 - iter 49/72 - loss 0.13085599 - time (sec): 5.34 - samples/se

100%|██████████| 8/8 [00:01<00:00,  4.88it/s]

2023-04-20 09:55:00,076 Evaluating as a multi-label problem: False
2023-04-20 09:55:00,103 DEV : loss 0.149658665060997 - f1-score (micro avg)  0.6097
2023-04-20 09:55:00,143 BAD EPOCHS (no improvement): 3
2023-04-20 09:55:00,148 ----------------------------------------------------------------------------------------------------





2023-04-20 09:55:02,062 epoch 23 - iter 7/72 - loss 0.12369764 - time (sec): 1.91 - samples/sec: 2029.78 - lr: 0.050000
2023-04-20 09:55:03,050 epoch 23 - iter 14/72 - loss 0.12344891 - time (sec): 2.90 - samples/sec: 2719.74 - lr: 0.050000
2023-04-20 09:55:03,807 epoch 23 - iter 21/72 - loss 0.13134784 - time (sec): 3.66 - samples/sec: 3261.63 - lr: 0.050000
2023-04-20 09:55:04,611 epoch 23 - iter 28/72 - loss 0.12965292 - time (sec): 4.46 - samples/sec: 3557.09 - lr: 0.050000
2023-04-20 09:55:05,344 epoch 23 - iter 35/72 - loss 0.13235985 - time (sec): 5.19 - samples/sec: 3806.60 - lr: 0.050000
2023-04-20 09:55:06,122 epoch 23 - iter 42/72 - loss 0.13479864 - time (sec): 5.97 - samples/sec: 3981.47 - lr: 0.050000
2023-04-20 09:55:06,783 epoch 23 - iter 49/72 - loss 0.13508200 - time (sec): 6.63 - samples/sec: 4142.00 - lr: 0.050000
2023-04-20 09:55:07,474 epoch 23 - iter 56/72 - loss 0.13536505 - time (sec): 7.32 - samples/sec: 4267.02 - lr: 0.050000
2023-04-20 09:55:08,311 epoch 23 

100%|██████████| 8/8 [00:01<00:00,  7.59it/s]

2023-04-20 09:55:10,426 Evaluating as a multi-label problem: False
2023-04-20 09:55:10,442 DEV : loss 0.15296469628810883 - f1-score (micro avg)  0.6062





2023-04-20 09:55:10,469 Epoch    23: reducing learning rate of group 0 to 2.5000e-02.
2023-04-20 09:55:10,472 BAD EPOCHS (no improvement): 4
2023-04-20 09:55:10,477 ----------------------------------------------------------------------------------------------------
2023-04-20 09:55:11,273 epoch 24 - iter 7/72 - loss 0.12601771 - time (sec): 0.79 - samples/sec: 4884.72 - lr: 0.025000
2023-04-20 09:55:11,962 epoch 24 - iter 14/72 - loss 0.13525511 - time (sec): 1.48 - samples/sec: 5194.90 - lr: 0.025000
2023-04-20 09:55:12,986 epoch 24 - iter 21/72 - loss 0.13331790 - time (sec): 2.51 - samples/sec: 4665.65 - lr: 0.025000
2023-04-20 09:55:13,974 epoch 24 - iter 28/72 - loss 0.13254826 - time (sec): 3.50 - samples/sec: 4421.39 - lr: 0.025000
2023-04-20 09:55:15,227 epoch 24 - iter 35/72 - loss 0.13851233 - time (sec): 4.75 - samples/sec: 4130.97 - lr: 0.025000
2023-04-20 09:55:16,336 epoch 24 - iter 42/72 - loss 0.13753702 - time (sec): 5.86 - samples/sec: 4032.97 - lr: 0.025000
2023-04-2

100%|██████████| 8/8 [00:01<00:00,  7.65it/s]

2023-04-20 09:55:21,272 Evaluating as a multi-label problem: False
2023-04-20 09:55:21,289 DEV : loss 0.14635753631591797 - f1-score (micro avg)  0.625
2023-04-20 09:55:21,313 BAD EPOCHS (no improvement): 0





2023-04-20 09:55:21,320 saving best model
2023-04-20 09:55:23,162 ----------------------------------------------------------------------------------------------------
2023-04-20 09:55:23,982 epoch 25 - iter 7/72 - loss 0.12734347 - time (sec): 0.81 - samples/sec: 4726.95 - lr: 0.025000
2023-04-20 09:55:24,806 epoch 25 - iter 14/72 - loss 0.12165602 - time (sec): 1.64 - samples/sec: 4800.79 - lr: 0.025000
2023-04-20 09:55:25,505 epoch 25 - iter 21/72 - loss 0.12903966 - time (sec): 2.34 - samples/sec: 4989.11 - lr: 0.025000
2023-04-20 09:55:26,361 epoch 25 - iter 28/72 - loss 0.12980483 - time (sec): 3.19 - samples/sec: 4877.95 - lr: 0.025000
2023-04-20 09:55:27,125 epoch 25 - iter 35/72 - loss 0.12858873 - time (sec): 3.96 - samples/sec: 4928.54 - lr: 0.025000
2023-04-20 09:55:28,077 epoch 25 - iter 42/72 - loss 0.12587434 - time (sec): 4.91 - samples/sec: 4766.97 - lr: 0.025000
2023-04-20 09:55:29,116 epoch 25 - iter 49/72 - loss 0.12697937 - time (sec): 5.95 - samples/sec: 4575.86 - 

100%|██████████| 8/8 [00:01<00:00,  7.37it/s]

2023-04-20 09:55:35,594 Evaluating as a multi-label problem: False
2023-04-20 09:55:35,612 DEV : loss 0.14905215799808502 - f1-score (micro avg)  0.6183





2023-04-20 09:55:35,641 BAD EPOCHS (no improvement): 1
2023-04-20 09:55:35,648 ----------------------------------------------------------------------------------------------------
2023-04-20 09:55:36,419 epoch 26 - iter 7/72 - loss 0.12502837 - time (sec): 0.77 - samples/sec: 4922.70 - lr: 0.025000
2023-04-20 09:55:37,238 epoch 26 - iter 14/72 - loss 0.13479571 - time (sec): 1.58 - samples/sec: 4936.36 - lr: 0.025000
2023-04-20 09:55:38,082 epoch 26 - iter 21/72 - loss 0.12960040 - time (sec): 2.43 - samples/sec: 4919.91 - lr: 0.025000
2023-04-20 09:55:38,796 epoch 26 - iter 28/72 - loss 0.12599789 - time (sec): 3.14 - samples/sec: 5019.11 - lr: 0.025000
2023-04-20 09:55:39,691 epoch 26 - iter 35/72 - loss 0.12572536 - time (sec): 4.04 - samples/sec: 4903.34 - lr: 0.025000
2023-04-20 09:55:40,535 epoch 26 - iter 42/72 - loss 0.12850290 - time (sec): 4.88 - samples/sec: 4879.98 - lr: 0.025000
2023-04-20 09:55:41,429 epoch 26 - iter 49/72 - loss 0.12775076 - time (sec): 5.78 - samples/se

100%|██████████| 8/8 [00:01<00:00,  4.87it/s]

2023-04-20 09:55:45,746 Evaluating as a multi-label problem: False
2023-04-20 09:55:45,768 DEV : loss 0.14723962545394897 - f1-score (micro avg)  0.6252
2023-04-20 09:55:45,813 BAD EPOCHS (no improvement): 0
2023-04-20 09:55:45,824 saving best model





2023-04-20 09:55:48,395 ----------------------------------------------------------------------------------------------------
2023-04-20 09:55:49,600 epoch 27 - iter 7/72 - loss 0.16594365 - time (sec): 1.15 - samples/sec: 3464.44 - lr: 0.025000
2023-04-20 09:55:50,556 epoch 27 - iter 14/72 - loss 0.14179633 - time (sec): 2.11 - samples/sec: 3833.12 - lr: 0.025000
2023-04-20 09:55:51,271 epoch 27 - iter 21/72 - loss 0.13156950 - time (sec): 2.82 - samples/sec: 4186.27 - lr: 0.025000
2023-04-20 09:55:52,014 epoch 27 - iter 28/72 - loss 0.12755964 - time (sec): 3.57 - samples/sec: 4401.20 - lr: 0.025000
2023-04-20 09:55:52,790 epoch 27 - iter 35/72 - loss 0.12445160 - time (sec): 4.34 - samples/sec: 4514.05 - lr: 0.025000
2023-04-20 09:55:53,549 epoch 27 - iter 42/72 - loss 0.12640020 - time (sec): 5.10 - samples/sec: 4580.28 - lr: 0.025000
2023-04-20 09:55:54,396 epoch 27 - iter 49/72 - loss 0.12547958 - time (sec): 5.95 - samples/sec: 4612.49 - lr: 0.025000
2023-04-20 09:55:55,444 epoch

100%|██████████| 8/8 [00:01<00:00,  5.77it/s]

2023-04-20 09:55:59,549 Evaluating as a multi-label problem: False
2023-04-20 09:55:59,572 DEV : loss 0.14658001065254211 - f1-score (micro avg)  0.6329
2023-04-20 09:55:59,623 BAD EPOCHS (no improvement): 0
2023-04-20 09:55:59,633 saving best model





2023-04-20 09:56:02,242 ----------------------------------------------------------------------------------------------------
2023-04-20 09:56:04,312 epoch 28 - iter 7/72 - loss 0.11194734 - time (sec): 2.05 - samples/sec: 1939.39 - lr: 0.025000
2023-04-20 09:56:05,404 epoch 28 - iter 14/72 - loss 0.11774249 - time (sec): 3.14 - samples/sec: 2509.56 - lr: 0.025000
2023-04-20 09:56:06,366 epoch 28 - iter 21/72 - loss 0.12454166 - time (sec): 4.10 - samples/sec: 2930.99 - lr: 0.025000
2023-04-20 09:56:07,109 epoch 28 - iter 28/72 - loss 0.12303595 - time (sec): 4.84 - samples/sec: 3248.29 - lr: 0.025000
2023-04-20 09:56:07,925 epoch 28 - iter 35/72 - loss 0.12526479 - time (sec): 5.66 - samples/sec: 3499.12 - lr: 0.025000
2023-04-20 09:56:08,726 epoch 28 - iter 42/72 - loss 0.12432017 - time (sec): 6.46 - samples/sec: 3680.16 - lr: 0.025000
2023-04-20 09:56:09,514 epoch 28 - iter 49/72 - loss 0.12443273 - time (sec): 7.25 - samples/sec: 3813.69 - lr: 0.025000
2023-04-20 09:56:10,641 epoch

100%|██████████| 8/8 [00:01<00:00,  6.67it/s]


2023-04-20 09:56:14,410 Evaluating as a multi-label problem: False
2023-04-20 09:56:14,429 DEV : loss 0.14588893949985504 - f1-score (micro avg)  0.6291
2023-04-20 09:56:14,455 BAD EPOCHS (no improvement): 1
2023-04-20 09:56:14,459 ----------------------------------------------------------------------------------------------------
2023-04-20 09:56:15,543 epoch 29 - iter 7/72 - loss 0.12634713 - time (sec): 1.08 - samples/sec: 3651.27 - lr: 0.025000
2023-04-20 09:56:16,685 epoch 29 - iter 14/72 - loss 0.12902135 - time (sec): 2.22 - samples/sec: 3555.08 - lr: 0.025000
2023-04-20 09:56:17,869 epoch 29 - iter 21/72 - loss 0.12384756 - time (sec): 3.41 - samples/sec: 3497.56 - lr: 0.025000
2023-04-20 09:56:19,085 epoch 29 - iter 28/72 - loss 0.12042873 - time (sec): 4.62 - samples/sec: 3424.22 - lr: 0.025000
2023-04-20 09:56:20,310 epoch 29 - iter 35/72 - loss 0.11763695 - time (sec): 5.85 - samples/sec: 3377.06 - lr: 0.025000
2023-04-20 09:56:21,255 epoch 29 - iter 42/72 - loss 0.12018891

100%|██████████| 8/8 [00:01<00:00,  7.27it/s]

2023-04-20 09:56:25,551 Evaluating as a multi-label problem: False
2023-04-20 09:56:25,568 DEV : loss 0.1453029215335846 - f1-score (micro avg)  0.6335





2023-04-20 09:56:25,597 BAD EPOCHS (no improvement): 0
2023-04-20 09:56:25,605 saving best model
2023-04-20 09:56:27,480 ----------------------------------------------------------------------------------------------------
2023-04-20 09:56:28,392 epoch 30 - iter 7/72 - loss 0.12486831 - time (sec): 0.88 - samples/sec: 4553.98 - lr: 0.025000
2023-04-20 09:56:29,266 epoch 30 - iter 14/72 - loss 0.12131884 - time (sec): 1.75 - samples/sec: 4537.15 - lr: 0.025000
2023-04-20 09:56:30,002 epoch 30 - iter 21/72 - loss 0.13159782 - time (sec): 2.49 - samples/sec: 4737.76 - lr: 0.025000
2023-04-20 09:56:30,948 epoch 30 - iter 28/72 - loss 0.13137693 - time (sec): 3.44 - samples/sec: 4547.14 - lr: 0.025000
2023-04-20 09:56:31,989 epoch 30 - iter 35/72 - loss 0.12912195 - time (sec): 4.48 - samples/sec: 4397.99 - lr: 0.025000
2023-04-20 09:56:33,154 epoch 30 - iter 42/72 - loss 0.12576555 - time (sec): 5.64 - samples/sec: 4149.04 - lr: 0.025000
2023-04-20 09:56:34,754 epoch 30 - iter 49/72 - loss 

100%|██████████| 8/8 [00:01<00:00,  7.25it/s]

2023-04-20 09:56:39,698 Evaluating as a multi-label problem: False
2023-04-20 09:56:39,713 DEV : loss 0.1439448893070221 - f1-score (micro avg)  0.6439





2023-04-20 09:56:39,740 BAD EPOCHS (no improvement): 0
2023-04-20 09:56:39,755 saving best model
2023-04-20 09:56:41,760 ----------------------------------------------------------------------------------------------------
2023-04-20 09:56:42,805 epoch 31 - iter 7/72 - loss 0.11842076 - time (sec): 1.02 - samples/sec: 4011.48 - lr: 0.025000
2023-04-20 09:56:43,626 epoch 31 - iter 14/72 - loss 0.12020558 - time (sec): 1.84 - samples/sec: 4379.64 - lr: 0.025000
2023-04-20 09:56:44,423 epoch 31 - iter 21/72 - loss 0.11891520 - time (sec): 2.64 - samples/sec: 4553.20 - lr: 0.025000
2023-04-20 09:56:45,133 epoch 31 - iter 28/72 - loss 0.11606135 - time (sec): 3.35 - samples/sec: 4742.07 - lr: 0.025000
2023-04-20 09:56:45,826 epoch 31 - iter 35/72 - loss 0.11993193 - time (sec): 4.04 - samples/sec: 4909.72 - lr: 0.025000
2023-04-20 09:56:46,743 epoch 31 - iter 42/72 - loss 0.12258017 - time (sec): 4.96 - samples/sec: 4753.24 - lr: 0.025000
2023-04-20 09:56:47,761 epoch 31 - iter 49/72 - loss 

100%|██████████| 8/8 [00:01<00:00,  7.16it/s]

2023-04-20 09:56:53,890 Evaluating as a multi-label problem: False
2023-04-20 09:56:53,907 DEV : loss 0.1453489363193512 - f1-score (micro avg)  0.6322





2023-04-20 09:56:53,935 BAD EPOCHS (no improvement): 1
2023-04-20 09:56:53,941 ----------------------------------------------------------------------------------------------------
2023-04-20 09:56:54,780 epoch 32 - iter 7/72 - loss 0.11516199 - time (sec): 0.83 - samples/sec: 4709.62 - lr: 0.025000
2023-04-20 09:56:55,579 epoch 32 - iter 14/72 - loss 0.12238609 - time (sec): 1.63 - samples/sec: 4911.81 - lr: 0.025000
2023-04-20 09:56:56,326 epoch 32 - iter 21/72 - loss 0.12395003 - time (sec): 2.38 - samples/sec: 5036.35 - lr: 0.025000
2023-04-20 09:56:57,245 epoch 32 - iter 28/72 - loss 0.12389844 - time (sec): 3.30 - samples/sec: 4870.42 - lr: 0.025000
2023-04-20 09:56:57,916 epoch 32 - iter 35/72 - loss 0.12201628 - time (sec): 3.97 - samples/sec: 4972.23 - lr: 0.025000
2023-04-20 09:56:58,704 epoch 32 - iter 42/72 - loss 0.12619137 - time (sec): 4.76 - samples/sec: 4976.93 - lr: 0.025000
2023-04-20 09:56:59,476 epoch 32 - iter 49/72 - loss 0.12581178 - time (sec): 5.53 - samples/se

100%|██████████| 8/8 [00:02<00:00,  3.67it/s]

2023-04-20 09:57:04,242 Evaluating as a multi-label problem: False
2023-04-20 09:57:04,264 DEV : loss 0.1473478376865387 - f1-score (micro avg)  0.639
2023-04-20 09:57:04,307 BAD EPOCHS (no improvement): 2
2023-04-20 09:57:04,313 ----------------------------------------------------------------------------------------------------





2023-04-20 09:57:05,508 epoch 33 - iter 7/72 - loss 0.11437146 - time (sec): 1.19 - samples/sec: 3153.63 - lr: 0.025000
2023-04-20 09:57:06,642 epoch 33 - iter 14/72 - loss 0.11934715 - time (sec): 2.32 - samples/sec: 3204.48 - lr: 0.025000
2023-04-20 09:57:07,669 epoch 33 - iter 21/72 - loss 0.12216917 - time (sec): 3.35 - samples/sec: 3389.53 - lr: 0.025000
2023-04-20 09:57:08,495 epoch 33 - iter 28/72 - loss 0.12141986 - time (sec): 4.18 - samples/sec: 3633.06 - lr: 0.025000
2023-04-20 09:57:09,335 epoch 33 - iter 35/72 - loss 0.12091181 - time (sec): 5.02 - samples/sec: 3789.21 - lr: 0.025000
2023-04-20 09:57:10,031 epoch 33 - iter 42/72 - loss 0.11759149 - time (sec): 5.71 - samples/sec: 4037.28 - lr: 0.025000
2023-04-20 09:57:10,820 epoch 33 - iter 49/72 - loss 0.11809457 - time (sec): 6.50 - samples/sec: 4199.07 - lr: 0.025000
2023-04-20 09:57:11,700 epoch 33 - iter 56/72 - loss 0.12187564 - time (sec): 7.38 - samples/sec: 4268.74 - lr: 0.025000
2023-04-20 09:57:12,447 epoch 33 

100%|██████████| 8/8 [00:01<00:00,  7.41it/s]

2023-04-20 09:57:14,474 Evaluating as a multi-label problem: False
2023-04-20 09:57:14,491 DEV : loss 0.14835619926452637 - f1-score (micro avg)  0.6325





2023-04-20 09:57:14,519 BAD EPOCHS (no improvement): 3
2023-04-20 09:57:14,525 ----------------------------------------------------------------------------------------------------
2023-04-20 09:57:15,275 epoch 34 - iter 7/72 - loss 0.10938300 - time (sec): 0.75 - samples/sec: 5403.46 - lr: 0.025000
2023-04-20 09:57:16,014 epoch 34 - iter 14/72 - loss 0.10844240 - time (sec): 1.48 - samples/sec: 5184.99 - lr: 0.025000
2023-04-20 09:57:16,707 epoch 34 - iter 21/72 - loss 0.11389341 - time (sec): 2.18 - samples/sec: 5302.05 - lr: 0.025000
2023-04-20 09:57:17,455 epoch 34 - iter 28/72 - loss 0.11745936 - time (sec): 2.92 - samples/sec: 5173.49 - lr: 0.025000
2023-04-20 09:57:18,404 epoch 34 - iter 35/72 - loss 0.11533736 - time (sec): 3.87 - samples/sec: 4922.16 - lr: 0.025000
2023-04-20 09:57:19,660 epoch 34 - iter 42/72 - loss 0.11627675 - time (sec): 5.13 - samples/sec: 4524.22 - lr: 0.025000
2023-04-20 09:57:20,786 epoch 34 - iter 49/72 - loss 0.11876759 - time (sec): 6.26 - samples/se

100%|██████████| 8/8 [00:01<00:00,  7.30it/s]

2023-04-20 09:57:25,284 Evaluating as a multi-label problem: False
2023-04-20 09:57:25,302 DEV : loss 0.1471465677022934 - f1-score (micro avg)  0.6281





2023-04-20 09:57:25,330 Epoch    34: reducing learning rate of group 0 to 1.2500e-02.
2023-04-20 09:57:25,335 BAD EPOCHS (no improvement): 4
2023-04-20 09:57:25,345 ----------------------------------------------------------------------------------------------------
2023-04-20 09:57:26,046 epoch 35 - iter 7/72 - loss 0.12753648 - time (sec): 0.70 - samples/sec: 5526.63 - lr: 0.012500
2023-04-20 09:57:26,863 epoch 35 - iter 14/72 - loss 0.12473133 - time (sec): 1.51 - samples/sec: 5206.19 - lr: 0.012500
2023-04-20 09:57:27,665 epoch 35 - iter 21/72 - loss 0.12203824 - time (sec): 2.32 - samples/sec: 5075.31 - lr: 0.012500
2023-04-20 09:57:28,489 epoch 35 - iter 28/72 - loss 0.12400851 - time (sec): 3.14 - samples/sec: 4991.65 - lr: 0.012500
2023-04-20 09:57:29,186 epoch 35 - iter 35/72 - loss 0.12480325 - time (sec): 3.84 - samples/sec: 5125.30 - lr: 0.012500
2023-04-20 09:57:29,913 epoch 35 - iter 42/72 - loss 0.12131694 - time (sec): 4.56 - samples/sec: 5189.18 - lr: 0.012500
2023-04-2

100%|██████████| 8/8 [00:01<00:00,  4.98it/s]

2023-04-20 09:57:35,036 Evaluating as a multi-label problem: False
2023-04-20 09:57:35,058 DEV : loss 0.14491912722587585 - f1-score (micro avg)  0.641
2023-04-20 09:57:35,102 BAD EPOCHS (no improvement): 1
2023-04-20 09:57:35,111 ----------------------------------------------------------------------------------------------------





2023-04-20 09:57:36,276 epoch 36 - iter 7/72 - loss 0.13136703 - time (sec): 1.16 - samples/sec: 3335.94 - lr: 0.012500
2023-04-20 09:57:37,510 epoch 36 - iter 14/72 - loss 0.12449858 - time (sec): 2.40 - samples/sec: 3209.52 - lr: 0.012500
2023-04-20 09:57:38,715 epoch 36 - iter 21/72 - loss 0.12666282 - time (sec): 3.60 - samples/sec: 3272.03 - lr: 0.012500
2023-04-20 09:57:39,481 epoch 36 - iter 28/72 - loss 0.12202675 - time (sec): 4.37 - samples/sec: 3554.21 - lr: 0.012500
2023-04-20 09:57:40,277 epoch 36 - iter 35/72 - loss 0.11834734 - time (sec): 5.16 - samples/sec: 3748.00 - lr: 0.012500
2023-04-20 09:57:41,125 epoch 36 - iter 42/72 - loss 0.11928951 - time (sec): 6.01 - samples/sec: 3869.88 - lr: 0.012500
2023-04-20 09:57:41,975 epoch 36 - iter 49/72 - loss 0.11671050 - time (sec): 6.86 - samples/sec: 3968.13 - lr: 0.012500
2023-04-20 09:57:42,716 epoch 36 - iter 56/72 - loss 0.11887098 - time (sec): 7.60 - samples/sec: 4115.14 - lr: 0.012500
2023-04-20 09:57:43,396 epoch 36 

100%|██████████| 8/8 [00:01<00:00,  7.69it/s]

2023-04-20 09:57:45,513 Evaluating as a multi-label problem: False
2023-04-20 09:57:45,531 DEV : loss 0.14578258991241455 - f1-score (micro avg)  0.632





2023-04-20 09:57:45,555 BAD EPOCHS (no improvement): 2
2023-04-20 09:57:45,563 ----------------------------------------------------------------------------------------------------
2023-04-20 09:57:46,801 epoch 37 - iter 7/72 - loss 0.11248701 - time (sec): 1.24 - samples/sec: 3137.05 - lr: 0.012500
2023-04-20 09:57:47,651 epoch 37 - iter 14/72 - loss 0.11030630 - time (sec): 2.09 - samples/sec: 3775.40 - lr: 0.012500
2023-04-20 09:57:48,382 epoch 37 - iter 21/72 - loss 0.11112053 - time (sec): 2.82 - samples/sec: 4184.84 - lr: 0.012500
2023-04-20 09:57:49,422 epoch 37 - iter 28/72 - loss 0.11317034 - time (sec): 3.86 - samples/sec: 4131.56 - lr: 0.012500
2023-04-20 09:57:50,415 epoch 37 - iter 35/72 - loss 0.11416458 - time (sec): 4.85 - samples/sec: 4101.98 - lr: 0.012500
2023-04-20 09:57:51,394 epoch 37 - iter 42/72 - loss 0.11658222 - time (sec): 5.83 - samples/sec: 4044.20 - lr: 0.012500
2023-04-20 09:57:52,492 epoch 37 - iter 49/72 - loss 0.11430988 - time (sec): 6.93 - samples/se

100%|██████████| 8/8 [00:01<00:00,  7.66it/s]

2023-04-20 09:57:56,750 Evaluating as a multi-label problem: False
2023-04-20 09:57:56,766 DEV : loss 0.14591199159622192 - f1-score (micro avg)  0.6339





2023-04-20 09:57:56,791 BAD EPOCHS (no improvement): 3
2023-04-20 09:57:56,800 ----------------------------------------------------------------------------------------------------
2023-04-20 09:57:57,547 epoch 38 - iter 7/72 - loss 0.11050271 - time (sec): 0.74 - samples/sec: 5356.83 - lr: 0.012500
2023-04-20 09:57:58,290 epoch 38 - iter 14/72 - loss 0.11513724 - time (sec): 1.48 - samples/sec: 5282.74 - lr: 0.012500
2023-04-20 09:57:59,031 epoch 38 - iter 21/72 - loss 0.11491352 - time (sec): 2.22 - samples/sec: 5257.40 - lr: 0.012500
2023-04-20 09:57:59,759 epoch 38 - iter 28/72 - loss 0.11230855 - time (sec): 2.95 - samples/sec: 5273.24 - lr: 0.012500
2023-04-20 09:58:00,732 epoch 38 - iter 35/72 - loss 0.11232938 - time (sec): 3.92 - samples/sec: 5025.22 - lr: 0.012500
2023-04-20 09:58:01,396 epoch 38 - iter 42/72 - loss 0.11618912 - time (sec): 4.59 - samples/sec: 5094.10 - lr: 0.012500
2023-04-20 09:58:02,235 epoch 38 - iter 49/72 - loss 0.12000228 - time (sec): 5.43 - samples/se

100%|██████████| 8/8 [00:01<00:00,  4.88it/s]

2023-04-20 09:58:06,648 Evaluating as a multi-label problem: False
2023-04-20 09:58:06,675 DEV : loss 0.1462680697441101 - f1-score (micro avg)  0.6406
2023-04-20 09:58:06,718 Epoch    38: reducing learning rate of group 0 to 6.2500e-03.
2023-04-20 09:58:06,724 BAD EPOCHS (no improvement): 4
2023-04-20 09:58:06,729 ----------------------------------------------------------------------------------------------------





2023-04-20 09:58:07,797 epoch 39 - iter 7/72 - loss 0.09938844 - time (sec): 1.07 - samples/sec: 3544.59 - lr: 0.006250
2023-04-20 09:58:09,052 epoch 39 - iter 14/72 - loss 0.10415087 - time (sec): 2.32 - samples/sec: 3286.75 - lr: 0.006250
2023-04-20 09:58:10,037 epoch 39 - iter 21/72 - loss 0.10553865 - time (sec): 3.31 - samples/sec: 3571.72 - lr: 0.006250
2023-04-20 09:58:10,766 epoch 39 - iter 28/72 - loss 0.10596866 - time (sec): 4.03 - samples/sec: 3861.52 - lr: 0.006250
2023-04-20 09:58:11,513 epoch 39 - iter 35/72 - loss 0.10718710 - time (sec): 4.78 - samples/sec: 4107.84 - lr: 0.006250
2023-04-20 09:58:12,371 epoch 39 - iter 42/72 - loss 0.10783262 - time (sec): 5.64 - samples/sec: 4184.19 - lr: 0.006250
2023-04-20 09:58:13,178 epoch 39 - iter 49/72 - loss 0.10873611 - time (sec): 6.45 - samples/sec: 4249.68 - lr: 0.006250
2023-04-20 09:58:13,958 epoch 39 - iter 56/72 - loss 0.10915407 - time (sec): 7.23 - samples/sec: 4343.73 - lr: 0.006250
2023-04-20 09:58:14,716 epoch 39 

100%|██████████| 8/8 [00:01<00:00,  7.53it/s]

2023-04-20 09:58:16,692 Evaluating as a multi-label problem: False
2023-04-20 09:58:16,708 DEV : loss 0.14557373523712158 - f1-score (micro avg)  0.6167





2023-04-20 09:58:16,736 BAD EPOCHS (no improvement): 1
2023-04-20 09:58:16,742 ----------------------------------------------------------------------------------------------------
2023-04-20 09:58:17,548 epoch 40 - iter 7/72 - loss 0.12941221 - time (sec): 0.80 - samples/sec: 5188.50 - lr: 0.006250
2023-04-20 09:58:18,373 epoch 40 - iter 14/72 - loss 0.11708692 - time (sec): 1.63 - samples/sec: 5087.12 - lr: 0.006250
2023-04-20 09:58:19,133 epoch 40 - iter 21/72 - loss 0.11707558 - time (sec): 2.39 - samples/sec: 5104.85 - lr: 0.006250
2023-04-20 09:58:20,136 epoch 40 - iter 28/72 - loss 0.11826867 - time (sec): 3.39 - samples/sec: 4777.32 - lr: 0.006250
2023-04-20 09:58:21,255 epoch 40 - iter 35/72 - loss 0.11651376 - time (sec): 4.51 - samples/sec: 4468.93 - lr: 0.006250
2023-04-20 09:58:22,344 epoch 40 - iter 42/72 - loss 0.11663675 - time (sec): 5.60 - samples/sec: 4276.72 - lr: 0.006250
2023-04-20 09:58:23,463 epoch 40 - iter 49/72 - loss 0.11485032 - time (sec): 6.72 - samples/se

100%|██████████| 8/8 [00:01<00:00,  7.49it/s]

2023-04-20 09:58:27,471 Evaluating as a multi-label problem: False
2023-04-20 09:58:27,488 DEV : loss 0.14551131427288055 - f1-score (micro avg)  0.636





2023-04-20 09:58:27,517 BAD EPOCHS (no improvement): 2
2023-04-20 09:58:27,524 ----------------------------------------------------------------------------------------------------
2023-04-20 09:58:28,280 epoch 41 - iter 7/72 - loss 0.11706296 - time (sec): 0.75 - samples/sec: 5069.87 - lr: 0.006250
2023-04-20 09:58:29,167 epoch 41 - iter 14/72 - loss 0.11982547 - time (sec): 1.64 - samples/sec: 4856.56 - lr: 0.006250
2023-04-20 09:58:29,938 epoch 41 - iter 21/72 - loss 0.11327514 - time (sec): 2.41 - samples/sec: 4943.25 - lr: 0.006250
2023-04-20 09:58:30,699 epoch 41 - iter 28/72 - loss 0.11135701 - time (sec): 3.17 - samples/sec: 4960.02 - lr: 0.006250
2023-04-20 09:58:31,564 epoch 41 - iter 35/72 - loss 0.11145008 - time (sec): 4.03 - samples/sec: 4910.39 - lr: 0.006250
2023-04-20 09:58:32,309 epoch 41 - iter 42/72 - loss 0.11304870 - time (sec): 4.78 - samples/sec: 4931.37 - lr: 0.006250
2023-04-20 09:58:33,031 epoch 41 - iter 49/72 - loss 0.11574221 - time (sec): 5.50 - samples/se

100%|██████████| 8/8 [00:02<00:00,  3.46it/s]

2023-04-20 09:58:38,165 Evaluating as a multi-label problem: False
2023-04-20 09:58:38,194 DEV : loss 0.14559075236320496 - f1-score (micro avg)  0.6428
2023-04-20 09:58:38,237 BAD EPOCHS (no improvement): 3
2023-04-20 09:58:38,244 ----------------------------------------------------------------------------------------------------





2023-04-20 09:58:39,333 epoch 42 - iter 7/72 - loss 0.09991172 - time (sec): 1.08 - samples/sec: 3405.79 - lr: 0.006250
2023-04-20 09:58:40,468 epoch 42 - iter 14/72 - loss 0.11362214 - time (sec): 2.22 - samples/sec: 3375.09 - lr: 0.006250
2023-04-20 09:58:41,357 epoch 42 - iter 21/72 - loss 0.11263612 - time (sec): 3.11 - samples/sec: 3711.15 - lr: 0.006250
2023-04-20 09:58:42,057 epoch 42 - iter 28/72 - loss 0.11352897 - time (sec): 3.81 - samples/sec: 4045.51 - lr: 0.006250
2023-04-20 09:58:42,836 epoch 42 - iter 35/72 - loss 0.11344835 - time (sec): 4.59 - samples/sec: 4240.29 - lr: 0.006250
2023-04-20 09:58:43,549 epoch 42 - iter 42/72 - loss 0.11393191 - time (sec): 5.30 - samples/sec: 4371.78 - lr: 0.006250
2023-04-20 09:58:44,350 epoch 42 - iter 49/72 - loss 0.11193515 - time (sec): 6.10 - samples/sec: 4423.67 - lr: 0.006250
2023-04-20 09:58:45,270 epoch 42 - iter 56/72 - loss 0.11321496 - time (sec): 7.02 - samples/sec: 4435.91 - lr: 0.006250
2023-04-20 09:58:46,038 epoch 42 

100%|██████████| 8/8 [00:01<00:00,  7.27it/s]

2023-04-20 09:58:48,256 Evaluating as a multi-label problem: False
2023-04-20 09:58:48,273 DEV : loss 0.14581158757209778 - f1-score (micro avg)  0.6388





2023-04-20 09:58:48,305 Epoch    42: reducing learning rate of group 0 to 3.1250e-03.
2023-04-20 09:58:48,308 BAD EPOCHS (no improvement): 4
2023-04-20 09:58:48,314 ----------------------------------------------------------------------------------------------------
2023-04-20 09:58:49,111 epoch 43 - iter 7/72 - loss 0.11530795 - time (sec): 0.79 - samples/sec: 4827.03 - lr: 0.003125
2023-04-20 09:58:50,097 epoch 43 - iter 14/72 - loss 0.11494585 - time (sec): 1.78 - samples/sec: 4552.50 - lr: 0.003125
2023-04-20 09:58:51,079 epoch 43 - iter 21/72 - loss 0.11044547 - time (sec): 2.76 - samples/sec: 4358.56 - lr: 0.003125
2023-04-20 09:58:52,129 epoch 43 - iter 28/72 - loss 0.10934875 - time (sec): 3.81 - samples/sec: 4156.97 - lr: 0.003125
2023-04-20 09:58:53,389 epoch 43 - iter 35/72 - loss 0.10859650 - time (sec): 5.07 - samples/sec: 3918.81 - lr: 0.003125
2023-04-20 09:58:54,434 epoch 43 - iter 42/72 - loss 0.10910052 - time (sec): 6.12 - samples/sec: 3896.22 - lr: 0.003125
2023-04-2

100%|██████████| 8/8 [00:01<00:00,  7.35it/s]

2023-04-20 09:58:59,326 Evaluating as a multi-label problem: False
2023-04-20 09:58:59,343 DEV : loss 0.14562521874904633 - f1-score (micro avg)  0.6417





2023-04-20 09:58:59,373 BAD EPOCHS (no improvement): 1
2023-04-20 09:58:59,377 ----------------------------------------------------------------------------------------------------
2023-04-20 09:59:00,268 epoch 44 - iter 7/72 - loss 0.13049533 - time (sec): 0.89 - samples/sec: 4368.40 - lr: 0.003125
2023-04-20 09:59:01,012 epoch 44 - iter 14/72 - loss 0.11223241 - time (sec): 1.63 - samples/sec: 4731.66 - lr: 0.003125
2023-04-20 09:59:01,858 epoch 44 - iter 21/72 - loss 0.11208479 - time (sec): 2.48 - samples/sec: 4724.37 - lr: 0.003125
2023-04-20 09:59:02,684 epoch 44 - iter 28/72 - loss 0.10681613 - time (sec): 3.31 - samples/sec: 4713.94 - lr: 0.003125
2023-04-20 09:59:03,429 epoch 44 - iter 35/72 - loss 0.11143302 - time (sec): 4.05 - samples/sec: 4829.00 - lr: 0.003125
2023-04-20 09:59:04,291 epoch 44 - iter 42/72 - loss 0.11004933 - time (sec): 4.91 - samples/sec: 4783.56 - lr: 0.003125
2023-04-20 09:59:05,027 epoch 44 - iter 49/72 - loss 0.11232668 - time (sec): 5.65 - samples/se

100%|██████████| 8/8 [00:01<00:00,  4.57it/s]

2023-04-20 09:59:10,170 Evaluating as a multi-label problem: False
2023-04-20 09:59:10,194 DEV : loss 0.1448303759098053 - f1-score (micro avg)  0.6418
2023-04-20 09:59:10,236 BAD EPOCHS (no improvement): 2
2023-04-20 09:59:10,246 ----------------------------------------------------------------------------------------------------





2023-04-20 09:59:11,497 epoch 45 - iter 7/72 - loss 0.12642866 - time (sec): 1.25 - samples/sec: 3127.04 - lr: 0.003125
2023-04-20 09:59:12,263 epoch 45 - iter 14/72 - loss 0.11195318 - time (sec): 2.01 - samples/sec: 3820.80 - lr: 0.003125
2023-04-20 09:59:13,094 epoch 45 - iter 21/72 - loss 0.10986489 - time (sec): 2.84 - samples/sec: 4157.02 - lr: 0.003125
2023-04-20 09:59:13,950 epoch 45 - iter 28/72 - loss 0.10947342 - time (sec): 3.70 - samples/sec: 4291.36 - lr: 0.003125
2023-04-20 09:59:14,712 epoch 45 - iter 35/72 - loss 0.10985607 - time (sec): 4.46 - samples/sec: 4444.91 - lr: 0.003125
2023-04-20 09:59:15,500 epoch 45 - iter 42/72 - loss 0.11034831 - time (sec): 5.25 - samples/sec: 4517.63 - lr: 0.003125
2023-04-20 09:59:16,260 epoch 45 - iter 49/72 - loss 0.11011200 - time (sec): 6.01 - samples/sec: 4577.42 - lr: 0.003125
2023-04-20 09:59:17,069 epoch 45 - iter 56/72 - loss 0.11100667 - time (sec): 6.82 - samples/sec: 4622.18 - lr: 0.003125
2023-04-20 09:59:17,863 epoch 45 

100%|██████████| 8/8 [00:01<00:00,  7.21it/s]

2023-04-20 09:59:19,844 Evaluating as a multi-label problem: False
2023-04-20 09:59:19,861 DEV : loss 0.14461080729961395 - f1-score (micro avg)  0.6514





2023-04-20 09:59:19,890 BAD EPOCHS (no improvement): 0
2023-04-20 09:59:19,897 saving best model
2023-04-20 09:59:21,793 ----------------------------------------------------------------------------------------------------
2023-04-20 09:59:23,008 epoch 46 - iter 7/72 - loss 0.10341325 - time (sec): 1.17 - samples/sec: 3197.51 - lr: 0.003125
2023-04-20 09:59:24,219 epoch 46 - iter 14/72 - loss 0.10559280 - time (sec): 2.38 - samples/sec: 3253.87 - lr: 0.003125
2023-04-20 09:59:25,467 epoch 46 - iter 21/72 - loss 0.11169772 - time (sec): 3.63 - samples/sec: 3215.52 - lr: 0.003125
2023-04-20 09:59:27,157 epoch 46 - iter 28/72 - loss 0.11488671 - time (sec): 5.32 - samples/sec: 2926.94 - lr: 0.003125
2023-04-20 09:59:27,916 epoch 46 - iter 35/72 - loss 0.11325211 - time (sec): 6.08 - samples/sec: 3185.91 - lr: 0.003125
2023-04-20 09:59:28,700 epoch 46 - iter 42/72 - loss 0.10981070 - time (sec): 6.86 - samples/sec: 3382.44 - lr: 0.003125
2023-04-20 09:59:29,483 epoch 46 - iter 49/72 - loss 

100%|██████████| 8/8 [00:01<00:00,  5.81it/s]

2023-04-20 09:59:34,604 Evaluating as a multi-label problem: False
2023-04-20 09:59:34,620 DEV : loss 0.14512552320957184 - f1-score (micro avg)  0.639





2023-04-20 09:59:34,650 BAD EPOCHS (no improvement): 1
2023-04-20 09:59:34,655 ----------------------------------------------------------------------------------------------------
2023-04-20 09:59:35,588 epoch 47 - iter 7/72 - loss 0.10738431 - time (sec): 0.93 - samples/sec: 4128.90 - lr: 0.003125
2023-04-20 09:59:36,414 epoch 47 - iter 14/72 - loss 0.10487083 - time (sec): 1.75 - samples/sec: 4498.16 - lr: 0.003125
2023-04-20 09:59:37,525 epoch 47 - iter 21/72 - loss 0.11002578 - time (sec): 2.86 - samples/sec: 4092.01 - lr: 0.003125
2023-04-20 09:59:38,458 epoch 47 - iter 28/72 - loss 0.10805745 - time (sec): 3.79 - samples/sec: 4076.89 - lr: 0.003125
2023-04-20 09:59:39,571 epoch 47 - iter 35/72 - loss 0.10884207 - time (sec): 4.91 - samples/sec: 4004.36 - lr: 0.003125
2023-04-20 09:59:40,746 epoch 47 - iter 42/72 - loss 0.11217254 - time (sec): 6.08 - samples/sec: 3879.75 - lr: 0.003125
2023-04-20 09:59:41,992 epoch 47 - iter 49/72 - loss 0.11086857 - time (sec): 7.33 - samples/se

100%|██████████| 8/8 [00:01<00:00,  7.41it/s]

2023-04-20 09:59:45,629 Evaluating as a multi-label problem: False
2023-04-20 09:59:45,648 DEV : loss 0.14446893334388733 - f1-score (micro avg)  0.639





2023-04-20 09:59:45,676 BAD EPOCHS (no improvement): 2
2023-04-20 09:59:45,682 ----------------------------------------------------------------------------------------------------
2023-04-20 09:59:46,573 epoch 48 - iter 7/72 - loss 0.11180843 - time (sec): 0.89 - samples/sec: 4623.61 - lr: 0.003125
2023-04-20 09:59:47,331 epoch 48 - iter 14/72 - loss 0.11013330 - time (sec): 1.65 - samples/sec: 4788.42 - lr: 0.003125
2023-04-20 09:59:48,171 epoch 48 - iter 21/72 - loss 0.11152624 - time (sec): 2.49 - samples/sec: 4819.48 - lr: 0.003125
2023-04-20 09:59:48,884 epoch 48 - iter 28/72 - loss 0.10775173 - time (sec): 3.20 - samples/sec: 4907.92 - lr: 0.003125
2023-04-20 09:59:49,605 epoch 48 - iter 35/72 - loss 0.11411977 - time (sec): 3.92 - samples/sec: 4989.12 - lr: 0.003125
2023-04-20 09:59:50,378 epoch 48 - iter 42/72 - loss 0.11511920 - time (sec): 4.69 - samples/sec: 4987.92 - lr: 0.003125
2023-04-20 09:59:51,240 epoch 48 - iter 49/72 - loss 0.11246683 - time (sec): 5.55 - samples/se

100%|██████████| 8/8 [00:01<00:00,  4.59it/s]

2023-04-20 09:59:56,227 Evaluating as a multi-label problem: False
2023-04-20 09:59:56,251 DEV : loss 0.1444394588470459 - f1-score (micro avg)  0.6466
2023-04-20 09:59:56,295 BAD EPOCHS (no improvement): 3
2023-04-20 09:59:56,305 ----------------------------------------------------------------------------------------------------





2023-04-20 09:59:57,490 epoch 49 - iter 7/72 - loss 0.11280226 - time (sec): 1.18 - samples/sec: 3333.54 - lr: 0.003125
2023-04-20 09:59:58,387 epoch 49 - iter 14/72 - loss 0.10915454 - time (sec): 2.08 - samples/sec: 3774.56 - lr: 0.003125
2023-04-20 09:59:59,100 epoch 49 - iter 21/72 - loss 0.11069332 - time (sec): 2.79 - samples/sec: 4179.79 - lr: 0.003125
2023-04-20 09:59:59,877 epoch 49 - iter 28/72 - loss 0.10919326 - time (sec): 3.57 - samples/sec: 4335.98 - lr: 0.003125
2023-04-20 10:00:00,644 epoch 49 - iter 35/72 - loss 0.10692086 - time (sec): 4.34 - samples/sec: 4483.66 - lr: 0.003125
2023-04-20 10:00:01,348 epoch 49 - iter 42/72 - loss 0.10699659 - time (sec): 5.04 - samples/sec: 4610.14 - lr: 0.003125
2023-04-20 10:00:02,119 epoch 49 - iter 49/72 - loss 0.10933243 - time (sec): 5.81 - samples/sec: 4685.88 - lr: 0.003125
2023-04-20 10:00:02,904 epoch 49 - iter 56/72 - loss 0.11179219 - time (sec): 6.60 - samples/sec: 4727.91 - lr: 0.003125
2023-04-20 10:00:03,687 epoch 49 

100%|██████████| 8/8 [00:01<00:00,  7.25it/s]

2023-04-20 10:00:05,877 Evaluating as a multi-label problem: False
2023-04-20 10:00:05,895 DEV : loss 0.14534521102905273 - f1-score (micro avg)  0.6332





2023-04-20 10:00:05,923 Epoch    49: reducing learning rate of group 0 to 1.5625e-03.
2023-04-20 10:00:05,926 BAD EPOCHS (no improvement): 4
2023-04-20 10:00:05,933 ----------------------------------------------------------------------------------------------------
2023-04-20 10:00:06,784 epoch 50 - iter 7/72 - loss 0.11861985 - time (sec): 0.85 - samples/sec: 4880.47 - lr: 0.001563
2023-04-20 10:00:07,525 epoch 50 - iter 14/72 - loss 0.12695109 - time (sec): 1.59 - samples/sec: 4980.74 - lr: 0.001563
2023-04-20 10:00:08,665 epoch 50 - iter 21/72 - loss 0.12151844 - time (sec): 2.73 - samples/sec: 4365.34 - lr: 0.001563
2023-04-20 10:00:09,792 epoch 50 - iter 28/72 - loss 0.11902736 - time (sec): 3.85 - samples/sec: 4159.28 - lr: 0.001563
2023-04-20 10:00:10,811 epoch 50 - iter 35/72 - loss 0.11488072 - time (sec): 4.87 - samples/sec: 4033.46 - lr: 0.001563
2023-04-20 10:00:12,019 epoch 50 - iter 42/72 - loss 0.11380928 - time (sec): 6.08 - samples/sec: 3861.76 - lr: 0.001563
2023-04-2

100%|██████████| 8/8 [00:01<00:00,  5.15it/s]

2023-04-20 10:00:17,308 Evaluating as a multi-label problem: False
2023-04-20 10:00:17,326 DEV : loss 0.1445985734462738 - f1-score (micro avg)  0.6429





2023-04-20 10:00:17,358 BAD EPOCHS (no improvement): 1
2023-04-20 10:00:17,366 ----------------------------------------------------------------------------------------------------
2023-04-20 10:00:18,264 epoch 51 - iter 7/72 - loss 0.09994509 - time (sec): 0.89 - samples/sec: 4430.35 - lr: 0.001563
2023-04-20 10:00:19,100 epoch 51 - iter 14/72 - loss 0.10513739 - time (sec): 1.73 - samples/sec: 4497.67 - lr: 0.001563
2023-04-20 10:00:19,869 epoch 51 - iter 21/72 - loss 0.11077243 - time (sec): 2.50 - samples/sec: 4585.15 - lr: 0.001563
2023-04-20 10:00:20,642 epoch 51 - iter 28/72 - loss 0.11293561 - time (sec): 3.27 - samples/sec: 4718.42 - lr: 0.001563
2023-04-20 10:00:21,423 epoch 51 - iter 35/72 - loss 0.11047168 - time (sec): 4.05 - samples/sec: 4791.44 - lr: 0.001563
2023-04-20 10:00:22,158 epoch 51 - iter 42/72 - loss 0.11035815 - time (sec): 4.79 - samples/sec: 4862.23 - lr: 0.001563
2023-04-20 10:00:22,875 epoch 51 - iter 49/72 - loss 0.11075365 - time (sec): 5.51 - samples/se

100%|██████████| 8/8 [00:01<00:00,  4.89it/s]

2023-04-20 10:00:28,246 Evaluating as a multi-label problem: False





2023-04-20 10:00:28,269 DEV : loss 0.1440802365541458 - f1-score (micro avg)  0.6448
2023-04-20 10:00:28,295 BAD EPOCHS (no improvement): 2
2023-04-20 10:00:28,300 ----------------------------------------------------------------------------------------------------
2023-04-20 10:00:29,210 epoch 52 - iter 7/72 - loss 0.10311099 - time (sec): 0.91 - samples/sec: 4280.21 - lr: 0.001563
2023-04-20 10:00:30,087 epoch 52 - iter 14/72 - loss 0.10637102 - time (sec): 1.78 - samples/sec: 4416.61 - lr: 0.001563
2023-04-20 10:00:30,822 epoch 52 - iter 21/72 - loss 0.11159974 - time (sec): 2.52 - samples/sec: 4676.18 - lr: 0.001563
2023-04-20 10:00:31,510 epoch 52 - iter 28/72 - loss 0.10897318 - time (sec): 3.21 - samples/sec: 4798.85 - lr: 0.001563
2023-04-20 10:00:32,241 epoch 52 - iter 35/72 - loss 0.10699509 - time (sec): 3.94 - samples/sec: 4893.09 - lr: 0.001563
2023-04-20 10:00:32,965 epoch 52 - iter 42/72 - loss 0.11228069 - time (sec): 4.66 - samples/sec: 4985.06 - lr: 0.001563
2023-04-20

100%|██████████| 8/8 [00:01<00:00,  7.34it/s]

2023-04-20 10:00:37,388 Evaluating as a multi-label problem: False
2023-04-20 10:00:37,406 DEV : loss 0.14436876773834229 - f1-score (micro avg)  0.6447





2023-04-20 10:00:37,435 BAD EPOCHS (no improvement): 3
2023-04-20 10:00:37,443 ----------------------------------------------------------------------------------------------------
2023-04-20 10:00:38,354 epoch 53 - iter 7/72 - loss 0.12418106 - time (sec): 0.90 - samples/sec: 4484.21 - lr: 0.001563
2023-04-20 10:00:39,317 epoch 53 - iter 14/72 - loss 0.11095730 - time (sec): 1.87 - samples/sec: 4180.44 - lr: 0.001563
2023-04-20 10:00:40,579 epoch 53 - iter 21/72 - loss 0.10864436 - time (sec): 3.13 - samples/sec: 3758.65 - lr: 0.001563
2023-04-20 10:00:41,616 epoch 53 - iter 28/72 - loss 0.11478586 - time (sec): 4.17 - samples/sec: 3763.65 - lr: 0.001563
2023-04-20 10:00:42,757 epoch 53 - iter 35/72 - loss 0.11217203 - time (sec): 5.31 - samples/sec: 3680.51 - lr: 0.001563
2023-04-20 10:00:43,844 epoch 53 - iter 42/72 - loss 0.11298562 - time (sec): 6.39 - samples/sec: 3690.60 - lr: 0.001563
2023-04-20 10:00:44,694 epoch 53 - iter 49/72 - loss 0.11230373 - time (sec): 7.24 - samples/se

100%|██████████| 8/8 [00:01<00:00,  7.37it/s]

2023-04-20 10:00:48,272 Evaluating as a multi-label problem: False
2023-04-20 10:00:48,290 DEV : loss 0.14440475404262543 - f1-score (micro avg)  0.6408





2023-04-20 10:00:48,318 Epoch    53: reducing learning rate of group 0 to 7.8125e-04.
2023-04-20 10:00:48,321 BAD EPOCHS (no improvement): 4
2023-04-20 10:00:48,327 ----------------------------------------------------------------------------------------------------
2023-04-20 10:00:49,218 epoch 54 - iter 7/72 - loss 0.08525839 - time (sec): 0.88 - samples/sec: 4528.22 - lr: 0.000781
2023-04-20 10:00:50,030 epoch 54 - iter 14/72 - loss 0.09582834 - time (sec): 1.70 - samples/sec: 4646.72 - lr: 0.000781
2023-04-20 10:00:50,720 epoch 54 - iter 21/72 - loss 0.09846044 - time (sec): 2.39 - samples/sec: 4938.16 - lr: 0.000781
2023-04-20 10:00:51,569 epoch 54 - iter 28/72 - loss 0.10434703 - time (sec): 3.24 - samples/sec: 4926.64 - lr: 0.000781
2023-04-20 10:00:52,304 epoch 54 - iter 35/72 - loss 0.10784503 - time (sec): 3.97 - samples/sec: 4984.08 - lr: 0.000781
2023-04-20 10:00:53,077 epoch 54 - iter 42/72 - loss 0.10654033 - time (sec): 4.74 - samples/sec: 5006.99 - lr: 0.000781
2023-04-2

100%|██████████| 8/8 [00:01<00:00,  4.94it/s]

2023-04-20 10:00:59,078 Evaluating as a multi-label problem: False





2023-04-20 10:00:59,100 DEV : loss 0.14434406161308289 - f1-score (micro avg)  0.6476
2023-04-20 10:00:59,127 BAD EPOCHS (no improvement): 1
2023-04-20 10:00:59,135 ----------------------------------------------------------------------------------------------------
2023-04-20 10:00:59,936 epoch 55 - iter 7/72 - loss 0.11220347 - time (sec): 0.79 - samples/sec: 4883.74 - lr: 0.000781
2023-04-20 10:01:00,756 epoch 55 - iter 14/72 - loss 0.11405137 - time (sec): 1.61 - samples/sec: 4806.18 - lr: 0.000781
2023-04-20 10:01:01,574 epoch 55 - iter 21/72 - loss 0.11774407 - time (sec): 2.43 - samples/sec: 4806.95 - lr: 0.000781
2023-04-20 10:01:02,500 epoch 55 - iter 28/72 - loss 0.11455765 - time (sec): 3.36 - samples/sec: 4696.69 - lr: 0.000781
2023-04-20 10:01:03,182 epoch 55 - iter 35/72 - loss 0.11119166 - time (sec): 4.04 - samples/sec: 4826.57 - lr: 0.000781
2023-04-20 10:01:03,956 epoch 55 - iter 42/72 - loss 0.11157543 - time (sec): 4.81 - samples/sec: 4892.27 - lr: 0.000781
2023-04-2

100%|██████████| 8/8 [00:01<00:00,  7.39it/s]

2023-04-20 10:01:08,882 Evaluating as a multi-label problem: False





2023-04-20 10:01:08,908 DEV : loss 0.1444486677646637 - f1-score (micro avg)  0.6524
2023-04-20 10:01:08,949 BAD EPOCHS (no improvement): 0
2023-04-20 10:01:08,958 saving best model
2023-04-20 10:01:11,510 ----------------------------------------------------------------------------------------------------
2023-04-20 10:01:12,804 epoch 56 - iter 7/72 - loss 0.11667714 - time (sec): 1.25 - samples/sec: 3325.35 - lr: 0.000781
2023-04-20 10:01:14,136 epoch 56 - iter 14/72 - loss 0.11077545 - time (sec): 2.58 - samples/sec: 3136.20 - lr: 0.000781
2023-04-20 10:01:15,045 epoch 56 - iter 21/72 - loss 0.11015466 - time (sec): 3.49 - samples/sec: 3422.87 - lr: 0.000781
2023-04-20 10:01:15,806 epoch 56 - iter 28/72 - loss 0.10896119 - time (sec): 4.25 - samples/sec: 3677.21 - lr: 0.000781
2023-04-20 10:01:16,560 epoch 56 - iter 35/72 - loss 0.11268445 - time (sec): 5.01 - samples/sec: 3890.84 - lr: 0.000781
2023-04-20 10:01:17,617 epoch 56 - iter 42/72 - loss 0.10944933 - time (sec): 6.06 - samp

100%|██████████| 8/8 [00:01<00:00,  7.05it/s]

2023-04-20 10:01:23,199 Evaluating as a multi-label problem: False





2023-04-20 10:01:23,220 DEV : loss 0.14441336691379547 - f1-score (micro avg)  0.6466
2023-04-20 10:01:23,247 BAD EPOCHS (no improvement): 1
2023-04-20 10:01:23,256 ----------------------------------------------------------------------------------------------------
2023-04-20 10:01:24,110 epoch 57 - iter 7/72 - loss 0.10082636 - time (sec): 0.85 - samples/sec: 4802.26 - lr: 0.000781
2023-04-20 10:01:25,130 epoch 57 - iter 14/72 - loss 0.10581923 - time (sec): 1.87 - samples/sec: 4342.12 - lr: 0.000781
2023-04-20 10:01:26,233 epoch 57 - iter 21/72 - loss 0.10922469 - time (sec): 2.97 - samples/sec: 4052.77 - lr: 0.000781
2023-04-20 10:01:27,570 epoch 57 - iter 28/72 - loss 0.10912887 - time (sec): 4.31 - samples/sec: 3762.28 - lr: 0.000781
2023-04-20 10:01:28,637 epoch 57 - iter 35/72 - loss 0.11107627 - time (sec): 5.38 - samples/sec: 3742.37 - lr: 0.000781
2023-04-20 10:01:29,670 epoch 57 - iter 42/72 - loss 0.11177171 - time (sec): 6.41 - samples/sec: 3707.31 - lr: 0.000781
2023-04-2

100%|██████████| 8/8 [00:01<00:00,  7.43it/s]

2023-04-20 10:01:34,205 Evaluating as a multi-label problem: False
2023-04-20 10:01:34,222 DEV : loss 0.1443730741739273 - f1-score (micro avg)  0.6448





2023-04-20 10:01:34,251 BAD EPOCHS (no improvement): 2
2023-04-20 10:01:34,265 ----------------------------------------------------------------------------------------------------
2023-04-20 10:01:35,168 epoch 58 - iter 7/72 - loss 0.10008979 - time (sec): 0.90 - samples/sec: 4426.76 - lr: 0.000781
2023-04-20 10:01:35,943 epoch 58 - iter 14/72 - loss 0.09966550 - time (sec): 1.67 - samples/sec: 4620.36 - lr: 0.000781
2023-04-20 10:01:36,700 epoch 58 - iter 21/72 - loss 0.10485115 - time (sec): 2.43 - samples/sec: 4769.22 - lr: 0.000781
2023-04-20 10:01:37,430 epoch 58 - iter 28/72 - loss 0.10644323 - time (sec): 3.16 - samples/sec: 4905.08 - lr: 0.000781
2023-04-20 10:01:38,254 epoch 58 - iter 35/72 - loss 0.10477515 - time (sec): 3.98 - samples/sec: 4950.24 - lr: 0.000781
2023-04-20 10:01:38,977 epoch 58 - iter 42/72 - loss 0.10674197 - time (sec): 4.71 - samples/sec: 4984.42 - lr: 0.000781
2023-04-20 10:01:39,770 epoch 58 - iter 49/72 - loss 0.10505735 - time (sec): 5.50 - samples/se

100%|██████████| 8/8 [00:01<00:00,  4.60it/s]

2023-04-20 10:01:44,883 Evaluating as a multi-label problem: False
2023-04-20 10:01:44,913 DEV : loss 0.14431564509868622 - f1-score (micro avg)  0.6523
2023-04-20 10:01:44,959 BAD EPOCHS (no improvement): 3
2023-04-20 10:01:44,969 ----------------------------------------------------------------------------------------------------





2023-04-20 10:01:45,935 epoch 59 - iter 7/72 - loss 0.10753448 - time (sec): 0.96 - samples/sec: 4109.95 - lr: 0.000781
2023-04-20 10:01:46,812 epoch 59 - iter 14/72 - loss 0.10497238 - time (sec): 1.84 - samples/sec: 4397.08 - lr: 0.000781
2023-04-20 10:01:47,583 epoch 59 - iter 21/72 - loss 0.10593144 - time (sec): 2.61 - samples/sec: 4604.26 - lr: 0.000781
2023-04-20 10:01:48,498 epoch 59 - iter 28/72 - loss 0.10600556 - time (sec): 3.53 - samples/sec: 4616.85 - lr: 0.000781
2023-04-20 10:01:49,257 epoch 59 - iter 35/72 - loss 0.11115352 - time (sec): 4.29 - samples/sec: 4716.32 - lr: 0.000781
2023-04-20 10:01:49,963 epoch 59 - iter 42/72 - loss 0.11393480 - time (sec): 4.99 - samples/sec: 4818.36 - lr: 0.000781
2023-04-20 10:01:50,667 epoch 59 - iter 49/72 - loss 0.11333719 - time (sec): 5.70 - samples/sec: 4901.78 - lr: 0.000781
2023-04-20 10:01:51,478 epoch 59 - iter 56/72 - loss 0.11210640 - time (sec): 6.51 - samples/sec: 4893.30 - lr: 0.000781
2023-04-20 10:01:52,189 epoch 59 

100%|██████████| 8/8 [00:01<00:00,  7.44it/s]

2023-04-20 10:01:54,236 Evaluating as a multi-label problem: False
2023-04-20 10:01:54,254 DEV : loss 0.14426031708717346 - f1-score (micro avg)  0.6466





2023-04-20 10:01:54,288 Epoch    59: reducing learning rate of group 0 to 3.9063e-04.
2023-04-20 10:01:54,293 BAD EPOCHS (no improvement): 4
2023-04-20 10:01:54,300 ----------------------------------------------------------------------------------------------------
2023-04-20 10:01:55,726 epoch 60 - iter 7/72 - loss 0.10869827 - time (sec): 1.42 - samples/sec: 2825.62 - lr: 0.000391
2023-04-20 10:01:56,858 epoch 60 - iter 14/72 - loss 0.11162779 - time (sec): 2.56 - samples/sec: 3132.41 - lr: 0.000391
2023-04-20 10:01:57,945 epoch 60 - iter 21/72 - loss 0.11635657 - time (sec): 3.64 - samples/sec: 3322.06 - lr: 0.000391
2023-04-20 10:01:59,044 epoch 60 - iter 28/72 - loss 0.11279027 - time (sec): 4.74 - samples/sec: 3317.26 - lr: 0.000391
2023-04-20 10:02:00,220 epoch 60 - iter 35/72 - loss 0.11222094 - time (sec): 5.92 - samples/sec: 3355.83 - lr: 0.000391
2023-04-20 10:02:01,076 epoch 60 - iter 42/72 - loss 0.11069514 - time (sec): 6.77 - samples/sec: 3479.42 - lr: 0.000391
2023-04-2

100%|██████████| 8/8 [00:01<00:00,  7.17it/s]

2023-04-20 10:02:05,729 Evaluating as a multi-label problem: False
2023-04-20 10:02:05,746 DEV : loss 0.1441991627216339 - f1-score (micro avg)  0.6466





2023-04-20 10:02:05,779 BAD EPOCHS (no improvement): 1
2023-04-20 10:02:05,785 ----------------------------------------------------------------------------------------------------
2023-04-20 10:02:06,581 epoch 61 - iter 7/72 - loss 0.10235341 - time (sec): 0.79 - samples/sec: 4819.37 - lr: 0.000391
2023-04-20 10:02:07,465 epoch 61 - iter 14/72 - loss 0.10481510 - time (sec): 1.67 - samples/sec: 4803.18 - lr: 0.000391
2023-04-20 10:02:08,474 epoch 61 - iter 21/72 - loss 0.10388460 - time (sec): 2.68 - samples/sec: 4567.20 - lr: 0.000391
2023-04-20 10:02:09,186 epoch 61 - iter 28/72 - loss 0.10628804 - time (sec): 3.40 - samples/sec: 4702.14 - lr: 0.000391
2023-04-20 10:02:09,876 epoch 61 - iter 35/72 - loss 0.11255590 - time (sec): 4.09 - samples/sec: 4827.71 - lr: 0.000391
2023-04-20 10:02:10,599 epoch 61 - iter 42/72 - loss 0.11292393 - time (sec): 4.81 - samples/sec: 4902.31 - lr: 0.000391
2023-04-20 10:02:11,652 epoch 61 - iter 49/72 - loss 0.11185287 - time (sec): 5.86 - samples/se

100%|██████████| 8/8 [00:01<00:00,  5.68it/s]

2023-04-20 10:02:16,681 Evaluating as a multi-label problem: False





2023-04-20 10:02:16,699 DEV : loss 0.1441868245601654 - f1-score (micro avg)  0.6486
2023-04-20 10:02:16,731 BAD EPOCHS (no improvement): 2
2023-04-20 10:02:16,736 ----------------------------------------------------------------------------------------------------
2023-04-20 10:02:17,580 epoch 62 - iter 7/72 - loss 0.10982832 - time (sec): 0.84 - samples/sec: 4544.88 - lr: 0.000391
2023-04-20 10:02:18,373 epoch 62 - iter 14/72 - loss 0.10449371 - time (sec): 1.64 - samples/sec: 4705.43 - lr: 0.000391
2023-04-20 10:02:19,228 epoch 62 - iter 21/72 - loss 0.10261729 - time (sec): 2.49 - samples/sec: 4680.47 - lr: 0.000391
2023-04-20 10:02:20,051 epoch 62 - iter 28/72 - loss 0.10380636 - time (sec): 3.31 - samples/sec: 4655.76 - lr: 0.000391
2023-04-20 10:02:20,826 epoch 62 - iter 35/72 - loss 0.10774404 - time (sec): 4.09 - samples/sec: 4755.99 - lr: 0.000391
2023-04-20 10:02:21,605 epoch 62 - iter 42/72 - loss 0.11271919 - time (sec): 4.87 - samples/sec: 4794.89 - lr: 0.000391
2023-04-20

100%|██████████| 8/8 [00:01<00:00,  6.77it/s]

2023-04-20 10:02:26,173 Evaluating as a multi-label problem: False
2023-04-20 10:02:26,198 DEV : loss 0.14420440793037415 - f1-score (micro avg)  0.6504
2023-04-20 10:02:26,241 BAD EPOCHS (no improvement): 3
2023-04-20 10:02:26,250 ----------------------------------------------------------------------------------------------------





2023-04-20 10:02:27,356 epoch 63 - iter 7/72 - loss 0.09740705 - time (sec): 1.10 - samples/sec: 3630.55 - lr: 0.000391
2023-04-20 10:02:28,506 epoch 63 - iter 14/72 - loss 0.10183047 - time (sec): 2.25 - samples/sec: 3582.55 - lr: 0.000391
2023-04-20 10:02:29,814 epoch 63 - iter 21/72 - loss 0.10341785 - time (sec): 3.56 - samples/sec: 3392.72 - lr: 0.000391
2023-04-20 10:02:30,986 epoch 63 - iter 28/72 - loss 0.10711968 - time (sec): 4.73 - samples/sec: 3370.13 - lr: 0.000391
2023-04-20 10:02:31,845 epoch 63 - iter 35/72 - loss 0.10739880 - time (sec): 5.59 - samples/sec: 3545.50 - lr: 0.000391
2023-04-20 10:02:32,623 epoch 63 - iter 42/72 - loss 0.10813806 - time (sec): 6.37 - samples/sec: 3715.08 - lr: 0.000391
2023-04-20 10:02:33,413 epoch 63 - iter 49/72 - loss 0.11018828 - time (sec): 7.16 - samples/sec: 3857.56 - lr: 0.000391
2023-04-20 10:02:34,261 epoch 63 - iter 56/72 - loss 0.10872681 - time (sec): 8.01 - samples/sec: 3972.11 - lr: 0.000391
2023-04-20 10:02:35,052 epoch 63 

100%|██████████| 8/8 [00:01<00:00,  7.35it/s]

2023-04-20 10:02:37,077 Evaluating as a multi-label problem: False
2023-04-20 10:02:37,095 DEV : loss 0.1441289335489273 - f1-score (micro avg)  0.6486





2023-04-20 10:02:37,126 Epoch    63: reducing learning rate of group 0 to 1.9531e-04.
2023-04-20 10:02:37,130 BAD EPOCHS (no improvement): 4
2023-04-20 10:02:37,139 ----------------------------------------------------------------------------------------------------
2023-04-20 10:02:37,954 epoch 64 - iter 7/72 - loss 0.11250082 - time (sec): 0.81 - samples/sec: 4966.45 - lr: 0.000195
2023-04-20 10:02:38,836 epoch 64 - iter 14/72 - loss 0.10777426 - time (sec): 1.70 - samples/sec: 4795.56 - lr: 0.000195
2023-04-20 10:02:39,611 epoch 64 - iter 21/72 - loss 0.10770865 - time (sec): 2.47 - samples/sec: 4840.31 - lr: 0.000195
2023-04-20 10:02:40,454 epoch 64 - iter 28/72 - loss 0.10847798 - time (sec): 3.31 - samples/sec: 4784.67 - lr: 0.000195
2023-04-20 10:02:41,193 epoch 64 - iter 35/72 - loss 0.11172606 - time (sec): 4.05 - samples/sec: 4858.95 - lr: 0.000195
2023-04-20 10:02:42,213 epoch 64 - iter 42/72 - loss 0.11309535 - time (sec): 5.07 - samples/sec: 4668.07 - lr: 0.000195
2023-04-2

100%|██████████| 8/8 [00:01<00:00,  6.75it/s]

2023-04-20 10:02:48,112 Evaluating as a multi-label problem: False





2023-04-20 10:02:48,136 DEV : loss 0.14414629340171814 - f1-score (micro avg)  0.6486
2023-04-20 10:02:48,166 BAD EPOCHS (no improvement): 1
2023-04-20 10:02:48,173 ----------------------------------------------------------------------------------------------------
2023-04-20 10:02:49,584 epoch 65 - iter 7/72 - loss 0.09426811 - time (sec): 1.40 - samples/sec: 2815.56 - lr: 0.000195
2023-04-20 10:02:50,469 epoch 65 - iter 14/72 - loss 0.09870894 - time (sec): 2.29 - samples/sec: 3554.71 - lr: 0.000195
2023-04-20 10:02:51,164 epoch 65 - iter 21/72 - loss 0.10023272 - time (sec): 2.98 - samples/sec: 3988.23 - lr: 0.000195
2023-04-20 10:02:51,859 epoch 65 - iter 28/72 - loss 0.10595373 - time (sec): 3.68 - samples/sec: 4302.69 - lr: 0.000195
2023-04-20 10:02:52,660 epoch 65 - iter 35/72 - loss 0.10724285 - time (sec): 4.48 - samples/sec: 4417.82 - lr: 0.000195
2023-04-20 10:02:53,605 epoch 65 - iter 42/72 - loss 0.10875348 - time (sec): 5.42 - samples/sec: 4423.42 - lr: 0.000195
2023-04-2

100%|██████████| 8/8 [00:01<00:00,  4.85it/s]

2023-04-20 10:02:58,536 Evaluating as a multi-label problem: False
2023-04-20 10:02:58,558 DEV : loss 0.1441836804151535 - f1-score (micro avg)  0.6495
2023-04-20 10:02:58,605 BAD EPOCHS (no improvement): 2
2023-04-20 10:02:58,615 ----------------------------------------------------------------------------------------------------





2023-04-20 10:02:59,713 epoch 66 - iter 7/72 - loss 0.10922172 - time (sec): 1.09 - samples/sec: 3562.20 - lr: 0.000195
2023-04-20 10:03:01,011 epoch 66 - iter 14/72 - loss 0.10676952 - time (sec): 2.39 - samples/sec: 3322.45 - lr: 0.000195
2023-04-20 10:03:02,269 epoch 66 - iter 21/72 - loss 0.11063922 - time (sec): 3.65 - samples/sec: 3303.04 - lr: 0.000195
2023-04-20 10:03:03,155 epoch 66 - iter 28/72 - loss 0.11007311 - time (sec): 4.54 - samples/sec: 3508.40 - lr: 0.000195
2023-04-20 10:03:03,874 epoch 66 - iter 35/72 - loss 0.10919210 - time (sec): 5.26 - samples/sec: 3750.13 - lr: 0.000195
2023-04-20 10:03:04,647 epoch 66 - iter 42/72 - loss 0.11181145 - time (sec): 6.03 - samples/sec: 3908.84 - lr: 0.000195
2023-04-20 10:03:05,368 epoch 66 - iter 49/72 - loss 0.11044929 - time (sec): 6.75 - samples/sec: 4061.67 - lr: 0.000195
2023-04-20 10:03:06,253 epoch 66 - iter 56/72 - loss 0.11086111 - time (sec): 7.63 - samples/sec: 4111.25 - lr: 0.000195
2023-04-20 10:03:07,052 epoch 66 

100%|██████████| 8/8 [00:01<00:00,  7.38it/s]

2023-04-20 10:03:09,054 Evaluating as a multi-label problem: False





2023-04-20 10:03:09,075 DEV : loss 0.14415979385375977 - f1-score (micro avg)  0.6486
2023-04-20 10:03:09,100 BAD EPOCHS (no improvement): 3
2023-04-20 10:03:09,104 ----------------------------------------------------------------------------------------------------
2023-04-20 10:03:09,957 epoch 67 - iter 7/72 - loss 0.11492984 - time (sec): 0.85 - samples/sec: 4518.50 - lr: 0.000195
2023-04-20 10:03:10,664 epoch 67 - iter 14/72 - loss 0.11257119 - time (sec): 1.55 - samples/sec: 4978.77 - lr: 0.000195
2023-04-20 10:03:11,427 epoch 67 - iter 21/72 - loss 0.10874200 - time (sec): 2.32 - samples/sec: 5073.10 - lr: 0.000195
2023-04-20 10:03:12,125 epoch 67 - iter 28/72 - loss 0.11555468 - time (sec): 3.02 - samples/sec: 5135.35 - lr: 0.000195
2023-04-20 10:03:13,162 epoch 67 - iter 35/72 - loss 0.11285344 - time (sec): 4.05 - samples/sec: 4777.95 - lr: 0.000195
2023-04-20 10:03:14,221 epoch 67 - iter 42/72 - loss 0.11122024 - time (sec): 5.11 - samples/sec: 4592.74 - lr: 0.000195
2023-04-2

100%|██████████| 8/8 [00:01<00:00,  7.11it/s]

2023-04-20 10:03:19,972 Evaluating as a multi-label problem: False
2023-04-20 10:03:19,990 DEV : loss 0.1441495418548584 - f1-score (micro avg)  0.6486





2023-04-20 10:03:20,022 Epoch    67: reducing learning rate of group 0 to 9.7656e-05.
2023-04-20 10:03:20,023 BAD EPOCHS (no improvement): 4
2023-04-20 10:03:20,031 ----------------------------------------------------------------------------------------------------
2023-04-20 10:03:20,034 ----------------------------------------------------------------------------------------------------
2023-04-20 10:03:20,036 learning rate too small - quitting training!
2023-04-20 10:03:20,041 ----------------------------------------------------------------------------------------------------
2023-04-20 10:03:21,910 ----------------------------------------------------------------------------------------------------
2023-04-20 10:03:23,755 SequenceTagger predicts: Dictionary with 19 tags: O, S-organization entity, B-organization entity, E-organization entity, I-organization entity, S-location entity, B-location entity, E-location entity, I-location entity, S-person entity, B-person entity, E-person en

100%|██████████| 27/27 [00:12<00:00,  2.22it/s]

2023-04-20 10:03:36,319 Evaluating as a multi-label problem: False





2023-04-20 10:03:36,345 0.6487	0.5951	0.6207	0.5353
2023-04-20 10:03:36,349 
Results:
- F-score (micro) 0.6207
- F-score (macro) 0.4756
- Accuracy 0.5353

By class:
                     precision    recall  f1-score   support

    location entity     0.5987    0.5968    0.5978       315
      person entity     0.7425    0.6792    0.7094       293
organization entity     0.6231    0.5700    0.5954       293
        other label     0.0000    0.0000    0.0000        30

          micro avg     0.6487    0.5951    0.6207       931
          macro avg     0.4911    0.4615    0.4756       931
       weighted avg     0.6324    0.5951    0.6129       931

2023-04-20 10:03:36,351 ----------------------------------------------------------------------------------------------------


{'test_score': 0.6207282913165266,
 'dev_score_history': [0.22093023255813954,
  0.27816901408450706,
  0.4250681198910082,
  0.45558739255014324,
  0.49415204678362573,
  0.5572289156626505,
  0.5751072961373391,
  0.5938864628820961,
  0.5186246418338109,
  0.569640062597809,
  0.5887445887445887,
  0.5726375176304654,
  0.6125356125356125,
  0.6080691642651297,
  0.5976676384839651,
  0.6197604790419162,
  0.5800865800865801,
  0.6203966005665722,
  0.6240928882438317,
  0.6156028368794326,
  0.6214689265536724,
  0.609720176730486,
  0.6062322946175637,
  0.6250000000000001,
  0.6182873730043541,
  0.6251768033946252,
  0.6329479768786127,
  0.6291486291486291,
  0.6335227272727273,
  0.6438746438746438,
  0.6321839080459771,
  0.6389684813753581,
  0.6324786324786326,
  0.6280752532561504,
  0.6410256410256411,
  0.6320346320346321,
  0.6338639652677279,
  0.6405797101449275,
  0.6167146974063401,
  0.6359712230215828,
  0.6427546628407461,
  0.6388489208633094,
  0.64172661870503

## For Japanese

In [None]:
#First we check what labels the dataset has
from flair.datasets import NER_JAPANESE

# Load the corpus
corpus = NER_JAPANESE()

label_type='ner'

# Get the label dictionary
label_dict = corpus.make_label_dictionary(label_type=label_type)

# Get the list of labels
labels = label_dict.get_items()

# Print the list of labels
print("Labels:", labels)


2023-04-19 21:44:02,818 https://raw.githubusercontent.com/Hironsan/IOB2Corpus/master/hironsan.txt not found in cache, downloading to /tmp/tmpyk469oyh


862kB [00:00, 62.1MB/s]                   

2023-04-19 21:44:02,882 copying /tmp/tmpyk469oyh to cache at /root/.flair/datasets/ner_japanese/raw/hironsan.txt
2023-04-19 21:44:02,888 removing temp file /tmp/tmpyk469oyh





2023-04-19 21:44:03,258 https://raw.githubusercontent.com/Hironsan/IOB2Corpus/master/ja.wikipedia.conll not found in cache, downloading to /tmp/tmp_s_y_2nh


1.24MB [00:00, 43.9MB/s]                  

2023-04-19 21:44:03,349 copying /tmp/tmp_s_y_2nh to cache at /root/.flair/datasets/ner_japanese/raw/ja.wikipedia.conll
2023-04-19 21:44:03,353 removing temp file /tmp/tmp_s_y_2nh
2023-04-19 21:44:03,510 Reading data from /root/.flair/datasets/ner_japanese





2023-04-19 21:44:03,516 Train: /root/.flair/datasets/ner_japanese/train.txt
2023-04-19 21:44:03,521 Dev: None
2023-04-19 21:44:03,524 Test: None
2023-04-19 21:44:06,526 Computing label dictionary. Progress:


3621it [00:00, 43341.93it/s]

2023-04-19 21:44:06,616 Dictionary created for label 'ner' with 20 values: LOCATION (seen 2275 times), ORGANIZATION (seen 1277 times), DATE (seen 1224 times), PERSON (seen 1122 times), NUMBER (seen 850 times), ARTIFACT (seen 591 times), LOC (seen 330 times), OTHER (seen 329 times), EVENT (seen 229 times), DAT (seen 221 times), ORG (seen 214 times), PSN (seen 138 times), PERCENT (seen 131 times), ART (seen 89 times), MONEY (seen 56 times), TIM (seen 49 times), TIME (seen 19 times), PNT (seen 12 times), MNY (seen 9 times)
Labels: ['<unk>', 'LOCATION', 'ORGANIZATION', 'DATE', 'PERSON', 'NUMBER', 'ARTIFACT', 'LOC', 'OTHER', 'EVENT', 'DAT', 'ORG', 'PSN', 'PERCENT', 'ART', 'MONEY', 'TIM', 'TIME', 'PNT', 'MNY']





In [None]:
from flair.data import Corpus
from flair.datasets import NER_JAPANESE
from flair.models import TARSClassifier
from flair.trainers import ModelTrainer
from flair.embeddings import WordEmbeddings, FlairEmbeddings, StackedEmbeddings
from flair.trainers import ModelTrainer
from flair.data import Sentence
from flair.models import SequenceTagger

# 1. define label names in natural language since some datasets come with cryptic set of labels
label_name_map = {'<unk>': 'Unknown',
                  'LOCATION':'location label',
                  'ORGANIZATION': 'organization entity',
                  'DATE': 'date label',
                  'PERSON': 'person entity',
                  'NUMBER': 'number label',
                  'ARTIFACT': 'artifact label',
                  'LOC': 'localization label',
                  'OTHER': 'other labels',
                  'DAT': 'date label',
                  'EVENT': 'event entity',
                  'ORG': 'organization entity',
                  'PERCENT': 'percent label',
                  'ART':'artifact entity',
                  'MONEY':'money entity',
                  'TIM':'time entity',
                  'TIME':'time label',
                  'MNY':'money entity'
                  }

# 2. get the corpus
corpus: Corpus = NER_JAPANESE(label_name_map=label_name_map)

# 3. what label do you want to predict?
label_type = 'ner'

# 4. make a label dictionary
label_dict = corpus.make_label_dictionary(label_type=label_type)

# 5. start from their existing TARS base model for English
tars = TARSClassifier.load("tars-base")

# 5a: alternatively, comment out previous line and comment in next line to train a new TARS model from scratch instead
#tars = TARSClassifier(embeddings="bert-base-uncased")

# 6. switch to a new task (TARS can do multiple tasks so you must define one)
tars.add_and_switch_to_new_task(task_name="ner-tagging-japanese",
                                label_dictionary=label_dict,
                                label_type=label_type,
                                )


# 7. initialize embedding stack with Flair and GloVe
embedding_types = [
    WordEmbeddings('glove'),
    FlairEmbeddings('news-forward'),
    FlairEmbeddings('news-backward'),
]

embeddings = StackedEmbeddings(embeddings=embedding_types)

# 8. initialize sequence tagger
tagger = SequenceTagger(hidden_size=256,
                        embeddings=embeddings,
                        tag_dictionary=label_dict,
                        tag_type=label_type,
                        use_crf=True)

2023-04-19 22:27:30,469 Reading data from /root/.flair/datasets/ner_japanese
2023-04-19 22:27:30,470 Train: /root/.flair/datasets/ner_japanese/train.txt
2023-04-19 22:27:30,472 Dev: None
2023-04-19 22:27:30,474 Test: None
2023-04-19 22:27:35,636 Computing label dictionary. Progress:


3621it [00:00, 38919.55it/s]

2023-04-19 22:27:35,737 Dictionary created for label 'ner' with 17 values: location label (seen 2394 times), organization entity (seen 1482 times), date label (seen 1458 times), person entity (seen 1116 times), number label (seen 849 times), artifact label (seen 598 times), other labels (seen 306 times), localization label (seen 292 times), event entity (seen 232 times), PSN (seen 133 times), percent label (seen 131 times), artifact entity (seen 87 times), money entity (seen 62 times), time entity (seen 49 times), time label (seen 17 times), PNT (seen 12 times)





2023-04-19 22:27:40,117 TARS initialized without a task. You need to call .add_and_switch_to_new_task() before training this model
2023-04-19 22:27:50,332 SequenceTagger predicts: Dictionary with 65 tags: O, S-location label, B-location label, E-location label, I-location label, S-organization entity, B-organization entity, E-organization entity, I-organization entity, S-date label, B-date label, E-date label, I-date label, S-person entity, B-person entity, E-person entity, I-person entity, S-number label, B-number label, E-number label, I-number label, S-artifact label, B-artifact label, E-artifact label, I-artifact label, S-other labels, B-other labels, E-other labels, I-other labels, S-localization label, B-localization label, E-localization label, I-localization label, S-event entity, B-event entity, E-event entity, I-event entity, S-PSN, B-PSN, E-PSN, I-PSN, S-percent label, B-percent label, E-percent label, I-percent label, S-artifact entity, B-artifact entity, E-artifact entity,

In [None]:
# 10. create a ModelTrainer and start training
trainer = ModelTrainer(tagger, corpus)

trainer.train(base_path='/content/drive/MyDrive/ColabNotebooks/nermodels/japanese',
              learning_rate=0.1,
              mini_batch_size=32,
              max_epochs=150)

2023-04-19 22:28:44,067 ----------------------------------------------------------------------------------------------------
2023-04-19 22:28:44,069 Model: "SequenceTagger(
  (embeddings): StackedEmbeddings(
    (list_embedding_0): WordEmbeddings(
      'glove'
      (embedding): Embedding(400001, 100)
    )
    (list_embedding_1): FlairEmbeddings(
      (lm): LanguageModel(
        (drop): Dropout(p=0.05, inplace=False)
        (encoder): Embedding(300, 100)
        (rnn): LSTM(100, 2048)
      )
    )
    (list_embedding_2): FlairEmbeddings(
      (lm): LanguageModel(
        (drop): Dropout(p=0.05, inplace=False)
        (encoder): Embedding(300, 100)
        (rnn): LSTM(100, 2048)
      )
    )
  )
  (word_dropout): WordDropout(p=0.05)
  (locked_dropout): LockedDropout(p=0.5)
  (embedding2nn): Linear(in_features=4196, out_features=4196, bias=True)
  (rnn): LSTM(4196, 256, batch_first=True, bidirectional=True)
  (linear): Linear(in_features=512, out_features=67, bias=True)
  (loss_f

100%|██████████| 13/13 [00:13<00:00,  1.00s/it]

2023-04-19 22:30:07,748 Evaluating as a multi-label problem: False
2023-04-19 22:30:07,770 DEV : loss 0.8138943910598755 - f1-score (micro avg)  0.022
2023-04-19 22:30:07,839 BAD EPOCHS (no improvement): 0
2023-04-19 22:30:07,846 saving best model





2023-04-19 22:30:09,904 ----------------------------------------------------------------------------------------------------
2023-04-19 22:30:13,070 epoch 2 - iter 11/114 - loss 0.82302818 - time (sec): 3.16 - samples/sec: 3797.76 - lr: 0.100000
2023-04-19 22:30:17,412 epoch 2 - iter 22/114 - loss 0.82483473 - time (sec): 7.50 - samples/sec: 3191.91 - lr: 0.100000
2023-04-19 22:30:21,927 epoch 2 - iter 33/114 - loss 0.79238746 - time (sec): 12.02 - samples/sec: 3040.94 - lr: 0.100000
2023-04-19 22:30:25,660 epoch 2 - iter 44/114 - loss 0.79316838 - time (sec): 15.75 - samples/sec: 3106.71 - lr: 0.100000
2023-04-19 22:30:29,095 epoch 2 - iter 55/114 - loss 0.79039161 - time (sec): 19.19 - samples/sec: 3170.73 - lr: 0.100000
2023-04-19 22:30:32,629 epoch 2 - iter 66/114 - loss 0.77994085 - time (sec): 22.72 - samples/sec: 3212.34 - lr: 0.100000
2023-04-19 22:30:36,654 epoch 2 - iter 77/114 - loss 0.77335517 - time (sec): 26.75 - samples/sec: 3196.96 - lr: 0.100000
2023-04-19 22:30:40,189

100%|██████████| 13/13 [00:04<00:00,  2.79it/s]

2023-04-19 22:30:54,879 Evaluating as a multi-label problem: False
2023-04-19 22:30:54,901 DEV : loss 0.6446548104286194 - f1-score (micro avg)  0.1934
2023-04-19 22:30:54,969 BAD EPOCHS (no improvement): 0
2023-04-19 22:30:54,974 saving best model





2023-04-19 22:30:57,073 ----------------------------------------------------------------------------------------------------
2023-04-19 22:31:00,729 epoch 3 - iter 11/114 - loss 0.67816350 - time (sec): 3.65 - samples/sec: 3318.67 - lr: 0.100000
2023-04-19 22:31:04,391 epoch 3 - iter 22/114 - loss 0.71724152 - time (sec): 7.31 - samples/sec: 3353.63 - lr: 0.100000
2023-04-19 22:31:10,765 epoch 3 - iter 33/114 - loss 0.69180612 - time (sec): 13.69 - samples/sec: 2729.48 - lr: 0.100000
2023-04-19 22:31:14,389 epoch 3 - iter 44/114 - loss 0.68431803 - time (sec): 17.31 - samples/sec: 2885.98 - lr: 0.100000
2023-04-19 22:31:17,907 epoch 3 - iter 55/114 - loss 0.66879767 - time (sec): 20.83 - samples/sec: 2983.57 - lr: 0.100000
2023-04-19 22:31:22,551 epoch 3 - iter 66/114 - loss 0.67604554 - time (sec): 25.47 - samples/sec: 2933.69 - lr: 0.100000
2023-04-19 22:31:26,798 epoch 3 - iter 77/114 - loss 0.67059419 - time (sec): 29.72 - samples/sec: 2931.99 - lr: 0.100000
2023-04-19 22:31:30,696

100%|██████████| 13/13 [00:04<00:00,  2.73it/s]

2023-04-19 22:31:43,601 Evaluating as a multi-label problem: False
2023-04-19 22:31:43,621 DEV : loss 0.6108267903327942 - f1-score (micro avg)  0.1989
2023-04-19 22:31:43,692 BAD EPOCHS (no improvement): 0
2023-04-19 22:31:43,724 saving best model





2023-04-19 22:31:45,774 ----------------------------------------------------------------------------------------------------
2023-04-19 22:31:49,640 epoch 4 - iter 11/114 - loss 0.63071516 - time (sec): 3.86 - samples/sec: 3195.03 - lr: 0.100000
2023-04-19 22:31:54,109 epoch 4 - iter 22/114 - loss 0.63171911 - time (sec): 8.33 - samples/sec: 2967.50 - lr: 0.100000
2023-04-19 22:31:58,923 epoch 4 - iter 33/114 - loss 0.62239458 - time (sec): 13.15 - samples/sec: 2826.09 - lr: 0.100000
2023-04-19 22:32:02,414 epoch 4 - iter 44/114 - loss 0.61026358 - time (sec): 16.64 - samples/sec: 2995.35 - lr: 0.100000
2023-04-19 22:32:05,777 epoch 4 - iter 55/114 - loss 0.60391569 - time (sec): 20.00 - samples/sec: 3108.91 - lr: 0.100000
2023-04-19 22:32:09,825 epoch 4 - iter 66/114 - loss 0.60063663 - time (sec): 24.05 - samples/sec: 3089.99 - lr: 0.100000
2023-04-19 22:32:14,445 epoch 4 - iter 77/114 - loss 0.59193839 - time (sec): 28.67 - samples/sec: 3042.18 - lr: 0.100000
2023-04-19 22:32:17,692

100%|██████████| 13/13 [00:05<00:00,  2.52it/s]

2023-04-19 22:32:31,522 Evaluating as a multi-label problem: False
2023-04-19 22:32:31,542 DEV : loss 0.539375901222229 - f1-score (micro avg)  0.254
2023-04-19 22:32:31,618 BAD EPOCHS (no improvement): 0
2023-04-19 22:32:31,624 saving best model





2023-04-19 22:32:33,655 ----------------------------------------------------------------------------------------------------
2023-04-19 22:32:38,440 epoch 5 - iter 11/114 - loss 0.58550204 - time (sec): 4.77 - samples/sec: 2664.21 - lr: 0.100000
2023-04-19 22:32:43,418 epoch 5 - iter 22/114 - loss 0.55453481 - time (sec): 9.75 - samples/sec: 2546.16 - lr: 0.100000
2023-04-19 22:32:47,852 epoch 5 - iter 33/114 - loss 0.55768712 - time (sec): 14.18 - samples/sec: 2639.56 - lr: 0.100000
2023-04-19 22:32:52,179 epoch 5 - iter 44/114 - loss 0.54852200 - time (sec): 18.51 - samples/sec: 2699.05 - lr: 0.100000
2023-04-19 22:32:55,431 epoch 5 - iter 55/114 - loss 0.54010506 - time (sec): 21.76 - samples/sec: 2848.36 - lr: 0.100000
2023-04-19 22:32:59,480 epoch 5 - iter 66/114 - loss 0.54129520 - time (sec): 25.81 - samples/sec: 2881.28 - lr: 0.100000
2023-04-19 22:33:03,246 epoch 5 - iter 77/114 - loss 0.53837989 - time (sec): 29.57 - samples/sec: 2940.06 - lr: 0.100000
2023-04-19 22:33:06,270

100%|██████████| 13/13 [00:04<00:00,  2.67it/s]

2023-04-19 22:33:20,227 Evaluating as a multi-label problem: False
2023-04-19 22:33:20,249 DEV : loss 0.5127555131912231 - f1-score (micro avg)  0.2363
2023-04-19 22:33:20,320 BAD EPOCHS (no improvement): 1
2023-04-19 22:33:20,327 ----------------------------------------------------------------------------------------------------





2023-04-19 22:33:24,060 epoch 6 - iter 11/114 - loss 0.48182879 - time (sec): 3.73 - samples/sec: 3266.40 - lr: 0.100000
2023-04-19 22:33:27,640 epoch 6 - iter 22/114 - loss 0.48754391 - time (sec): 7.31 - samples/sec: 3424.71 - lr: 0.100000
2023-04-19 22:33:31,343 epoch 6 - iter 33/114 - loss 0.49610016 - time (sec): 11.01 - samples/sec: 3373.66 - lr: 0.100000
2023-04-19 22:33:34,561 epoch 6 - iter 44/114 - loss 0.49363102 - time (sec): 14.23 - samples/sec: 3464.20 - lr: 0.100000
2023-04-19 22:33:38,014 epoch 6 - iter 55/114 - loss 0.50264860 - time (sec): 17.68 - samples/sec: 3520.14 - lr: 0.100000
2023-04-19 22:33:41,285 epoch 6 - iter 66/114 - loss 0.49422204 - time (sec): 20.95 - samples/sec: 3543.71 - lr: 0.100000
2023-04-19 22:33:45,041 epoch 6 - iter 77/114 - loss 0.49750735 - time (sec): 24.71 - samples/sec: 3494.33 - lr: 0.100000
2023-04-19 22:33:48,562 epoch 6 - iter 88/114 - loss 0.49668477 - time (sec): 28.23 - samples/sec: 3479.20 - lr: 0.100000
2023-04-19 22:33:52,888 ep

100%|██████████| 13/13 [00:06<00:00,  2.11it/s]

2023-04-19 22:34:04,017 Evaluating as a multi-label problem: False
2023-04-19 22:34:04,040 DEV : loss 0.4618784189224243 - f1-score (micro avg)  0.2609
2023-04-19 22:34:04,121 BAD EPOCHS (no improvement): 0
2023-04-19 22:34:04,128 saving best model





2023-04-19 22:34:06,209 ----------------------------------------------------------------------------------------------------
2023-04-19 22:34:10,379 epoch 7 - iter 11/114 - loss 0.45706192 - time (sec): 4.16 - samples/sec: 2920.55 - lr: 0.100000
2023-04-19 22:34:14,371 epoch 7 - iter 22/114 - loss 0.47308147 - time (sec): 8.16 - samples/sec: 2936.84 - lr: 0.100000
2023-04-19 22:34:19,675 epoch 7 - iter 33/114 - loss 0.48273276 - time (sec): 13.46 - samples/sec: 2691.89 - lr: 0.100000
2023-04-19 22:34:23,649 epoch 7 - iter 44/114 - loss 0.48923665 - time (sec): 17.43 - samples/sec: 2796.85 - lr: 0.100000
2023-04-19 22:34:27,906 epoch 7 - iter 55/114 - loss 0.48209606 - time (sec): 21.69 - samples/sec: 2859.26 - lr: 0.100000
2023-04-19 22:34:31,410 epoch 7 - iter 66/114 - loss 0.47703806 - time (sec): 25.20 - samples/sec: 2951.75 - lr: 0.100000
2023-04-19 22:34:35,637 epoch 7 - iter 77/114 - loss 0.47381001 - time (sec): 29.42 - samples/sec: 2962.63 - lr: 0.100000
2023-04-19 22:34:38,778

100%|██████████| 13/13 [00:05<00:00,  2.25it/s]

2023-04-19 22:34:52,966 Evaluating as a multi-label problem: False
2023-04-19 22:34:52,993 DEV : loss 0.4596380293369293 - f1-score (micro avg)  0.2517
2023-04-19 22:34:53,065 BAD EPOCHS (no improvement): 1
2023-04-19 22:34:53,070 ----------------------------------------------------------------------------------------------------





2023-04-19 22:34:56,206 epoch 8 - iter 11/114 - loss 0.46008470 - time (sec): 3.13 - samples/sec: 3711.79 - lr: 0.100000
2023-04-19 22:34:59,826 epoch 8 - iter 22/114 - loss 0.46267183 - time (sec): 6.75 - samples/sec: 3569.53 - lr: 0.100000
2023-04-19 22:35:03,050 epoch 8 - iter 33/114 - loss 0.45260428 - time (sec): 9.98 - samples/sec: 3627.92 - lr: 0.100000
2023-04-19 22:35:07,383 epoch 8 - iter 44/114 - loss 0.45020869 - time (sec): 14.31 - samples/sec: 3403.73 - lr: 0.100000
2023-04-19 22:35:10,913 epoch 8 - iter 55/114 - loss 0.45143562 - time (sec): 17.84 - samples/sec: 3408.99 - lr: 0.100000
2023-04-19 22:35:14,405 epoch 8 - iter 66/114 - loss 0.45084093 - time (sec): 21.33 - samples/sec: 3436.01 - lr: 0.100000
2023-04-19 22:35:18,386 epoch 8 - iter 77/114 - loss 0.44963114 - time (sec): 25.31 - samples/sec: 3387.86 - lr: 0.100000
2023-04-19 22:35:24,138 epoch 8 - iter 88/114 - loss 0.45117081 - time (sec): 31.07 - samples/sec: 3169.10 - lr: 0.100000
2023-04-19 22:35:27,324 epo

100%|██████████| 13/13 [00:06<00:00,  2.10it/s]

2023-04-19 22:35:38,001 Evaluating as a multi-label problem: False
2023-04-19 22:35:38,039 DEV : loss 0.4358760416507721 - f1-score (micro avg)  0.252
2023-04-19 22:35:38,171 BAD EPOCHS (no improvement): 2





2023-04-19 22:35:38,187 ----------------------------------------------------------------------------------------------------
2023-04-19 22:35:42,566 epoch 9 - iter 11/114 - loss 0.42582990 - time (sec): 4.38 - samples/sec: 2884.96 - lr: 0.100000
2023-04-19 22:35:46,088 epoch 9 - iter 22/114 - loss 0.43814975 - time (sec): 7.90 - samples/sec: 3093.28 - lr: 0.100000
2023-04-19 22:35:50,824 epoch 9 - iter 33/114 - loss 0.44915057 - time (sec): 12.63 - samples/sec: 2948.62 - lr: 0.100000
2023-04-19 22:35:55,496 epoch 9 - iter 44/114 - loss 0.44294614 - time (sec): 17.31 - samples/sec: 2841.20 - lr: 0.100000
2023-04-19 22:35:59,216 epoch 9 - iter 55/114 - loss 0.44358264 - time (sec): 21.03 - samples/sec: 2919.10 - lr: 0.100000
2023-04-19 22:36:02,374 epoch 9 - iter 66/114 - loss 0.44004464 - time (sec): 24.18 - samples/sec: 3034.49 - lr: 0.100000
2023-04-19 22:36:06,341 epoch 9 - iter 77/114 - loss 0.44153073 - time (sec): 28.15 - samples/sec: 3055.19 - lr: 0.100000
2023-04-19 22:36:10,284

100%|██████████| 13/13 [00:05<00:00,  2.30it/s]

2023-04-19 22:36:24,141 Evaluating as a multi-label problem: False
2023-04-19 22:36:24,177 DEV : loss 0.41539159417152405 - f1-score (micro avg)  0.2048
2023-04-19 22:36:24,295 BAD EPOCHS (no improvement): 3
2023-04-19 22:36:24,303 ----------------------------------------------------------------------------------------------------





2023-04-19 22:36:27,996 epoch 10 - iter 11/114 - loss 0.41471581 - time (sec): 3.69 - samples/sec: 3405.47 - lr: 0.100000
2023-04-19 22:36:31,473 epoch 10 - iter 22/114 - loss 0.42228868 - time (sec): 7.17 - samples/sec: 3501.74 - lr: 0.100000
2023-04-19 22:36:34,652 epoch 10 - iter 33/114 - loss 0.42494903 - time (sec): 10.35 - samples/sec: 3535.32 - lr: 0.100000
2023-04-19 22:36:38,372 epoch 10 - iter 44/114 - loss 0.42008602 - time (sec): 14.07 - samples/sec: 3463.57 - lr: 0.100000
2023-04-19 22:36:42,916 epoch 10 - iter 55/114 - loss 0.42316287 - time (sec): 18.61 - samples/sec: 3287.71 - lr: 0.100000
2023-04-19 22:36:46,144 epoch 10 - iter 66/114 - loss 0.42033352 - time (sec): 21.84 - samples/sec: 3371.03 - lr: 0.100000
2023-04-19 22:36:50,372 epoch 10 - iter 77/114 - loss 0.41945903 - time (sec): 26.07 - samples/sec: 3315.94 - lr: 0.100000
2023-04-19 22:36:53,850 epoch 10 - iter 88/114 - loss 0.41990396 - time (sec): 29.54 - samples/sec: 3336.72 - lr: 0.100000
2023-04-19 22:36:5

100%|██████████| 13/13 [00:04<00:00,  2.82it/s]

2023-04-19 22:37:07,929 Evaluating as a multi-label problem: False
2023-04-19 22:37:07,959 DEV : loss 0.40387436747550964 - f1-score (micro avg)  0.2733
2023-04-19 22:37:08,089 BAD EPOCHS (no improvement): 0
2023-04-19 22:37:08,096 saving best model





2023-04-19 22:37:10,873 ----------------------------------------------------------------------------------------------------
2023-04-19 22:37:16,045 epoch 11 - iter 11/114 - loss 0.39789612 - time (sec): 5.17 - samples/sec: 2492.79 - lr: 0.100000
2023-04-19 22:37:20,330 epoch 11 - iter 22/114 - loss 0.39782062 - time (sec): 9.45 - samples/sec: 2678.22 - lr: 0.100000
2023-04-19 22:37:24,628 epoch 11 - iter 33/114 - loss 0.39801091 - time (sec): 13.75 - samples/sec: 2689.70 - lr: 0.100000
2023-04-19 22:37:29,311 epoch 11 - iter 44/114 - loss 0.40909075 - time (sec): 18.43 - samples/sec: 2693.59 - lr: 0.100000
2023-04-19 22:37:32,721 epoch 11 - iter 55/114 - loss 0.41354317 - time (sec): 21.84 - samples/sec: 2823.95 - lr: 0.100000
2023-04-19 22:37:36,261 epoch 11 - iter 66/114 - loss 0.41664740 - time (sec): 25.38 - samples/sec: 2915.05 - lr: 0.100000
2023-04-19 22:37:39,986 epoch 11 - iter 77/114 - loss 0.41504213 - time (sec): 29.11 - samples/sec: 2989.66 - lr: 0.100000
2023-04-19 22:37

100%|██████████| 13/13 [00:05<00:00,  2.50it/s]

2023-04-19 22:37:57,267 Evaluating as a multi-label problem: False
2023-04-19 22:37:57,302 DEV : loss 0.38703325390815735 - f1-score (micro avg)  0.2763
2023-04-19 22:37:57,434 BAD EPOCHS (no improvement): 0
2023-04-19 22:37:57,442 saving best model





2023-04-19 22:38:00,243 ----------------------------------------------------------------------------------------------------
2023-04-19 22:38:04,157 epoch 12 - iter 11/114 - loss 0.42560775 - time (sec): 3.91 - samples/sec: 3198.83 - lr: 0.100000
2023-04-19 22:38:07,526 epoch 12 - iter 22/114 - loss 0.39760031 - time (sec): 7.28 - samples/sec: 3377.13 - lr: 0.100000
2023-04-19 22:38:11,277 epoch 12 - iter 33/114 - loss 0.40404062 - time (sec): 11.03 - samples/sec: 3359.25 - lr: 0.100000
2023-04-19 22:38:15,619 epoch 12 - iter 44/114 - loss 0.40549002 - time (sec): 15.37 - samples/sec: 3199.26 - lr: 0.100000
2023-04-19 22:38:19,715 epoch 12 - iter 55/114 - loss 0.40877390 - time (sec): 19.47 - samples/sec: 3162.82 - lr: 0.100000
2023-04-19 22:38:23,869 epoch 12 - iter 66/114 - loss 0.40382495 - time (sec): 23.62 - samples/sec: 3130.02 - lr: 0.100000
2023-04-19 22:38:27,259 epoch 12 - iter 77/114 - loss 0.39971835 - time (sec): 27.01 - samples/sec: 3186.12 - lr: 0.100000
2023-04-19 22:38

100%|██████████| 13/13 [00:04<00:00,  2.73it/s]

2023-04-19 22:38:44,948 Evaluating as a multi-label problem: False
2023-04-19 22:38:44,982 DEV : loss 0.39129576086997986 - f1-score (micro avg)  0.2289
2023-04-19 22:38:45,105 BAD EPOCHS (no improvement): 1
2023-04-19 22:38:45,117 ----------------------------------------------------------------------------------------------------





2023-04-19 22:38:50,147 epoch 13 - iter 11/114 - loss 0.36141650 - time (sec): 5.03 - samples/sec: 2472.11 - lr: 0.100000
2023-04-19 22:38:53,590 epoch 13 - iter 22/114 - loss 0.36105978 - time (sec): 8.47 - samples/sec: 2927.28 - lr: 0.100000
2023-04-19 22:38:57,168 epoch 13 - iter 33/114 - loss 0.37473910 - time (sec): 12.05 - samples/sec: 3074.41 - lr: 0.100000
2023-04-19 22:39:00,622 epoch 13 - iter 44/114 - loss 0.37984455 - time (sec): 15.50 - samples/sec: 3190.87 - lr: 0.100000
2023-04-19 22:39:05,377 epoch 13 - iter 55/114 - loss 0.37618763 - time (sec): 20.26 - samples/sec: 3062.97 - lr: 0.100000
2023-04-19 22:39:08,674 epoch 13 - iter 66/114 - loss 0.37790417 - time (sec): 23.55 - samples/sec: 3140.41 - lr: 0.100000
2023-04-19 22:39:12,417 epoch 13 - iter 77/114 - loss 0.38429841 - time (sec): 27.30 - samples/sec: 3156.87 - lr: 0.100000
2023-04-19 22:39:16,673 epoch 13 - iter 88/114 - loss 0.38949615 - time (sec): 31.55 - samples/sec: 3137.86 - lr: 0.100000
2023-04-19 22:39:2

100%|██████████| 13/13 [00:04<00:00,  2.99it/s]

2023-04-19 22:39:29,640 Evaluating as a multi-label problem: False
2023-04-19 22:39:29,662 DEV : loss 0.37574803829193115 - f1-score (micro avg)  0.2737
2023-04-19 22:39:29,733 BAD EPOCHS (no improvement): 2
2023-04-19 22:39:29,740 ----------------------------------------------------------------------------------------------------





2023-04-19 22:39:34,013 epoch 14 - iter 11/114 - loss 0.36563330 - time (sec): 4.27 - samples/sec: 2870.30 - lr: 0.100000
2023-04-19 22:39:37,794 epoch 14 - iter 22/114 - loss 0.37050009 - time (sec): 8.05 - samples/sec: 3057.74 - lr: 0.100000
2023-04-19 22:39:41,166 epoch 14 - iter 33/114 - loss 0.37016450 - time (sec): 11.42 - samples/sec: 3236.30 - lr: 0.100000
2023-04-19 22:39:44,591 epoch 14 - iter 44/114 - loss 0.37369559 - time (sec): 14.85 - samples/sec: 3296.09 - lr: 0.100000
2023-04-19 22:39:48,658 epoch 14 - iter 55/114 - loss 0.37611038 - time (sec): 18.91 - samples/sec: 3229.77 - lr: 0.100000
2023-04-19 22:39:52,655 epoch 14 - iter 66/114 - loss 0.37565386 - time (sec): 22.91 - samples/sec: 3215.06 - lr: 0.100000
2023-04-19 22:39:56,217 epoch 14 - iter 77/114 - loss 0.37331260 - time (sec): 26.47 - samples/sec: 3268.12 - lr: 0.100000
2023-04-19 22:40:00,002 epoch 14 - iter 88/114 - loss 0.37898905 - time (sec): 30.26 - samples/sec: 3270.05 - lr: 0.100000
2023-04-19 22:40:0

100%|██████████| 13/13 [00:04<00:00,  3.01it/s]

2023-04-19 22:40:13,151 Evaluating as a multi-label problem: False
2023-04-19 22:40:13,170 DEV : loss 0.3730096220970154 - f1-score (micro avg)  0.2369
2023-04-19 22:40:13,244 BAD EPOCHS (no improvement): 3
2023-04-19 22:40:13,251 ----------------------------------------------------------------------------------------------------





2023-04-19 22:40:18,148 epoch 15 - iter 11/114 - loss 0.40073110 - time (sec): 4.89 - samples/sec: 2606.01 - lr: 0.100000
2023-04-19 22:40:22,989 epoch 15 - iter 22/114 - loss 0.38721312 - time (sec): 9.74 - samples/sec: 2618.88 - lr: 0.100000
2023-04-19 22:40:26,010 epoch 15 - iter 33/114 - loss 0.38728906 - time (sec): 12.76 - samples/sec: 2932.36 - lr: 0.100000
2023-04-19 22:40:29,834 epoch 15 - iter 44/114 - loss 0.37553684 - time (sec): 16.58 - samples/sec: 3013.32 - lr: 0.100000
2023-04-19 22:40:33,450 epoch 15 - iter 55/114 - loss 0.37579701 - time (sec): 20.20 - samples/sec: 3083.91 - lr: 0.100000
2023-04-19 22:40:37,460 epoch 15 - iter 66/114 - loss 0.37296272 - time (sec): 24.21 - samples/sec: 3065.72 - lr: 0.100000
2023-04-19 22:40:40,732 epoch 15 - iter 77/114 - loss 0.37369177 - time (sec): 27.48 - samples/sec: 3151.28 - lr: 0.100000
2023-04-19 22:40:44,185 epoch 15 - iter 88/114 - loss 0.37136782 - time (sec): 30.93 - samples/sec: 3198.27 - lr: 0.100000
2023-04-19 22:40:4

100%|██████████| 13/13 [00:04<00:00,  2.87it/s]

2023-04-19 22:40:57,235 Evaluating as a multi-label problem: False
2023-04-19 22:40:57,255 DEV : loss 0.35540899634361267 - f1-score (micro avg)  0.2755
2023-04-19 22:40:57,323 Epoch    15: reducing learning rate of group 0 to 5.0000e-02.
2023-04-19 22:40:57,325 BAD EPOCHS (no improvement): 4
2023-04-19 22:40:57,334 ----------------------------------------------------------------------------------------------------





2023-04-19 22:41:00,899 epoch 16 - iter 11/114 - loss 0.37103636 - time (sec): 3.56 - samples/sec: 3502.61 - lr: 0.050000
2023-04-19 22:41:05,030 epoch 16 - iter 22/114 - loss 0.37037386 - time (sec): 7.69 - samples/sec: 3250.04 - lr: 0.050000
2023-04-19 22:41:08,936 epoch 16 - iter 33/114 - loss 0.36233579 - time (sec): 11.60 - samples/sec: 3202.94 - lr: 0.050000
2023-04-19 22:41:12,140 epoch 16 - iter 44/114 - loss 0.35993064 - time (sec): 14.80 - samples/sec: 3305.62 - lr: 0.050000
2023-04-19 22:41:15,465 epoch 16 - iter 55/114 - loss 0.36566463 - time (sec): 18.13 - samples/sec: 3389.06 - lr: 0.050000
2023-04-19 22:41:18,400 epoch 16 - iter 66/114 - loss 0.36066288 - time (sec): 21.06 - samples/sec: 3469.32 - lr: 0.050000
2023-04-19 22:41:22,739 epoch 16 - iter 77/114 - loss 0.36042797 - time (sec): 25.40 - samples/sec: 3379.85 - lr: 0.050000
2023-04-19 22:41:27,170 epoch 16 - iter 88/114 - loss 0.35966931 - time (sec): 29.83 - samples/sec: 3301.46 - lr: 0.050000
2023-04-19 22:41:3

100%|██████████| 13/13 [00:05<00:00,  2.28it/s]

2023-04-19 22:41:41,726 Evaluating as a multi-label problem: False
2023-04-19 22:41:41,753 DEV : loss 0.3545457720756531 - f1-score (micro avg)  0.3206
2023-04-19 22:41:41,824 BAD EPOCHS (no improvement): 0
2023-04-19 22:41:41,829 saving best model





2023-04-19 22:41:43,925 ----------------------------------------------------------------------------------------------------
2023-04-19 22:41:48,153 epoch 17 - iter 11/114 - loss 0.37006517 - time (sec): 4.23 - samples/sec: 2899.31 - lr: 0.050000
2023-04-19 22:41:52,833 epoch 17 - iter 22/114 - loss 0.36459065 - time (sec): 8.91 - samples/sec: 2812.98 - lr: 0.050000
2023-04-19 22:41:57,806 epoch 17 - iter 33/114 - loss 0.36157602 - time (sec): 13.88 - samples/sec: 2706.51 - lr: 0.050000
2023-04-19 22:42:00,962 epoch 17 - iter 44/114 - loss 0.36196315 - time (sec): 17.03 - samples/sec: 2916.88 - lr: 0.050000
2023-04-19 22:42:03,977 epoch 17 - iter 55/114 - loss 0.36128467 - time (sec): 20.05 - samples/sec: 3082.24 - lr: 0.050000
2023-04-19 22:42:08,257 epoch 17 - iter 66/114 - loss 0.36177653 - time (sec): 24.33 - samples/sec: 3056.71 - lr: 0.050000
2023-04-19 22:42:12,416 epoch 17 - iter 77/114 - loss 0.35655077 - time (sec): 28.49 - samples/sec: 3036.01 - lr: 0.050000
2023-04-19 22:42

100%|██████████| 13/13 [00:05<00:00,  2.31it/s]

2023-04-19 22:42:30,445 Evaluating as a multi-label problem: False
2023-04-19 22:42:30,470 DEV : loss 0.34804514050483704 - f1-score (micro avg)  0.2945
2023-04-19 22:42:30,538 BAD EPOCHS (no improvement): 1
2023-04-19 22:42:30,545 ----------------------------------------------------------------------------------------------------





2023-04-19 22:42:34,246 epoch 18 - iter 11/114 - loss 0.35402671 - time (sec): 3.70 - samples/sec: 3342.43 - lr: 0.050000
2023-04-19 22:42:38,245 epoch 18 - iter 22/114 - loss 0.35829779 - time (sec): 7.70 - samples/sec: 3217.24 - lr: 0.050000
2023-04-19 22:42:42,670 epoch 18 - iter 33/114 - loss 0.35539369 - time (sec): 12.12 - samples/sec: 3055.69 - lr: 0.050000
2023-04-19 22:42:46,377 epoch 18 - iter 44/114 - loss 0.35608220 - time (sec): 15.83 - samples/sec: 3125.39 - lr: 0.050000
2023-04-19 22:42:49,689 epoch 18 - iter 55/114 - loss 0.35129077 - time (sec): 19.14 - samples/sec: 3227.68 - lr: 0.050000
2023-04-19 22:42:52,995 epoch 18 - iter 66/114 - loss 0.35281127 - time (sec): 22.45 - samples/sec: 3288.69 - lr: 0.050000
2023-04-19 22:42:57,003 epoch 18 - iter 77/114 - loss 0.35349901 - time (sec): 26.45 - samples/sec: 3273.74 - lr: 0.050000
2023-04-19 22:43:00,761 epoch 18 - iter 88/114 - loss 0.35202757 - time (sec): 30.21 - samples/sec: 3270.85 - lr: 0.050000
2023-04-19 22:43:0

100%|██████████| 13/13 [00:07<00:00,  1.82it/s]

2023-04-19 22:43:16,683 Evaluating as a multi-label problem: False
2023-04-19 22:43:16,703 DEV : loss 0.3415183126926422 - f1-score (micro avg)  0.3014
2023-04-19 22:43:16,778 BAD EPOCHS (no improvement): 2
2023-04-19 22:43:16,787 ----------------------------------------------------------------------------------------------------





2023-04-19 22:43:20,108 epoch 19 - iter 11/114 - loss 0.31066643 - time (sec): 3.32 - samples/sec: 3588.81 - lr: 0.050000
2023-04-19 22:43:23,392 epoch 19 - iter 22/114 - loss 0.32862958 - time (sec): 6.60 - samples/sec: 3639.76 - lr: 0.050000
2023-04-19 22:43:27,747 epoch 19 - iter 33/114 - loss 0.32941994 - time (sec): 10.96 - samples/sec: 3387.32 - lr: 0.050000
2023-04-19 22:43:32,093 epoch 19 - iter 44/114 - loss 0.33484555 - time (sec): 15.30 - samples/sec: 3227.80 - lr: 0.050000
2023-04-19 22:43:35,220 epoch 19 - iter 55/114 - loss 0.33856560 - time (sec): 18.43 - samples/sec: 3352.44 - lr: 0.050000
2023-04-19 22:43:38,662 epoch 19 - iter 66/114 - loss 0.33696894 - time (sec): 21.87 - samples/sec: 3421.79 - lr: 0.050000
2023-04-19 22:43:42,402 epoch 19 - iter 77/114 - loss 0.33731109 - time (sec): 25.61 - samples/sec: 3387.25 - lr: 0.050000
2023-04-19 22:43:46,914 epoch 19 - iter 88/114 - loss 0.33860636 - time (sec): 30.12 - samples/sec: 3284.22 - lr: 0.050000
2023-04-19 22:43:5

100%|██████████| 13/13 [00:06<00:00,  2.15it/s]

2023-04-19 22:44:01,428 Evaluating as a multi-label problem: False
2023-04-19 22:44:01,461 DEV : loss 0.3392186164855957 - f1-score (micro avg)  0.2945
2023-04-19 22:44:01,583 BAD EPOCHS (no improvement): 3
2023-04-19 22:44:01,592 ----------------------------------------------------------------------------------------------------





2023-04-19 22:44:05,368 epoch 20 - iter 11/114 - loss 0.32687758 - time (sec): 3.77 - samples/sec: 3389.79 - lr: 0.050000
2023-04-19 22:44:08,954 epoch 20 - iter 22/114 - loss 0.33964965 - time (sec): 7.36 - samples/sec: 3417.67 - lr: 0.050000
2023-04-19 22:44:12,258 epoch 20 - iter 33/114 - loss 0.34221841 - time (sec): 10.66 - samples/sec: 3478.21 - lr: 0.050000
2023-04-19 22:44:16,619 epoch 20 - iter 44/114 - loss 0.33676412 - time (sec): 15.02 - samples/sec: 3286.88 - lr: 0.050000
2023-04-19 22:44:20,909 epoch 20 - iter 55/114 - loss 0.34757208 - time (sec): 19.31 - samples/sec: 3235.11 - lr: 0.050000
2023-04-19 22:44:24,962 epoch 20 - iter 66/114 - loss 0.34380009 - time (sec): 23.37 - samples/sec: 3206.37 - lr: 0.050000
2023-04-19 22:44:28,182 epoch 20 - iter 77/114 - loss 0.34836528 - time (sec): 26.59 - samples/sec: 3274.15 - lr: 0.050000
2023-04-19 22:44:32,526 epoch 20 - iter 88/114 - loss 0.34516278 - time (sec): 30.93 - samples/sec: 3207.88 - lr: 0.050000
2023-04-19 22:44:3

100%|██████████| 13/13 [00:06<00:00,  2.13it/s]

2023-04-19 22:44:46,993 Evaluating as a multi-label problem: False
2023-04-19 22:44:47,028 DEV : loss 0.34844648838043213 - f1-score (micro avg)  0.3138
2023-04-19 22:44:47,158 Epoch    20: reducing learning rate of group 0 to 2.5000e-02.
2023-04-19 22:44:47,165 BAD EPOCHS (no improvement): 4





2023-04-19 22:44:47,175 ----------------------------------------------------------------------------------------------------
2023-04-19 22:44:51,348 epoch 21 - iter 11/114 - loss 0.32659131 - time (sec): 4.17 - samples/sec: 2886.02 - lr: 0.025000
2023-04-19 22:44:54,992 epoch 21 - iter 22/114 - loss 0.34720207 - time (sec): 7.81 - samples/sec: 3093.20 - lr: 0.025000
2023-04-19 22:44:58,047 epoch 21 - iter 33/114 - loss 0.33633948 - time (sec): 10.87 - samples/sec: 3340.31 - lr: 0.025000
2023-04-19 22:45:02,365 epoch 21 - iter 44/114 - loss 0.33746748 - time (sec): 15.19 - samples/sec: 3209.35 - lr: 0.025000
2023-04-19 22:45:06,301 epoch 21 - iter 55/114 - loss 0.34054743 - time (sec): 19.12 - samples/sec: 3184.74 - lr: 0.025000
2023-04-19 22:45:10,201 epoch 21 - iter 66/114 - loss 0.34314462 - time (sec): 23.02 - samples/sec: 3206.83 - lr: 0.025000
2023-04-19 22:45:13,788 epoch 21 - iter 77/114 - loss 0.34101671 - time (sec): 26.61 - samples/sec: 3251.67 - lr: 0.025000
2023-04-19 22:45

100%|██████████| 13/13 [00:04<00:00,  2.92it/s]

2023-04-19 22:45:30,950 Evaluating as a multi-label problem: False
2023-04-19 22:45:30,984 DEV : loss 0.33954429626464844 - f1-score (micro avg)  0.3164
2023-04-19 22:45:31,109 BAD EPOCHS (no improvement): 1
2023-04-19 22:45:31,115 ----------------------------------------------------------------------------------------------------





2023-04-19 22:45:35,553 epoch 22 - iter 11/114 - loss 0.34874713 - time (sec): 4.43 - samples/sec: 2920.85 - lr: 0.025000
2023-04-19 22:45:39,331 epoch 22 - iter 22/114 - loss 0.34747916 - time (sec): 8.21 - samples/sec: 3112.95 - lr: 0.025000
2023-04-19 22:45:42,787 epoch 22 - iter 33/114 - loss 0.34557280 - time (sec): 11.67 - samples/sec: 3227.81 - lr: 0.025000
2023-04-19 22:45:46,431 epoch 22 - iter 44/114 - loss 0.34586173 - time (sec): 15.31 - samples/sec: 3308.97 - lr: 0.025000
2023-04-19 22:45:50,668 epoch 22 - iter 55/114 - loss 0.34784188 - time (sec): 19.55 - samples/sec: 3192.29 - lr: 0.025000
2023-04-19 22:45:54,988 epoch 22 - iter 66/114 - loss 0.34271415 - time (sec): 23.87 - samples/sec: 3138.54 - lr: 0.025000
2023-04-19 22:45:58,564 epoch 22 - iter 77/114 - loss 0.33780809 - time (sec): 27.44 - samples/sec: 3168.51 - lr: 0.025000
2023-04-19 22:46:02,367 epoch 22 - iter 88/114 - loss 0.33492375 - time (sec): 31.25 - samples/sec: 3173.40 - lr: 0.025000
2023-04-19 22:46:0

100%|██████████| 13/13 [00:04<00:00,  3.01it/s]

2023-04-19 22:46:15,811 Evaluating as a multi-label problem: False
2023-04-19 22:46:15,832 DEV : loss 0.3334939777851105 - f1-score (micro avg)  0.3007
2023-04-19 22:46:15,908 BAD EPOCHS (no improvement): 2
2023-04-19 22:46:15,916 ----------------------------------------------------------------------------------------------------





2023-04-19 22:46:21,594 epoch 23 - iter 11/114 - loss 0.34111548 - time (sec): 5.68 - samples/sec: 2254.45 - lr: 0.025000
2023-04-19 22:46:25,188 epoch 23 - iter 22/114 - loss 0.33053225 - time (sec): 9.27 - samples/sec: 2730.95 - lr: 0.025000
2023-04-19 22:46:29,393 epoch 23 - iter 33/114 - loss 0.33549832 - time (sec): 13.48 - samples/sec: 2827.59 - lr: 0.025000
2023-04-19 22:46:32,443 epoch 23 - iter 44/114 - loss 0.33449023 - time (sec): 16.53 - samples/sec: 3011.18 - lr: 0.025000
2023-04-19 22:46:36,441 epoch 23 - iter 55/114 - loss 0.33582684 - time (sec): 20.52 - samples/sec: 3040.60 - lr: 0.025000
2023-04-19 22:46:40,189 epoch 23 - iter 66/114 - loss 0.33851797 - time (sec): 24.27 - samples/sec: 3063.15 - lr: 0.025000
2023-04-19 22:46:43,414 epoch 23 - iter 77/114 - loss 0.33671358 - time (sec): 27.50 - samples/sec: 3144.12 - lr: 0.025000
2023-04-19 22:46:46,663 epoch 23 - iter 88/114 - loss 0.33528994 - time (sec): 30.75 - samples/sec: 3189.80 - lr: 0.025000
2023-04-19 22:46:5

100%|██████████| 13/13 [00:04<00:00,  3.00it/s]

2023-04-19 22:47:00,862 Evaluating as a multi-label problem: False
2023-04-19 22:47:00,884 DEV : loss 0.3348504602909088 - f1-score (micro avg)  0.3063
2023-04-19 22:47:00,957 BAD EPOCHS (no improvement): 3
2023-04-19 22:47:00,967 ----------------------------------------------------------------------------------------------------





2023-04-19 22:47:04,609 epoch 24 - iter 11/114 - loss 0.32622224 - time (sec): 3.64 - samples/sec: 3472.29 - lr: 0.025000
2023-04-19 22:47:08,280 epoch 24 - iter 22/114 - loss 0.32750832 - time (sec): 7.31 - samples/sec: 3383.39 - lr: 0.025000
2023-04-19 22:47:12,509 epoch 24 - iter 33/114 - loss 0.32666210 - time (sec): 11.54 - samples/sec: 3205.27 - lr: 0.025000
2023-04-19 22:47:15,647 epoch 24 - iter 44/114 - loss 0.32781032 - time (sec): 14.68 - samples/sec: 3338.66 - lr: 0.025000
2023-04-19 22:47:19,022 epoch 24 - iter 55/114 - loss 0.33042262 - time (sec): 18.05 - samples/sec: 3401.48 - lr: 0.025000
2023-04-19 22:47:22,526 epoch 24 - iter 66/114 - loss 0.33121836 - time (sec): 21.56 - samples/sec: 3404.33 - lr: 0.025000
2023-04-19 22:47:27,242 epoch 24 - iter 77/114 - loss 0.33397179 - time (sec): 26.27 - samples/sec: 3280.92 - lr: 0.025000
2023-04-19 22:47:30,465 epoch 24 - iter 88/114 - loss 0.33271129 - time (sec): 29.49 - samples/sec: 3325.03 - lr: 0.025000
2023-04-19 22:47:3

100%|██████████| 13/13 [00:05<00:00,  2.59it/s]

2023-04-19 22:47:45,063 Evaluating as a multi-label problem: False
2023-04-19 22:47:45,085 DEV : loss 0.33321747183799744 - f1-score (micro avg)  0.3085
2023-04-19 22:47:45,158 Epoch    24: reducing learning rate of group 0 to 1.2500e-02.
2023-04-19 22:47:45,160 BAD EPOCHS (no improvement): 4
2023-04-19 22:47:45,169 ----------------------------------------------------------------------------------------------------





2023-04-19 22:47:50,118 epoch 25 - iter 11/114 - loss 0.30465223 - time (sec): 4.95 - samples/sec: 2512.97 - lr: 0.012500
2023-04-19 22:47:53,622 epoch 25 - iter 22/114 - loss 0.32202091 - time (sec): 8.45 - samples/sec: 2929.31 - lr: 0.012500
2023-04-19 22:47:57,838 epoch 25 - iter 33/114 - loss 0.32980442 - time (sec): 12.67 - samples/sec: 2936.51 - lr: 0.012500
2023-04-19 22:48:01,671 epoch 25 - iter 44/114 - loss 0.32863019 - time (sec): 16.50 - samples/sec: 3016.79 - lr: 0.012500
2023-04-19 22:48:04,833 epoch 25 - iter 55/114 - loss 0.32625869 - time (sec): 19.66 - samples/sec: 3171.63 - lr: 0.012500
2023-04-19 22:48:07,969 epoch 25 - iter 66/114 - loss 0.32466728 - time (sec): 22.80 - samples/sec: 3259.54 - lr: 0.012500
2023-04-19 22:48:11,943 epoch 25 - iter 77/114 - loss 0.32489155 - time (sec): 26.77 - samples/sec: 3232.43 - lr: 0.012500
2023-04-19 22:48:16,093 epoch 25 - iter 88/114 - loss 0.32779996 - time (sec): 30.92 - samples/sec: 3204.15 - lr: 0.012500
2023-04-19 22:48:1

100%|██████████| 13/13 [00:05<00:00,  2.27it/s]

2023-04-19 22:48:30,447 Evaluating as a multi-label problem: False
2023-04-19 22:48:30,469 DEV : loss 0.33139604330062866 - f1-score (micro avg)  0.3163
2023-04-19 22:48:30,549 BAD EPOCHS (no improvement): 1
2023-04-19 22:48:30,557 ----------------------------------------------------------------------------------------------------





2023-04-19 22:48:33,862 epoch 26 - iter 11/114 - loss 0.30360038 - time (sec): 3.30 - samples/sec: 3636.28 - lr: 0.012500
2023-04-19 22:48:37,597 epoch 26 - iter 22/114 - loss 0.30655607 - time (sec): 7.04 - samples/sec: 3463.31 - lr: 0.012500
2023-04-19 22:48:41,615 epoch 26 - iter 33/114 - loss 0.31847972 - time (sec): 11.06 - samples/sec: 3312.13 - lr: 0.012500
2023-04-19 22:48:45,340 epoch 26 - iter 44/114 - loss 0.31808918 - time (sec): 14.78 - samples/sec: 3283.70 - lr: 0.012500
2023-04-19 22:48:49,202 epoch 26 - iter 55/114 - loss 0.32078102 - time (sec): 18.64 - samples/sec: 3287.87 - lr: 0.012500
2023-04-19 22:48:53,126 epoch 26 - iter 66/114 - loss 0.32230033 - time (sec): 22.57 - samples/sec: 3274.20 - lr: 0.012500
2023-04-19 22:48:56,812 epoch 26 - iter 77/114 - loss 0.32327068 - time (sec): 26.25 - samples/sec: 3272.44 - lr: 0.012500
2023-04-19 22:49:01,154 epoch 26 - iter 88/114 - loss 0.32518970 - time (sec): 30.59 - samples/sec: 3222.56 - lr: 0.012500
2023-04-19 22:49:0

100%|██████████| 13/13 [00:07<00:00,  1.82it/s]

2023-04-19 22:49:16,324 Evaluating as a multi-label problem: False
2023-04-19 22:49:16,345 DEV : loss 0.3293048143386841 - f1-score (micro avg)  0.3134
2023-04-19 22:49:16,414 BAD EPOCHS (no improvement): 2
2023-04-19 22:49:16,421 ----------------------------------------------------------------------------------------------------





2023-04-19 22:49:19,803 epoch 27 - iter 11/114 - loss 0.32572252 - time (sec): 3.38 - samples/sec: 3606.88 - lr: 0.012500
2023-04-19 22:49:23,192 epoch 27 - iter 22/114 - loss 0.32560892 - time (sec): 6.77 - samples/sec: 3595.36 - lr: 0.012500
2023-04-19 22:49:26,332 epoch 27 - iter 33/114 - loss 0.32193713 - time (sec): 9.91 - samples/sec: 3659.82 - lr: 0.012500
2023-04-19 22:49:30,536 epoch 27 - iter 44/114 - loss 0.32197418 - time (sec): 14.11 - samples/sec: 3452.48 - lr: 0.012500
2023-04-19 22:49:34,042 epoch 27 - iter 55/114 - loss 0.32138377 - time (sec): 17.62 - samples/sec: 3461.84 - lr: 0.012500
2023-04-19 22:49:37,611 epoch 27 - iter 66/114 - loss 0.32594847 - time (sec): 21.19 - samples/sec: 3474.30 - lr: 0.012500
2023-04-19 22:49:41,457 epoch 27 - iter 77/114 - loss 0.32469352 - time (sec): 25.03 - samples/sec: 3436.33 - lr: 0.012500
2023-04-19 22:49:45,927 epoch 27 - iter 88/114 - loss 0.32528498 - time (sec): 29.50 - samples/sec: 3326.93 - lr: 0.012500
2023-04-19 22:49:49

100%|██████████| 13/13 [00:05<00:00,  2.28it/s]

2023-04-19 22:50:00,935 Evaluating as a multi-label problem: False
2023-04-19 22:50:00,973 DEV : loss 0.3312470614910126 - f1-score (micro avg)  0.3215
2023-04-19 22:50:01,105 BAD EPOCHS (no improvement): 0
2023-04-19 22:50:01,114 saving best model





2023-04-19 22:50:03,670 ----------------------------------------------------------------------------------------------------
2023-04-19 22:50:06,819 epoch 28 - iter 11/114 - loss 0.31881086 - time (sec): 3.15 - samples/sec: 3841.53 - lr: 0.012500
2023-04-19 22:50:10,135 epoch 28 - iter 22/114 - loss 0.32061282 - time (sec): 6.46 - samples/sec: 3747.48 - lr: 0.012500
2023-04-19 22:50:14,411 epoch 28 - iter 33/114 - loss 0.31383466 - time (sec): 10.74 - samples/sec: 3425.96 - lr: 0.012500
2023-04-19 22:50:19,337 epoch 28 - iter 44/114 - loss 0.31782794 - time (sec): 15.67 - samples/sec: 3133.31 - lr: 0.012500
2023-04-19 22:50:23,688 epoch 28 - iter 55/114 - loss 0.32331411 - time (sec): 20.02 - samples/sec: 3060.42 - lr: 0.012500
2023-04-19 22:50:27,453 epoch 28 - iter 66/114 - loss 0.32712561 - time (sec): 23.78 - samples/sec: 3106.15 - lr: 0.012500
2023-04-19 22:50:31,107 epoch 28 - iter 77/114 - loss 0.32845293 - time (sec): 27.44 - samples/sec: 3134.70 - lr: 0.012500
2023-04-19 22:50

100%|██████████| 13/13 [00:06<00:00,  2.04it/s]

2023-04-19 22:50:51,944 Evaluating as a multi-label problem: False
2023-04-19 22:50:51,967 DEV : loss 0.32893434166908264 - f1-score (micro avg)  0.3159
2023-04-19 22:50:52,035 BAD EPOCHS (no improvement): 1
2023-04-19 22:50:52,043 ----------------------------------------------------------------------------------------------------





2023-04-19 22:50:56,015 epoch 29 - iter 11/114 - loss 0.33077123 - time (sec): 3.97 - samples/sec: 3003.84 - lr: 0.012500
2023-04-19 22:51:00,002 epoch 29 - iter 22/114 - loss 0.30792157 - time (sec): 7.95 - samples/sec: 3149.23 - lr: 0.012500
2023-04-19 22:51:04,659 epoch 29 - iter 33/114 - loss 0.32181781 - time (sec): 12.61 - samples/sec: 2975.03 - lr: 0.012500
2023-04-19 22:51:08,993 epoch 29 - iter 44/114 - loss 0.33346352 - time (sec): 16.95 - samples/sec: 2968.48 - lr: 0.012500
2023-04-19 22:51:12,360 epoch 29 - iter 55/114 - loss 0.33068998 - time (sec): 20.31 - samples/sec: 3061.75 - lr: 0.012500
2023-04-19 22:51:15,527 epoch 29 - iter 66/114 - loss 0.33154055 - time (sec): 23.48 - samples/sec: 3172.18 - lr: 0.012500
2023-04-19 22:51:19,037 epoch 29 - iter 77/114 - loss 0.32871910 - time (sec): 26.99 - samples/sec: 3221.11 - lr: 0.012500
2023-04-19 22:51:23,297 epoch 29 - iter 88/114 - loss 0.32774286 - time (sec): 31.25 - samples/sec: 3177.24 - lr: 0.012500
2023-04-19 22:51:2

100%|██████████| 13/13 [00:06<00:00,  2.16it/s]

2023-04-19 22:51:37,397 Evaluating as a multi-label problem: False
2023-04-19 22:51:37,433 DEV : loss 0.32780176401138306 - f1-score (micro avg)  0.3156
2023-04-19 22:51:37,561 BAD EPOCHS (no improvement): 2





2023-04-19 22:51:37,568 ----------------------------------------------------------------------------------------------------
2023-04-19 22:51:41,248 epoch 30 - iter 11/114 - loss 0.35934029 - time (sec): 3.68 - samples/sec: 3325.65 - lr: 0.012500
2023-04-19 22:51:44,823 epoch 30 - iter 22/114 - loss 0.34077532 - time (sec): 7.25 - samples/sec: 3384.18 - lr: 0.012500
2023-04-19 22:51:47,973 epoch 30 - iter 33/114 - loss 0.32731010 - time (sec): 10.40 - samples/sec: 3548.97 - lr: 0.012500
2023-04-19 22:51:51,826 epoch 30 - iter 44/114 - loss 0.32851720 - time (sec): 14.25 - samples/sec: 3451.12 - lr: 0.012500
2023-04-19 22:51:55,206 epoch 30 - iter 55/114 - loss 0.32039613 - time (sec): 17.63 - samples/sec: 3461.98 - lr: 0.012500
2023-04-19 22:51:59,088 epoch 30 - iter 66/114 - loss 0.32197084 - time (sec): 21.52 - samples/sec: 3441.82 - lr: 0.012500
2023-04-19 22:52:02,973 epoch 30 - iter 77/114 - loss 0.32389018 - time (sec): 25.40 - samples/sec: 3426.81 - lr: 0.012500
2023-04-19 22:52

100%|██████████| 13/13 [00:04<00:00,  2.82it/s]

2023-04-19 22:52:20,365 Evaluating as a multi-label problem: False
2023-04-19 22:52:20,397 DEV : loss 0.3288080096244812 - f1-score (micro avg)  0.3149
2023-04-19 22:52:20,521 BAD EPOCHS (no improvement): 3
2023-04-19 22:52:20,531 ----------------------------------------------------------------------------------------------------





2023-04-19 22:52:25,927 epoch 31 - iter 11/114 - loss 0.34434053 - time (sec): 5.39 - samples/sec: 2335.27 - lr: 0.012500
2023-04-19 22:52:29,375 epoch 31 - iter 22/114 - loss 0.33698858 - time (sec): 8.84 - samples/sec: 2854.81 - lr: 0.012500
2023-04-19 22:52:32,864 epoch 31 - iter 33/114 - loss 0.33854309 - time (sec): 12.33 - samples/sec: 2995.36 - lr: 0.012500
2023-04-19 22:52:36,660 epoch 31 - iter 44/114 - loss 0.33129392 - time (sec): 16.13 - samples/sec: 3080.16 - lr: 0.012500
2023-04-19 22:52:41,216 epoch 31 - iter 55/114 - loss 0.32756982 - time (sec): 20.68 - samples/sec: 3030.39 - lr: 0.012500
2023-04-19 22:52:44,612 epoch 31 - iter 66/114 - loss 0.32742672 - time (sec): 24.08 - samples/sec: 3092.93 - lr: 0.012500
2023-04-19 22:52:48,483 epoch 31 - iter 77/114 - loss 0.32944866 - time (sec): 27.95 - samples/sec: 3106.86 - lr: 0.012500
2023-04-19 22:52:52,454 epoch 31 - iter 88/114 - loss 0.32793052 - time (sec): 31.92 - samples/sec: 3105.34 - lr: 0.012500
2023-04-19 22:52:5

100%|██████████| 13/13 [00:04<00:00,  3.00it/s]

2023-04-19 22:53:05,171 Evaluating as a multi-label problem: False
2023-04-19 22:53:05,192 DEV : loss 0.32843270897865295 - f1-score (micro avg)  0.3195
2023-04-19 22:53:05,262 Epoch    31: reducing learning rate of group 0 to 6.2500e-03.
2023-04-19 22:53:05,265 BAD EPOCHS (no improvement): 4
2023-04-19 22:53:05,275 ----------------------------------------------------------------------------------------------------





2023-04-19 22:53:09,978 epoch 32 - iter 11/114 - loss 0.29363330 - time (sec): 4.70 - samples/sec: 2640.48 - lr: 0.006250
2023-04-19 22:53:13,261 epoch 32 - iter 22/114 - loss 0.30137717 - time (sec): 7.98 - samples/sec: 3028.28 - lr: 0.006250
2023-04-19 22:53:17,115 epoch 32 - iter 33/114 - loss 0.31426191 - time (sec): 11.84 - samples/sec: 3098.20 - lr: 0.006250
2023-04-19 22:53:20,599 epoch 32 - iter 44/114 - loss 0.32069584 - time (sec): 15.32 - samples/sec: 3203.93 - lr: 0.006250
2023-04-19 22:53:25,208 epoch 32 - iter 55/114 - loss 0.32063365 - time (sec): 19.93 - samples/sec: 3103.27 - lr: 0.006250
2023-04-19 22:53:29,351 epoch 32 - iter 66/114 - loss 0.32302764 - time (sec): 24.07 - samples/sec: 3070.65 - lr: 0.006250
2023-04-19 22:53:32,572 epoch 32 - iter 77/114 - loss 0.32263427 - time (sec): 27.30 - samples/sec: 3152.84 - lr: 0.006250
2023-04-19 22:53:36,432 epoch 32 - iter 88/114 - loss 0.32453203 - time (sec): 31.16 - samples/sec: 3152.40 - lr: 0.006250
2023-04-19 22:53:4

100%|██████████| 13/13 [00:04<00:00,  3.01it/s]

2023-04-19 22:53:49,556 Evaluating as a multi-label problem: False
2023-04-19 22:53:49,577 DEV : loss 0.32722148299217224 - f1-score (micro avg)  0.3127
2023-04-19 22:53:49,653 BAD EPOCHS (no improvement): 1
2023-04-19 22:53:49,663 ----------------------------------------------------------------------------------------------------





2023-04-19 22:53:53,650 epoch 33 - iter 11/114 - loss 0.30002217 - time (sec): 3.98 - samples/sec: 3197.18 - lr: 0.006250
2023-04-19 22:53:57,193 epoch 33 - iter 22/114 - loss 0.33025462 - time (sec): 7.52 - samples/sec: 3263.69 - lr: 0.006250
2023-04-19 22:54:00,674 epoch 33 - iter 33/114 - loss 0.32164798 - time (sec): 11.00 - samples/sec: 3360.78 - lr: 0.006250
2023-04-19 22:54:05,155 epoch 33 - iter 44/114 - loss 0.32608958 - time (sec): 15.49 - samples/sec: 3205.06 - lr: 0.006250
2023-04-19 22:54:09,029 epoch 33 - iter 55/114 - loss 0.32823575 - time (sec): 19.36 - samples/sec: 3204.79 - lr: 0.006250
2023-04-19 22:54:13,082 epoch 33 - iter 66/114 - loss 0.32665269 - time (sec): 23.41 - samples/sec: 3177.98 - lr: 0.006250
2023-04-19 22:54:17,630 epoch 33 - iter 77/114 - loss 0.32126251 - time (sec): 27.96 - samples/sec: 3129.49 - lr: 0.006250
2023-04-19 22:54:21,289 epoch 33 - iter 88/114 - loss 0.31894431 - time (sec): 31.62 - samples/sec: 3138.89 - lr: 0.006250
2023-04-19 22:54:2

100%|██████████| 13/13 [00:04<00:00,  3.03it/s]

2023-04-19 22:54:34,463 Evaluating as a multi-label problem: False
2023-04-19 22:54:34,485 DEV : loss 0.3275053799152374 - f1-score (micro avg)  0.318
2023-04-19 22:54:34,556 BAD EPOCHS (no improvement): 2
2023-04-19 22:54:34,563 ----------------------------------------------------------------------------------------------------





2023-04-19 22:54:38,280 epoch 34 - iter 11/114 - loss 0.31399608 - time (sec): 3.72 - samples/sec: 3387.01 - lr: 0.006250
2023-04-19 22:54:42,181 epoch 34 - iter 22/114 - loss 0.31993740 - time (sec): 7.62 - samples/sec: 3248.03 - lr: 0.006250
2023-04-19 22:54:46,507 epoch 34 - iter 33/114 - loss 0.31826775 - time (sec): 11.94 - samples/sec: 3151.47 - lr: 0.006250
2023-04-19 22:54:49,568 epoch 34 - iter 44/114 - loss 0.31804403 - time (sec): 15.00 - samples/sec: 3303.39 - lr: 0.006250
2023-04-19 22:54:53,336 epoch 34 - iter 55/114 - loss 0.31828865 - time (sec): 18.77 - samples/sec: 3301.47 - lr: 0.006250
2023-04-19 22:54:57,633 epoch 34 - iter 66/114 - loss 0.31835450 - time (sec): 23.07 - samples/sec: 3227.21 - lr: 0.006250
2023-04-19 22:55:01,634 epoch 34 - iter 77/114 - loss 0.32213616 - time (sec): 27.07 - samples/sec: 3212.69 - lr: 0.006250
2023-04-19 22:55:04,970 epoch 34 - iter 88/114 - loss 0.31857645 - time (sec): 30.41 - samples/sec: 3251.82 - lr: 0.006250
2023-04-19 22:55:0

100%|██████████| 13/13 [00:05<00:00,  2.51it/s]

2023-04-19 22:55:19,076 Evaluating as a multi-label problem: False
2023-04-19 22:55:19,096 DEV : loss 0.327154278755188 - f1-score (micro avg)  0.3091
2023-04-19 22:55:19,172 BAD EPOCHS (no improvement): 3
2023-04-19 22:55:19,180 ----------------------------------------------------------------------------------------------------





2023-04-19 22:55:22,195 epoch 35 - iter 11/114 - loss 0.31083514 - time (sec): 3.01 - samples/sec: 3974.67 - lr: 0.006250
2023-04-19 22:55:25,150 epoch 35 - iter 22/114 - loss 0.31818953 - time (sec): 5.97 - samples/sec: 4038.41 - lr: 0.006250
2023-04-19 22:55:29,992 epoch 35 - iter 33/114 - loss 0.33004999 - time (sec): 10.81 - samples/sec: 3369.95 - lr: 0.006250
2023-04-19 22:55:34,457 epoch 35 - iter 44/114 - loss 0.32640023 - time (sec): 15.27 - samples/sec: 3196.10 - lr: 0.006250
2023-04-19 22:55:38,158 epoch 35 - iter 55/114 - loss 0.32813533 - time (sec): 18.98 - samples/sec: 3247.64 - lr: 0.006250
2023-04-19 22:55:42,018 epoch 35 - iter 66/114 - loss 0.32585440 - time (sec): 22.84 - samples/sec: 3255.50 - lr: 0.006250
2023-04-19 22:55:47,001 epoch 35 - iter 77/114 - loss 0.32570177 - time (sec): 27.82 - samples/sec: 3109.80 - lr: 0.006250
2023-04-19 22:55:50,531 epoch 35 - iter 88/114 - loss 0.32443538 - time (sec): 31.35 - samples/sec: 3168.06 - lr: 0.006250
2023-04-19 22:55:5

100%|██████████| 13/13 [00:05<00:00,  2.33it/s]

2023-04-19 22:56:04,390 Evaluating as a multi-label problem: False
2023-04-19 22:56:04,412 DEV : loss 0.3268199563026428 - f1-score (micro avg)  0.3172
2023-04-19 22:56:04,482 Epoch    35: reducing learning rate of group 0 to 3.1250e-03.
2023-04-19 22:56:04,484 BAD EPOCHS (no improvement): 4
2023-04-19 22:56:04,492 ----------------------------------------------------------------------------------------------------





2023-04-19 22:56:08,213 epoch 36 - iter 11/114 - loss 0.35920252 - time (sec): 3.72 - samples/sec: 3374.68 - lr: 0.003125
2023-04-19 22:56:11,643 epoch 36 - iter 22/114 - loss 0.32925864 - time (sec): 7.15 - samples/sec: 3451.86 - lr: 0.003125
2023-04-19 22:56:15,692 epoch 36 - iter 33/114 - loss 0.33489738 - time (sec): 11.20 - samples/sec: 3340.16 - lr: 0.003125
2023-04-19 22:56:19,986 epoch 36 - iter 44/114 - loss 0.32622486 - time (sec): 15.49 - samples/sec: 3220.81 - lr: 0.003125
2023-04-19 22:56:23,476 epoch 36 - iter 55/114 - loss 0.32283716 - time (sec): 18.98 - samples/sec: 3273.62 - lr: 0.003125
2023-04-19 22:56:26,800 epoch 36 - iter 66/114 - loss 0.32021865 - time (sec): 22.30 - samples/sec: 3339.46 - lr: 0.003125
2023-04-19 22:56:30,207 epoch 36 - iter 77/114 - loss 0.31798028 - time (sec): 25.71 - samples/sec: 3377.15 - lr: 0.003125
2023-04-19 22:56:34,183 epoch 36 - iter 88/114 - loss 0.32126587 - time (sec): 29.69 - samples/sec: 3319.55 - lr: 0.003125
2023-04-19 22:56:3

100%|██████████| 13/13 [00:06<00:00,  2.06it/s]

2023-04-19 22:56:49,007 Evaluating as a multi-label problem: False
2023-04-19 22:56:49,039 DEV : loss 0.32662126421928406 - f1-score (micro avg)  0.3131
2023-04-19 22:56:49,152 BAD EPOCHS (no improvement): 1
2023-04-19 22:56:49,159 ----------------------------------------------------------------------------------------------------





2023-04-19 22:56:53,129 epoch 37 - iter 11/114 - loss 0.31443448 - time (sec): 3.97 - samples/sec: 3163.05 - lr: 0.003125
2023-04-19 22:56:56,562 epoch 37 - iter 22/114 - loss 0.32188187 - time (sec): 7.40 - samples/sec: 3322.33 - lr: 0.003125
2023-04-19 22:57:01,346 epoch 37 - iter 33/114 - loss 0.32259748 - time (sec): 12.18 - samples/sec: 3018.22 - lr: 0.003125
2023-04-19 22:57:05,077 epoch 37 - iter 44/114 - loss 0.32105741 - time (sec): 15.92 - samples/sec: 3075.28 - lr: 0.003125
2023-04-19 22:57:08,549 epoch 37 - iter 55/114 - loss 0.32096357 - time (sec): 19.39 - samples/sec: 3179.07 - lr: 0.003125
2023-04-19 22:57:12,347 epoch 37 - iter 66/114 - loss 0.32145641 - time (sec): 23.18 - samples/sec: 3194.09 - lr: 0.003125
2023-04-19 22:57:15,830 epoch 37 - iter 77/114 - loss 0.31924584 - time (sec): 26.67 - samples/sec: 3229.30 - lr: 0.003125
2023-04-19 22:57:20,303 epoch 37 - iter 88/114 - loss 0.31743588 - time (sec): 31.14 - samples/sec: 3179.64 - lr: 0.003125
2023-04-19 22:57:2

100%|██████████| 13/13 [00:05<00:00,  2.28it/s]

2023-04-19 22:57:34,171 Evaluating as a multi-label problem: False
2023-04-19 22:57:34,207 DEV : loss 0.3265036642551422 - f1-score (micro avg)  0.314
2023-04-19 22:57:34,326 BAD EPOCHS (no improvement): 2
2023-04-19 22:57:34,332 ----------------------------------------------------------------------------------------------------





2023-04-19 22:57:37,598 epoch 38 - iter 11/114 - loss 0.34860065 - time (sec): 3.26 - samples/sec: 3674.75 - lr: 0.003125
2023-04-19 22:57:41,374 epoch 38 - iter 22/114 - loss 0.35041132 - time (sec): 7.04 - samples/sec: 3472.11 - lr: 0.003125
2023-04-19 22:57:45,252 epoch 38 - iter 33/114 - loss 0.34013967 - time (sec): 10.92 - samples/sec: 3427.93 - lr: 0.003125
2023-04-19 22:57:48,923 epoch 38 - iter 44/114 - loss 0.34213047 - time (sec): 14.59 - samples/sec: 3378.77 - lr: 0.003125
2023-04-19 22:57:52,669 epoch 38 - iter 55/114 - loss 0.33122567 - time (sec): 18.33 - samples/sec: 3376.65 - lr: 0.003125
2023-04-19 22:57:56,291 epoch 38 - iter 66/114 - loss 0.32952545 - time (sec): 21.96 - samples/sec: 3398.15 - lr: 0.003125
2023-04-19 22:58:00,064 epoch 38 - iter 77/114 - loss 0.32698586 - time (sec): 25.73 - samples/sec: 3373.25 - lr: 0.003125
2023-04-19 22:58:04,763 epoch 38 - iter 88/114 - loss 0.32559927 - time (sec): 30.43 - samples/sec: 3268.98 - lr: 0.003125
2023-04-19 22:58:0

100%|██████████| 13/13 [00:04<00:00,  2.86it/s]

2023-04-19 22:58:17,589 Evaluating as a multi-label problem: False
2023-04-19 22:58:17,629 DEV : loss 0.32615500688552856 - f1-score (micro avg)  0.3119





2023-04-19 22:58:17,777 BAD EPOCHS (no improvement): 3
2023-04-19 22:58:17,787 ----------------------------------------------------------------------------------------------------
2023-04-19 22:58:21,587 epoch 39 - iter 11/114 - loss 0.29763190 - time (sec): 3.80 - samples/sec: 3216.96 - lr: 0.003125
2023-04-19 22:58:25,394 epoch 39 - iter 22/114 - loss 0.31227008 - time (sec): 7.60 - samples/sec: 3216.53 - lr: 0.003125
2023-04-19 22:58:28,928 epoch 39 - iter 33/114 - loss 0.31801370 - time (sec): 11.14 - samples/sec: 3259.00 - lr: 0.003125
2023-04-19 22:58:33,100 epoch 39 - iter 44/114 - loss 0.31654319 - time (sec): 15.31 - samples/sec: 3180.23 - lr: 0.003125
2023-04-19 22:58:37,646 epoch 39 - iter 55/114 - loss 0.31949077 - time (sec): 19.86 - samples/sec: 3072.61 - lr: 0.003125
2023-04-19 22:58:41,948 epoch 39 - iter 66/114 - loss 0.32363776 - time (sec): 24.16 - samples/sec: 3067.31 - lr: 0.003125
2023-04-19 22:58:45,623 epoch 39 - iter 77/114 - loss 0.32177267 - time (sec): 27.83

100%|██████████| 13/13 [00:04<00:00,  3.02it/s]

2023-04-19 22:59:02,159 Evaluating as a multi-label problem: False
2023-04-19 22:59:02,182 DEV : loss 0.3261478543281555 - f1-score (micro avg)  0.3122
2023-04-19 22:59:02,250 Epoch    39: reducing learning rate of group 0 to 1.5625e-03.
2023-04-19 22:59:02,251 BAD EPOCHS (no improvement): 4
2023-04-19 22:59:02,261 ----------------------------------------------------------------------------------------------------





2023-04-19 22:59:06,831 epoch 40 - iter 11/114 - loss 0.31503378 - time (sec): 4.57 - samples/sec: 2632.45 - lr: 0.001563
2023-04-19 22:59:10,995 epoch 40 - iter 22/114 - loss 0.31770626 - time (sec): 8.73 - samples/sec: 2812.58 - lr: 0.001563
2023-04-19 22:59:14,119 epoch 40 - iter 33/114 - loss 0.31702168 - time (sec): 11.85 - samples/sec: 3097.69 - lr: 0.001563
2023-04-19 22:59:18,013 epoch 40 - iter 44/114 - loss 0.32726342 - time (sec): 15.75 - samples/sec: 3140.14 - lr: 0.001563
2023-04-19 22:59:21,829 epoch 40 - iter 55/114 - loss 0.32963872 - time (sec): 19.56 - samples/sec: 3161.13 - lr: 0.001563
2023-04-19 22:59:26,165 epoch 40 - iter 66/114 - loss 0.32663245 - time (sec): 23.90 - samples/sec: 3116.58 - lr: 0.001563
2023-04-19 22:59:29,913 epoch 40 - iter 77/114 - loss 0.32435959 - time (sec): 27.65 - samples/sec: 3138.32 - lr: 0.001563
2023-04-19 22:59:32,906 epoch 40 - iter 88/114 - loss 0.32074358 - time (sec): 30.64 - samples/sec: 3232.88 - lr: 0.001563
2023-04-19 22:59:3

100%|██████████| 13/13 [00:04<00:00,  3.04it/s]

2023-04-19 22:59:45,957 Evaluating as a multi-label problem: False
2023-04-19 22:59:45,980 DEV : loss 0.32619181275367737 - f1-score (micro avg)  0.314
2023-04-19 22:59:46,047 BAD EPOCHS (no improvement): 1
2023-04-19 22:59:46,054 ----------------------------------------------------------------------------------------------------





2023-04-19 22:59:49,957 epoch 41 - iter 11/114 - loss 0.33164718 - time (sec): 3.90 - samples/sec: 3245.42 - lr: 0.001563
2023-04-19 22:59:55,553 epoch 41 - iter 22/114 - loss 0.32352430 - time (sec): 9.49 - samples/sec: 2660.16 - lr: 0.001563
2023-04-19 22:59:58,857 epoch 41 - iter 33/114 - loss 0.32662781 - time (sec): 12.80 - samples/sec: 2935.65 - lr: 0.001563
2023-04-19 23:00:02,699 epoch 41 - iter 44/114 - loss 0.32794593 - time (sec): 16.64 - samples/sec: 3030.45 - lr: 0.001563
2023-04-19 23:00:06,507 epoch 41 - iter 55/114 - loss 0.32110918 - time (sec): 20.45 - samples/sec: 3065.63 - lr: 0.001563
2023-04-19 23:00:10,007 epoch 41 - iter 66/114 - loss 0.31903218 - time (sec): 23.95 - samples/sec: 3088.82 - lr: 0.001563
2023-04-19 23:00:13,219 epoch 41 - iter 77/114 - loss 0.31656773 - time (sec): 27.16 - samples/sec: 3163.71 - lr: 0.001563
2023-04-19 23:00:17,170 epoch 41 - iter 88/114 - loss 0.31957918 - time (sec): 31.11 - samples/sec: 3185.59 - lr: 0.001563
2023-04-19 23:00:2

100%|██████████| 13/13 [00:04<00:00,  2.88it/s]

2023-04-19 23:00:31,029 Evaluating as a multi-label problem: False
2023-04-19 23:00:31,051 DEV : loss 0.3259912431240082 - f1-score (micro avg)  0.3132
2023-04-19 23:00:31,122 BAD EPOCHS (no improvement): 2
2023-04-19 23:00:31,131 ----------------------------------------------------------------------------------------------------





2023-04-19 23:00:35,144 epoch 42 - iter 11/114 - loss 0.33696982 - time (sec): 4.01 - samples/sec: 3139.31 - lr: 0.001563
2023-04-19 23:00:39,026 epoch 42 - iter 22/114 - loss 0.33424615 - time (sec): 7.89 - samples/sec: 3202.24 - lr: 0.001563
2023-04-19 23:00:42,585 epoch 42 - iter 33/114 - loss 0.32627691 - time (sec): 11.45 - samples/sec: 3264.41 - lr: 0.001563
2023-04-19 23:00:46,543 epoch 42 - iter 44/114 - loss 0.33230614 - time (sec): 15.41 - samples/sec: 3233.85 - lr: 0.001563
2023-04-19 23:00:50,267 epoch 42 - iter 55/114 - loss 0.33080986 - time (sec): 19.13 - samples/sec: 3256.70 - lr: 0.001563
2023-04-19 23:00:53,792 epoch 42 - iter 66/114 - loss 0.32430638 - time (sec): 22.66 - samples/sec: 3295.09 - lr: 0.001563
2023-04-19 23:00:57,903 epoch 42 - iter 77/114 - loss 0.32616623 - time (sec): 26.77 - samples/sec: 3267.05 - lr: 0.001563
2023-04-19 23:01:01,473 epoch 42 - iter 88/114 - loss 0.32646568 - time (sec): 30.34 - samples/sec: 3293.48 - lr: 0.001563
2023-04-19 23:01:0

100%|██████████| 13/13 [00:05<00:00,  2.21it/s]

2023-04-19 23:01:15,190 Evaluating as a multi-label problem: False
2023-04-19 23:01:15,210 DEV : loss 0.3260800242424011 - f1-score (micro avg)  0.3134
2023-04-19 23:01:15,277 BAD EPOCHS (no improvement): 3
2023-04-19 23:01:15,284 ----------------------------------------------------------------------------------------------------





2023-04-19 23:01:18,795 epoch 43 - iter 11/114 - loss 0.32389705 - time (sec): 3.51 - samples/sec: 3491.57 - lr: 0.001563
2023-04-19 23:01:22,602 epoch 43 - iter 22/114 - loss 0.30960766 - time (sec): 7.31 - samples/sec: 3414.15 - lr: 0.001563
2023-04-19 23:01:27,524 epoch 43 - iter 33/114 - loss 0.31603709 - time (sec): 12.23 - samples/sec: 3067.77 - lr: 0.001563
2023-04-19 23:01:31,580 epoch 43 - iter 44/114 - loss 0.31640341 - time (sec): 16.29 - samples/sec: 3071.69 - lr: 0.001563
2023-04-19 23:01:34,758 epoch 43 - iter 55/114 - loss 0.31738841 - time (sec): 19.47 - samples/sec: 3165.06 - lr: 0.001563
2023-04-19 23:01:37,893 epoch 43 - iter 66/114 - loss 0.31773593 - time (sec): 22.60 - samples/sec: 3262.00 - lr: 0.001563
2023-04-19 23:01:42,455 epoch 43 - iter 77/114 - loss 0.32156930 - time (sec): 27.17 - samples/sec: 3189.77 - lr: 0.001563
2023-04-19 23:01:46,552 epoch 43 - iter 88/114 - loss 0.32459712 - time (sec): 31.26 - samples/sec: 3162.28 - lr: 0.001563
2023-04-19 23:01:4

100%|██████████| 13/13 [00:06<00:00,  2.08it/s]

2023-04-19 23:02:00,794 Evaluating as a multi-label problem: False
2023-04-19 23:02:00,819 DEV : loss 0.3260837495326996 - f1-score (micro avg)  0.3135
2023-04-19 23:02:00,894 Epoch    43: reducing learning rate of group 0 to 7.8125e-04.
2023-04-19 23:02:00,896 BAD EPOCHS (no improvement): 4
2023-04-19 23:02:00,904 ----------------------------------------------------------------------------------------------------





2023-04-19 23:02:04,349 epoch 44 - iter 11/114 - loss 0.30322412 - time (sec): 3.44 - samples/sec: 3499.25 - lr: 0.000781
2023-04-19 23:02:08,001 epoch 44 - iter 22/114 - loss 0.29320388 - time (sec): 7.10 - samples/sec: 3418.99 - lr: 0.000781
2023-04-19 23:02:11,970 epoch 44 - iter 33/114 - loss 0.30805291 - time (sec): 11.06 - samples/sec: 3312.11 - lr: 0.000781
2023-04-19 23:02:16,109 epoch 44 - iter 44/114 - loss 0.31823945 - time (sec): 15.20 - samples/sec: 3209.65 - lr: 0.000781
2023-04-19 23:02:19,356 epoch 44 - iter 55/114 - loss 0.32011359 - time (sec): 18.45 - samples/sec: 3285.54 - lr: 0.000781
2023-04-19 23:02:22,775 epoch 44 - iter 66/114 - loss 0.32221703 - time (sec): 21.87 - samples/sec: 3343.12 - lr: 0.000781
2023-04-19 23:02:26,819 epoch 44 - iter 77/114 - loss 0.32298653 - time (sec): 25.91 - samples/sec: 3319.85 - lr: 0.000781
2023-04-19 23:02:31,129 epoch 44 - iter 88/114 - loss 0.32290603 - time (sec): 30.22 - samples/sec: 3272.20 - lr: 0.000781
2023-04-19 23:02:3

100%|██████████| 13/13 [00:05<00:00,  2.47it/s]

2023-04-19 23:02:44,841 Evaluating as a multi-label problem: False
2023-04-19 23:02:44,869 DEV : loss 0.3258892595767975 - f1-score (micro avg)  0.313
2023-04-19 23:02:44,994 BAD EPOCHS (no improvement): 1
2023-04-19 23:02:45,008 ----------------------------------------------------------------------------------------------------





2023-04-19 23:02:48,887 epoch 45 - iter 11/114 - loss 0.30999831 - time (sec): 3.88 - samples/sec: 3204.52 - lr: 0.000781
2023-04-19 23:02:51,945 epoch 45 - iter 22/114 - loss 0.31087358 - time (sec): 6.93 - samples/sec: 3460.42 - lr: 0.000781
2023-04-19 23:02:56,243 epoch 45 - iter 33/114 - loss 0.31580270 - time (sec): 11.23 - samples/sec: 3264.61 - lr: 0.000781
2023-04-19 23:03:00,698 epoch 45 - iter 44/114 - loss 0.31445280 - time (sec): 15.69 - samples/sec: 3120.33 - lr: 0.000781
2023-04-19 23:03:04,759 epoch 45 - iter 55/114 - loss 0.31642638 - time (sec): 19.75 - samples/sec: 3107.92 - lr: 0.000781
2023-04-19 23:03:08,207 epoch 45 - iter 66/114 - loss 0.31598446 - time (sec): 23.20 - samples/sec: 3192.21 - lr: 0.000781
2023-04-19 23:03:11,387 epoch 45 - iter 77/114 - loss 0.31527024 - time (sec): 26.38 - samples/sec: 3264.82 - lr: 0.000781
2023-04-19 23:03:15,056 epoch 45 - iter 88/114 - loss 0.31462758 - time (sec): 30.05 - samples/sec: 3265.44 - lr: 0.000781
2023-04-19 23:03:1

100%|██████████| 13/13 [00:04<00:00,  2.72it/s]

2023-04-19 23:03:29,818 Evaluating as a multi-label problem: False
2023-04-19 23:03:29,853 DEV : loss 0.32574281096458435 - f1-score (micro avg)  0.3132
2023-04-19 23:03:29,987 BAD EPOCHS (no improvement): 2
2023-04-19 23:03:29,999 ----------------------------------------------------------------------------------------------------





2023-04-19 23:03:34,684 epoch 46 - iter 11/114 - loss 0.31970006 - time (sec): 4.68 - samples/sec: 2609.72 - lr: 0.000781
2023-04-19 23:03:37,761 epoch 46 - iter 22/114 - loss 0.30789246 - time (sec): 7.76 - samples/sec: 3093.01 - lr: 0.000781
2023-04-19 23:03:41,563 epoch 46 - iter 33/114 - loss 0.32773885 - time (sec): 11.56 - samples/sec: 3157.51 - lr: 0.000781
2023-04-19 23:03:44,976 epoch 46 - iter 44/114 - loss 0.31951037 - time (sec): 14.97 - samples/sec: 3261.68 - lr: 0.000781
2023-04-19 23:03:49,486 epoch 46 - iter 55/114 - loss 0.31306761 - time (sec): 19.48 - samples/sec: 3183.00 - lr: 0.000781
2023-04-19 23:03:52,498 epoch 46 - iter 66/114 - loss 0.31441387 - time (sec): 22.50 - samples/sec: 3287.47 - lr: 0.000781
2023-04-19 23:03:56,166 epoch 46 - iter 77/114 - loss 0.31896079 - time (sec): 26.16 - samples/sec: 3297.32 - lr: 0.000781
2023-04-19 23:03:59,703 epoch 46 - iter 88/114 - loss 0.31923309 - time (sec): 29.70 - samples/sec: 3328.17 - lr: 0.000781
2023-04-19 23:04:0

100%|██████████| 13/13 [00:04<00:00,  2.96it/s]

2023-04-19 23:04:13,694 Evaluating as a multi-label problem: False
2023-04-19 23:04:13,717 DEV : loss 0.3257792890071869 - f1-score (micro avg)  0.3123
2023-04-19 23:04:13,787 BAD EPOCHS (no improvement): 3
2023-04-19 23:04:13,796 ----------------------------------------------------------------------------------------------------





2023-04-19 23:04:17,605 epoch 47 - iter 11/114 - loss 0.32839736 - time (sec): 3.80 - samples/sec: 3324.17 - lr: 0.000781
2023-04-19 23:04:22,016 epoch 47 - iter 22/114 - loss 0.31083024 - time (sec): 8.22 - samples/sec: 3057.31 - lr: 0.000781
2023-04-19 23:04:25,776 epoch 47 - iter 33/114 - loss 0.31131606 - time (sec): 11.98 - samples/sec: 3164.23 - lr: 0.000781
2023-04-19 23:04:30,250 epoch 47 - iter 44/114 - loss 0.31199388 - time (sec): 16.45 - samples/sec: 3078.10 - lr: 0.000781
2023-04-19 23:04:34,405 epoch 47 - iter 55/114 - loss 0.31553234 - time (sec): 20.60 - samples/sec: 3055.15 - lr: 0.000781
2023-04-19 23:04:39,059 epoch 47 - iter 66/114 - loss 0.31268569 - time (sec): 25.26 - samples/sec: 2987.71 - lr: 0.000781
2023-04-19 23:04:42,176 epoch 47 - iter 77/114 - loss 0.31760274 - time (sec): 28.38 - samples/sec: 3096.37 - lr: 0.000781
2023-04-19 23:04:45,063 epoch 47 - iter 88/114 - loss 0.31756487 - time (sec): 31.26 - samples/sec: 3187.20 - lr: 0.000781
2023-04-19 23:04:4

100%|██████████| 13/13 [00:04<00:00,  3.00it/s]

2023-04-19 23:04:58,577 Evaluating as a multi-label problem: False
2023-04-19 23:04:58,601 DEV : loss 0.3258054554462433 - f1-score (micro avg)  0.3121
2023-04-19 23:04:58,677 Epoch    47: reducing learning rate of group 0 to 3.9063e-04.
2023-04-19 23:04:58,679 BAD EPOCHS (no improvement): 4
2023-04-19 23:04:58,687 ----------------------------------------------------------------------------------------------------





2023-04-19 23:05:02,154 epoch 48 - iter 11/114 - loss 0.30119853 - time (sec): 3.46 - samples/sec: 3426.17 - lr: 0.000391
2023-04-19 23:05:06,463 epoch 48 - iter 22/114 - loss 0.30521291 - time (sec): 7.77 - samples/sec: 3094.09 - lr: 0.000391
2023-04-19 23:05:10,375 epoch 48 - iter 33/114 - loss 0.31819786 - time (sec): 11.68 - samples/sec: 3117.58 - lr: 0.000391
2023-04-19 23:05:13,800 epoch 48 - iter 44/114 - loss 0.31569972 - time (sec): 15.11 - samples/sec: 3212.77 - lr: 0.000391
2023-04-19 23:05:17,262 epoch 48 - iter 55/114 - loss 0.31724620 - time (sec): 18.57 - samples/sec: 3283.41 - lr: 0.000391
2023-04-19 23:05:21,144 epoch 48 - iter 66/114 - loss 0.31585967 - time (sec): 22.45 - samples/sec: 3261.93 - lr: 0.000391
2023-04-19 23:05:25,692 epoch 48 - iter 77/114 - loss 0.31909873 - time (sec): 27.00 - samples/sec: 3176.60 - lr: 0.000391
2023-04-19 23:05:28,772 epoch 48 - iter 88/114 - loss 0.32020751 - time (sec): 30.08 - samples/sec: 3269.84 - lr: 0.000391
2023-04-19 23:05:3

100%|██████████| 13/13 [00:04<00:00,  2.63it/s]

2023-04-19 23:05:42,329 Evaluating as a multi-label problem: False
2023-04-19 23:05:42,351 DEV : loss 0.3258017897605896 - f1-score (micro avg)  0.3132
2023-04-19 23:05:42,419 BAD EPOCHS (no improvement): 1
2023-04-19 23:05:42,427 ----------------------------------------------------------------------------------------------------





2023-04-19 23:05:45,760 epoch 49 - iter 11/114 - loss 0.29517860 - time (sec): 3.33 - samples/sec: 3829.83 - lr: 0.000391
2023-04-19 23:05:49,508 epoch 49 - iter 22/114 - loss 0.31322983 - time (sec): 7.07 - samples/sec: 3575.02 - lr: 0.000391
2023-04-19 23:05:53,316 epoch 49 - iter 33/114 - loss 0.30915191 - time (sec): 10.88 - samples/sec: 3442.81 - lr: 0.000391
2023-04-19 23:05:58,271 epoch 49 - iter 44/114 - loss 0.32013410 - time (sec): 15.84 - samples/sec: 3140.50 - lr: 0.000391
2023-04-19 23:06:01,826 epoch 49 - iter 55/114 - loss 0.31957057 - time (sec): 19.39 - samples/sec: 3201.23 - lr: 0.000391
2023-04-19 23:06:05,157 epoch 49 - iter 66/114 - loss 0.32486878 - time (sec): 22.72 - samples/sec: 3262.62 - lr: 0.000391
2023-04-19 23:06:10,347 epoch 49 - iter 77/114 - loss 0.32545516 - time (sec): 27.91 - samples/sec: 3114.44 - lr: 0.000391
2023-04-19 23:06:13,984 epoch 49 - iter 88/114 - loss 0.31943124 - time (sec): 31.55 - samples/sec: 3153.36 - lr: 0.000391
2023-04-19 23:06:1

100%|██████████| 13/13 [00:05<00:00,  2.34it/s]

2023-04-19 23:06:27,802 Evaluating as a multi-label problem: False
2023-04-19 23:06:27,827 DEV : loss 0.3257328271865845 - f1-score (micro avg)  0.3132
2023-04-19 23:06:27,903 BAD EPOCHS (no improvement): 2
2023-04-19 23:06:27,908 ----------------------------------------------------------------------------------------------------





2023-04-19 23:06:31,902 epoch 50 - iter 11/114 - loss 0.31930536 - time (sec): 3.99 - samples/sec: 3248.93 - lr: 0.000391
2023-04-19 23:06:35,040 epoch 50 - iter 22/114 - loss 0.33100957 - time (sec): 7.13 - samples/sec: 3549.34 - lr: 0.000391
2023-04-19 23:06:39,206 epoch 50 - iter 33/114 - loss 0.32463232 - time (sec): 11.29 - samples/sec: 3294.61 - lr: 0.000391
2023-04-19 23:06:43,581 epoch 50 - iter 44/114 - loss 0.33090296 - time (sec): 15.67 - samples/sec: 3175.29 - lr: 0.000391
2023-04-19 23:06:46,969 epoch 50 - iter 55/114 - loss 0.32746717 - time (sec): 19.06 - samples/sec: 3246.07 - lr: 0.000391
2023-04-19 23:06:50,218 epoch 50 - iter 66/114 - loss 0.32808092 - time (sec): 22.31 - samples/sec: 3309.01 - lr: 0.000391
2023-04-19 23:06:54,487 epoch 50 - iter 77/114 - loss 0.32432160 - time (sec): 26.58 - samples/sec: 3260.60 - lr: 0.000391
2023-04-19 23:06:58,871 epoch 50 - iter 88/114 - loss 0.32327072 - time (sec): 30.96 - samples/sec: 3224.37 - lr: 0.000391
2023-04-19 23:07:0

100%|██████████| 13/13 [00:06<00:00,  2.06it/s]

2023-04-19 23:07:12,502 Evaluating as a multi-label problem: False
2023-04-19 23:07:12,532 DEV : loss 0.32569730281829834 - f1-score (micro avg)  0.3134
2023-04-19 23:07:12,614 BAD EPOCHS (no improvement): 3
2023-04-19 23:07:12,622 ----------------------------------------------------------------------------------------------------





2023-04-19 23:07:16,413 epoch 51 - iter 11/114 - loss 0.31251271 - time (sec): 3.79 - samples/sec: 3285.20 - lr: 0.000391
2023-04-19 23:07:19,927 epoch 51 - iter 22/114 - loss 0.31030512 - time (sec): 7.30 - samples/sec: 3329.37 - lr: 0.000391
2023-04-19 23:07:23,689 epoch 51 - iter 33/114 - loss 0.31427849 - time (sec): 11.06 - samples/sec: 3343.47 - lr: 0.000391
2023-04-19 23:07:27,666 epoch 51 - iter 44/114 - loss 0.31901516 - time (sec): 15.04 - samples/sec: 3266.68 - lr: 0.000391
2023-04-19 23:07:30,960 epoch 51 - iter 55/114 - loss 0.31623131 - time (sec): 18.34 - samples/sec: 3332.23 - lr: 0.000391
2023-04-19 23:07:35,434 epoch 51 - iter 66/114 - loss 0.31498941 - time (sec): 22.81 - samples/sec: 3217.73 - lr: 0.000391
2023-04-19 23:07:39,398 epoch 51 - iter 77/114 - loss 0.31695647 - time (sec): 26.77 - samples/sec: 3223.30 - lr: 0.000391
2023-04-19 23:07:43,476 epoch 51 - iter 88/114 - loss 0.31959759 - time (sec): 30.85 - samples/sec: 3211.61 - lr: 0.000391
2023-04-19 23:07:4

100%|██████████| 13/13 [00:05<00:00,  2.31it/s]

2023-04-19 23:07:57,376 Evaluating as a multi-label problem: False
2023-04-19 23:07:57,411 DEV : loss 0.32572031021118164 - f1-score (micro avg)  0.3134
2023-04-19 23:07:57,536 Epoch    51: reducing learning rate of group 0 to 1.9531e-04.
2023-04-19 23:07:57,538 BAD EPOCHS (no improvement): 4
2023-04-19 23:07:57,549 ----------------------------------------------------------------------------------------------------





2023-04-19 23:08:00,911 epoch 52 - iter 11/114 - loss 0.31325511 - time (sec): 3.36 - samples/sec: 3558.95 - lr: 0.000195
2023-04-19 23:08:04,732 epoch 52 - iter 22/114 - loss 0.30237798 - time (sec): 7.18 - samples/sec: 3445.38 - lr: 0.000195
2023-04-19 23:08:07,992 epoch 52 - iter 33/114 - loss 0.31780387 - time (sec): 10.44 - samples/sec: 3513.27 - lr: 0.000195
2023-04-19 23:08:12,521 epoch 52 - iter 44/114 - loss 0.32454651 - time (sec): 14.97 - samples/sec: 3344.02 - lr: 0.000195
2023-04-19 23:08:16,530 epoch 52 - iter 55/114 - loss 0.32573464 - time (sec): 18.98 - samples/sec: 3286.98 - lr: 0.000195
2023-04-19 23:08:19,850 epoch 52 - iter 66/114 - loss 0.32253870 - time (sec): 22.30 - samples/sec: 3336.32 - lr: 0.000195
2023-04-19 23:08:23,467 epoch 52 - iter 77/114 - loss 0.32438647 - time (sec): 25.92 - samples/sec: 3341.49 - lr: 0.000195
2023-04-19 23:08:27,707 epoch 52 - iter 88/114 - loss 0.32280488 - time (sec): 30.16 - samples/sec: 3284.12 - lr: 0.000195
2023-04-19 23:08:3

100%|██████████| 13/13 [00:04<00:00,  2.78it/s]

2023-04-19 23:08:41,092 Evaluating as a multi-label problem: False
2023-04-19 23:08:41,122 DEV : loss 0.3257114291191101 - f1-score (micro avg)  0.3134
2023-04-19 23:08:41,249 BAD EPOCHS (no improvement): 1
2023-04-19 23:08:41,259 ----------------------------------------------------------------------------------------------------





2023-04-19 23:08:45,308 epoch 53 - iter 11/114 - loss 0.34603494 - time (sec): 4.05 - samples/sec: 3016.45 - lr: 0.000195
2023-04-19 23:08:49,608 epoch 53 - iter 22/114 - loss 0.32286478 - time (sec): 8.35 - samples/sec: 3038.51 - lr: 0.000195
2023-04-19 23:08:52,950 epoch 53 - iter 33/114 - loss 0.32413933 - time (sec): 11.69 - samples/sec: 3203.49 - lr: 0.000195
2023-04-19 23:08:57,072 epoch 53 - iter 44/114 - loss 0.32424603 - time (sec): 15.81 - samples/sec: 3167.01 - lr: 0.000195
2023-04-19 23:09:02,191 epoch 53 - iter 55/114 - loss 0.32392061 - time (sec): 20.93 - samples/sec: 2986.78 - lr: 0.000195
2023-04-19 23:09:05,463 epoch 53 - iter 66/114 - loss 0.32278193 - time (sec): 24.20 - samples/sec: 3089.93 - lr: 0.000195
2023-04-19 23:09:08,616 epoch 53 - iter 77/114 - loss 0.32244183 - time (sec): 27.35 - samples/sec: 3174.18 - lr: 0.000195
2023-04-19 23:09:12,253 epoch 53 - iter 88/114 - loss 0.32125284 - time (sec): 30.99 - samples/sec: 3199.46 - lr: 0.000195
2023-04-19 23:09:1

100%|██████████| 13/13 [00:04<00:00,  3.01it/s]

2023-04-19 23:09:25,509 Evaluating as a multi-label problem: False
2023-04-19 23:09:25,530 DEV : loss 0.32573020458221436 - f1-score (micro avg)  0.3132
2023-04-19 23:09:25,601 BAD EPOCHS (no improvement): 2
2023-04-19 23:09:25,606 ----------------------------------------------------------------------------------------------------





2023-04-19 23:09:29,694 epoch 54 - iter 11/114 - loss 0.36318298 - time (sec): 4.09 - samples/sec: 3096.08 - lr: 0.000195
2023-04-19 23:09:33,659 epoch 54 - iter 22/114 - loss 0.34413052 - time (sec): 8.05 - samples/sec: 3151.53 - lr: 0.000195
2023-04-19 23:09:36,973 epoch 54 - iter 33/114 - loss 0.33930513 - time (sec): 11.37 - samples/sec: 3294.44 - lr: 0.000195
2023-04-19 23:09:40,544 epoch 54 - iter 44/114 - loss 0.33310368 - time (sec): 14.94 - samples/sec: 3325.73 - lr: 0.000195
2023-04-19 23:09:43,597 epoch 54 - iter 55/114 - loss 0.32545224 - time (sec): 17.99 - samples/sec: 3411.87 - lr: 0.000195
2023-04-19 23:09:48,527 epoch 54 - iter 66/114 - loss 0.32201932 - time (sec): 22.92 - samples/sec: 3221.25 - lr: 0.000195
2023-04-19 23:09:52,433 epoch 54 - iter 77/114 - loss 0.32351608 - time (sec): 26.83 - samples/sec: 3217.57 - lr: 0.000195
2023-04-19 23:09:56,055 epoch 54 - iter 88/114 - loss 0.32400654 - time (sec): 30.45 - samples/sec: 3241.06 - lr: 0.000195
2023-04-19 23:09:5

100%|██████████| 13/13 [00:04<00:00,  2.98it/s]

2023-04-19 23:10:09,315 Evaluating as a multi-label problem: False
2023-04-19 23:10:09,336 DEV : loss 0.32569947838783264 - f1-score (micro avg)  0.3134
2023-04-19 23:10:09,406 BAD EPOCHS (no improvement): 3
2023-04-19 23:10:09,413 ----------------------------------------------------------------------------------------------------





2023-04-19 23:10:12,602 epoch 55 - iter 11/114 - loss 0.32314702 - time (sec): 3.19 - samples/sec: 3700.95 - lr: 0.000195
2023-04-19 23:10:17,373 epoch 55 - iter 22/114 - loss 0.32823554 - time (sec): 7.96 - samples/sec: 3043.79 - lr: 0.000195
2023-04-19 23:10:21,690 epoch 55 - iter 33/114 - loss 0.31923696 - time (sec): 12.27 - samples/sec: 2998.16 - lr: 0.000195
2023-04-19 23:10:25,912 epoch 55 - iter 44/114 - loss 0.32140345 - time (sec): 16.50 - samples/sec: 3008.41 - lr: 0.000195
2023-04-19 23:10:29,108 epoch 55 - iter 55/114 - loss 0.32138787 - time (sec): 19.69 - samples/sec: 3142.89 - lr: 0.000195
2023-04-19 23:10:33,275 epoch 55 - iter 66/114 - loss 0.32128141 - time (sec): 23.86 - samples/sec: 3095.86 - lr: 0.000195
2023-04-19 23:10:36,955 epoch 55 - iter 77/114 - loss 0.31962513 - time (sec): 27.54 - samples/sec: 3131.75 - lr: 0.000195
2023-04-19 23:10:40,257 epoch 55 - iter 88/114 - loss 0.32041766 - time (sec): 30.84 - samples/sec: 3193.99 - lr: 0.000195
2023-04-19 23:10:4

100%|██████████| 13/13 [00:04<00:00,  3.01it/s]

2023-04-19 23:10:54,105 Evaluating as a multi-label problem: False
2023-04-19 23:10:54,130 DEV : loss 0.32568490505218506 - f1-score (micro avg)  0.3134
2023-04-19 23:10:54,201 Epoch    55: reducing learning rate of group 0 to 9.7656e-05.
2023-04-19 23:10:54,206 BAD EPOCHS (no improvement): 4
2023-04-19 23:10:54,213 ----------------------------------------------------------------------------------------------------
2023-04-19 23:10:54,219 ----------------------------------------------------------------------------------------------------
2023-04-19 23:10:54,220 learning rate too small - quitting training!
2023-04-19 23:10:54,225 ----------------------------------------------------------------------------------------------------





2023-04-19 23:10:56,260 ----------------------------------------------------------------------------------------------------
2023-04-19 23:10:59,695 SequenceTagger predicts: Dictionary with 67 tags: O, S-location label, B-location label, E-location label, I-location label, S-organization entity, B-organization entity, E-organization entity, I-organization entity, S-date label, B-date label, E-date label, I-date label, S-person entity, B-person entity, E-person entity, I-person entity, S-number label, B-number label, E-number label, I-number label, S-artifact label, B-artifact label, E-artifact label, I-artifact label, S-other labels, B-other labels, E-other labels, I-other labels, S-localization label, B-localization label, E-localization label, I-localization label, S-event entity, B-event entity, E-event entity, I-event entity, S-PSN, B-PSN, E-PSN, I-PSN, S-percent label, B-percent label, E-percent label, I-percent label, S-artifact entity, B-artifact entity, E-artifact entity, I-art

100%|██████████| 14/14 [00:08<00:00,  1.60it/s]

2023-04-19 23:11:08,904 Evaluating as a multi-label problem: False
2023-04-19 23:11:08,925 0.499	0.2076	0.2933	0.1805
2023-04-19 23:11:08,927 
Results:
- F-score (micro) 0.2933
- F-score (macro) 0.161
- Accuracy 0.1805

By class:
                     precision    recall  f1-score   support

     location label     0.2812    0.0664    0.1075       271
       number label     0.4124    0.6348    0.5000       115
         date label     0.7185    0.6382    0.6760       152
organization entity     0.8438    0.1444    0.2466       187
      person entity     0.2766    0.0922    0.1383       141
     artifact label     0.3333    0.0694    0.1149        72
 localization label     0.0000    0.0000    0.0000        66
       other labels     0.0000    0.0000    0.0000        55
    artifact entity     0.0000    0.0000    0.0000        27
       event entity     0.0000    0.0000    0.0000        25
      percent label     1.0000    0.4615    0.6316        13
                PSN     0.0000    0.0




{'test_score': 0.2932515337423313,
 'dev_score_history': [0.021978021978021976,
  0.19339984650805833,
  0.19892884468247896,
  0.2540415704387991,
  0.23632038065027758,
  0.2609351432880845,
  0.25166543301258326,
  0.25199362041467305,
  0.20476610767872902,
  0.2733423545331529,
  0.27627627627627627,
  0.22885572139303484,
  0.27370030581039756,
  0.23688663282571912,
  0.2754863813229572,
  0.32064128256513025,
  0.2945054945054945,
  0.3013899049012436,
  0.2944693572496263,
  0.3138105567606652,
  0.31639226914817464,
  0.30066322770817977,
  0.3063328424153166,
  0.30847212165097754,
  0.31631205673758866,
  0.3133574007220217,
  0.32152974504249293,
  0.31586503948312994,
  0.31556503198294245,
  0.31493745401030165,
  0.31953522149600583,
  0.3126801152737752,
  0.3180212014134276,
  0.30914368650217705,
  0.3172119487908962,
  0.3130807719799857,
  0.3139784946236559,
  0.31191335740072207,
  0.3122302158273382,
  0.3139784946236559,
  0.31321839080459773,
  0.3134435657800

Training with 'glove' is not the best option, but we were not able to do it with embeddings for japanese. For a better performance, even though the results are not really that bad, it will be necessary to repeat the method using embeddings trained for japanese, since the language it's far apart from english.

## Examples of use

Here we check the performance of the different models based on some examples. For that, we create a function so that then we only need to specify the model we're going to use and the text example related to it. 

In [25]:
from flair.data import Sentence
from flair.models import SequenceTagger

def predict_ner(model_path, text):
    """
    Performs NER prediction on a text using a previously loaded SequenceTagger model.

    Args:
        model_path (str): Local path of the SequenceTagger model to load.
        text (str): text to process.

    Returns:
        sentence (Sentence): Sentence object with the recognized entities and their labels.
    """
    # Load SequenceTagger model
    tagger = SequenceTagger.load(model_path)

    # Create a Sentence object with the text
    sentence = Sentence(text)

    # Pass the sentence to the SequenceTagger model to perform NER prediction
    tagger.predict(sentence)

    # Return the Sentence object with the recognized entities and their tags.
    return sentence


In [None]:
# Example of use for Spanish
model_path = "/content/drive/MyDrive/ColabNotebooks/nermodels/spanish/final-model-spanish.pt"
text = "Juan trabaja en una empresa llamada Acme en Madrid."
sentence = predict_ner(model_path, text)

# Print the sentence with the recognized entities and their labels
print(sentence.to_tagged_string('ner'))

2023-04-19 21:40:31,991 SequenceTagger predicts: Dictionary with 11 tags: <unk>, Beginning of an organization name, Inside an organization name, Beginning of a person name, Beginning of a location name, Inside a person name, Inside a miscellaneous name, Beginning of a miscellaneous name, Inside a location name, <START>, <STOP>
Sentence[10]: "Juan trabaja en una empresa llamada Acme en Madrid." → ["Juan"/<unk>, "trabaja"/<unk>, "en"/<unk>, "una"/<unk>, "empresa"/<unk>, "llamada"/<unk>, "Acme"/Beginning of an organization name, "en"/<unk>, "Madrid"/Beginning of a location name, "."/<unk>]


In [None]:
#Example of use for German
model_path = "/content/drive/MyDrive/ColabNotebooks/nermodels/german/final-model-german.pt"
text = "Die Firma ABC GmbH mit Sitz in Berlin wurde im Jahr 2005 gegründet."
sentence = predict_ner(model_path, text)

# Print the sentence with the recognized entities and their labels
print(sentence.to_tagged_string('ner'))

2023-04-20 09:26:18,022 SequenceTagger predicts: Dictionary with 27 tags: O, S-Tax, B-Tax, E-Tax, I-Tax, S-Other, B-Other, E-Other, I-Other, S-Location, B-Location, E-Location, I-Location, S-Person, B-Person, E-Person, I-Person, S-Time, B-Time, E-Time, I-Time, S-Organization, B-Organization, E-Organization, I-Organization, <START>, <STOP>
Sentence[14]: "Die Firma ABC GmbH mit Sitz in Berlin wurde im Jahr 2005 gegründet." → ["2005"/Time]


In [None]:
#Example of use for Basque
model_path = "/content/drive/MyDrive/ColabNotebooks/nermodels/basque/final-model-basque.pt"
text = "Maite Perez eta Jon Anderk Eusko Jaurlaritzako Osasun Sailaren esku jarri dute beraien kontuak."
sentence = predict_ner(model_path, text)

# Print the sentence with the recognized entities and their labels
print(sentence.to_tagged_string('ner'))

2023-04-20 10:07:46,390 SequenceTagger predicts: Dictionary with 19 tags: O, S-organization entity, B-organization entity, E-organization entity, I-organization entity, S-location entity, B-location entity, E-location entity, I-location entity, S-person entity, B-person entity, E-person entity, I-person entity, S-other label, B-other label, E-other label, I-other label, <START>, <STOP>
Sentence[15]: "Maite Perez eta Jon Anderk Eusko Jaurlaritzako Osasun Sailaren esku jarri dute beraien kontuak." → ["Maite Perez"/person entity, "Jon Anderk"/person entity, "Eusko Jaurlaritzako Osasun Sailaren"/organization entity]


In [None]:
#Example of use for Japanese
model_path = "/content/drive/MyDrive/ColabNotebooks/nermodels/japanese/final-model-japanese.pt"  
text = "株式会社XYZは2023年4月20日に設立されました。"
sentence = predict_ner(model_path, text)

# Print the sentence with the recognized entities and their labels
print(sentence.to_tagged_string('ner'))

2023-04-20 09:23:18,165 SequenceTagger predicts: Dictionary with 67 tags: O, S-location label, B-location label, E-location label, I-location label, S-organization entity, B-organization entity, E-organization entity, I-organization entity, S-date label, B-date label, E-date label, I-date label, S-person entity, B-person entity, E-person entity, I-person entity, S-number label, B-number label, E-number label, I-number label, S-artifact label, B-artifact label, E-artifact label, I-artifact label, S-other labels, B-other labels, E-other labels, I-other labels, S-localization label, B-localization label, E-localization label, I-localization label, S-event entity, B-event entity, E-event entity, I-event entity, S-PSN, B-PSN, E-PSN, I-PSN, S-percent label, B-percent label, E-percent label, I-percent label, S-artifact entity, B-artifact entity, E-artifact entity, I-artifact entity, S-money entity
Sentence[10]: "株式会社XYZは2023年4月20日に設立されました。" → ["株式会社XYZ"/organization entity, "2023年4月"/date lab

- For the Spanish language, the TARSTagger performs well, with the exception that all other text not identified as a named entity recognition (NER) tag is classified as 'unk'.
- In the case of the Basque language, we have identified examples where the labels are better identified, resulting in a high level of accuracy. 
- In German, the time-related labels show the best results, while organization-related labels are not accurately predicted, which is reflected in the sentence output. 
- For Japanese, we have observed better results with entities such as date and organization, while other labels show lower accuracy. Thus, the model's performance may vary with different sentences.

## Evaluation

Based on our observations of the results, we have concluded that the performance of the system tends to degrade with an increase in the number of labels. Our attempts to incorporate additional corpus labels with those provided in the SequenceTagger have only resulted in complications, as it interferes with the system's learning process. To enhance the performance of the models, it would be advisable to reduce the number of labels and ensure they are more specific and concrete in nature.

It is not possible to compare them with each other, because they all use different labels, even if they have some common ones. But we can make an evaluation based on the performance of each model in each of the labels.

- For the TARS model in Spanish: 


```
                                      precision    recall  f1-score   support

                            <unk>     0.0000    0.0000    0.0000         0
Beginning of an organization name     0.7258    0.9000    0.8036       100
      Inside an organization name     0.7320    0.8765    0.7978        81
     Beginning of a location name     0.8594    0.6471    0.7383        85
       Beginning of a person name     0.8261    0.9268    0.8736        41
      Inside a miscellaneous name     0.6667    0.3182    0.4308        44
             Inside a person name     0.9583    0.9200    0.9388        25
Beginning of a miscellaneous name     0.7273    0.4211    0.5333        19
           Inside a location name     0.8182    0.6000    0.6923        15

                        micro avg     0.0792    0.7512    0.1434       410
                        macro avg     0.7015    0.6233    0.6454       410
                     weighted avg     0.7760    0.7512    0.7475       410

```
As evident from our observations, some labels have not been identified as 'unk'; however, when we test actual usage examples, these labels appear in names that do not have any NER tag associated with them.

Overall, the results are fairly satisfactory, considering that the labels could have been better defined without mixing the SequenceTagger and dataset labels. Although the micro average is relatively low, the macro average performs better in comparison.

- For the TARS model in German:


```
                precision    recall  f1-score   support

         Tax     0.6389    0.5750    0.6053        80
       Other     0.4091    0.2000    0.2687        45
        Time     0.8947    0.6800    0.7727        25
    Location     0.6000    0.0811    0.1429        37
      Person     0.4667    0.3889    0.4242        18
Organization     0.0000    0.0000    0.0000         1

   micro avg     0.6165    0.3981    0.4838       206
   macro avg     0.5016    0.3208    0.3690       206
weighted avg     0.5946    0.3981    0.4502       206
```

For German the tags have less information. Considering that it is recommended to be more descriptive with the labels, this could have affected the performance of the system. There are labels that have learned very well like Time or Tax, while others have no results at all, like 'Organization'. However, as far as micro and macro avg are concerned, the results are quite bad, between 0.3 and 0.4 for the f-score. 

- For the TARS model in Basque:


```
                      precision    recall  f1-score   support

    location entity     0.5987    0.5968    0.5978       315
      person entity     0.7425    0.6792    0.7094       293
organization entity     0.6231    0.5700    0.5954       293
        other label     0.0000    0.0000    0.0000        30

          micro avg     0.6487    0.5951    0.6207       931
          macro avg     0.4911    0.4615    0.4756       931
       weighted avg     0.6324    0.5951    0.6129       931

```
In this scenario, the overall performance of the model is satisfactory, with the exception of the 'other label', which could have been better defined as 'others' to yield improved results. The micro average score is relatively higher compared to the macro average score for the F1-score. Despite this, the model demonstrates good performance in real-world usage examples, indicating its effectiveness despite not achieving optimal results.

- For the TARS model in Japanese:


```
                      precision    recall  f1-score   support

     location label     0.2812    0.0664    0.1075       271
       number label     0.4124    0.6348    0.5000       115
         date label     0.7185    0.6382    0.6760       152
organization entity     0.8438    0.1444    0.2466       187
      person entity     0.2766    0.0922    0.1383       141
     artifact label     0.3333    0.0694    0.1149        72
 localization label     0.0000    0.0000    0.0000        66
       other labels     0.0000    0.0000    0.0000        55
    artifact entity     0.0000    0.0000    0.0000        27
       event entity     0.0000    0.0000    0.0000        25
      percent label     1.0000    0.4615    0.6316        13
                PSN     0.0000    0.0000    0.0000        12
        time entity     0.0000    0.0000    0.0000         7
       money entity     0.0000    0.0000    0.0000         5
         time label     0.0000    0.0000    0.0000         3

          micro avg     0.4990    0.2076    0.2933      1151
          macro avg     0.2577    0.1405    0.1610      1151
       weighted avg     0.4054    0.2076    0.2359      1151
```
In this scenario, the main issue is the large number of labels being used. The combination of labels from the corpus with those from the SequenceTagger results in decreased performance, as the model struggles to differentiate between closely related labels. As a result, the micro and macro average scores are notably low, failing to surpass 0.5.

Another potential factor contributing to the poor performance could be the use of English embeddings instead of Japanese embeddings during training. This discrepancy in language embeddings may have a negative impact on the model's ability to accurately capture the nuances and characteristics of the Japanese language, further affecting its performance.




# FLAIR models

In this section we are going to train the different models with Flair for the four languages using the same datasets as in the previous approach. 

To do the training we have based ourselves on the lab 4 done in class, extending it for the different languages. We perform the same system for all languages.

## For Spanish

In [3]:
from flair.data import Corpus
from flair.datasets import ColumnCorpus
from flair.embeddings import TokenEmbeddings, WordEmbeddings, StackedEmbeddings, FlairEmbeddings
from typing import List

In [4]:
from flair.datasets import CONLL_03_SPANISH
from flair.embeddings import WordEmbeddings, FlairEmbeddings, StackedEmbeddings
from flair.models import SequenceTagger
from flair.trainers import ModelTrainer

# get the corpus
corpus = CONLL_03_SPANISH()
print(corpus)

import flair.datasets
downsampled_corpus = flair.datasets.CONLL_03_SPANISH().downsample(0.08) 

print("--- 1 Original ---")
print(corpus)

print("--- 2 Downsampled ---")
print(downsampled_corpus)

2023-04-21 09:05:54,957 https://www.clips.uantwerpen.be/conll2002/ner/data/esp.testa not found in cache, downloading to /tmp/tmp13_pn7n9


100%|██████████| 415k/415k [00:00<00:00, 11.5MB/s]

2023-04-21 09:05:55,047 copying /tmp/tmp13_pn7n9 to cache at /root/.flair/datasets/conll_03_spanish/esp.testa
2023-04-21 09:05:55,050 removing temp file /tmp/tmp13_pn7n9
2023-04-21 09:05:55,102 https://www.clips.uantwerpen.be/conll2002/ner/data/esp.testb not found in cache, downloading to /tmp/tmpfm78mqek



100%|██████████| 401k/401k [00:00<00:00, 11.8MB/s]

2023-04-21 09:05:55,193 copying /tmp/tmpfm78mqek to cache at /root/.flair/datasets/conll_03_spanish/esp.testb
2023-04-21 09:05:55,202 removing temp file /tmp/tmpfm78mqek





2023-04-21 09:05:55,252 https://www.clips.uantwerpen.be/conll2002/ner/data/esp.train not found in cache, downloading to /tmp/tmpmgz51m_i


100%|██████████| 2.03M/2.03M [00:00<00:00, 29.9MB/s]

2023-04-21 09:05:55,372 copying /tmp/tmpmgz51m_i to cache at /root/.flair/datasets/conll_03_spanish/esp.train
2023-04-21 09:05:55,376 removing temp file /tmp/tmpmgz51m_i
2023-04-21 09:05:55,380 Reading data from /root/.flair/datasets/conll_03_spanish
2023-04-21 09:05:55,381 Train: /root/.flair/datasets/conll_03_spanish/esp.train
2023-04-21 09:05:55,383 Dev: /root/.flair/datasets/conll_03_spanish/esp.testa
2023-04-21 09:05:55,384 Test: /root/.flair/datasets/conll_03_spanish/esp.testb





Corpus: 8323 train + 1915 dev + 1517 test sentences
2023-04-21 09:06:00,051 Reading data from /root/.flair/datasets/conll_03_spanish
2023-04-21 09:06:00,053 Train: /root/.flair/datasets/conll_03_spanish/esp.train
2023-04-21 09:06:00,057 Dev: /root/.flair/datasets/conll_03_spanish/esp.testa
2023-04-21 09:06:00,059 Test: /root/.flair/datasets/conll_03_spanish/esp.testb
--- 1 Original ---
Corpus: 8323 train + 1915 dev + 1517 test sentences
--- 2 Downsampled ---
Corpus: 666 train + 153 dev + 121 test sentences


In [5]:
# Just to check what we have in the corpus
print(len(downsampled_corpus.train))
print(len(downsampled_corpus.test))
print(len(downsampled_corpus.dev))
sentence=downsampled_corpus.test[3]
print(sentence)
print(downsampled_corpus)

666
121
153
Sentence[25]: "El paseo irá hasta el complejo de ocio Naturávila , propiedad de la Diputación , y la inversión ascenderá a 450 millones de pesetas ." → ["Naturávila"/LOC, "Diputación"/ORG]
Corpus: 666 train + 153 dev + 121 test sentences


In [6]:
# 2. Select the label we're going to predict
label_type = 'ner'

# 3. make the dictionary
label_dict = downsampled_corpus.make_label_dictionary(label_type=label_type)
print(label_dict)

# 4. initialize embedding stack with Flair and GloVe
embedding_types = [
    WordEmbeddings('glove'),
    FlairEmbeddings('news-forward'),
    FlairEmbeddings('news-backward'),
]

embeddings = StackedEmbeddings(embeddings=embedding_types)


2023-04-20 22:07:44,567 Computing label dictionary. Progress:


666it [00:00, 9043.52it/s]

2023-04-20 22:07:44,702 Dictionary created for label 'ner' with 5 values: ORG (seen 558 times), PER (seen 371 times), LOC (seen 352 times), MISC (seen 196 times)
Dictionary with 5 tags: <unk>, ORG, PER, LOC, MISC





2023-04-20 22:07:45,408 https://flair.informatik.hu-berlin.de/resources/embeddings/token/glove.gensim.vectors.npy not found in cache, downloading to /tmp/tmpw019sb9z


100%|██████████| 153M/153M [00:10<00:00, 15.8MB/s]

2023-04-20 22:07:56,069 copying /tmp/tmpw019sb9z to cache at /root/.flair/embeddings/glove.gensim.vectors.npy





2023-04-20 22:07:56,358 removing temp file /tmp/tmpw019sb9z
2023-04-20 22:07:56,908 https://flair.informatik.hu-berlin.de/resources/embeddings/token/glove.gensim not found in cache, downloading to /tmp/tmpf63emaw_


100%|██████████| 20.5M/20.5M [00:02<00:00, 8.68MB/s]

2023-04-20 22:07:59,904 copying /tmp/tmpf63emaw_ to cache at /root/.flair/embeddings/glove.gensim
2023-04-20 22:07:59,938 removing temp file /tmp/tmpf63emaw_





2023-04-20 22:08:06,206 https://flair.informatik.hu-berlin.de/resources/embeddings/flair/news-forward-0.4.1.pt not found in cache, downloading to /tmp/tmpvkjk8jr_


100%|██████████| 69.7M/69.7M [00:05<00:00, 14.0MB/s]

2023-04-20 22:08:11,964 copying /tmp/tmpvkjk8jr_ to cache at /root/.flair/embeddings/news-forward-0.4.1.pt





2023-04-20 22:08:12,173 removing temp file /tmp/tmpvkjk8jr_
2023-04-20 22:08:26,737 https://flair.informatik.hu-berlin.de/resources/embeddings/flair/news-backward-0.4.1.pt not found in cache, downloading to /tmp/tmpbbqf_0my


100%|██████████| 69.7M/69.7M [00:05<00:00, 12.3MB/s]

2023-04-20 22:08:33,378 copying /tmp/tmpbbqf_0my to cache at /root/.flair/embeddings/news-backward-0.4.1.pt





2023-04-20 22:08:33,505 removing temp file /tmp/tmpbbqf_0my


In [7]:
# 5. initialize sequence tagger
tagger = SequenceTagger(hidden_size=256,
                        embeddings=embeddings,
                        tag_dictionary=label_dict,
                        tag_type=label_type,
                        use_crf=True)


2023-04-20 22:08:36,400 SequenceTagger predicts: Dictionary with 17 tags: O, S-ORG, B-ORG, E-ORG, I-ORG, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-MISC, B-MISC, E-MISC, I-MISC


In [8]:
# 6. initialize trainer
trainer = ModelTrainer(tagger, downsampled_corpus)


In [9]:
# 7. start training
trainer.train('/content/drive/My Drive/ColabNotebooks/flairmodels/spanish',  #We use a different path to save our models
              learning_rate=0.1,
              mini_batch_size=32,
              max_epochs=100,
              write_weights=True)

2023-04-20 22:08:43,307 ----------------------------------------------------------------------------------------------------
2023-04-20 22:08:43,309 Model: "SequenceTagger(
  (embeddings): StackedEmbeddings(
    (list_embedding_0): WordEmbeddings(
      'glove'
      (embedding): Embedding(400001, 100)
    )
    (list_embedding_1): FlairEmbeddings(
      (lm): LanguageModel(
        (drop): Dropout(p=0.05, inplace=False)
        (encoder): Embedding(300, 100)
        (rnn): LSTM(100, 2048)
      )
    )
    (list_embedding_2): FlairEmbeddings(
      (lm): LanguageModel(
        (drop): Dropout(p=0.05, inplace=False)
        (encoder): Embedding(300, 100)
        (rnn): LSTM(100, 2048)
      )
    )
  )
  (word_dropout): WordDropout(p=0.05)
  (locked_dropout): LockedDropout(p=0.5)
  (embedding2nn): Linear(in_features=4196, out_features=4196, bias=True)
  (rnn): LSTM(4196, 256, batch_first=True, bidirectional=True)
  (linear): Linear(in_features=512, out_features=19, bias=True)
  (loss_f

100%|██████████| 5/5 [00:02<00:00,  1.68it/s]

2023-04-20 22:09:06,303 Evaluating as a multi-label problem: False
2023-04-20 22:09:06,319 DEV : loss 0.7000213861465454 - f1-score (micro avg)  0.086
2023-04-20 22:09:06,344 BAD EPOCHS (no improvement): 0
2023-04-20 22:09:06,349 saving best model





2023-04-20 22:09:08,128 ----------------------------------------------------------------------------------------------------
2023-04-20 22:09:08,584 epoch 2 - iter 2/21 - loss 0.63551632 - time (sec): 0.45 - samples/sec: 4913.47 - lr: 0.100000
2023-04-20 22:09:09,561 epoch 2 - iter 4/21 - loss 0.62161069 - time (sec): 1.43 - samples/sec: 2832.23 - lr: 0.100000
2023-04-20 22:09:10,928 epoch 2 - iter 6/21 - loss 0.66989551 - time (sec): 2.79 - samples/sec: 2234.84 - lr: 0.100000
2023-04-20 22:09:13,349 epoch 2 - iter 8/21 - loss 0.67048243 - time (sec): 5.21 - samples/sec: 1633.76 - lr: 0.100000
2023-04-20 22:09:16,026 epoch 2 - iter 10/21 - loss 0.67343143 - time (sec): 7.89 - samples/sec: 1327.49 - lr: 0.100000
2023-04-20 22:09:19,754 epoch 2 - iter 12/21 - loss 0.68094405 - time (sec): 11.62 - samples/sec: 1069.85 - lr: 0.100000
2023-04-20 22:09:21,275 epoch 2 - iter 14/21 - loss 0.66797977 - time (sec): 13.14 - samples/sec: 1099.16 - lr: 0.100000
2023-04-20 22:09:22,517 epoch 2 - ite

100%|██████████| 5/5 [00:01<00:00,  2.85it/s]

2023-04-20 22:09:28,169 Evaluating as a multi-label problem: False
2023-04-20 22:09:28,195 DEV : loss 0.5277140140533447 - f1-score (micro avg)  0.2514
2023-04-20 22:09:28,242 BAD EPOCHS (no improvement): 0
2023-04-20 22:09:28,249 saving best model





2023-04-20 22:09:31,361 ----------------------------------------------------------------------------------------------------
2023-04-20 22:09:32,213 epoch 3 - iter 2/21 - loss 0.53525261 - time (sec): 0.85 - samples/sec: 2033.99 - lr: 0.100000
2023-04-20 22:09:34,439 epoch 3 - iter 4/21 - loss 0.53803631 - time (sec): 3.08 - samples/sec: 1247.42 - lr: 0.100000
2023-04-20 22:09:36,753 epoch 3 - iter 6/21 - loss 0.53133382 - time (sec): 5.39 - samples/sec: 1063.35 - lr: 0.100000
2023-04-20 22:09:38,001 epoch 3 - iter 8/21 - loss 0.53356785 - time (sec): 6.64 - samples/sec: 1177.52 - lr: 0.100000
2023-04-20 22:09:40,270 epoch 3 - iter 10/21 - loss 0.52640890 - time (sec): 8.91 - samples/sec: 1115.81 - lr: 0.100000
2023-04-20 22:09:42,459 epoch 3 - iter 12/21 - loss 0.52580140 - time (sec): 11.10 - samples/sec: 1103.71 - lr: 0.100000
2023-04-20 22:09:43,790 epoch 3 - iter 14/21 - loss 0.52409988 - time (sec): 12.43 - samples/sec: 1137.05 - lr: 0.100000
2023-04-20 22:09:45,026 epoch 3 - ite

100%|██████████| 5/5 [00:02<00:00,  1.69it/s]

2023-04-20 22:09:51,694 Evaluating as a multi-label problem: False
2023-04-20 22:09:51,759 DEV : loss 0.4243457615375519 - f1-score (micro avg)  0.2929
2023-04-20 22:09:51,826 BAD EPOCHS (no improvement): 0
2023-04-20 22:09:51,847 saving best model





2023-04-20 22:09:54,297 ----------------------------------------------------------------------------------------------------
2023-04-20 22:09:54,886 epoch 4 - iter 2/21 - loss 0.56874348 - time (sec): 0.55 - samples/sec: 3893.49 - lr: 0.100000
2023-04-20 22:09:56,103 epoch 4 - iter 4/21 - loss 0.56454938 - time (sec): 1.77 - samples/sec: 2321.08 - lr: 0.100000
2023-04-20 22:09:57,328 epoch 4 - iter 6/21 - loss 0.50809296 - time (sec): 3.00 - samples/sec: 2079.99 - lr: 0.100000
2023-04-20 22:09:58,503 epoch 4 - iter 8/21 - loss 0.48621795 - time (sec): 4.17 - samples/sec: 1990.38 - lr: 0.100000
2023-04-20 22:09:59,600 epoch 4 - iter 10/21 - loss 0.46311326 - time (sec): 5.27 - samples/sec: 1991.49 - lr: 0.100000
2023-04-20 22:10:00,602 epoch 4 - iter 12/21 - loss 0.45309929 - time (sec): 6.27 - samples/sec: 1996.44 - lr: 0.100000
2023-04-20 22:10:01,679 epoch 4 - iter 14/21 - loss 0.46368537 - time (sec): 7.35 - samples/sec: 1979.05 - lr: 0.100000
2023-04-20 22:10:03,245 epoch 4 - iter 

100%|██████████| 5/5 [00:01<00:00,  2.58it/s]

2023-04-20 22:10:09,786 Evaluating as a multi-label problem: False
2023-04-20 22:10:09,814 DEV : loss 0.34873801469802856 - f1-score (micro avg)  0.3732
2023-04-20 22:10:09,867 BAD EPOCHS (no improvement): 0
2023-04-20 22:10:09,876 saving best model





2023-04-20 22:10:12,228 ----------------------------------------------------------------------------------------------------
2023-04-20 22:10:12,900 epoch 5 - iter 2/21 - loss 0.35838605 - time (sec): 0.66 - samples/sec: 3203.43 - lr: 0.100000
2023-04-20 22:10:14,295 epoch 5 - iter 4/21 - loss 0.36113589 - time (sec): 2.06 - samples/sec: 2067.19 - lr: 0.100000
2023-04-20 22:10:15,728 epoch 5 - iter 6/21 - loss 0.32304582 - time (sec): 3.49 - samples/sec: 1855.92 - lr: 0.100000
2023-04-20 22:10:16,851 epoch 5 - iter 8/21 - loss 0.35871350 - time (sec): 4.61 - samples/sec: 1867.86 - lr: 0.100000
2023-04-20 22:10:17,806 epoch 5 - iter 10/21 - loss 0.35620331 - time (sec): 5.57 - samples/sec: 1929.08 - lr: 0.100000
2023-04-20 22:10:19,167 epoch 5 - iter 12/21 - loss 0.35448403 - time (sec): 6.93 - samples/sec: 1804.55 - lr: 0.100000
2023-04-20 22:10:21,424 epoch 5 - iter 14/21 - loss 0.35119433 - time (sec): 9.19 - samples/sec: 1574.67 - lr: 0.100000
2023-04-20 22:10:23,751 epoch 5 - iter 

100%|██████████| 5/5 [00:01<00:00,  2.80it/s]

2023-04-20 22:10:30,064 Evaluating as a multi-label problem: False
2023-04-20 22:10:30,087 DEV : loss 0.2888263165950775 - f1-score (micro avg)  0.5082
2023-04-20 22:10:30,128 BAD EPOCHS (no improvement): 0
2023-04-20 22:10:30,135 saving best model





2023-04-20 22:10:32,549 ----------------------------------------------------------------------------------------------------
2023-04-20 22:10:33,119 epoch 6 - iter 2/21 - loss 0.23506782 - time (sec): 0.53 - samples/sec: 3619.65 - lr: 0.100000
2023-04-20 22:10:34,402 epoch 6 - iter 4/21 - loss 0.25790492 - time (sec): 1.82 - samples/sec: 2010.68 - lr: 0.100000
2023-04-20 22:10:35,599 epoch 6 - iter 6/21 - loss 0.31173551 - time (sec): 3.01 - samples/sec: 1921.72 - lr: 0.100000
2023-04-20 22:10:37,153 epoch 6 - iter 8/21 - loss 0.29624440 - time (sec): 4.57 - samples/sec: 1671.39 - lr: 0.100000
2023-04-20 22:10:39,425 epoch 6 - iter 10/21 - loss 0.31189610 - time (sec): 6.84 - samples/sec: 1406.87 - lr: 0.100000
2023-04-20 22:10:42,462 epoch 6 - iter 12/21 - loss 0.31697904 - time (sec): 9.88 - samples/sec: 1206.13 - lr: 0.100000
2023-04-20 22:10:43,858 epoch 6 - iter 14/21 - loss 0.31993734 - time (sec): 11.27 - samples/sec: 1240.47 - lr: 0.100000
2023-04-20 22:10:45,209 epoch 6 - iter

100%|██████████| 5/5 [00:03<00:00,  1.42it/s]

2023-04-20 22:10:52,133 Evaluating as a multi-label problem: False
2023-04-20 22:10:52,153 DEV : loss 0.2604008913040161 - f1-score (micro avg)  0.4869
2023-04-20 22:10:52,192 BAD EPOCHS (no improvement): 1
2023-04-20 22:10:52,199 ----------------------------------------------------------------------------------------------------





2023-04-20 22:10:53,082 epoch 7 - iter 2/21 - loss 0.27215121 - time (sec): 0.88 - samples/sec: 2315.33 - lr: 0.100000
2023-04-20 22:10:55,059 epoch 7 - iter 4/21 - loss 0.27060467 - time (sec): 2.86 - samples/sec: 1541.15 - lr: 0.100000
2023-04-20 22:10:57,230 epoch 7 - iter 6/21 - loss 0.29717280 - time (sec): 5.03 - samples/sec: 1301.59 - lr: 0.100000
2023-04-20 22:10:58,753 epoch 7 - iter 8/21 - loss 0.29517037 - time (sec): 6.55 - samples/sec: 1344.42 - lr: 0.100000
2023-04-20 22:10:59,954 epoch 7 - iter 10/21 - loss 0.29781421 - time (sec): 7.75 - samples/sec: 1417.85 - lr: 0.100000
2023-04-20 22:11:01,142 epoch 7 - iter 12/21 - loss 0.29087798 - time (sec): 8.94 - samples/sec: 1442.99 - lr: 0.100000
2023-04-20 22:11:02,357 epoch 7 - iter 14/21 - loss 0.28681150 - time (sec): 10.16 - samples/sec: 1457.63 - lr: 0.100000
2023-04-20 22:11:03,604 epoch 7 - iter 16/21 - loss 0.28061552 - time (sec): 11.40 - samples/sec: 1476.33 - lr: 0.100000
2023-04-20 22:11:04,592 epoch 7 - iter 18/

100%|██████████| 5/5 [00:01<00:00,  3.95it/s]

2023-04-20 22:11:07,714 Evaluating as a multi-label problem: False
2023-04-20 22:11:07,729 DEV : loss 0.22448617219924927 - f1-score (micro avg)  0.5255
2023-04-20 22:11:07,750 BAD EPOCHS (no improvement): 0
2023-04-20 22:11:07,754 saving best model





2023-04-20 22:11:10,738 ----------------------------------------------------------------------------------------------------
2023-04-20 22:11:11,306 epoch 8 - iter 2/21 - loss 0.27749283 - time (sec): 0.56 - samples/sec: 4053.08 - lr: 0.100000
2023-04-20 22:11:13,299 epoch 8 - iter 4/21 - loss 0.24105734 - time (sec): 2.55 - samples/sec: 1650.30 - lr: 0.100000
2023-04-20 22:11:14,442 epoch 8 - iter 6/21 - loss 0.24680142 - time (sec): 3.69 - samples/sec: 1693.04 - lr: 0.100000
2023-04-20 22:11:15,467 epoch 8 - iter 8/21 - loss 0.25209619 - time (sec): 4.72 - samples/sec: 1803.79 - lr: 0.100000
2023-04-20 22:11:16,493 epoch 8 - iter 10/21 - loss 0.27802188 - time (sec): 5.74 - samples/sec: 1861.54 - lr: 0.100000
2023-04-20 22:11:17,744 epoch 8 - iter 12/21 - loss 0.27732130 - time (sec): 7.00 - samples/sec: 1784.96 - lr: 0.100000
2023-04-20 22:11:19,063 epoch 8 - iter 14/21 - loss 0.27432763 - time (sec): 8.31 - samples/sec: 1751.46 - lr: 0.100000
2023-04-20 22:11:20,215 epoch 8 - iter 

100%|██████████| 5/5 [00:01<00:00,  4.51it/s]

2023-04-20 22:11:23,996 Evaluating as a multi-label problem: False
2023-04-20 22:11:24,017 DEV : loss 0.20366014540195465 - f1-score (micro avg)  0.5626
2023-04-20 22:11:24,054 BAD EPOCHS (no improvement): 0
2023-04-20 22:11:24,062 saving best model





2023-04-20 22:11:26,466 ----------------------------------------------------------------------------------------------------
2023-04-20 22:11:26,973 epoch 9 - iter 2/21 - loss 0.26535466 - time (sec): 0.50 - samples/sec: 3769.05 - lr: 0.100000
2023-04-20 22:11:28,341 epoch 9 - iter 4/21 - loss 0.27067162 - time (sec): 1.87 - samples/sec: 2078.60 - lr: 0.100000
2023-04-20 22:11:29,537 epoch 9 - iter 6/21 - loss 0.26079174 - time (sec): 3.06 - samples/sec: 1949.59 - lr: 0.100000
2023-04-20 22:11:30,606 epoch 9 - iter 8/21 - loss 0.27908544 - time (sec): 4.13 - samples/sec: 1947.02 - lr: 0.100000
2023-04-20 22:11:31,538 epoch 9 - iter 10/21 - loss 0.27516377 - time (sec): 5.06 - samples/sec: 1951.67 - lr: 0.100000
2023-04-20 22:11:32,464 epoch 9 - iter 12/21 - loss 0.26614464 - time (sec): 5.99 - samples/sec: 1970.67 - lr: 0.100000
2023-04-20 22:11:33,412 epoch 9 - iter 14/21 - loss 0.25893854 - time (sec): 6.94 - samples/sec: 1996.08 - lr: 0.100000
2023-04-20 22:11:34,592 epoch 9 - iter 

100%|██████████| 5/5 [00:01<00:00,  4.29it/s]

2023-04-20 22:11:39,518 Evaluating as a multi-label problem: False
2023-04-20 22:11:39,536 DEV : loss 0.19480913877487183 - f1-score (micro avg)  0.5615
2023-04-20 22:11:39,556 BAD EPOCHS (no improvement): 1
2023-04-20 22:11:39,561 ----------------------------------------------------------------------------------------------------





2023-04-20 22:11:40,005 epoch 10 - iter 2/21 - loss 0.24438386 - time (sec): 0.44 - samples/sec: 4274.18 - lr: 0.100000
2023-04-20 22:11:41,171 epoch 10 - iter 4/21 - loss 0.23353264 - time (sec): 1.61 - samples/sec: 2489.81 - lr: 0.100000
2023-04-20 22:11:42,282 epoch 10 - iter 6/21 - loss 0.23003343 - time (sec): 2.72 - samples/sec: 2203.26 - lr: 0.100000
2023-04-20 22:11:43,551 epoch 10 - iter 8/21 - loss 0.21631618 - time (sec): 3.99 - samples/sec: 2047.97 - lr: 0.100000
2023-04-20 22:11:45,023 epoch 10 - iter 10/21 - loss 0.22051206 - time (sec): 5.46 - samples/sec: 1843.85 - lr: 0.100000
2023-04-20 22:11:47,095 epoch 10 - iter 12/21 - loss 0.22499622 - time (sec): 7.53 - samples/sec: 1620.61 - lr: 0.100000
2023-04-20 22:11:48,712 epoch 10 - iter 14/21 - loss 0.22478989 - time (sec): 9.15 - samples/sec: 1565.63 - lr: 0.100000
2023-04-20 22:11:50,236 epoch 10 - iter 16/21 - loss 0.22072493 - time (sec): 10.67 - samples/sec: 1529.26 - lr: 0.100000
2023-04-20 22:11:51,926 epoch 10 - 

100%|██████████| 5/5 [00:02<00:00,  2.30it/s]

2023-04-20 22:11:56,598 Evaluating as a multi-label problem: False
2023-04-20 22:11:56,644 DEV : loss 0.1725565493106842 - f1-score (micro avg)  0.5986
2023-04-20 22:11:56,710 BAD EPOCHS (no improvement): 0
2023-04-20 22:11:56,724 saving best model





2023-04-20 22:12:00,880 ----------------------------------------------------------------------------------------------------
2023-04-20 22:12:01,972 epoch 11 - iter 2/21 - loss 0.18628441 - time (sec): 1.08 - samples/sec: 1991.47 - lr: 0.100000
2023-04-20 22:12:03,490 epoch 11 - iter 4/21 - loss 0.22526657 - time (sec): 2.60 - samples/sec: 1537.97 - lr: 0.100000
2023-04-20 22:12:04,549 epoch 11 - iter 6/21 - loss 0.23220705 - time (sec): 3.66 - samples/sec: 1630.82 - lr: 0.100000
2023-04-20 22:12:05,480 epoch 11 - iter 8/21 - loss 0.23347865 - time (sec): 4.59 - samples/sec: 1711.95 - lr: 0.100000
2023-04-20 22:12:06,561 epoch 11 - iter 10/21 - loss 0.22466759 - time (sec): 5.67 - samples/sec: 1787.74 - lr: 0.100000
2023-04-20 22:12:07,847 epoch 11 - iter 12/21 - loss 0.21733921 - time (sec): 6.95 - samples/sec: 1736.71 - lr: 0.100000
2023-04-20 22:12:09,176 epoch 11 - iter 14/21 - loss 0.20895211 - time (sec): 8.28 - samples/sec: 1709.90 - lr: 0.100000
2023-04-20 22:12:10,378 epoch 11

100%|██████████| 5/5 [00:01<00:00,  2.89it/s]

2023-04-20 22:12:14,852 Evaluating as a multi-label problem: False
2023-04-20 22:12:14,873 DEV : loss 0.15779036283493042 - f1-score (micro avg)  0.651
2023-04-20 22:12:14,908 BAD EPOCHS (no improvement): 0
2023-04-20 22:12:14,916 saving best model





2023-04-20 22:12:17,317 ----------------------------------------------------------------------------------------------------
2023-04-20 22:12:17,903 epoch 12 - iter 2/21 - loss 0.18540044 - time (sec): 0.58 - samples/sec: 3386.50 - lr: 0.100000
2023-04-20 22:12:19,401 epoch 12 - iter 4/21 - loss 0.17329262 - time (sec): 2.08 - samples/sec: 1917.56 - lr: 0.100000
2023-04-20 22:12:20,782 epoch 12 - iter 6/21 - loss 0.16814849 - time (sec): 3.46 - samples/sec: 1718.97 - lr: 0.100000
2023-04-20 22:12:22,001 epoch 12 - iter 8/21 - loss 0.18584710 - time (sec): 4.68 - samples/sec: 1754.90 - lr: 0.100000
2023-04-20 22:12:23,183 epoch 12 - iter 10/21 - loss 0.18763353 - time (sec): 5.86 - samples/sec: 1728.39 - lr: 0.100000
2023-04-20 22:12:24,587 epoch 12 - iter 12/21 - loss 0.19562329 - time (sec): 7.27 - samples/sec: 1692.48 - lr: 0.100000
2023-04-20 22:12:26,269 epoch 12 - iter 14/21 - loss 0.18963646 - time (sec): 8.95 - samples/sec: 1626.39 - lr: 0.100000
2023-04-20 22:12:27,631 epoch 12

100%|██████████| 5/5 [00:02<00:00,  2.08it/s]

2023-04-20 22:12:33,369 Evaluating as a multi-label problem: False
2023-04-20 22:12:33,407 DEV : loss 0.14831893146038055 - f1-score (micro avg)  0.6647
2023-04-20 22:12:33,476 BAD EPOCHS (no improvement): 0
2023-04-20 22:12:33,482 saving best model





2023-04-20 22:12:36,386 ----------------------------------------------------------------------------------------------------
2023-04-20 22:12:36,920 epoch 13 - iter 2/21 - loss 0.14568949 - time (sec): 0.53 - samples/sec: 3694.20 - lr: 0.100000
2023-04-20 22:12:38,393 epoch 13 - iter 4/21 - loss 0.15203702 - time (sec): 2.00 - samples/sec: 2074.80 - lr: 0.100000
2023-04-20 22:12:39,590 epoch 13 - iter 6/21 - loss 0.16761164 - time (sec): 3.20 - samples/sec: 1899.15 - lr: 0.100000
2023-04-20 22:12:40,860 epoch 13 - iter 8/21 - loss 0.17973632 - time (sec): 4.47 - samples/sec: 1798.57 - lr: 0.100000
2023-04-20 22:12:42,099 epoch 13 - iter 10/21 - loss 0.17761071 - time (sec): 5.71 - samples/sec: 1788.21 - lr: 0.100000
2023-04-20 22:12:43,403 epoch 13 - iter 12/21 - loss 0.18084452 - time (sec): 7.01 - samples/sec: 1764.45 - lr: 0.100000
2023-04-20 22:12:45,886 epoch 13 - iter 14/21 - loss 0.18255884 - time (sec): 9.50 - samples/sec: 1506.55 - lr: 0.100000
2023-04-20 22:12:50,032 epoch 13

100%|██████████| 5/5 [00:01<00:00,  2.82it/s]

2023-04-20 22:12:55,926 Evaluating as a multi-label problem: False
2023-04-20 22:12:55,947 DEV : loss 0.15990230441093445 - f1-score (micro avg)  0.6112
2023-04-20 22:12:55,984 BAD EPOCHS (no improvement): 1
2023-04-20 22:12:55,991 ----------------------------------------------------------------------------------------------------





2023-04-20 22:12:56,410 epoch 14 - iter 2/21 - loss 0.16244345 - time (sec): 0.42 - samples/sec: 4555.24 - lr: 0.100000
2023-04-20 22:12:57,380 epoch 14 - iter 4/21 - loss 0.18117512 - time (sec): 1.39 - samples/sec: 2734.08 - lr: 0.100000
2023-04-20 22:12:58,457 epoch 14 - iter 6/21 - loss 0.16701671 - time (sec): 2.47 - samples/sec: 2357.98 - lr: 0.100000
2023-04-20 22:12:59,685 epoch 14 - iter 8/21 - loss 0.16186557 - time (sec): 3.69 - samples/sec: 2162.61 - lr: 0.100000
2023-04-20 22:13:00,884 epoch 14 - iter 10/21 - loss 0.16871068 - time (sec): 4.89 - samples/sec: 2067.42 - lr: 0.100000
2023-04-20 22:13:02,695 epoch 14 - iter 12/21 - loss 0.15874325 - time (sec): 6.70 - samples/sec: 1767.55 - lr: 0.100000
2023-04-20 22:13:04,556 epoch 14 - iter 14/21 - loss 0.16683641 - time (sec): 8.56 - samples/sec: 1611.88 - lr: 0.100000
2023-04-20 22:13:06,673 epoch 14 - iter 16/21 - loss 0.16849088 - time (sec): 10.68 - samples/sec: 1510.00 - lr: 0.100000
2023-04-20 22:13:07,915 epoch 14 - 

100%|██████████| 5/5 [00:01<00:00,  4.44it/s]

2023-04-20 22:13:10,913 Evaluating as a multi-label problem: False
2023-04-20 22:13:10,929 DEV : loss 0.14311571419239044 - f1-score (micro avg)  0.6687
2023-04-20 22:13:10,950 BAD EPOCHS (no improvement): 0
2023-04-20 22:13:10,955 saving best model





2023-04-20 22:13:12,802 ----------------------------------------------------------------------------------------------------
2023-04-20 22:13:13,231 epoch 15 - iter 2/21 - loss 0.13218620 - time (sec): 0.43 - samples/sec: 4160.33 - lr: 0.100000
2023-04-20 22:13:14,199 epoch 15 - iter 4/21 - loss 0.14205006 - time (sec): 1.39 - samples/sec: 2586.77 - lr: 0.100000
2023-04-20 22:13:15,082 epoch 15 - iter 6/21 - loss 0.14303050 - time (sec): 2.28 - samples/sec: 2309.99 - lr: 0.100000
2023-04-20 22:13:16,110 epoch 15 - iter 8/21 - loss 0.15391939 - time (sec): 3.31 - samples/sec: 2282.46 - lr: 0.100000
2023-04-20 22:13:17,046 epoch 15 - iter 10/21 - loss 0.15314735 - time (sec): 4.24 - samples/sec: 2240.80 - lr: 0.100000
2023-04-20 22:13:18,263 epoch 15 - iter 12/21 - loss 0.15570530 - time (sec): 5.46 - samples/sec: 2151.19 - lr: 0.100000
2023-04-20 22:13:19,684 epoch 15 - iter 14/21 - loss 0.14962446 - time (sec): 6.88 - samples/sec: 2037.69 - lr: 0.100000
2023-04-20 22:13:21,351 epoch 15

100%|██████████| 5/5 [00:01<00:00,  3.59it/s]

2023-04-20 22:13:26,352 Evaluating as a multi-label problem: False
2023-04-20 22:13:26,380 DEV : loss 0.13241297006607056 - f1-score (micro avg)  0.6849
2023-04-20 22:13:26,440 BAD EPOCHS (no improvement): 0
2023-04-20 22:13:26,446 saving best model





2023-04-20 22:13:28,790 ----------------------------------------------------------------------------------------------------
2023-04-20 22:13:29,372 epoch 16 - iter 2/21 - loss 0.13371781 - time (sec): 0.58 - samples/sec: 3704.54 - lr: 0.100000
2023-04-20 22:13:30,638 epoch 16 - iter 4/21 - loss 0.14029113 - time (sec): 1.84 - samples/sec: 2262.35 - lr: 0.100000
2023-04-20 22:13:31,872 epoch 16 - iter 6/21 - loss 0.14345219 - time (sec): 3.08 - samples/sec: 1905.35 - lr: 0.100000
2023-04-20 22:13:32,946 epoch 16 - iter 8/21 - loss 0.14027640 - time (sec): 4.15 - samples/sec: 1934.80 - lr: 0.100000
2023-04-20 22:13:34,169 epoch 16 - iter 10/21 - loss 0.14707282 - time (sec): 5.37 - samples/sec: 1856.83 - lr: 0.100000
2023-04-20 22:13:36,384 epoch 16 - iter 12/21 - loss 0.14961221 - time (sec): 7.59 - samples/sec: 1577.89 - lr: 0.100000
2023-04-20 22:13:39,482 epoch 16 - iter 14/21 - loss 0.14786557 - time (sec): 10.69 - samples/sec: 1318.49 - lr: 0.100000
2023-04-20 22:13:41,216 epoch 1

100%|██████████| 5/5 [00:01<00:00,  4.40it/s]

2023-04-20 22:13:46,344 Evaluating as a multi-label problem: False
2023-04-20 22:13:46,363 DEV : loss 0.14533646404743195 - f1-score (micro avg)  0.686
2023-04-20 22:13:46,384 BAD EPOCHS (no improvement): 0
2023-04-20 22:13:46,390 saving best model





2023-04-20 22:13:48,174 ----------------------------------------------------------------------------------------------------
2023-04-20 22:13:48,639 epoch 17 - iter 2/21 - loss 0.11824273 - time (sec): 0.45 - samples/sec: 4680.04 - lr: 0.100000
2023-04-20 22:13:49,627 epoch 17 - iter 4/21 - loss 0.14028755 - time (sec): 1.44 - samples/sec: 2735.76 - lr: 0.100000
2023-04-20 22:13:50,786 epoch 17 - iter 6/21 - loss 0.13956346 - time (sec): 2.60 - samples/sec: 2351.14 - lr: 0.100000
2023-04-20 22:13:51,995 epoch 17 - iter 8/21 - loss 0.12930011 - time (sec): 3.80 - samples/sec: 2174.82 - lr: 0.100000
2023-04-20 22:13:53,105 epoch 17 - iter 10/21 - loss 0.13704418 - time (sec): 4.91 - samples/sec: 2082.95 - lr: 0.100000
2023-04-20 22:13:54,637 epoch 17 - iter 12/21 - loss 0.13317478 - time (sec): 6.45 - samples/sec: 1912.43 - lr: 0.100000
2023-04-20 22:13:56,116 epoch 17 - iter 14/21 - loss 0.13729094 - time (sec): 7.93 - samples/sec: 1752.71 - lr: 0.100000
2023-04-20 22:13:57,653 epoch 17

100%|██████████| 5/5 [00:01<00:00,  4.61it/s]

2023-04-20 22:14:01,539 Evaluating as a multi-label problem: False
2023-04-20 22:14:01,555 DEV : loss 0.1279839724302292 - f1-score (micro avg)  0.7044
2023-04-20 22:14:01,588 BAD EPOCHS (no improvement): 0
2023-04-20 22:14:01,593 saving best model





2023-04-20 22:14:03,413 ----------------------------------------------------------------------------------------------------
2023-04-20 22:14:03,857 epoch 18 - iter 2/21 - loss 0.17143134 - time (sec): 0.44 - samples/sec: 4361.19 - lr: 0.100000
2023-04-20 22:14:05,148 epoch 18 - iter 4/21 - loss 0.15308212 - time (sec): 1.73 - samples/sec: 2440.05 - lr: 0.100000
2023-04-20 22:14:06,942 epoch 18 - iter 6/21 - loss 0.13666387 - time (sec): 3.53 - samples/sec: 1847.15 - lr: 0.100000
2023-04-20 22:14:08,711 epoch 18 - iter 8/21 - loss 0.13490178 - time (sec): 5.30 - samples/sec: 1636.08 - lr: 0.100000
2023-04-20 22:14:11,533 epoch 18 - iter 10/21 - loss 0.13534604 - time (sec): 8.12 - samples/sec: 1328.67 - lr: 0.100000
2023-04-20 22:14:13,571 epoch 18 - iter 12/21 - loss 0.13569182 - time (sec): 10.16 - samples/sec: 1213.35 - lr: 0.100000
2023-04-20 22:14:14,700 epoch 18 - iter 14/21 - loss 0.13341407 - time (sec): 11.28 - samples/sec: 1267.36 - lr: 0.100000
2023-04-20 22:14:15,912 epoch 

100%|██████████| 5/5 [00:01<00:00,  4.45it/s]

2023-04-20 22:14:20,616 Evaluating as a multi-label problem: False
2023-04-20 22:14:20,631 DEV : loss 0.13269278407096863 - f1-score (micro avg)  0.6597
2023-04-20 22:14:20,651 BAD EPOCHS (no improvement): 1
2023-04-20 22:14:20,656 ----------------------------------------------------------------------------------------------------





2023-04-20 22:14:22,218 epoch 19 - iter 2/21 - loss 0.14582089 - time (sec): 1.56 - samples/sec: 1173.19 - lr: 0.100000
2023-04-20 22:14:24,216 epoch 19 - iter 4/21 - loss 0.12457250 - time (sec): 3.56 - samples/sec: 1131.19 - lr: 0.100000
2023-04-20 22:14:26,256 epoch 19 - iter 6/21 - loss 0.13743017 - time (sec): 5.60 - samples/sec: 1007.24 - lr: 0.100000
2023-04-20 22:14:27,990 epoch 19 - iter 8/21 - loss 0.12739666 - time (sec): 7.33 - samples/sec: 1047.65 - lr: 0.100000
2023-04-20 22:14:29,628 epoch 19 - iter 10/21 - loss 0.12513451 - time (sec): 8.97 - samples/sec: 1101.43 - lr: 0.100000
2023-04-20 22:14:30,856 epoch 19 - iter 12/21 - loss 0.13163133 - time (sec): 10.20 - samples/sec: 1190.56 - lr: 0.100000
2023-04-20 22:14:32,109 epoch 19 - iter 14/21 - loss 0.12806181 - time (sec): 11.45 - samples/sec: 1247.56 - lr: 0.100000
2023-04-20 22:14:33,372 epoch 19 - iter 16/21 - loss 0.12747023 - time (sec): 12.71 - samples/sec: 1284.35 - lr: 0.100000
2023-04-20 22:14:34,551 epoch 19 

100%|██████████| 5/5 [00:01<00:00,  2.97it/s]

2023-04-20 22:14:38,348 Evaluating as a multi-label problem: False
2023-04-20 22:14:38,369 DEV : loss 0.1227133497595787 - f1-score (micro avg)  0.7014
2023-04-20 22:14:38,405 BAD EPOCHS (no improvement): 2
2023-04-20 22:14:38,409 ----------------------------------------------------------------------------------------------------





2023-04-20 22:14:39,288 epoch 20 - iter 2/21 - loss 0.12960209 - time (sec): 0.88 - samples/sec: 2520.50 - lr: 0.100000
2023-04-20 22:14:41,639 epoch 20 - iter 4/21 - loss 0.12059309 - time (sec): 3.23 - samples/sec: 1355.88 - lr: 0.100000
2023-04-20 22:14:43,696 epoch 20 - iter 6/21 - loss 0.12239803 - time (sec): 5.28 - samples/sec: 1161.82 - lr: 0.100000
2023-04-20 22:14:45,839 epoch 20 - iter 8/21 - loss 0.11718156 - time (sec): 7.43 - samples/sec: 1074.28 - lr: 0.100000
2023-04-20 22:14:47,086 epoch 20 - iter 10/21 - loss 0.11409841 - time (sec): 8.67 - samples/sec: 1184.06 - lr: 0.100000
2023-04-20 22:14:48,283 epoch 20 - iter 12/21 - loss 0.11206072 - time (sec): 9.87 - samples/sec: 1258.80 - lr: 0.100000
2023-04-20 22:14:49,560 epoch 20 - iter 14/21 - loss 0.11769439 - time (sec): 11.15 - samples/sec: 1298.90 - lr: 0.100000
2023-04-20 22:14:50,742 epoch 20 - iter 16/21 - loss 0.12198349 - time (sec): 12.33 - samples/sec: 1328.68 - lr: 0.100000
2023-04-20 22:14:51,932 epoch 20 -

100%|██████████| 5/5 [00:01<00:00,  4.58it/s]

2023-04-20 22:14:55,000 Evaluating as a multi-label problem: False
2023-04-20 22:14:55,023 DEV : loss 0.12375012040138245 - f1-score (micro avg)  0.7189
2023-04-20 22:14:55,046 BAD EPOCHS (no improvement): 0
2023-04-20 22:14:55,052 saving best model





2023-04-20 22:14:57,007 ----------------------------------------------------------------------------------------------------
2023-04-20 22:14:57,559 epoch 21 - iter 2/21 - loss 0.11514432 - time (sec): 0.54 - samples/sec: 3723.57 - lr: 0.100000
2023-04-20 22:14:58,931 epoch 21 - iter 4/21 - loss 0.12720074 - time (sec): 1.91 - samples/sec: 2059.20 - lr: 0.100000
2023-04-20 22:15:00,151 epoch 21 - iter 6/21 - loss 0.12222515 - time (sec): 3.13 - samples/sec: 1900.17 - lr: 0.100000
2023-04-20 22:15:01,401 epoch 21 - iter 8/21 - loss 0.12081029 - time (sec): 4.38 - samples/sec: 1799.13 - lr: 0.100000
2023-04-20 22:15:02,421 epoch 21 - iter 10/21 - loss 0.12387634 - time (sec): 5.40 - samples/sec: 1824.72 - lr: 0.100000
2023-04-20 22:15:03,417 epoch 21 - iter 12/21 - loss 0.12190148 - time (sec): 6.40 - samples/sec: 1871.05 - lr: 0.100000
2023-04-20 22:15:04,594 epoch 21 - iter 14/21 - loss 0.12034492 - time (sec): 7.58 - samples/sec: 1881.30 - lr: 0.100000
2023-04-20 22:15:05,824 epoch 21

100%|██████████| 5/5 [00:01<00:00,  4.28it/s]

2023-04-20 22:15:10,361 Evaluating as a multi-label problem: False
2023-04-20 22:15:10,376 DEV : loss 0.11572648584842682 - f1-score (micro avg)  0.7002
2023-04-20 22:15:10,396 BAD EPOCHS (no improvement): 1
2023-04-20 22:15:10,402 ----------------------------------------------------------------------------------------------------





2023-04-20 22:15:10,848 epoch 22 - iter 2/21 - loss 0.09391689 - time (sec): 0.44 - samples/sec: 4355.47 - lr: 0.100000
2023-04-20 22:15:11,987 epoch 22 - iter 4/21 - loss 0.09261811 - time (sec): 1.58 - samples/sec: 2697.80 - lr: 0.100000
2023-04-20 22:15:13,258 epoch 22 - iter 6/21 - loss 0.09929004 - time (sec): 2.85 - samples/sec: 2269.43 - lr: 0.100000
2023-04-20 22:15:14,502 epoch 22 - iter 8/21 - loss 0.10421013 - time (sec): 4.10 - samples/sec: 2118.40 - lr: 0.100000
2023-04-20 22:15:15,666 epoch 22 - iter 10/21 - loss 0.10395436 - time (sec): 5.26 - samples/sec: 2015.39 - lr: 0.100000
2023-04-20 22:15:16,791 epoch 22 - iter 12/21 - loss 0.10703214 - time (sec): 6.39 - samples/sec: 1928.78 - lr: 0.100000
2023-04-20 22:15:17,773 epoch 22 - iter 14/21 - loss 0.10313771 - time (sec): 7.37 - samples/sec: 1924.36 - lr: 0.100000
2023-04-20 22:15:18,745 epoch 22 - iter 16/21 - loss 0.10055465 - time (sec): 8.34 - samples/sec: 1976.20 - lr: 0.100000
2023-04-20 22:15:19,719 epoch 22 - i

100%|██████████| 5/5 [00:01<00:00,  4.63it/s]

2023-04-20 22:15:22,515 Evaluating as a multi-label problem: False
2023-04-20 22:15:22,534 DEV : loss 0.12195662409067154 - f1-score (micro avg)  0.711
2023-04-20 22:15:22,556 BAD EPOCHS (no improvement): 2
2023-04-20 22:15:22,561 ----------------------------------------------------------------------------------------------------





2023-04-20 22:15:22,970 epoch 23 - iter 2/21 - loss 0.12906250 - time (sec): 0.41 - samples/sec: 5274.81 - lr: 0.100000
2023-04-20 22:15:23,956 epoch 23 - iter 4/21 - loss 0.10964015 - time (sec): 1.39 - samples/sec: 2860.69 - lr: 0.100000
2023-04-20 22:15:24,941 epoch 23 - iter 6/21 - loss 0.11963936 - time (sec): 2.38 - samples/sec: 2592.30 - lr: 0.100000
2023-04-20 22:15:25,888 epoch 23 - iter 8/21 - loss 0.11265344 - time (sec): 3.32 - samples/sec: 2528.14 - lr: 0.100000
2023-04-20 22:15:26,947 epoch 23 - iter 10/21 - loss 0.11286999 - time (sec): 4.38 - samples/sec: 2409.03 - lr: 0.100000
2023-04-20 22:15:28,048 epoch 23 - iter 12/21 - loss 0.10883096 - time (sec): 5.49 - samples/sec: 2263.66 - lr: 0.100000
2023-04-20 22:15:29,594 epoch 23 - iter 14/21 - loss 0.10872479 - time (sec): 7.03 - samples/sec: 2038.24 - lr: 0.100000
2023-04-20 22:15:32,692 epoch 23 - iter 16/21 - loss 0.10567795 - time (sec): 10.13 - samples/sec: 1623.78 - lr: 0.100000
2023-04-20 22:15:34,114 epoch 23 - 

100%|██████████| 5/5 [00:01<00:00,  2.84it/s]

2023-04-20 22:15:38,025 Evaluating as a multi-label problem: False
2023-04-20 22:15:38,046 DEV : loss 0.12460789084434509 - f1-score (micro avg)  0.6972
2023-04-20 22:15:38,094 BAD EPOCHS (no improvement): 3
2023-04-20 22:15:38,100 ----------------------------------------------------------------------------------------------------





2023-04-20 22:15:38,715 epoch 24 - iter 2/21 - loss 0.10098327 - time (sec): 0.61 - samples/sec: 3470.76 - lr: 0.100000
2023-04-20 22:15:40,031 epoch 24 - iter 4/21 - loss 0.11748329 - time (sec): 1.93 - samples/sec: 2350.69 - lr: 0.100000
2023-04-20 22:15:41,158 epoch 24 - iter 6/21 - loss 0.10712731 - time (sec): 3.06 - samples/sec: 2078.82 - lr: 0.100000
2023-04-20 22:15:42,589 epoch 24 - iter 8/21 - loss 0.10338814 - time (sec): 4.49 - samples/sec: 1872.76 - lr: 0.100000
2023-04-20 22:15:44,250 epoch 24 - iter 10/21 - loss 0.10727054 - time (sec): 6.15 - samples/sec: 1701.59 - lr: 0.100000
2023-04-20 22:15:46,158 epoch 24 - iter 12/21 - loss 0.10530373 - time (sec): 8.06 - samples/sec: 1575.76 - lr: 0.100000
2023-04-20 22:15:47,540 epoch 24 - iter 14/21 - loss 0.10148999 - time (sec): 9.44 - samples/sec: 1558.46 - lr: 0.100000
2023-04-20 22:15:49,038 epoch 24 - iter 16/21 - loss 0.10265208 - time (sec): 10.94 - samples/sec: 1508.52 - lr: 0.100000
2023-04-20 22:15:50,549 epoch 24 - 

100%|██████████| 5/5 [00:01<00:00,  2.82it/s]

2023-04-20 22:15:54,512 Evaluating as a multi-label problem: False
2023-04-20 22:15:54,538 DEV : loss 0.11488623172044754 - f1-score (micro avg)  0.7188
2023-04-20 22:15:54,577 Epoch    24: reducing learning rate of group 0 to 5.0000e-02.
2023-04-20 22:15:54,582 BAD EPOCHS (no improvement): 4
2023-04-20 22:15:54,588 ----------------------------------------------------------------------------------------------------





2023-04-20 22:15:55,228 epoch 25 - iter 2/21 - loss 0.08463477 - time (sec): 0.64 - samples/sec: 3182.62 - lr: 0.050000
2023-04-20 22:15:56,411 epoch 25 - iter 4/21 - loss 0.09684344 - time (sec): 1.82 - samples/sec: 2257.38 - lr: 0.050000
2023-04-20 22:15:57,633 epoch 25 - iter 6/21 - loss 0.09798807 - time (sec): 3.04 - samples/sec: 2072.81 - lr: 0.050000
2023-04-20 22:15:58,921 epoch 25 - iter 8/21 - loss 0.09323655 - time (sec): 4.33 - samples/sec: 1941.11 - lr: 0.050000
2023-04-20 22:16:00,090 epoch 25 - iter 10/21 - loss 0.09467274 - time (sec): 5.50 - samples/sec: 1914.60 - lr: 0.050000
2023-04-20 22:16:02,235 epoch 25 - iter 12/21 - loss 0.09445406 - time (sec): 7.64 - samples/sec: 1655.85 - lr: 0.050000
2023-04-20 22:16:04,381 epoch 25 - iter 14/21 - loss 0.09628808 - time (sec): 9.79 - samples/sec: 1480.95 - lr: 0.050000
2023-04-20 22:16:06,405 epoch 25 - iter 16/21 - loss 0.09928113 - time (sec): 11.81 - samples/sec: 1386.61 - lr: 0.050000
2023-04-20 22:16:07,633 epoch 25 - 

100%|██████████| 5/5 [00:01<00:00,  4.39it/s]

2023-04-20 22:16:10,848 Evaluating as a multi-label problem: False
2023-04-20 22:16:10,864 DEV : loss 0.115970678627491 - f1-score (micro avg)  0.7298
2023-04-20 22:16:10,884 BAD EPOCHS (no improvement): 0
2023-04-20 22:16:10,889 saving best model





2023-04-20 22:16:12,745 ----------------------------------------------------------------------------------------------------
2023-04-20 22:16:13,182 epoch 26 - iter 2/21 - loss 0.09304214 - time (sec): 0.41 - samples/sec: 5096.36 - lr: 0.050000
2023-04-20 22:16:14,170 epoch 26 - iter 4/21 - loss 0.08660290 - time (sec): 1.40 - samples/sec: 3029.17 - lr: 0.050000
2023-04-20 22:16:15,210 epoch 26 - iter 6/21 - loss 0.08813404 - time (sec): 2.44 - samples/sec: 2629.84 - lr: 0.050000
2023-04-20 22:16:16,200 epoch 26 - iter 8/21 - loss 0.08609667 - time (sec): 3.43 - samples/sec: 2455.94 - lr: 0.050000
2023-04-20 22:16:17,450 epoch 26 - iter 10/21 - loss 0.08955025 - time (sec): 4.68 - samples/sec: 2298.05 - lr: 0.050000
2023-04-20 22:16:18,601 epoch 26 - iter 12/21 - loss 0.09179067 - time (sec): 5.83 - samples/sec: 2159.78 - lr: 0.050000
2023-04-20 22:16:19,998 epoch 26 - iter 14/21 - loss 0.09106749 - time (sec): 7.23 - samples/sec: 2014.38 - lr: 0.050000
2023-04-20 22:16:21,521 epoch 26

100%|██████████| 5/5 [00:01<00:00,  4.23it/s]

2023-04-20 22:16:26,038 Evaluating as a multi-label problem: False
2023-04-20 22:16:26,060 DEV : loss 0.11369562894105911 - f1-score (micro avg)  0.7336
2023-04-20 22:16:26,085 BAD EPOCHS (no improvement): 0
2023-04-20 22:16:26,093 saving best model





2023-04-20 22:16:27,968 ----------------------------------------------------------------------------------------------------
2023-04-20 22:16:28,424 epoch 27 - iter 2/21 - loss 0.07719821 - time (sec): 0.45 - samples/sec: 4883.37 - lr: 0.050000
2023-04-20 22:16:29,407 epoch 27 - iter 4/21 - loss 0.08095547 - time (sec): 1.43 - samples/sec: 3023.41 - lr: 0.050000
2023-04-20 22:16:30,329 epoch 27 - iter 6/21 - loss 0.08385074 - time (sec): 2.35 - samples/sec: 2594.84 - lr: 0.050000
2023-04-20 22:16:31,316 epoch 27 - iter 8/21 - loss 0.08640794 - time (sec): 3.34 - samples/sec: 2508.32 - lr: 0.050000
2023-04-20 22:16:32,298 epoch 27 - iter 10/21 - loss 0.08541401 - time (sec): 4.32 - samples/sec: 2385.13 - lr: 0.050000
2023-04-20 22:16:33,370 epoch 27 - iter 12/21 - loss 0.08522388 - time (sec): 5.39 - samples/sec: 2248.91 - lr: 0.050000
2023-04-20 22:16:34,748 epoch 27 - iter 14/21 - loss 0.08361930 - time (sec): 6.77 - samples/sec: 2121.34 - lr: 0.050000
2023-04-20 22:16:36,477 epoch 27

100%|██████████| 5/5 [00:01<00:00,  4.50it/s]

2023-04-20 22:16:41,189 Evaluating as a multi-label problem: False
2023-04-20 22:16:41,211 DEV : loss 0.11194215714931488 - f1-score (micro avg)  0.7306
2023-04-20 22:16:41,238 BAD EPOCHS (no improvement): 1
2023-04-20 22:16:41,243 ----------------------------------------------------------------------------------------------------





2023-04-20 22:16:41,754 epoch 28 - iter 2/21 - loss 0.07141683 - time (sec): 0.51 - samples/sec: 4246.49 - lr: 0.050000
2023-04-20 22:16:42,986 epoch 28 - iter 4/21 - loss 0.08498876 - time (sec): 1.74 - samples/sec: 2378.96 - lr: 0.050000
2023-04-20 22:16:44,234 epoch 28 - iter 6/21 - loss 0.08476680 - time (sec): 2.99 - samples/sec: 2058.12 - lr: 0.050000
2023-04-20 22:16:45,441 epoch 28 - iter 8/21 - loss 0.08068043 - time (sec): 4.20 - samples/sec: 1916.41 - lr: 0.050000
2023-04-20 22:16:46,743 epoch 28 - iter 10/21 - loss 0.08047004 - time (sec): 5.50 - samples/sec: 1878.00 - lr: 0.050000
2023-04-20 22:16:47,930 epoch 28 - iter 12/21 - loss 0.08046762 - time (sec): 6.69 - samples/sec: 1863.18 - lr: 0.050000
2023-04-20 22:16:49,264 epoch 28 - iter 14/21 - loss 0.07772115 - time (sec): 8.02 - samples/sec: 1808.54 - lr: 0.050000
2023-04-20 22:16:50,395 epoch 28 - iter 16/21 - loss 0.08116717 - time (sec): 9.15 - samples/sec: 1797.04 - lr: 0.050000
2023-04-20 22:16:51,521 epoch 28 - i

100%|██████████| 5/5 [00:01<00:00,  4.10it/s]

2023-04-20 22:16:54,905 Evaluating as a multi-label problem: False
2023-04-20 22:16:54,920 DEV : loss 0.11262521892786026 - f1-score (micro avg)  0.7267
2023-04-20 22:16:54,941 BAD EPOCHS (no improvement): 2
2023-04-20 22:16:54,946 ----------------------------------------------------------------------------------------------------





2023-04-20 22:16:55,377 epoch 29 - iter 2/21 - loss 0.08195539 - time (sec): 0.43 - samples/sec: 4250.69 - lr: 0.050000
2023-04-20 22:16:56,348 epoch 29 - iter 4/21 - loss 0.07772783 - time (sec): 1.40 - samples/sec: 2812.05 - lr: 0.050000
2023-04-20 22:16:57,309 epoch 29 - iter 6/21 - loss 0.07904228 - time (sec): 2.36 - samples/sec: 2546.46 - lr: 0.050000
2023-04-20 22:16:58,261 epoch 29 - iter 8/21 - loss 0.08297072 - time (sec): 3.31 - samples/sec: 2423.21 - lr: 0.050000
2023-04-20 22:16:59,250 epoch 29 - iter 10/21 - loss 0.07871473 - time (sec): 4.30 - samples/sec: 2403.86 - lr: 0.050000
2023-04-20 22:17:00,294 epoch 29 - iter 12/21 - loss 0.08305243 - time (sec): 5.35 - samples/sec: 2365.92 - lr: 0.050000
2023-04-20 22:17:01,255 epoch 29 - iter 14/21 - loss 0.08383838 - time (sec): 6.31 - samples/sec: 2302.36 - lr: 0.050000
2023-04-20 22:17:02,175 epoch 29 - iter 16/21 - loss 0.08483013 - time (sec): 7.23 - samples/sec: 2269.30 - lr: 0.050000
2023-04-20 22:17:03,085 epoch 29 - i

100%|██████████| 5/5 [00:01<00:00,  2.89it/s]

2023-04-20 22:17:06,691 Evaluating as a multi-label problem: False
2023-04-20 22:17:06,712 DEV : loss 0.1163739264011383 - f1-score (micro avg)  0.7201
2023-04-20 22:17:06,753 BAD EPOCHS (no improvement): 3
2023-04-20 22:17:06,758 ----------------------------------------------------------------------------------------------------





2023-04-20 22:17:07,362 epoch 30 - iter 2/21 - loss 0.09327351 - time (sec): 0.60 - samples/sec: 3657.79 - lr: 0.050000
2023-04-20 22:17:08,599 epoch 30 - iter 4/21 - loss 0.09105005 - time (sec): 1.84 - samples/sec: 2399.52 - lr: 0.050000
2023-04-20 22:17:09,770 epoch 30 - iter 6/21 - loss 0.08616777 - time (sec): 3.01 - samples/sec: 2223.36 - lr: 0.050000
2023-04-20 22:17:10,739 epoch 30 - iter 8/21 - loss 0.08055712 - time (sec): 3.98 - samples/sec: 2141.44 - lr: 0.050000
2023-04-20 22:17:11,710 epoch 30 - iter 10/21 - loss 0.08267365 - time (sec): 4.95 - samples/sec: 2117.29 - lr: 0.050000
2023-04-20 22:17:12,651 epoch 30 - iter 12/21 - loss 0.08246549 - time (sec): 5.89 - samples/sec: 2110.91 - lr: 0.050000
2023-04-20 22:17:13,677 epoch 30 - iter 14/21 - loss 0.08349794 - time (sec): 6.92 - samples/sec: 2099.23 - lr: 0.050000
2023-04-20 22:17:14,606 epoch 30 - iter 16/21 - loss 0.08369352 - time (sec): 7.84 - samples/sec: 2089.08 - lr: 0.050000
2023-04-20 22:17:15,573 epoch 30 - i

100%|██████████| 5/5 [00:01<00:00,  3.68it/s]

2023-04-20 22:17:18,647 Evaluating as a multi-label problem: False
2023-04-20 22:17:18,663 DEV : loss 0.11568669229745865 - f1-score (micro avg)  0.7267
2023-04-20 22:17:18,686 Epoch    30: reducing learning rate of group 0 to 2.5000e-02.
2023-04-20 22:17:18,688 BAD EPOCHS (no improvement): 4
2023-04-20 22:17:18,711 ----------------------------------------------------------------------------------------------------





2023-04-20 22:17:19,390 epoch 31 - iter 2/21 - loss 0.08392984 - time (sec): 0.68 - samples/sec: 3123.93 - lr: 0.025000
2023-04-20 22:17:20,794 epoch 31 - iter 4/21 - loss 0.07849992 - time (sec): 2.08 - samples/sec: 2008.21 - lr: 0.025000
2023-04-20 22:17:22,007 epoch 31 - iter 6/21 - loss 0.08587348 - time (sec): 3.29 - samples/sec: 1959.10 - lr: 0.025000
2023-04-20 22:17:23,136 epoch 31 - iter 8/21 - loss 0.08795978 - time (sec): 4.42 - samples/sec: 1870.98 - lr: 0.025000
2023-04-20 22:17:25,560 epoch 31 - iter 10/21 - loss 0.08572002 - time (sec): 6.85 - samples/sec: 1520.94 - lr: 0.025000
2023-04-20 22:17:26,580 epoch 31 - iter 12/21 - loss 0.08430216 - time (sec): 7.87 - samples/sec: 1593.76 - lr: 0.025000
2023-04-20 22:17:27,519 epoch 31 - iter 14/21 - loss 0.08388822 - time (sec): 8.81 - samples/sec: 1621.65 - lr: 0.025000
2023-04-20 22:17:28,514 epoch 31 - iter 16/21 - loss 0.08447136 - time (sec): 9.80 - samples/sec: 1688.21 - lr: 0.025000
2023-04-20 22:17:29,463 epoch 31 - i

100%|██████████| 5/5 [00:01<00:00,  4.70it/s]

2023-04-20 22:17:32,206 Evaluating as a multi-label problem: False
2023-04-20 22:17:32,222 DEV : loss 0.10903520882129669 - f1-score (micro avg)  0.7409
2023-04-20 22:17:32,243 BAD EPOCHS (no improvement): 0
2023-04-20 22:17:32,248 saving best model





2023-04-20 22:17:34,010 ----------------------------------------------------------------------------------------------------
2023-04-20 22:17:34,447 epoch 32 - iter 2/21 - loss 0.08334329 - time (sec): 0.42 - samples/sec: 4419.45 - lr: 0.025000
2023-04-20 22:17:35,693 epoch 32 - iter 4/21 - loss 0.08445656 - time (sec): 1.67 - samples/sec: 2255.80 - lr: 0.025000
2023-04-20 22:17:36,828 epoch 32 - iter 6/21 - loss 0.08288036 - time (sec): 2.80 - samples/sec: 2039.51 - lr: 0.025000
2023-04-20 22:17:37,976 epoch 32 - iter 8/21 - loss 0.08240867 - time (sec): 3.95 - samples/sec: 1943.46 - lr: 0.025000
2023-04-20 22:17:39,205 epoch 32 - iter 10/21 - loss 0.07861794 - time (sec): 5.18 - samples/sec: 1879.90 - lr: 0.025000
2023-04-20 22:17:40,433 epoch 32 - iter 12/21 - loss 0.07517071 - time (sec): 6.41 - samples/sec: 1846.00 - lr: 0.025000
2023-04-20 22:17:41,447 epoch 32 - iter 14/21 - loss 0.07677720 - time (sec): 7.42 - samples/sec: 1879.77 - lr: 0.025000
2023-04-20 22:17:42,670 epoch 32

100%|██████████| 5/5 [00:01<00:00,  3.98it/s]

2023-04-20 22:17:47,304 Evaluating as a multi-label problem: False
2023-04-20 22:17:47,320 DEV : loss 0.104422926902771 - f1-score (micro avg)  0.7295
2023-04-20 22:17:47,340 BAD EPOCHS (no improvement): 1
2023-04-20 22:17:47,346 ----------------------------------------------------------------------------------------------------





2023-04-20 22:17:47,756 epoch 33 - iter 2/21 - loss 0.06720060 - time (sec): 0.41 - samples/sec: 4994.77 - lr: 0.025000
2023-04-20 22:17:48,850 epoch 33 - iter 4/21 - loss 0.06404106 - time (sec): 1.50 - samples/sec: 2903.09 - lr: 0.025000
2023-04-20 22:17:49,820 epoch 33 - iter 6/21 - loss 0.06467626 - time (sec): 2.47 - samples/sec: 2618.05 - lr: 0.025000
2023-04-20 22:17:50,939 epoch 33 - iter 8/21 - loss 0.06710625 - time (sec): 3.59 - samples/sec: 2352.88 - lr: 0.025000
2023-04-20 22:17:52,093 epoch 33 - iter 10/21 - loss 0.06956877 - time (sec): 4.75 - samples/sec: 2161.00 - lr: 0.025000
2023-04-20 22:17:53,325 epoch 33 - iter 12/21 - loss 0.07101702 - time (sec): 5.98 - samples/sec: 2067.16 - lr: 0.025000
2023-04-20 22:17:54,507 epoch 33 - iter 14/21 - loss 0.07134342 - time (sec): 7.16 - samples/sec: 1977.20 - lr: 0.025000
2023-04-20 22:17:55,703 epoch 33 - iter 16/21 - loss 0.07429884 - time (sec): 8.36 - samples/sec: 1918.96 - lr: 0.025000
2023-04-20 22:17:56,699 epoch 33 - i

100%|██████████| 5/5 [00:01<00:00,  4.64it/s]

2023-04-20 22:17:59,475 Evaluating as a multi-label problem: False
2023-04-20 22:17:59,490 DEV : loss 0.11024170368909836 - f1-score (micro avg)  0.7234
2023-04-20 22:17:59,513 BAD EPOCHS (no improvement): 2
2023-04-20 22:17:59,519 ----------------------------------------------------------------------------------------------------





2023-04-20 22:17:59,902 epoch 34 - iter 2/21 - loss 0.06790766 - time (sec): 0.38 - samples/sec: 5929.51 - lr: 0.025000
2023-04-20 22:18:00,874 epoch 34 - iter 4/21 - loss 0.07265424 - time (sec): 1.35 - samples/sec: 3014.10 - lr: 0.025000
2023-04-20 22:18:01,791 epoch 34 - iter 6/21 - loss 0.06932901 - time (sec): 2.27 - samples/sec: 2670.34 - lr: 0.025000
2023-04-20 22:18:02,832 epoch 34 - iter 8/21 - loss 0.06761708 - time (sec): 3.31 - samples/sec: 2518.46 - lr: 0.025000
2023-04-20 22:18:03,752 epoch 34 - iter 10/21 - loss 0.07047236 - time (sec): 4.23 - samples/sec: 2402.81 - lr: 0.025000
2023-04-20 22:18:04,683 epoch 34 - iter 12/21 - loss 0.07218609 - time (sec): 5.16 - samples/sec: 2397.98 - lr: 0.025000
2023-04-20 22:18:05,670 epoch 34 - iter 14/21 - loss 0.07294837 - time (sec): 6.15 - samples/sec: 2344.27 - lr: 0.025000
2023-04-20 22:18:06,790 epoch 34 - iter 16/21 - loss 0.07102440 - time (sec): 7.27 - samples/sec: 2276.25 - lr: 0.025000
2023-04-20 22:18:07,912 epoch 34 - i

100%|██████████| 5/5 [00:01<00:00,  3.40it/s]

2023-04-20 22:18:11,542 Evaluating as a multi-label problem: False
2023-04-20 22:18:11,558 DEV : loss 0.10955272614955902 - f1-score (micro avg)  0.7226
2023-04-20 22:18:11,584 BAD EPOCHS (no improvement): 3
2023-04-20 22:18:11,589 ----------------------------------------------------------------------------------------------------





2023-04-20 22:18:12,007 epoch 35 - iter 2/21 - loss 0.08353557 - time (sec): 0.42 - samples/sec: 4610.27 - lr: 0.025000
2023-04-20 22:18:12,991 epoch 35 - iter 4/21 - loss 0.07428944 - time (sec): 1.40 - samples/sec: 2880.38 - lr: 0.025000
2023-04-20 22:18:13,965 epoch 35 - iter 6/21 - loss 0.07009802 - time (sec): 2.37 - samples/sec: 2558.09 - lr: 0.025000
2023-04-20 22:18:14,921 epoch 35 - iter 8/21 - loss 0.06807710 - time (sec): 3.33 - samples/sec: 2416.66 - lr: 0.025000
2023-04-20 22:18:15,843 epoch 35 - iter 10/21 - loss 0.07140889 - time (sec): 4.25 - samples/sec: 2317.29 - lr: 0.025000
2023-04-20 22:18:16,815 epoch 35 - iter 12/21 - loss 0.07411346 - time (sec): 5.22 - samples/sec: 2323.91 - lr: 0.025000
2023-04-20 22:18:17,750 epoch 35 - iter 14/21 - loss 0.07387825 - time (sec): 6.16 - samples/sec: 2286.77 - lr: 0.025000
2023-04-20 22:18:18,700 epoch 35 - iter 16/21 - loss 0.07311303 - time (sec): 7.11 - samples/sec: 2302.65 - lr: 0.025000
2023-04-20 22:18:19,755 epoch 35 - i

100%|██████████| 5/5 [00:01<00:00,  2.87it/s]

2023-04-20 22:18:23,341 Evaluating as a multi-label problem: False
2023-04-20 22:18:23,367 DEV : loss 0.11042081564664841 - f1-score (micro avg)  0.7093
2023-04-20 22:18:23,419 Epoch    35: reducing learning rate of group 0 to 1.2500e-02.
2023-04-20 22:18:23,423 BAD EPOCHS (no improvement): 4
2023-04-20 22:18:23,431 ----------------------------------------------------------------------------------------------------





2023-04-20 22:18:23,971 epoch 36 - iter 2/21 - loss 0.07975244 - time (sec): 0.54 - samples/sec: 3587.00 - lr: 0.012500
2023-04-20 22:18:25,305 epoch 36 - iter 4/21 - loss 0.08148095 - time (sec): 1.87 - samples/sec: 2186.10 - lr: 0.012500
2023-04-20 22:18:26,457 epoch 36 - iter 6/21 - loss 0.07661010 - time (sec): 3.02 - samples/sec: 2025.20 - lr: 0.012500
2023-04-20 22:18:27,429 epoch 36 - iter 8/21 - loss 0.07514027 - time (sec): 4.00 - samples/sec: 1997.82 - lr: 0.012500
2023-04-20 22:18:28,435 epoch 36 - iter 10/21 - loss 0.07045334 - time (sec): 5.00 - samples/sec: 2058.15 - lr: 0.012500
2023-04-20 22:18:29,379 epoch 36 - iter 12/21 - loss 0.07192489 - time (sec): 5.95 - samples/sec: 2087.53 - lr: 0.012500
2023-04-20 22:18:30,314 epoch 36 - iter 14/21 - loss 0.07167355 - time (sec): 6.88 - samples/sec: 2117.06 - lr: 0.012500
2023-04-20 22:18:31,254 epoch 36 - iter 16/21 - loss 0.07331193 - time (sec): 7.82 - samples/sec: 2112.37 - lr: 0.012500
2023-04-20 22:18:32,205 epoch 36 - i

100%|██████████| 5/5 [00:01<00:00,  4.58it/s]

2023-04-20 22:18:35,019 Evaluating as a multi-label problem: False
2023-04-20 22:18:35,044 DEV : loss 0.11134158074855804 - f1-score (micro avg)  0.7215
2023-04-20 22:18:35,066 BAD EPOCHS (no improvement): 1
2023-04-20 22:18:35,072 ----------------------------------------------------------------------------------------------------





2023-04-20 22:18:35,472 epoch 37 - iter 2/21 - loss 0.06280685 - time (sec): 0.40 - samples/sec: 5124.19 - lr: 0.012500
2023-04-20 22:18:36,605 epoch 37 - iter 4/21 - loss 0.06808470 - time (sec): 1.53 - samples/sec: 2704.62 - lr: 0.012500
2023-04-20 22:18:37,713 epoch 37 - iter 6/21 - loss 0.06916187 - time (sec): 2.64 - samples/sec: 2312.98 - lr: 0.012500
2023-04-20 22:18:38,903 epoch 37 - iter 8/21 - loss 0.06791505 - time (sec): 3.83 - samples/sec: 2120.47 - lr: 0.012500
2023-04-20 22:18:40,214 epoch 37 - iter 10/21 - loss 0.06757006 - time (sec): 5.14 - samples/sec: 1993.70 - lr: 0.012500
2023-04-20 22:18:41,513 epoch 37 - iter 12/21 - loss 0.06624102 - time (sec): 6.44 - samples/sec: 1927.46 - lr: 0.012500
2023-04-20 22:18:42,563 epoch 37 - iter 14/21 - loss 0.06982104 - time (sec): 7.49 - samples/sec: 1947.12 - lr: 0.012500
2023-04-20 22:18:43,509 epoch 37 - iter 16/21 - loss 0.07107112 - time (sec): 8.44 - samples/sec: 1969.75 - lr: 0.012500
2023-04-20 22:18:44,459 epoch 37 - i

100%|██████████| 5/5 [00:01<00:00,  4.54it/s]

2023-04-20 22:18:47,294 Evaluating as a multi-label problem: False
2023-04-20 22:18:47,310 DEV : loss 0.11033914983272552 - f1-score (micro avg)  0.7173
2023-04-20 22:18:47,330 BAD EPOCHS (no improvement): 2
2023-04-20 22:18:47,335 ----------------------------------------------------------------------------------------------------





2023-04-20 22:18:47,727 epoch 38 - iter 2/21 - loss 0.05455446 - time (sec): 0.39 - samples/sec: 5250.47 - lr: 0.012500
2023-04-20 22:18:48,968 epoch 38 - iter 4/21 - loss 0.06111969 - time (sec): 1.63 - samples/sec: 2501.29 - lr: 0.012500
2023-04-20 22:18:50,264 epoch 38 - iter 6/21 - loss 0.06674653 - time (sec): 2.93 - samples/sec: 2139.72 - lr: 0.012500
2023-04-20 22:18:51,884 epoch 38 - iter 8/21 - loss 0.06501243 - time (sec): 4.55 - samples/sec: 1843.16 - lr: 0.012500
2023-04-20 22:18:53,965 epoch 38 - iter 10/21 - loss 0.06471219 - time (sec): 6.63 - samples/sec: 1537.87 - lr: 0.012500
2023-04-20 22:18:56,109 epoch 38 - iter 12/21 - loss 0.06971064 - time (sec): 8.77 - samples/sec: 1395.28 - lr: 0.012500
2023-04-20 22:18:57,529 epoch 38 - iter 14/21 - loss 0.06909689 - time (sec): 10.19 - samples/sec: 1383.28 - lr: 0.012500
2023-04-20 22:18:58,660 epoch 38 - iter 16/21 - loss 0.06878636 - time (sec): 11.32 - samples/sec: 1429.67 - lr: 0.012500
2023-04-20 22:18:59,730 epoch 38 -

100%|██████████| 5/5 [00:01<00:00,  3.90it/s]

2023-04-20 22:19:03,064 Evaluating as a multi-label problem: False
2023-04-20 22:19:03,084 DEV : loss 0.11245700716972351 - f1-score (micro avg)  0.7234
2023-04-20 22:19:03,106 BAD EPOCHS (no improvement): 3
2023-04-20 22:19:03,111 ----------------------------------------------------------------------------------------------------





2023-04-20 22:19:03,580 epoch 39 - iter 2/21 - loss 0.06701339 - time (sec): 0.46 - samples/sec: 4733.20 - lr: 0.012500
2023-04-20 22:19:04,691 epoch 39 - iter 4/21 - loss 0.05648654 - time (sec): 1.57 - samples/sec: 2764.58 - lr: 0.012500
2023-04-20 22:19:05,750 epoch 39 - iter 6/21 - loss 0.06158263 - time (sec): 2.63 - samples/sec: 2430.33 - lr: 0.012500
2023-04-20 22:19:06,879 epoch 39 - iter 8/21 - loss 0.06009785 - time (sec): 3.76 - samples/sec: 2218.67 - lr: 0.012500
2023-04-20 22:19:08,013 epoch 39 - iter 10/21 - loss 0.06560651 - time (sec): 4.89 - samples/sec: 2110.89 - lr: 0.012500
2023-04-20 22:19:09,262 epoch 39 - iter 12/21 - loss 0.06483833 - time (sec): 6.14 - samples/sec: 1988.27 - lr: 0.012500
2023-04-20 22:19:10,644 epoch 39 - iter 14/21 - loss 0.06656463 - time (sec): 7.52 - samples/sec: 1900.50 - lr: 0.012500
2023-04-20 22:19:12,686 epoch 39 - iter 16/21 - loss 0.06623946 - time (sec): 9.56 - samples/sec: 1707.64 - lr: 0.012500
2023-04-20 22:19:14,197 epoch 39 - i

100%|██████████| 5/5 [00:01<00:00,  2.63it/s]

2023-04-20 22:19:18,586 Evaluating as a multi-label problem: False
2023-04-20 22:19:18,608 DEV : loss 0.10727524012327194 - f1-score (micro avg)  0.7284
2023-04-20 22:19:18,657 Epoch    39: reducing learning rate of group 0 to 6.2500e-03.
2023-04-20 22:19:18,660 BAD EPOCHS (no improvement): 4
2023-04-20 22:19:18,668 ----------------------------------------------------------------------------------------------------





2023-04-20 22:19:19,177 epoch 40 - iter 2/21 - loss 0.08300242 - time (sec): 0.50 - samples/sec: 3588.31 - lr: 0.006250
2023-04-20 22:19:20,402 epoch 40 - iter 4/21 - loss 0.08070906 - time (sec): 1.73 - samples/sec: 2283.12 - lr: 0.006250
2023-04-20 22:19:21,590 epoch 40 - iter 6/21 - loss 0.07694147 - time (sec): 2.92 - samples/sec: 2083.92 - lr: 0.006250
2023-04-20 22:19:22,927 epoch 40 - iter 8/21 - loss 0.07780329 - time (sec): 4.26 - samples/sec: 1909.04 - lr: 0.006250
2023-04-20 22:19:23,937 epoch 40 - iter 10/21 - loss 0.07576125 - time (sec): 5.27 - samples/sec: 1912.04 - lr: 0.006250
2023-04-20 22:19:25,058 epoch 40 - iter 12/21 - loss 0.07304507 - time (sec): 6.39 - samples/sec: 1902.14 - lr: 0.006250
2023-04-20 22:19:26,219 epoch 40 - iter 14/21 - loss 0.07226917 - time (sec): 7.55 - samples/sec: 1896.18 - lr: 0.006250
2023-04-20 22:19:27,376 epoch 40 - iter 16/21 - loss 0.07374036 - time (sec): 8.70 - samples/sec: 1877.34 - lr: 0.006250
2023-04-20 22:19:28,616 epoch 40 - i

100%|██████████| 5/5 [00:01<00:00,  4.42it/s]

2023-04-20 22:19:31,822 Evaluating as a multi-label problem: False
2023-04-20 22:19:31,839 DEV : loss 0.10869959741830826 - f1-score (micro avg)  0.7223
2023-04-20 22:19:31,860 BAD EPOCHS (no improvement): 1
2023-04-20 22:19:31,865 ----------------------------------------------------------------------------------------------------





2023-04-20 22:19:32,252 epoch 41 - iter 2/21 - loss 0.06207089 - time (sec): 0.38 - samples/sec: 5141.48 - lr: 0.006250
2023-04-20 22:19:33,216 epoch 41 - iter 4/21 - loss 0.06083980 - time (sec): 1.35 - samples/sec: 3010.57 - lr: 0.006250
2023-04-20 22:19:34,175 epoch 41 - iter 6/21 - loss 0.06370834 - time (sec): 2.31 - samples/sec: 2637.12 - lr: 0.006250
2023-04-20 22:19:35,132 epoch 41 - iter 8/21 - loss 0.06509014 - time (sec): 3.26 - samples/sec: 2433.00 - lr: 0.006250
2023-04-20 22:19:36,079 epoch 41 - iter 10/21 - loss 0.06688402 - time (sec): 4.21 - samples/sec: 2361.02 - lr: 0.006250
2023-04-20 22:19:37,079 epoch 41 - iter 12/21 - loss 0.06658811 - time (sec): 5.21 - samples/sec: 2321.58 - lr: 0.006250
2023-04-20 22:19:38,036 epoch 41 - iter 14/21 - loss 0.07022729 - time (sec): 6.17 - samples/sec: 2291.59 - lr: 0.006250
2023-04-20 22:19:38,921 epoch 41 - iter 16/21 - loss 0.06938953 - time (sec): 7.05 - samples/sec: 2274.06 - lr: 0.006250
2023-04-20 22:19:39,955 epoch 41 - i

100%|██████████| 5/5 [00:01<00:00,  2.88it/s]

2023-04-20 22:19:43,876 Evaluating as a multi-label problem: False
2023-04-20 22:19:43,900 DEV : loss 0.10915666818618774 - f1-score (micro avg)  0.7275
2023-04-20 22:19:43,935 BAD EPOCHS (no improvement): 2
2023-04-20 22:19:43,940 ----------------------------------------------------------------------------------------------------





2023-04-20 22:19:44,510 epoch 42 - iter 2/21 - loss 0.07991738 - time (sec): 0.57 - samples/sec: 3558.52 - lr: 0.006250
2023-04-20 22:19:45,625 epoch 42 - iter 4/21 - loss 0.07663770 - time (sec): 1.68 - samples/sec: 2557.40 - lr: 0.006250
2023-04-20 22:19:46,609 epoch 42 - iter 6/21 - loss 0.07374214 - time (sec): 2.67 - samples/sec: 2321.47 - lr: 0.006250
2023-04-20 22:19:47,529 epoch 42 - iter 8/21 - loss 0.07205967 - time (sec): 3.59 - samples/sec: 2318.70 - lr: 0.006250
2023-04-20 22:19:48,487 epoch 42 - iter 10/21 - loss 0.06903439 - time (sec): 4.55 - samples/sec: 2261.32 - lr: 0.006250
2023-04-20 22:19:49,454 epoch 42 - iter 12/21 - loss 0.06897321 - time (sec): 5.51 - samples/sec: 2247.90 - lr: 0.006250
2023-04-20 22:19:50,361 epoch 42 - iter 14/21 - loss 0.07083424 - time (sec): 6.42 - samples/sec: 2239.12 - lr: 0.006250
2023-04-20 22:19:51,302 epoch 42 - iter 16/21 - loss 0.06768972 - time (sec): 7.36 - samples/sec: 2241.15 - lr: 0.006250
2023-04-20 22:19:52,275 epoch 42 - i

100%|██████████| 5/5 [00:01<00:00,  4.38it/s]

2023-04-20 22:19:55,073 Evaluating as a multi-label problem: False
2023-04-20 22:19:55,092 DEV : loss 0.10922128707170486 - f1-score (micro avg)  0.7287
2023-04-20 22:19:55,119 BAD EPOCHS (no improvement): 3
2023-04-20 22:19:55,125 ----------------------------------------------------------------------------------------------------





2023-04-20 22:19:55,657 epoch 43 - iter 2/21 - loss 0.05880409 - time (sec): 0.53 - samples/sec: 3918.63 - lr: 0.006250
2023-04-20 22:19:56,798 epoch 43 - iter 4/21 - loss 0.05771766 - time (sec): 1.67 - samples/sec: 2516.75 - lr: 0.006250
2023-04-20 22:19:57,952 epoch 43 - iter 6/21 - loss 0.06061314 - time (sec): 2.82 - samples/sec: 2187.45 - lr: 0.006250
2023-04-20 22:19:59,284 epoch 43 - iter 8/21 - loss 0.06500877 - time (sec): 4.15 - samples/sec: 1952.16 - lr: 0.006250
2023-04-20 22:20:00,481 epoch 43 - iter 10/21 - loss 0.06548288 - time (sec): 5.35 - samples/sec: 1887.01 - lr: 0.006250
2023-04-20 22:20:01,544 epoch 43 - iter 12/21 - loss 0.06673203 - time (sec): 6.41 - samples/sec: 1932.43 - lr: 0.006250
2023-04-20 22:20:02,515 epoch 43 - iter 14/21 - loss 0.06430942 - time (sec): 7.39 - samples/sec: 1937.94 - lr: 0.006250
2023-04-20 22:20:03,482 epoch 43 - iter 16/21 - loss 0.06322194 - time (sec): 8.35 - samples/sec: 1943.15 - lr: 0.006250
2023-04-20 22:20:04,414 epoch 43 - i

100%|██████████| 5/5 [00:02<00:00,  2.27it/s]

2023-04-20 22:20:08,351 Evaluating as a multi-label problem: False
2023-04-20 22:20:08,366 DEV : loss 0.10871978104114532 - f1-score (micro avg)  0.7256
2023-04-20 22:20:08,402 Epoch    43: reducing learning rate of group 0 to 3.1250e-03.
2023-04-20 22:20:08,403 BAD EPOCHS (no improvement): 4
2023-04-20 22:20:08,411 ----------------------------------------------------------------------------------------------------





2023-04-20 22:20:08,822 epoch 44 - iter 2/21 - loss 0.07274930 - time (sec): 0.41 - samples/sec: 5173.90 - lr: 0.003125
2023-04-20 22:20:09,825 epoch 44 - iter 4/21 - loss 0.06252033 - time (sec): 1.41 - samples/sec: 2923.12 - lr: 0.003125
2023-04-20 22:20:10,905 epoch 44 - iter 6/21 - loss 0.06879024 - time (sec): 2.49 - samples/sec: 2453.25 - lr: 0.003125
2023-04-20 22:20:11,986 epoch 44 - iter 8/21 - loss 0.06588900 - time (sec): 3.57 - samples/sec: 2244.93 - lr: 0.003125
2023-04-20 22:20:13,235 epoch 44 - iter 10/21 - loss 0.06709873 - time (sec): 4.82 - samples/sec: 2089.68 - lr: 0.003125
2023-04-20 22:20:14,403 epoch 44 - iter 12/21 - loss 0.06649113 - time (sec): 5.99 - samples/sec: 1997.32 - lr: 0.003125
2023-04-20 22:20:15,721 epoch 44 - iter 14/21 - loss 0.06751689 - time (sec): 7.31 - samples/sec: 1940.37 - lr: 0.003125
2023-04-20 22:20:16,772 epoch 44 - iter 16/21 - loss 0.06671951 - time (sec): 8.36 - samples/sec: 1953.34 - lr: 0.003125
2023-04-20 22:20:17,716 epoch 44 - i

100%|██████████| 5/5 [00:01<00:00,  4.35it/s]

2023-04-20 22:20:20,586 Evaluating as a multi-label problem: False
2023-04-20 22:20:20,605 DEV : loss 0.10901409387588501 - f1-score (micro avg)  0.7245
2023-04-20 22:20:20,627 BAD EPOCHS (no improvement): 1
2023-04-20 22:20:20,633 ----------------------------------------------------------------------------------------------------





2023-04-20 22:20:21,041 epoch 45 - iter 2/21 - loss 0.06281534 - time (sec): 0.41 - samples/sec: 4794.71 - lr: 0.003125
2023-04-20 22:20:22,005 epoch 45 - iter 4/21 - loss 0.07376054 - time (sec): 1.37 - samples/sec: 2818.94 - lr: 0.003125
2023-04-20 22:20:22,973 epoch 45 - iter 6/21 - loss 0.07034159 - time (sec): 2.34 - samples/sec: 2597.97 - lr: 0.003125
2023-04-20 22:20:23,936 epoch 45 - iter 8/21 - loss 0.06500812 - time (sec): 3.30 - samples/sec: 2511.49 - lr: 0.003125
2023-04-20 22:20:24,867 epoch 45 - iter 10/21 - loss 0.06281759 - time (sec): 4.23 - samples/sec: 2413.27 - lr: 0.003125
2023-04-20 22:20:25,762 epoch 45 - iter 12/21 - loss 0.06440643 - time (sec): 5.13 - samples/sec: 2304.60 - lr: 0.003125
2023-04-20 22:20:27,068 epoch 45 - iter 14/21 - loss 0.06301531 - time (sec): 6.43 - samples/sec: 2235.56 - lr: 0.003125
2023-04-20 22:20:28,168 epoch 45 - iter 16/21 - loss 0.06250278 - time (sec): 7.53 - samples/sec: 2147.46 - lr: 0.003125
2023-04-20 22:20:29,350 epoch 45 - i

100%|██████████| 5/5 [00:01<00:00,  4.35it/s]

2023-04-20 22:20:32,673 Evaluating as a multi-label problem: False
2023-04-20 22:20:32,689 DEV : loss 0.1089635118842125 - f1-score (micro avg)  0.7256
2023-04-20 22:20:32,710 BAD EPOCHS (no improvement): 2
2023-04-20 22:20:32,714 ----------------------------------------------------------------------------------------------------





2023-04-20 22:20:33,203 epoch 46 - iter 2/21 - loss 0.04422338 - time (sec): 0.48 - samples/sec: 4557.15 - lr: 0.003125
2023-04-20 22:20:34,156 epoch 46 - iter 4/21 - loss 0.05333923 - time (sec): 1.44 - samples/sec: 2935.19 - lr: 0.003125
2023-04-20 22:20:35,112 epoch 46 - iter 6/21 - loss 0.05915428 - time (sec): 2.39 - samples/sec: 2616.54 - lr: 0.003125
2023-04-20 22:20:36,043 epoch 46 - iter 8/21 - loss 0.05841947 - time (sec): 3.33 - samples/sec: 2477.03 - lr: 0.003125
2023-04-20 22:20:37,015 epoch 46 - iter 10/21 - loss 0.06637539 - time (sec): 4.30 - samples/sec: 2408.17 - lr: 0.003125
2023-04-20 22:20:38,012 epoch 46 - iter 12/21 - loss 0.06615636 - time (sec): 5.29 - samples/sec: 2358.59 - lr: 0.003125
2023-04-20 22:20:38,945 epoch 46 - iter 14/21 - loss 0.06711434 - time (sec): 6.23 - samples/sec: 2325.59 - lr: 0.003125
2023-04-20 22:20:39,852 epoch 46 - iter 16/21 - loss 0.06770668 - time (sec): 7.13 - samples/sec: 2311.61 - lr: 0.003125
2023-04-20 22:20:40,798 epoch 46 - i

100%|██████████| 5/5 [00:01<00:00,  2.90it/s]

2023-04-20 22:20:44,459 Evaluating as a multi-label problem: False
2023-04-20 22:20:44,480 DEV : loss 0.10892245173454285 - f1-score (micro avg)  0.7256
2023-04-20 22:20:44,518 BAD EPOCHS (no improvement): 3
2023-04-20 22:20:44,524 ----------------------------------------------------------------------------------------------------





2023-04-20 22:20:45,100 epoch 47 - iter 2/21 - loss 0.07562977 - time (sec): 0.57 - samples/sec: 3516.78 - lr: 0.003125
2023-04-20 22:20:46,233 epoch 47 - iter 4/21 - loss 0.08091167 - time (sec): 1.71 - samples/sec: 2330.46 - lr: 0.003125
2023-04-20 22:20:47,343 epoch 47 - iter 6/21 - loss 0.08132421 - time (sec): 2.82 - samples/sec: 2070.91 - lr: 0.003125
2023-04-20 22:20:48,392 epoch 47 - iter 8/21 - loss 0.07664143 - time (sec): 3.86 - samples/sec: 2145.43 - lr: 0.003125
2023-04-20 22:20:49,371 epoch 47 - iter 10/21 - loss 0.07635901 - time (sec): 4.84 - samples/sec: 2106.55 - lr: 0.003125
2023-04-20 22:20:50,309 epoch 47 - iter 12/21 - loss 0.07302415 - time (sec): 5.78 - samples/sec: 2130.47 - lr: 0.003125
2023-04-20 22:20:51,237 epoch 47 - iter 14/21 - loss 0.06989214 - time (sec): 6.71 - samples/sec: 2158.84 - lr: 0.003125
2023-04-20 22:20:52,192 epoch 47 - iter 16/21 - loss 0.07147576 - time (sec): 7.66 - samples/sec: 2145.81 - lr: 0.003125
2023-04-20 22:20:53,126 epoch 47 - i

100%|██████████| 5/5 [00:01<00:00,  4.45it/s]

2023-04-20 22:20:55,987 Evaluating as a multi-label problem: False
2023-04-20 22:20:56,002 DEV : loss 0.10873625427484512 - f1-score (micro avg)  0.7256
2023-04-20 22:20:56,024 Epoch    47: reducing learning rate of group 0 to 1.5625e-03.
2023-04-20 22:20:56,026 BAD EPOCHS (no improvement): 4
2023-04-20 22:20:56,032 ----------------------------------------------------------------------------------------------------





2023-04-20 22:20:56,433 epoch 48 - iter 2/21 - loss 0.07856068 - time (sec): 0.40 - samples/sec: 5310.49 - lr: 0.001563
2023-04-20 22:20:57,550 epoch 48 - iter 4/21 - loss 0.06583065 - time (sec): 1.51 - samples/sec: 2910.92 - lr: 0.001563
2023-04-20 22:20:58,642 epoch 48 - iter 6/21 - loss 0.06164931 - time (sec): 2.60 - samples/sec: 2463.37 - lr: 0.001563
2023-04-20 22:20:59,750 epoch 48 - iter 8/21 - loss 0.07021720 - time (sec): 3.71 - samples/sec: 2241.68 - lr: 0.001563
2023-04-20 22:21:00,879 epoch 48 - iter 10/21 - loss 0.07061452 - time (sec): 4.84 - samples/sec: 2102.53 - lr: 0.001563
2023-04-20 22:21:02,147 epoch 48 - iter 12/21 - loss 0.07078944 - time (sec): 6.11 - samples/sec: 2008.04 - lr: 0.001563
2023-04-20 22:21:03,176 epoch 48 - iter 14/21 - loss 0.07061578 - time (sec): 7.14 - samples/sec: 2002.97 - lr: 0.001563
2023-04-20 22:21:04,237 epoch 48 - iter 16/21 - loss 0.07072352 - time (sec): 8.20 - samples/sec: 2004.88 - lr: 0.001563
2023-04-20 22:21:05,176 epoch 48 - i

100%|██████████| 5/5 [00:01<00:00,  4.54it/s]

2023-04-20 22:21:08,013 Evaluating as a multi-label problem: False
2023-04-20 22:21:08,030 DEV : loss 0.10924031585454941 - f1-score (micro avg)  0.7256
2023-04-20 22:21:08,051 BAD EPOCHS (no improvement): 1
2023-04-20 22:21:08,057 ----------------------------------------------------------------------------------------------------





2023-04-20 22:21:08,465 epoch 49 - iter 2/21 - loss 0.07167473 - time (sec): 0.40 - samples/sec: 4958.16 - lr: 0.001563
2023-04-20 22:21:09,448 epoch 49 - iter 4/21 - loss 0.06556137 - time (sec): 1.39 - samples/sec: 3018.34 - lr: 0.001563
2023-04-20 22:21:10,430 epoch 49 - iter 6/21 - loss 0.06474045 - time (sec): 2.37 - samples/sec: 2628.75 - lr: 0.001563
2023-04-20 22:21:11,370 epoch 49 - iter 8/21 - loss 0.06987935 - time (sec): 3.31 - samples/sec: 2490.88 - lr: 0.001563
2023-04-20 22:21:12,301 epoch 49 - iter 10/21 - loss 0.06849141 - time (sec): 4.24 - samples/sec: 2441.92 - lr: 0.001563
2023-04-20 22:21:13,422 epoch 49 - iter 12/21 - loss 0.06731139 - time (sec): 5.36 - samples/sec: 2282.61 - lr: 0.001563
2023-04-20 22:21:14,598 epoch 49 - iter 14/21 - loss 0.06883587 - time (sec): 6.54 - samples/sec: 2180.49 - lr: 0.001563
2023-04-20 22:21:15,903 epoch 49 - iter 16/21 - loss 0.06608551 - time (sec): 7.84 - samples/sec: 2076.21 - lr: 0.001563
2023-04-20 22:21:17,135 epoch 49 - i

100%|██████████| 5/5 [00:01<00:00,  4.43it/s]

2023-04-20 22:21:20,159 Evaluating as a multi-label problem: False
2023-04-20 22:21:20,175 DEV : loss 0.10903137922286987 - f1-score (micro avg)  0.7287
2023-04-20 22:21:20,198 BAD EPOCHS (no improvement): 2
2023-04-20 22:21:20,214 ----------------------------------------------------------------------------------------------------





2023-04-20 22:21:20,670 epoch 50 - iter 2/21 - loss 0.06575357 - time (sec): 0.45 - samples/sec: 4837.65 - lr: 0.001563
2023-04-20 22:21:21,644 epoch 50 - iter 4/21 - loss 0.06351379 - time (sec): 1.42 - samples/sec: 3056.36 - lr: 0.001563
2023-04-20 22:21:22,579 epoch 50 - iter 6/21 - loss 0.06367403 - time (sec): 2.36 - samples/sec: 2684.76 - lr: 0.001563
2023-04-20 22:21:23,582 epoch 50 - iter 8/21 - loss 0.06843917 - time (sec): 3.36 - samples/sec: 2507.21 - lr: 0.001563
2023-04-20 22:21:24,550 epoch 50 - iter 10/21 - loss 0.06476448 - time (sec): 4.33 - samples/sec: 2419.32 - lr: 0.001563
2023-04-20 22:21:25,491 epoch 50 - iter 12/21 - loss 0.06649902 - time (sec): 5.27 - samples/sec: 2386.37 - lr: 0.001563
2023-04-20 22:21:26,421 epoch 50 - iter 14/21 - loss 0.06712112 - time (sec): 6.20 - samples/sec: 2370.17 - lr: 0.001563
2023-04-20 22:21:27,316 epoch 50 - iter 16/21 - loss 0.06725820 - time (sec): 7.10 - samples/sec: 2311.30 - lr: 0.001563
2023-04-20 22:21:28,354 epoch 50 - i

100%|██████████| 5/5 [00:01<00:00,  2.93it/s]

2023-04-20 22:21:32,107 Evaluating as a multi-label problem: False
2023-04-20 22:21:32,129 DEV : loss 0.10864727199077606 - f1-score (micro avg)  0.7287
2023-04-20 22:21:32,165 BAD EPOCHS (no improvement): 3
2023-04-20 22:21:32,170 ----------------------------------------------------------------------------------------------------





2023-04-20 22:21:32,664 epoch 51 - iter 2/21 - loss 0.05394393 - time (sec): 0.49 - samples/sec: 3759.25 - lr: 0.001563
2023-04-20 22:21:33,804 epoch 51 - iter 4/21 - loss 0.06336690 - time (sec): 1.63 - samples/sec: 2372.42 - lr: 0.001563
2023-04-20 22:21:34,776 epoch 51 - iter 6/21 - loss 0.06272037 - time (sec): 2.60 - samples/sec: 2321.73 - lr: 0.001563
2023-04-20 22:21:35,718 epoch 51 - iter 8/21 - loss 0.06208350 - time (sec): 3.54 - samples/sec: 2250.21 - lr: 0.001563
2023-04-20 22:21:36,666 epoch 51 - iter 10/21 - loss 0.06471859 - time (sec): 4.49 - samples/sec: 2189.62 - lr: 0.001563
2023-04-20 22:21:37,618 epoch 51 - iter 12/21 - loss 0.06177753 - time (sec): 5.44 - samples/sec: 2254.48 - lr: 0.001563
2023-04-20 22:21:38,575 epoch 51 - iter 14/21 - loss 0.06057762 - time (sec): 6.40 - samples/sec: 2258.00 - lr: 0.001563
2023-04-20 22:21:39,490 epoch 51 - iter 16/21 - loss 0.06024768 - time (sec): 7.31 - samples/sec: 2215.94 - lr: 0.001563
2023-04-20 22:21:40,477 epoch 51 - i

100%|██████████| 5/5 [00:01<00:00,  4.56it/s]

2023-04-20 22:21:43,384 Evaluating as a multi-label problem: False
2023-04-20 22:21:43,401 DEV : loss 0.10815875977277756 - f1-score (micro avg)  0.7226
2023-04-20 22:21:43,422 Epoch    51: reducing learning rate of group 0 to 7.8125e-04.
2023-04-20 22:21:43,423 BAD EPOCHS (no improvement): 4
2023-04-20 22:21:43,432 ----------------------------------------------------------------------------------------------------





2023-04-20 22:21:43,978 epoch 52 - iter 2/21 - loss 0.07386448 - time (sec): 0.54 - samples/sec: 3859.90 - lr: 0.000781
2023-04-20 22:21:45,125 epoch 52 - iter 4/21 - loss 0.06523649 - time (sec): 1.69 - samples/sec: 2407.86 - lr: 0.000781
2023-04-20 22:21:46,307 epoch 52 - iter 6/21 - loss 0.05930482 - time (sec): 2.87 - samples/sec: 2158.14 - lr: 0.000781
2023-04-20 22:21:47,450 epoch 52 - iter 8/21 - loss 0.05878322 - time (sec): 4.01 - samples/sec: 2055.33 - lr: 0.000781
2023-04-20 22:21:48,787 epoch 52 - iter 10/21 - loss 0.05788888 - time (sec): 5.35 - samples/sec: 1927.59 - lr: 0.000781
2023-04-20 22:21:49,793 epoch 52 - iter 12/21 - loss 0.05707650 - time (sec): 6.36 - samples/sec: 1998.86 - lr: 0.000781
2023-04-20 22:21:50,713 epoch 52 - iter 14/21 - loss 0.05932807 - time (sec): 7.28 - samples/sec: 2030.15 - lr: 0.000781
2023-04-20 22:21:51,660 epoch 52 - iter 16/21 - loss 0.06023523 - time (sec): 8.23 - samples/sec: 2036.27 - lr: 0.000781
2023-04-20 22:21:52,630 epoch 52 - i

100%|██████████| 5/5 [00:01<00:00,  4.65it/s]

2023-04-20 22:21:55,427 Evaluating as a multi-label problem: False
2023-04-20 22:21:55,442 DEV : loss 0.10860446840524673 - f1-score (micro avg)  0.7287
2023-04-20 22:21:55,466 BAD EPOCHS (no improvement): 1
2023-04-20 22:21:55,471 ----------------------------------------------------------------------------------------------------





2023-04-20 22:21:55,824 epoch 53 - iter 2/21 - loss 0.07109745 - time (sec): 0.35 - samples/sec: 5736.38 - lr: 0.000781
2023-04-20 22:21:56,781 epoch 53 - iter 4/21 - loss 0.05602716 - time (sec): 1.31 - samples/sec: 3087.15 - lr: 0.000781
2023-04-20 22:21:57,770 epoch 53 - iter 6/21 - loss 0.06098810 - time (sec): 2.30 - samples/sec: 2743.71 - lr: 0.000781
2023-04-20 22:21:58,685 epoch 53 - iter 8/21 - loss 0.06615426 - time (sec): 3.21 - samples/sec: 2575.50 - lr: 0.000781
2023-04-20 22:21:59,792 epoch 53 - iter 10/21 - loss 0.06293699 - time (sec): 4.32 - samples/sec: 2394.34 - lr: 0.000781
2023-04-20 22:22:00,962 epoch 53 - iter 12/21 - loss 0.06431712 - time (sec): 5.49 - samples/sec: 2255.51 - lr: 0.000781
2023-04-20 22:22:02,108 epoch 53 - iter 14/21 - loss 0.06380190 - time (sec): 6.63 - samples/sec: 2162.71 - lr: 0.000781
2023-04-20 22:22:03,366 epoch 53 - iter 16/21 - loss 0.06135458 - time (sec): 7.89 - samples/sec: 2112.70 - lr: 0.000781
2023-04-20 22:22:04,450 epoch 53 - i

100%|██████████| 5/5 [00:01<00:00,  4.61it/s]

2023-04-20 22:22:07,179 Evaluating as a multi-label problem: False
2023-04-20 22:22:07,195 DEV : loss 0.10886572301387787 - f1-score (micro avg)  0.7287
2023-04-20 22:22:07,218 BAD EPOCHS (no improvement): 2
2023-04-20 22:22:07,226 ----------------------------------------------------------------------------------------------------





2023-04-20 22:22:07,607 epoch 54 - iter 2/21 - loss 0.07343415 - time (sec): 0.38 - samples/sec: 4452.47 - lr: 0.000781
2023-04-20 22:22:08,582 epoch 54 - iter 4/21 - loss 0.07865145 - time (sec): 1.35 - samples/sec: 2730.54 - lr: 0.000781
2023-04-20 22:22:09,512 epoch 54 - iter 6/21 - loss 0.07676841 - time (sec): 2.28 - samples/sec: 2533.56 - lr: 0.000781
2023-04-20 22:22:10,432 epoch 54 - iter 8/21 - loss 0.07674850 - time (sec): 3.20 - samples/sec: 2442.56 - lr: 0.000781
2023-04-20 22:22:11,377 epoch 54 - iter 10/21 - loss 0.07470110 - time (sec): 4.15 - samples/sec: 2398.50 - lr: 0.000781
2023-04-20 22:22:12,322 epoch 54 - iter 12/21 - loss 0.07510006 - time (sec): 5.09 - samples/sec: 2399.09 - lr: 0.000781
2023-04-20 22:22:13,357 epoch 54 - iter 14/21 - loss 0.07283255 - time (sec): 6.13 - samples/sec: 2338.05 - lr: 0.000781
2023-04-20 22:22:14,254 epoch 54 - iter 16/21 - loss 0.07062074 - time (sec): 7.03 - samples/sec: 2298.18 - lr: 0.000781
2023-04-20 22:22:15,480 epoch 54 - i

100%|██████████| 5/5 [00:01<00:00,  2.84it/s]

2023-04-20 22:22:19,325 Evaluating as a multi-label problem: False
2023-04-20 22:22:19,355 DEV : loss 0.10884182155132294 - f1-score (micro avg)  0.7256
2023-04-20 22:22:19,393 BAD EPOCHS (no improvement): 3
2023-04-20 22:22:19,401 ----------------------------------------------------------------------------------------------------





2023-04-20 22:22:19,874 epoch 55 - iter 2/21 - loss 0.06036882 - time (sec): 0.47 - samples/sec: 4130.53 - lr: 0.000781
2023-04-20 22:22:20,823 epoch 55 - iter 4/21 - loss 0.05529091 - time (sec): 1.42 - samples/sec: 2699.06 - lr: 0.000781
2023-04-20 22:22:21,806 epoch 55 - iter 6/21 - loss 0.06023317 - time (sec): 2.40 - samples/sec: 2533.03 - lr: 0.000781
2023-04-20 22:22:22,741 epoch 55 - iter 8/21 - loss 0.06499147 - time (sec): 3.34 - samples/sec: 2436.67 - lr: 0.000781
2023-04-20 22:22:23,661 epoch 55 - iter 10/21 - loss 0.06396766 - time (sec): 4.26 - samples/sec: 2406.40 - lr: 0.000781
2023-04-20 22:22:24,617 epoch 55 - iter 12/21 - loss 0.05985395 - time (sec): 5.21 - samples/sec: 2414.44 - lr: 0.000781
2023-04-20 22:22:25,569 epoch 55 - iter 14/21 - loss 0.06049652 - time (sec): 6.17 - samples/sec: 2339.82 - lr: 0.000781
2023-04-20 22:22:26,490 epoch 55 - iter 16/21 - loss 0.06132296 - time (sec): 7.09 - samples/sec: 2316.93 - lr: 0.000781
2023-04-20 22:22:27,495 epoch 55 - i

100%|██████████| 5/5 [00:02<00:00,  1.80it/s]

2023-04-20 22:22:32,056 Evaluating as a multi-label problem: False
2023-04-20 22:22:32,077 DEV : loss 0.10895342379808426 - f1-score (micro avg)  0.7256
2023-04-20 22:22:32,118 Epoch    55: reducing learning rate of group 0 to 3.9063e-04.
2023-04-20 22:22:32,128 BAD EPOCHS (no improvement): 4
2023-04-20 22:22:32,133 ----------------------------------------------------------------------------------------------------





2023-04-20 22:22:32,661 epoch 56 - iter 2/21 - loss 0.07476916 - time (sec): 0.52 - samples/sec: 3730.91 - lr: 0.000391
2023-04-20 22:22:33,963 epoch 56 - iter 4/21 - loss 0.06929781 - time (sec): 1.83 - samples/sec: 2159.34 - lr: 0.000391
2023-04-20 22:22:35,171 epoch 56 - iter 6/21 - loss 0.06383289 - time (sec): 3.03 - samples/sec: 2137.85 - lr: 0.000391
2023-04-20 22:22:36,128 epoch 56 - iter 8/21 - loss 0.06371499 - time (sec): 3.99 - samples/sec: 2162.77 - lr: 0.000391
2023-04-20 22:22:37,115 epoch 56 - iter 10/21 - loss 0.06432588 - time (sec): 4.98 - samples/sec: 2167.49 - lr: 0.000391
2023-04-20 22:22:38,071 epoch 56 - iter 12/21 - loss 0.06579928 - time (sec): 5.93 - samples/sec: 2178.12 - lr: 0.000391
2023-04-20 22:22:39,024 epoch 56 - iter 14/21 - loss 0.06361464 - time (sec): 6.89 - samples/sec: 2171.81 - lr: 0.000391
2023-04-20 22:22:39,953 epoch 56 - iter 16/21 - loss 0.06438907 - time (sec): 7.82 - samples/sec: 2160.98 - lr: 0.000391
2023-04-20 22:22:40,853 epoch 56 - i

100%|██████████| 5/5 [00:01<00:00,  4.47it/s]

2023-04-20 22:22:43,660 Evaluating as a multi-label problem: False
2023-04-20 22:22:43,678 DEV : loss 0.10893312096595764 - f1-score (micro avg)  0.7298
2023-04-20 22:22:43,704 BAD EPOCHS (no improvement): 1
2023-04-20 22:22:43,708 ----------------------------------------------------------------------------------------------------





2023-04-20 22:22:44,172 epoch 57 - iter 2/21 - loss 0.07413840 - time (sec): 0.46 - samples/sec: 4441.28 - lr: 0.000391
2023-04-20 22:22:45,205 epoch 57 - iter 4/21 - loss 0.07708790 - time (sec): 1.50 - samples/sec: 2556.97 - lr: 0.000391
2023-04-20 22:22:46,281 epoch 57 - iter 6/21 - loss 0.07097870 - time (sec): 2.57 - samples/sec: 2170.78 - lr: 0.000391
2023-04-20 22:22:47,437 epoch 57 - iter 8/21 - loss 0.07410696 - time (sec): 3.73 - samples/sec: 1994.33 - lr: 0.000391
2023-04-20 22:22:48,609 epoch 57 - iter 10/21 - loss 0.06641541 - time (sec): 4.90 - samples/sec: 1957.84 - lr: 0.000391
2023-04-20 22:22:49,851 epoch 57 - iter 12/21 - loss 0.06630143 - time (sec): 6.14 - samples/sec: 1909.94 - lr: 0.000391
2023-04-20 22:22:50,893 epoch 57 - iter 14/21 - loss 0.06231237 - time (sec): 7.18 - samples/sec: 1903.28 - lr: 0.000391
2023-04-20 22:22:51,914 epoch 57 - iter 16/21 - loss 0.06366358 - time (sec): 8.20 - samples/sec: 1953.34 - lr: 0.000391
2023-04-20 22:22:52,876 epoch 57 - i

100%|██████████| 5/5 [00:01<00:00,  4.50it/s]

2023-04-20 22:22:55,820 Evaluating as a multi-label problem: False
2023-04-20 22:22:55,837 DEV : loss 0.10883799195289612 - f1-score (micro avg)  0.7287
2023-04-20 22:22:55,859 BAD EPOCHS (no improvement): 2
2023-04-20 22:22:55,862 ----------------------------------------------------------------------------------------------------





2023-04-20 22:22:56,219 epoch 58 - iter 2/21 - loss 0.06544057 - time (sec): 0.35 - samples/sec: 5567.74 - lr: 0.000391
2023-04-20 22:22:57,217 epoch 58 - iter 4/21 - loss 0.06873423 - time (sec): 1.35 - samples/sec: 3047.50 - lr: 0.000391
2023-04-20 22:22:58,195 epoch 58 - iter 6/21 - loss 0.06976033 - time (sec): 2.33 - samples/sec: 2633.34 - lr: 0.000391
2023-04-20 22:22:59,170 epoch 58 - iter 8/21 - loss 0.07271011 - time (sec): 3.31 - samples/sec: 2552.94 - lr: 0.000391
2023-04-20 22:23:00,193 epoch 58 - iter 10/21 - loss 0.07119176 - time (sec): 4.33 - samples/sec: 2427.62 - lr: 0.000391
2023-04-20 22:23:01,316 epoch 58 - iter 12/21 - loss 0.06928591 - time (sec): 5.45 - samples/sec: 2262.82 - lr: 0.000391
2023-04-20 22:23:02,497 epoch 58 - iter 14/21 - loss 0.06974051 - time (sec): 6.63 - samples/sec: 2173.87 - lr: 0.000391
2023-04-20 22:23:03,679 epoch 58 - iter 16/21 - loss 0.06958881 - time (sec): 7.82 - samples/sec: 2111.05 - lr: 0.000391
2023-04-20 22:23:04,829 epoch 58 - i

100%|██████████| 5/5 [00:01<00:00,  4.47it/s]

2023-04-20 22:23:07,818 Evaluating as a multi-label problem: False
2023-04-20 22:23:07,835 DEV : loss 0.10887116938829422 - f1-score (micro avg)  0.7256
2023-04-20 22:23:07,857 BAD EPOCHS (no improvement): 3
2023-04-20 22:23:07,862 ----------------------------------------------------------------------------------------------------





2023-04-20 22:23:08,318 epoch 59 - iter 2/21 - loss 0.05750715 - time (sec): 0.45 - samples/sec: 4240.80 - lr: 0.000391
2023-04-20 22:23:09,272 epoch 59 - iter 4/21 - loss 0.06153569 - time (sec): 1.41 - samples/sec: 2832.38 - lr: 0.000391
2023-04-20 22:23:10,220 epoch 59 - iter 6/21 - loss 0.06576493 - time (sec): 2.35 - samples/sec: 2673.95 - lr: 0.000391
2023-04-20 22:23:11,184 epoch 59 - iter 8/21 - loss 0.06175904 - time (sec): 3.32 - samples/sec: 2499.57 - lr: 0.000391
2023-04-20 22:23:12,162 epoch 59 - iter 10/21 - loss 0.06125935 - time (sec): 4.30 - samples/sec: 2450.34 - lr: 0.000391
2023-04-20 22:23:13,131 epoch 59 - iter 12/21 - loss 0.06095530 - time (sec): 5.27 - samples/sec: 2425.91 - lr: 0.000391
2023-04-20 22:23:14,042 epoch 59 - iter 14/21 - loss 0.06157448 - time (sec): 6.18 - samples/sec: 2357.37 - lr: 0.000391
2023-04-20 22:23:14,997 epoch 59 - iter 16/21 - loss 0.06416249 - time (sec): 7.13 - samples/sec: 2338.62 - lr: 0.000391
2023-04-20 22:23:16,033 epoch 59 - i

100%|██████████| 5/5 [00:01<00:00,  2.83it/s]

2023-04-20 22:23:19,822 Evaluating as a multi-label problem: False
2023-04-20 22:23:19,845 DEV : loss 0.10904043167829514 - f1-score (micro avg)  0.7256
2023-04-20 22:23:19,879 Epoch    59: reducing learning rate of group 0 to 1.9531e-04.
2023-04-20 22:23:19,881 BAD EPOCHS (no improvement): 4
2023-04-20 22:23:19,890 ----------------------------------------------------------------------------------------------------





2023-04-20 22:23:20,430 epoch 60 - iter 2/21 - loss 0.06606651 - time (sec): 0.54 - samples/sec: 3401.13 - lr: 0.000195
2023-04-20 22:23:21,536 epoch 60 - iter 4/21 - loss 0.08482649 - time (sec): 1.64 - samples/sec: 2346.69 - lr: 0.000195
2023-04-20 22:23:22,519 epoch 60 - iter 6/21 - loss 0.07284880 - time (sec): 2.63 - samples/sec: 2237.10 - lr: 0.000195
2023-04-20 22:23:23,582 epoch 60 - iter 8/21 - loss 0.07145456 - time (sec): 3.69 - samples/sec: 2238.62 - lr: 0.000195
2023-04-20 22:23:24,541 epoch 60 - iter 10/21 - loss 0.06705099 - time (sec): 4.65 - samples/sec: 2232.12 - lr: 0.000195
2023-04-20 22:23:25,479 epoch 60 - iter 12/21 - loss 0.06659517 - time (sec): 5.59 - samples/sec: 2214.12 - lr: 0.000195
2023-04-20 22:23:26,458 epoch 60 - iter 14/21 - loss 0.06955512 - time (sec): 6.57 - samples/sec: 2212.27 - lr: 0.000195
2023-04-20 22:23:27,403 epoch 60 - iter 16/21 - loss 0.07112152 - time (sec): 7.51 - samples/sec: 2202.47 - lr: 0.000195
2023-04-20 22:23:28,349 epoch 60 - i

100%|██████████| 5/5 [00:01<00:00,  4.52it/s]

2023-04-20 22:23:31,184 Evaluating as a multi-label problem: False
2023-04-20 22:23:31,202 DEV : loss 0.10906270146369934 - f1-score (micro avg)  0.7256
2023-04-20 22:23:31,226 BAD EPOCHS (no improvement): 1
2023-04-20 22:23:31,230 ----------------------------------------------------------------------------------------------------





2023-04-20 22:23:31,773 epoch 61 - iter 2/21 - loss 0.06630674 - time (sec): 0.54 - samples/sec: 3701.44 - lr: 0.000195
2023-04-20 22:23:32,955 epoch 61 - iter 4/21 - loss 0.06649452 - time (sec): 1.72 - samples/sec: 2337.75 - lr: 0.000195
2023-04-20 22:23:34,054 epoch 61 - iter 6/21 - loss 0.06439203 - time (sec): 2.82 - samples/sec: 2164.12 - lr: 0.000195
2023-04-20 22:23:35,264 epoch 61 - iter 8/21 - loss 0.06555776 - time (sec): 4.03 - samples/sec: 2006.62 - lr: 0.000195
2023-04-20 22:23:36,501 epoch 61 - iter 10/21 - loss 0.06359073 - time (sec): 5.27 - samples/sec: 1907.56 - lr: 0.000195
2023-04-20 22:23:37,496 epoch 61 - iter 12/21 - loss 0.06261051 - time (sec): 6.26 - samples/sec: 1927.16 - lr: 0.000195
2023-04-20 22:23:38,430 epoch 61 - iter 14/21 - loss 0.06651399 - time (sec): 7.20 - samples/sec: 1987.14 - lr: 0.000195
2023-04-20 22:23:39,376 epoch 61 - iter 16/21 - loss 0.06561061 - time (sec): 8.14 - samples/sec: 2019.43 - lr: 0.000195
2023-04-20 22:23:40,366 epoch 61 - i

100%|██████████| 5/5 [00:01<00:00,  4.54it/s]

2023-04-20 22:23:43,128 Evaluating as a multi-label problem: False
2023-04-20 22:23:43,143 DEV : loss 0.10907940566539764 - f1-score (micro avg)  0.7256
2023-04-20 22:23:43,166 BAD EPOCHS (no improvement): 2
2023-04-20 22:23:43,171 ----------------------------------------------------------------------------------------------------





2023-04-20 22:23:43,535 epoch 62 - iter 2/21 - loss 0.05653128 - time (sec): 0.36 - samples/sec: 5367.95 - lr: 0.000195
2023-04-20 22:23:44,522 epoch 62 - iter 4/21 - loss 0.06141218 - time (sec): 1.35 - samples/sec: 3043.60 - lr: 0.000195
2023-04-20 22:23:45,444 epoch 62 - iter 6/21 - loss 0.06595212 - time (sec): 2.27 - samples/sec: 2606.06 - lr: 0.000195
2023-04-20 22:23:46,424 epoch 62 - iter 8/21 - loss 0.06874356 - time (sec): 3.25 - samples/sec: 2445.46 - lr: 0.000195
2023-04-20 22:23:47,553 epoch 62 - iter 10/21 - loss 0.06964512 - time (sec): 4.38 - samples/sec: 2273.51 - lr: 0.000195
2023-04-20 22:23:48,686 epoch 62 - iter 12/21 - loss 0.06856941 - time (sec): 5.51 - samples/sec: 2199.39 - lr: 0.000195
2023-04-20 22:23:49,946 epoch 62 - iter 14/21 - loss 0.06652971 - time (sec): 6.77 - samples/sec: 2080.93 - lr: 0.000195
2023-04-20 22:23:51,142 epoch 62 - iter 16/21 - loss 0.06628651 - time (sec): 7.97 - samples/sec: 2026.14 - lr: 0.000195
2023-04-20 22:23:52,375 epoch 62 - i

100%|██████████| 5/5 [00:01<00:00,  4.49it/s]

2023-04-20 22:23:55,235 Evaluating as a multi-label problem: False
2023-04-20 22:23:55,255 DEV : loss 0.10901211947202682 - f1-score (micro avg)  0.7226
2023-04-20 22:23:55,276 BAD EPOCHS (no improvement): 3
2023-04-20 22:23:55,279 ----------------------------------------------------------------------------------------------------





2023-04-20 22:23:55,732 epoch 63 - iter 2/21 - loss 0.06304908 - time (sec): 0.45 - samples/sec: 4656.06 - lr: 0.000195
2023-04-20 22:23:56,690 epoch 63 - iter 4/21 - loss 0.05577938 - time (sec): 1.41 - samples/sec: 2763.20 - lr: 0.000195
2023-04-20 22:23:57,582 epoch 63 - iter 6/21 - loss 0.05390656 - time (sec): 2.30 - samples/sec: 2493.54 - lr: 0.000195
2023-04-20 22:23:58,518 epoch 63 - iter 8/21 - loss 0.06083940 - time (sec): 3.23 - samples/sec: 2461.76 - lr: 0.000195
2023-04-20 22:23:59,471 epoch 63 - iter 10/21 - loss 0.06447868 - time (sec): 4.19 - samples/sec: 2418.16 - lr: 0.000195
2023-04-20 22:24:00,441 epoch 63 - iter 12/21 - loss 0.06339747 - time (sec): 5.16 - samples/sec: 2367.93 - lr: 0.000195
2023-04-20 22:24:01,389 epoch 63 - iter 14/21 - loss 0.06505528 - time (sec): 6.11 - samples/sec: 2323.99 - lr: 0.000195
2023-04-20 22:24:02,510 epoch 63 - iter 16/21 - loss 0.06496837 - time (sec): 7.23 - samples/sec: 2255.13 - lr: 0.000195
2023-04-20 22:24:03,713 epoch 63 - i

100%|██████████| 5/5 [00:01<00:00,  2.91it/s]

2023-04-20 22:24:07,498 Evaluating as a multi-label problem: False
2023-04-20 22:24:07,514 DEV : loss 0.1090441420674324 - f1-score (micro avg)  0.7256
2023-04-20 22:24:07,535 Epoch    63: reducing learning rate of group 0 to 9.7656e-05.
2023-04-20 22:24:07,537 BAD EPOCHS (no improvement): 4
2023-04-20 22:24:07,543 ----------------------------------------------------------------------------------------------------
2023-04-20 22:24:07,544 ----------------------------------------------------------------------------------------------------
2023-04-20 22:24:07,546 learning rate too small - quitting training!
2023-04-20 22:24:07,548 ----------------------------------------------------------------------------------------------------





2023-04-20 22:24:09,286 ----------------------------------------------------------------------------------------------------
2023-04-20 22:24:11,248 SequenceTagger predicts: Dictionary with 19 tags: O, S-ORG, B-ORG, E-ORG, I-ORG, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-MISC, B-MISC, E-MISC, I-MISC, <START>, <STOP>


100%|██████████| 4/4 [00:02<00:00,  1.87it/s]

2023-04-20 22:24:13,780 Evaluating as a multi-label problem: False
2023-04-20 22:24:13,796 0.8085	0.7944	0.8014	0.7451
2023-04-20 22:24:13,801 
Results:
- F-score (micro) 0.8014
- F-score (macro) 0.6916
- Accuracy 0.7451

By class:
              precision    recall  f1-score   support

         ORG     0.7857    0.8919    0.8354       111
         LOC     0.7381    0.6667    0.7006        93
         PER     0.9697    0.9697    0.9697        66
        MISC     0.5000    0.1765    0.2609        17

   micro avg     0.8085    0.7944    0.8014       287
   macro avg     0.7484    0.6762    0.6916       287
weighted avg     0.7957    0.7944    0.7886       287

2023-04-20 22:24:13,807 ----------------------------------------------------------------------------------------------------





{'test_score': 0.8014059753954306,
 'dev_score_history': [0.08602150537634409,
  0.25136612021857924,
  0.2928759894459103,
  0.3731931668856768,
  0.5081723625557206,
  0.4869325997248969,
  0.5255474452554744,
  0.5625920471281297,
  0.5615050651230101,
  0.5985611510791367,
  0.6509572901325478,
  0.6646616541353384,
  0.6111951588502269,
  0.6686838124054463,
  0.684931506849315,
  0.6859756097560974,
  0.7044410413476264,
  0.6597014925373134,
  0.7013782542113323,
  0.7188940092165899,
  0.7001522070015221,
  0.7110438729198185,
  0.6972477064220184,
  0.7187969924812031,
  0.7297709923664122,
  0.7336377473363774,
  0.7305936073059361,
  0.7267175572519083,
  0.7201210287443268,
  0.7267175572519083,
  0.7408536585365855,
  0.729483282674772,
  0.723404255319149,
  0.7225609756097561,
  0.7092846270928463,
  0.7214611872146118,
  0.7173252279635258,
  0.723404255319149,
  0.7283763277693476,
  0.7223065250379364,
  0.7275494672754947,
  0.728658536585366,
  0.7256097560975611,
 

## For German 

In [7]:
from flair.datasets import NER_GERMAN_BIOFID
from flair.embeddings import WordEmbeddings, FlairEmbeddings, StackedEmbeddings
from flair.models import SequenceTagger
from flair.trainers import ModelTrainer

# get the corpus
corpus = NER_GERMAN_BIOFID()
print(corpus)

import flair.datasets
downsampled_corpus = flair.datasets.NER_GERMAN_BIOFID().downsample(0.05) 

print("--- 1 Original ---")
print(corpus)

print("--- 2 Downsampled ---")
print(downsampled_corpus)

# Just to check what we have in the corpus
print(len(downsampled_corpus.train))
print(len(downsampled_corpus.test))
print(len(downsampled_corpus.dev))
sentence=downsampled_corpus.test[3]
print(sentence)
print(downsampled_corpus)

2023-04-21 09:11:26,070 Reading data from /root/.flair/datasets/ner_german_biofid
2023-04-21 09:11:26,072 Train: /root/.flair/datasets/ner_german_biofid/train.conll
2023-04-21 09:11:26,074 Dev: /root/.flair/datasets/ner_german_biofid/dev.conll
2023-04-21 09:11:26,077 Test: /root/.flair/datasets/ner_german_biofid/test.conll
Corpus: 12668 train + 1584 dev + 1584 test sentences
2023-04-21 09:11:46,841 Reading data from /root/.flair/datasets/ner_german_biofid
2023-04-21 09:11:46,845 Train: /root/.flair/datasets/ner_german_biofid/train.conll
2023-04-21 09:11:46,848 Dev: /root/.flair/datasets/ner_german_biofid/dev.conll
2023-04-21 09:11:46,851 Test: /root/.flair/datasets/ner_german_biofid/test.conll
--- 1 Original ---
Corpus: 12668 train + 1584 dev + 1584 test sentences
--- 2 Downsampled ---
Corpus: 633 train + 79 dev + 79 test sentences
633
79
79
Sentence[7]: "Arbeiten aus der Bundesanstalt für Vegetationskartierung ." → ["Arbeiten"/arbeiten/NN, "Arbeiten aus der Bundesanstalt für Vegetatio

In [11]:
# 2. what label do we want to predict?
label_type = 'ner'

# 3. make the label dictionary from the corpus
label_dict = downsampled_corpus.make_label_dictionary(label_type=label_type)
print(label_dict)

# 4. initialize embedding stack with Flair and GloVe
embedding_types = [
    WordEmbeddings('glove'),
    FlairEmbeddings('news-forward'),
    FlairEmbeddings('news-backward'),
]

embeddings = StackedEmbeddings(embeddings=embedding_types)

# 5. initialize sequence tagger
tagger = SequenceTagger(hidden_size=256,
                        embeddings=embeddings,
                        tag_dictionary=label_dict,
                        tag_type=label_type,
                        use_crf=True)
# 6. initialize trainer
trainer = ModelTrainer(tagger, downsampled_corpus)

# 7. start training
trainer.train('/content/drive/My Drive/ColabNotebooks/flairmodels/german',
              learning_rate=0.1,
              mini_batch_size=32,
              max_epochs=100,
              write_weights=True)

2023-04-20 22:25:48,020 Computing label dictionary. Progress:


633it [00:00, 31018.46it/s]

2023-04-20 22:25:48,046 Dictionary created for label 'ner' with 7 values: TAX (seen 731 times), OTHER (seen 318 times), LOC (seen 313 times), PER (seen 219 times), TME (seen 204 times), ORG (seen 49 times)
Dictionary with 7 tags: <unk>, TAX, OTHER, LOC, PER, TME, ORG





2023-04-20 22:25:52,961 SequenceTagger predicts: Dictionary with 25 tags: O, S-TAX, B-TAX, E-TAX, I-TAX, S-OTHER, B-OTHER, E-OTHER, I-OTHER, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-TME, B-TME, E-TME, I-TME, S-ORG, B-ORG, E-ORG, I-ORG
2023-04-20 22:25:53,188 ----------------------------------------------------------------------------------------------------
2023-04-20 22:25:53,192 Model: "SequenceTagger(
  (embeddings): StackedEmbeddings(
    (list_embedding_0): WordEmbeddings(
      'glove'
      (embedding): Embedding(400001, 100)
    )
    (list_embedding_1): FlairEmbeddings(
      (lm): LanguageModel(
        (drop): Dropout(p=0.05, inplace=False)
        (encoder): Embedding(300, 100)
        (rnn): LSTM(100, 2048)
      )
    )
    (list_embedding_2): FlairEmbeddings(
      (lm): LanguageModel(
        (drop): Dropout(p=0.05, inplace=False)
        (encoder): Embedding(300, 100)
        (rnn): LSTM(100, 2048)
      )
    )
  )
  (word_dropout): WordDropout(p=0.05

100%|██████████| 3/3 [00:02<00:00,  1.40it/s]

2023-04-20 22:26:23,074 Evaluating as a multi-label problem: False
2023-04-20 22:26:23,095 DEV : loss 1.6637364625930786 - f1-score (micro avg)  0.0
2023-04-20 22:26:23,110 BAD EPOCHS (no improvement): 0
2023-04-20 22:26:23,116 saving best model





2023-04-20 22:26:25,646 ----------------------------------------------------------------------------------------------------
2023-04-20 22:26:26,469 epoch 2 - iter 2/20 - loss 1.33099865 - time (sec): 0.81 - samples/sec: 1813.18 - lr: 0.100000
2023-04-20 22:26:27,859 epoch 2 - iter 4/20 - loss 1.40727572 - time (sec): 2.20 - samples/sec: 1310.23 - lr: 0.100000
2023-04-20 22:26:31,085 epoch 2 - iter 6/20 - loss 1.38695836 - time (sec): 5.43 - samples/sec: 898.60 - lr: 0.100000
2023-04-20 22:26:32,926 epoch 2 - iter 8/20 - loss 1.36897894 - time (sec): 7.27 - samples/sec: 886.21 - lr: 0.100000
2023-04-20 22:26:34,730 epoch 2 - iter 10/20 - loss 1.34668140 - time (sec): 9.08 - samples/sec: 888.43 - lr: 0.100000
2023-04-20 22:26:37,019 epoch 2 - iter 12/20 - loss 1.38739883 - time (sec): 11.36 - samples/sec: 882.68 - lr: 0.100000
2023-04-20 22:26:38,257 epoch 2 - iter 14/20 - loss 1.37962090 - time (sec): 12.60 - samples/sec: 896.47 - lr: 0.100000
2023-04-20 22:26:39,562 epoch 2 - iter 16/

100%|██████████| 3/3 [00:00<00:00,  4.86it/s]

2023-04-20 22:26:42,839 Evaluating as a multi-label problem: False
2023-04-20 22:26:42,853 DEV : loss 1.4584163427352905 - f1-score (micro avg)  0.0
2023-04-20 22:26:42,865 BAD EPOCHS (no improvement): 0
2023-04-20 22:26:42,870 ----------------------------------------------------------------------------------------------------





2023-04-20 22:26:43,216 epoch 3 - iter 2/20 - loss 1.11701227 - time (sec): 0.34 - samples/sec: 4336.32 - lr: 0.100000
2023-04-20 22:26:44,556 epoch 3 - iter 4/20 - loss 1.16479516 - time (sec): 1.68 - samples/sec: 1792.80 - lr: 0.100000
2023-04-20 22:26:45,895 epoch 3 - iter 6/20 - loss 1.18602279 - time (sec): 3.02 - samples/sec: 1490.29 - lr: 0.100000
2023-04-20 22:26:47,122 epoch 3 - iter 8/20 - loss 1.14187951 - time (sec): 4.25 - samples/sec: 1415.75 - lr: 0.100000
2023-04-20 22:26:49,187 epoch 3 - iter 10/20 - loss 1.11111615 - time (sec): 6.31 - samples/sec: 1193.60 - lr: 0.100000
2023-04-20 22:26:50,858 epoch 3 - iter 12/20 - loss 1.10995555 - time (sec): 7.98 - samples/sec: 1160.54 - lr: 0.100000
2023-04-20 22:26:52,060 epoch 3 - iter 14/20 - loss 1.10486768 - time (sec): 9.19 - samples/sec: 1188.45 - lr: 0.100000
2023-04-20 22:26:53,111 epoch 3 - iter 16/20 - loss 1.14714904 - time (sec): 10.24 - samples/sec: 1227.87 - lr: 0.100000
2023-04-20 22:26:54,137 epoch 3 - iter 18/2

100%|██████████| 3/3 [00:00<00:00,  5.83it/s]

2023-04-20 22:26:56,882 Evaluating as a multi-label problem: False
2023-04-20 22:26:56,896 DEV : loss 1.354112148284912 - f1-score (micro avg)  0.0
2023-04-20 22:26:56,906 BAD EPOCHS (no improvement): 0
2023-04-20 22:26:56,911 ----------------------------------------------------------------------------------------------------





2023-04-20 22:27:04,969 epoch 4 - iter 2/20 - loss 1.15561781 - time (sec): 8.05 - samples/sec: 183.26 - lr: 0.100000
2023-04-20 22:27:06,163 epoch 4 - iter 4/20 - loss 1.13396787 - time (sec): 9.25 - samples/sec: 315.95 - lr: 0.100000
2023-04-20 22:27:07,912 epoch 4 - iter 6/20 - loss 1.16047381 - time (sec): 11.00 - samples/sec: 426.93 - lr: 0.100000
2023-04-20 22:27:08,953 epoch 4 - iter 8/20 - loss 1.17069047 - time (sec): 12.04 - samples/sec: 515.20 - lr: 0.100000
2023-04-20 22:27:09,980 epoch 4 - iter 10/20 - loss 1.17001571 - time (sec): 13.06 - samples/sec: 579.49 - lr: 0.100000
2023-04-20 22:27:11,804 epoch 4 - iter 12/20 - loss 1.12981458 - time (sec): 14.89 - samples/sec: 643.77 - lr: 0.100000
2023-04-20 22:27:12,837 epoch 4 - iter 14/20 - loss 1.09774334 - time (sec): 15.92 - samples/sec: 708.83 - lr: 0.100000
2023-04-20 22:27:14,197 epoch 4 - iter 16/20 - loss 1.09548606 - time (sec): 17.28 - samples/sec: 753.36 - lr: 0.100000
2023-04-20 22:27:15,284 epoch 4 - iter 18/20 -

100%|██████████| 3/3 [00:01<00:00,  1.73it/s]

2023-04-20 22:27:19,089 Evaluating as a multi-label problem: False
2023-04-20 22:27:19,112 DEV : loss 1.2260231971740723 - f1-score (micro avg)  0.0498
2023-04-20 22:27:19,128 BAD EPOCHS (no improvement): 0
2023-04-20 22:27:19,135 saving best model





2023-04-20 22:27:22,044 ----------------------------------------------------------------------------------------------------
2023-04-20 22:27:22,936 epoch 5 - iter 2/20 - loss 0.88055528 - time (sec): 0.88 - samples/sec: 1991.98 - lr: 0.100000
2023-04-20 22:27:24,332 epoch 5 - iter 4/20 - loss 0.96354632 - time (sec): 2.28 - samples/sec: 1467.54 - lr: 0.100000
2023-04-20 22:27:25,643 epoch 5 - iter 6/20 - loss 1.01514324 - time (sec): 3.59 - samples/sec: 1334.82 - lr: 0.100000
2023-04-20 22:27:27,724 epoch 5 - iter 8/20 - loss 1.05080859 - time (sec): 5.67 - samples/sec: 1178.35 - lr: 0.100000
2023-04-20 22:27:29,518 epoch 5 - iter 10/20 - loss 1.01501933 - time (sec): 7.46 - samples/sec: 1104.05 - lr: 0.100000
2023-04-20 22:27:30,689 epoch 5 - iter 12/20 - loss 1.01009393 - time (sec): 8.63 - samples/sec: 1112.68 - lr: 0.100000
2023-04-20 22:27:33,064 epoch 5 - iter 14/20 - loss 0.96841043 - time (sec): 11.01 - samples/sec: 1048.76 - lr: 0.100000
2023-04-20 22:27:34,271 epoch 5 - iter

100%|██████████| 3/3 [00:00<00:00,  4.79it/s]

2023-04-20 22:27:38,023 Evaluating as a multi-label problem: False
2023-04-20 22:27:38,037 DEV : loss 1.154377818107605 - f1-score (micro avg)  0.1145
2023-04-20 22:27:38,045 BAD EPOCHS (no improvement): 0
2023-04-20 22:27:38,051 saving best model





2023-04-20 22:27:40,341 ----------------------------------------------------------------------------------------------------
2023-04-20 22:27:41,028 epoch 6 - iter 2/20 - loss 0.79522860 - time (sec): 0.68 - samples/sec: 2942.79 - lr: 0.100000
2023-04-20 22:27:42,084 epoch 6 - iter 4/20 - loss 0.89515607 - time (sec): 1.74 - samples/sec: 2028.14 - lr: 0.100000
2023-04-20 22:27:43,224 epoch 6 - iter 6/20 - loss 0.87847674 - time (sec): 2.88 - samples/sec: 1713.03 - lr: 0.100000
2023-04-20 22:27:44,450 epoch 6 - iter 8/20 - loss 0.89158993 - time (sec): 4.10 - samples/sec: 1517.95 - lr: 0.100000
2023-04-20 22:27:45,596 epoch 6 - iter 10/20 - loss 0.89059227 - time (sec): 5.25 - samples/sec: 1464.53 - lr: 0.100000
2023-04-20 22:27:47,067 epoch 6 - iter 12/20 - loss 0.89019865 - time (sec): 6.72 - samples/sec: 1350.58 - lr: 0.100000
2023-04-20 22:27:49,227 epoch 6 - iter 14/20 - loss 0.91085868 - time (sec): 8.88 - samples/sec: 1202.50 - lr: 0.100000
2023-04-20 22:27:51,772 epoch 6 - iter 

100%|██████████| 3/3 [00:00<00:00,  5.18it/s]

2023-04-20 22:27:56,788 Evaluating as a multi-label problem: False
2023-04-20 22:27:56,803 DEV : loss 1.056619644165039 - f1-score (micro avg)  0.1919
2023-04-20 22:27:56,812 BAD EPOCHS (no improvement): 0
2023-04-20 22:27:56,816 saving best model





2023-04-20 22:27:58,861 ----------------------------------------------------------------------------------------------------
2023-04-20 22:28:00,138 epoch 7 - iter 2/20 - loss 0.90528603 - time (sec): 1.25 - samples/sec: 1439.41 - lr: 0.100000
2023-04-20 22:28:01,635 epoch 7 - iter 4/20 - loss 0.87196849 - time (sec): 2.75 - samples/sec: 1239.05 - lr: 0.100000
2023-04-20 22:28:02,926 epoch 7 - iter 6/20 - loss 0.87016466 - time (sec): 4.04 - samples/sec: 1219.24 - lr: 0.100000
2023-04-20 22:28:04,802 epoch 7 - iter 8/20 - loss 0.85880338 - time (sec): 5.92 - samples/sec: 1086.74 - lr: 0.100000
2023-04-20 22:28:07,706 epoch 7 - iter 10/20 - loss 0.85528245 - time (sec): 8.82 - samples/sec: 899.62 - lr: 0.100000
2023-04-20 22:28:10,217 epoch 7 - iter 12/20 - loss 0.86989500 - time (sec): 11.33 - samples/sec: 835.08 - lr: 0.100000
2023-04-20 22:28:11,403 epoch 7 - iter 14/20 - loss 0.84026612 - time (sec): 12.52 - samples/sec: 879.05 - lr: 0.100000
2023-04-20 22:28:13,192 epoch 7 - iter 1

100%|██████████| 3/3 [00:00<00:00,  5.67it/s]

2023-04-20 22:28:16,141 Evaluating as a multi-label problem: False
2023-04-20 22:28:16,155 DEV : loss 0.9491550922393799 - f1-score (micro avg)  0.2022
2023-04-20 22:28:16,164 BAD EPOCHS (no improvement): 0
2023-04-20 22:28:16,171 saving best model





2023-04-20 22:28:18,035 ----------------------------------------------------------------------------------------------------
2023-04-20 22:28:18,516 epoch 8 - iter 2/20 - loss 0.80295225 - time (sec): 0.48 - samples/sec: 3481.60 - lr: 0.100000
2023-04-20 22:28:19,617 epoch 8 - iter 4/20 - loss 0.80997638 - time (sec): 1.58 - samples/sec: 1928.04 - lr: 0.100000
2023-04-20 22:28:21,711 epoch 8 - iter 6/20 - loss 0.79356619 - time (sec): 3.67 - samples/sec: 1364.54 - lr: 0.100000
2023-04-20 22:28:22,821 epoch 8 - iter 8/20 - loss 0.83645465 - time (sec): 4.78 - samples/sec: 1338.56 - lr: 0.100000
2023-04-20 22:28:24,057 epoch 8 - iter 10/20 - loss 0.83781213 - time (sec): 6.02 - samples/sec: 1313.03 - lr: 0.100000
2023-04-20 22:28:25,449 epoch 8 - iter 12/20 - loss 0.86057603 - time (sec): 7.41 - samples/sec: 1281.23 - lr: 0.100000
2023-04-20 22:28:27,245 epoch 8 - iter 14/20 - loss 0.85096473 - time (sec): 9.21 - samples/sec: 1245.58 - lr: 0.100000
2023-04-20 22:28:28,576 epoch 8 - iter 

100%|██████████| 3/3 [00:00<00:00,  5.89it/s]

2023-04-20 22:28:31,660 Evaluating as a multi-label problem: False
2023-04-20 22:28:31,675 DEV : loss 0.9722471237182617 - f1-score (micro avg)  0.1914
2023-04-20 22:28:31,684 BAD EPOCHS (no improvement): 1
2023-04-20 22:28:31,689 ----------------------------------------------------------------------------------------------------





2023-04-20 22:28:32,077 epoch 9 - iter 2/20 - loss 0.71561732 - time (sec): 0.38 - samples/sec: 3448.08 - lr: 0.100000
2023-04-20 22:28:33,774 epoch 9 - iter 4/20 - loss 0.76564072 - time (sec): 2.08 - samples/sec: 1582.07 - lr: 0.100000
2023-04-20 22:28:34,989 epoch 9 - iter 6/20 - loss 0.78965760 - time (sec): 3.30 - samples/sec: 1573.91 - lr: 0.100000
2023-04-20 22:28:36,634 epoch 9 - iter 8/20 - loss 0.79364395 - time (sec): 4.94 - samples/sec: 1398.75 - lr: 0.100000
2023-04-20 22:28:37,725 epoch 9 - iter 10/20 - loss 0.76717755 - time (sec): 6.03 - samples/sec: 1381.53 - lr: 0.100000
2023-04-20 22:28:38,888 epoch 9 - iter 12/20 - loss 0.75636667 - time (sec): 7.19 - samples/sec: 1363.48 - lr: 0.100000
2023-04-20 22:28:40,193 epoch 9 - iter 14/20 - loss 0.76183876 - time (sec): 8.50 - samples/sec: 1351.03 - lr: 0.100000
2023-04-20 22:28:41,192 epoch 9 - iter 16/20 - loss 0.76433498 - time (sec): 9.50 - samples/sec: 1373.36 - lr: 0.100000
2023-04-20 22:28:42,103 epoch 9 - iter 18/20

100%|██████████| 3/3 [00:00<00:00,  5.91it/s]

2023-04-20 22:28:44,244 Evaluating as a multi-label problem: False
2023-04-20 22:28:44,259 DEV : loss 0.8437137603759766 - f1-score (micro avg)  0.1745
2023-04-20 22:28:44,268 BAD EPOCHS (no improvement): 2
2023-04-20 22:28:44,274 ----------------------------------------------------------------------------------------------------





2023-04-20 22:28:44,790 epoch 10 - iter 2/20 - loss 0.69348177 - time (sec): 0.51 - samples/sec: 2902.20 - lr: 0.100000
2023-04-20 22:28:46,265 epoch 10 - iter 4/20 - loss 0.70861294 - time (sec): 1.99 - samples/sec: 1703.11 - lr: 0.100000
2023-04-20 22:28:47,251 epoch 10 - iter 6/20 - loss 0.69704446 - time (sec): 2.98 - samples/sec: 1664.48 - lr: 0.100000
2023-04-20 22:28:48,155 epoch 10 - iter 8/20 - loss 0.71719815 - time (sec): 3.88 - samples/sec: 1650.57 - lr: 0.100000
2023-04-20 22:28:49,176 epoch 10 - iter 10/20 - loss 0.72623410 - time (sec): 4.90 - samples/sec: 1654.99 - lr: 0.100000
2023-04-20 22:28:50,150 epoch 10 - iter 12/20 - loss 0.73238311 - time (sec): 5.87 - samples/sec: 1606.16 - lr: 0.100000
2023-04-20 22:28:51,310 epoch 10 - iter 14/20 - loss 0.75007400 - time (sec): 7.03 - samples/sec: 1588.04 - lr: 0.100000
2023-04-20 22:28:53,332 epoch 10 - iter 16/20 - loss 0.73357089 - time (sec): 9.06 - samples/sec: 1441.28 - lr: 0.100000
2023-04-20 22:28:54,473 epoch 10 - i

100%|██████████| 3/3 [00:00<00:00,  5.81it/s]

2023-04-20 22:28:56,804 Evaluating as a multi-label problem: False
2023-04-20 22:28:56,818 DEV : loss 0.7977104187011719 - f1-score (micro avg)  0.2151
2023-04-20 22:28:56,827 BAD EPOCHS (no improvement): 0
2023-04-20 22:28:56,832 saving best model





2023-04-20 22:28:58,663 ----------------------------------------------------------------------------------------------------
2023-04-20 22:28:59,226 epoch 11 - iter 2/20 - loss 0.77776736 - time (sec): 0.56 - samples/sec: 2869.31 - lr: 0.100000
2023-04-20 22:29:00,255 epoch 11 - iter 4/20 - loss 0.74587860 - time (sec): 1.58 - samples/sec: 2078.43 - lr: 0.100000
2023-04-20 22:29:01,596 epoch 11 - iter 6/20 - loss 0.73509002 - time (sec): 2.92 - samples/sec: 1745.69 - lr: 0.100000
2023-04-20 22:29:03,341 epoch 11 - iter 8/20 - loss 0.72404635 - time (sec): 4.67 - samples/sec: 1528.40 - lr: 0.100000
2023-04-20 22:29:04,261 epoch 11 - iter 10/20 - loss 0.70224119 - time (sec): 5.59 - samples/sec: 1561.85 - lr: 0.100000
2023-04-20 22:29:05,162 epoch 11 - iter 12/20 - loss 0.70979435 - time (sec): 6.49 - samples/sec: 1557.63 - lr: 0.100000
2023-04-20 22:29:06,481 epoch 11 - iter 14/20 - loss 0.69709488 - time (sec): 7.81 - samples/sec: 1480.07 - lr: 0.100000
2023-04-20 22:29:08,152 epoch 11

100%|██████████| 3/3 [00:00<00:00,  5.79it/s]

2023-04-20 22:29:12,553 Evaluating as a multi-label problem: False
2023-04-20 22:29:12,568 DEV : loss 0.737370491027832 - f1-score (micro avg)  0.2564
2023-04-20 22:29:12,577 BAD EPOCHS (no improvement): 0
2023-04-20 22:29:12,582 saving best model





2023-04-20 22:29:14,429 ----------------------------------------------------------------------------------------------------
2023-04-20 22:29:14,830 epoch 12 - iter 2/20 - loss 0.74490827 - time (sec): 0.39 - samples/sec: 3600.74 - lr: 0.100000
2023-04-20 22:29:16,275 epoch 12 - iter 4/20 - loss 0.70632111 - time (sec): 1.84 - samples/sec: 1760.10 - lr: 0.100000
2023-04-20 22:29:17,288 epoch 12 - iter 6/20 - loss 0.68470299 - time (sec): 2.85 - samples/sec: 1687.61 - lr: 0.100000
2023-04-20 22:29:18,239 epoch 12 - iter 8/20 - loss 0.66621473 - time (sec): 3.80 - samples/sec: 1641.69 - lr: 0.100000
2023-04-20 22:29:19,238 epoch 12 - iter 10/20 - loss 0.65862940 - time (sec): 4.80 - samples/sec: 1631.24 - lr: 0.100000
2023-04-20 22:29:20,401 epoch 12 - iter 12/20 - loss 0.66807390 - time (sec): 5.96 - samples/sec: 1613.15 - lr: 0.100000
2023-04-20 22:29:21,621 epoch 12 - iter 14/20 - loss 0.65796398 - time (sec): 7.18 - samples/sec: 1565.50 - lr: 0.100000
2023-04-20 22:29:23,418 epoch 12

100%|██████████| 3/3 [00:00<00:00,  5.82it/s]

2023-04-20 22:29:28,402 Evaluating as a multi-label problem: False
2023-04-20 22:29:28,416 DEV : loss 0.7082787156105042 - f1-score (micro avg)  0.1739
2023-04-20 22:29:28,425 BAD EPOCHS (no improvement): 1
2023-04-20 22:29:28,430 ----------------------------------------------------------------------------------------------------





2023-04-20 22:29:28,802 epoch 13 - iter 2/20 - loss 0.62158112 - time (sec): 0.37 - samples/sec: 4387.22 - lr: 0.100000
2023-04-20 22:29:29,750 epoch 13 - iter 4/20 - loss 0.61135894 - time (sec): 1.32 - samples/sec: 2421.21 - lr: 0.100000
2023-04-20 22:29:31,154 epoch 13 - iter 6/20 - loss 0.63799981 - time (sec): 2.72 - samples/sec: 1877.56 - lr: 0.100000
2023-04-20 22:29:32,022 epoch 13 - iter 8/20 - loss 0.64228846 - time (sec): 3.59 - samples/sec: 1772.72 - lr: 0.100000
2023-04-20 22:29:33,714 epoch 13 - iter 10/20 - loss 0.63998232 - time (sec): 5.28 - samples/sec: 1559.17 - lr: 0.100000
2023-04-20 22:29:34,637 epoch 13 - iter 12/20 - loss 0.63989185 - time (sec): 6.20 - samples/sec: 1575.28 - lr: 0.100000
2023-04-20 22:29:35,558 epoch 13 - iter 14/20 - loss 0.63069729 - time (sec): 7.12 - samples/sec: 1585.13 - lr: 0.100000
2023-04-20 22:29:36,736 epoch 13 - iter 16/20 - loss 0.62928920 - time (sec): 8.30 - samples/sec: 1576.55 - lr: 0.100000
2023-04-20 22:29:37,813 epoch 13 - i

100%|██████████| 3/3 [00:00<00:00,  3.89it/s]

2023-04-20 22:29:40,485 Evaluating as a multi-label problem: False
2023-04-20 22:29:40,507 DEV : loss 0.6827486753463745 - f1-score (micro avg)  0.2137
2023-04-20 22:29:40,522 BAD EPOCHS (no improvement): 2
2023-04-20 22:29:40,528 ----------------------------------------------------------------------------------------------------





2023-04-20 22:29:41,206 epoch 14 - iter 2/20 - loss 0.63916892 - time (sec): 0.68 - samples/sec: 2390.12 - lr: 0.100000
2023-04-20 22:29:43,258 epoch 14 - iter 4/20 - loss 0.65864218 - time (sec): 2.73 - samples/sec: 1341.50 - lr: 0.100000
2023-04-20 22:29:44,294 epoch 14 - iter 6/20 - loss 0.67900631 - time (sec): 3.76 - samples/sec: 1350.82 - lr: 0.100000
2023-04-20 22:29:45,283 epoch 14 - iter 8/20 - loss 0.67340223 - time (sec): 4.75 - samples/sec: 1429.93 - lr: 0.100000
2023-04-20 22:29:46,340 epoch 14 - iter 10/20 - loss 0.65004593 - time (sec): 5.81 - samples/sec: 1448.97 - lr: 0.100000
2023-04-20 22:29:47,317 epoch 14 - iter 12/20 - loss 0.63566424 - time (sec): 6.79 - samples/sec: 1448.75 - lr: 0.100000
2023-04-20 22:29:48,287 epoch 14 - iter 14/20 - loss 0.63594047 - time (sec): 7.76 - samples/sec: 1497.72 - lr: 0.100000
2023-04-20 22:29:49,149 epoch 14 - iter 16/20 - loss 0.63543880 - time (sec): 8.62 - samples/sec: 1510.01 - lr: 0.100000
2023-04-20 22:29:50,522 epoch 14 - i

100%|██████████| 3/3 [00:00<00:00,  5.90it/s]

2023-04-20 22:29:52,578 Evaluating as a multi-label problem: False
2023-04-20 22:29:52,592 DEV : loss 0.6971041560173035 - f1-score (micro avg)  0.1975
2023-04-20 22:29:52,602 BAD EPOCHS (no improvement): 3
2023-04-20 22:29:52,608 ----------------------------------------------------------------------------------------------------





2023-04-20 22:29:53,211 epoch 15 - iter 2/20 - loss 0.62596358 - time (sec): 0.60 - samples/sec: 2635.38 - lr: 0.100000
2023-04-20 22:29:54,965 epoch 15 - iter 4/20 - loss 0.61313355 - time (sec): 2.35 - samples/sec: 1551.80 - lr: 0.100000
2023-04-20 22:29:56,138 epoch 15 - iter 6/20 - loss 0.59839754 - time (sec): 3.53 - samples/sec: 1499.39 - lr: 0.100000
2023-04-20 22:29:57,365 epoch 15 - iter 8/20 - loss 0.60127429 - time (sec): 4.75 - samples/sec: 1429.98 - lr: 0.100000
2023-04-20 22:29:58,310 epoch 15 - iter 10/20 - loss 0.60344660 - time (sec): 5.70 - samples/sec: 1399.60 - lr: 0.100000
2023-04-20 22:30:00,110 epoch 15 - iter 12/20 - loss 0.60808677 - time (sec): 7.50 - samples/sec: 1309.99 - lr: 0.100000
2023-04-20 22:30:01,001 epoch 15 - iter 14/20 - loss 0.61401116 - time (sec): 8.39 - samples/sec: 1330.76 - lr: 0.100000
2023-04-20 22:30:02,028 epoch 15 - iter 16/20 - loss 0.60608113 - time (sec): 9.42 - samples/sec: 1364.39 - lr: 0.100000
2023-04-20 22:30:03,129 epoch 15 - i

100%|██████████| 3/3 [00:00<00:00,  5.88it/s]

2023-04-20 22:30:05,219 Evaluating as a multi-label problem: False
2023-04-20 22:30:05,237 DEV : loss 0.6363776922225952 - f1-score (micro avg)  0.297
2023-04-20 22:30:05,248 BAD EPOCHS (no improvement): 0
2023-04-20 22:30:05,254 saving best model





2023-04-20 22:30:07,077 ----------------------------------------------------------------------------------------------------
2023-04-20 22:30:08,358 epoch 16 - iter 2/20 - loss 0.64542523 - time (sec): 1.28 - samples/sec: 1687.52 - lr: 0.100000
2023-04-20 22:30:09,998 epoch 16 - iter 4/20 - loss 0.57192001 - time (sec): 2.92 - samples/sec: 1362.70 - lr: 0.100000
2023-04-20 22:30:11,458 epoch 16 - iter 6/20 - loss 0.55344652 - time (sec): 4.38 - samples/sec: 1328.17 - lr: 0.100000
2023-04-20 22:30:12,601 epoch 16 - iter 8/20 - loss 0.60939159 - time (sec): 5.52 - samples/sec: 1322.06 - lr: 0.100000
2023-04-20 22:30:13,724 epoch 16 - iter 10/20 - loss 0.59508180 - time (sec): 6.64 - samples/sec: 1341.14 - lr: 0.100000
2023-04-20 22:30:14,941 epoch 16 - iter 12/20 - loss 0.59031425 - time (sec): 7.86 - samples/sec: 1323.82 - lr: 0.100000
2023-04-20 22:30:16,205 epoch 16 - iter 14/20 - loss 0.58834186 - time (sec): 9.13 - samples/sec: 1294.03 - lr: 0.100000
2023-04-20 22:30:17,357 epoch 16

100%|██████████| 3/3 [00:00<00:00,  5.91it/s]

2023-04-20 22:30:20,725 Evaluating as a multi-label problem: False
2023-04-20 22:30:20,739 DEV : loss 0.6743456125259399 - f1-score (micro avg)  0.2602
2023-04-20 22:30:20,749 BAD EPOCHS (no improvement): 1
2023-04-20 22:30:20,753 ----------------------------------------------------------------------------------------------------





2023-04-20 22:30:21,894 epoch 17 - iter 2/20 - loss 0.59700945 - time (sec): 1.14 - samples/sec: 1717.34 - lr: 0.100000
2023-04-20 22:30:22,787 epoch 17 - iter 4/20 - loss 0.58065549 - time (sec): 2.03 - samples/sec: 1626.53 - lr: 0.100000
2023-04-20 22:30:23,831 epoch 17 - iter 6/20 - loss 0.59679899 - time (sec): 3.08 - samples/sec: 1560.34 - lr: 0.100000
2023-04-20 22:30:25,088 epoch 17 - iter 8/20 - loss 0.57313320 - time (sec): 4.33 - samples/sec: 1451.15 - lr: 0.100000
2023-04-20 22:30:26,109 epoch 17 - iter 10/20 - loss 0.58987594 - time (sec): 5.35 - samples/sec: 1429.52 - lr: 0.100000
2023-04-20 22:30:27,957 epoch 17 - iter 12/20 - loss 0.58187286 - time (sec): 7.20 - samples/sec: 1315.74 - lr: 0.100000
2023-04-20 22:30:29,046 epoch 17 - iter 14/20 - loss 0.57807638 - time (sec): 8.29 - samples/sec: 1328.27 - lr: 0.100000
2023-04-20 22:30:30,043 epoch 17 - iter 16/20 - loss 0.57943000 - time (sec): 9.29 - samples/sec: 1344.41 - lr: 0.100000
2023-04-20 22:30:31,176 epoch 17 - i

100%|██████████| 3/3 [00:00<00:00,  6.03it/s]

2023-04-20 22:30:33,275 Evaluating as a multi-label problem: False
2023-04-20 22:30:33,290 DEV : loss 0.6569291353225708 - f1-score (micro avg)  0.2591
2023-04-20 22:30:33,299 BAD EPOCHS (no improvement): 2
2023-04-20 22:30:33,307 ----------------------------------------------------------------------------------------------------





2023-04-20 22:30:33,711 epoch 18 - iter 2/20 - loss 0.56906594 - time (sec): 0.40 - samples/sec: 3508.04 - lr: 0.100000
2023-04-20 22:30:34,632 epoch 18 - iter 4/20 - loss 0.55308760 - time (sec): 1.32 - samples/sec: 2202.76 - lr: 0.100000
2023-04-20 22:30:36,144 epoch 18 - iter 6/20 - loss 0.51798860 - time (sec): 2.83 - samples/sec: 1753.59 - lr: 0.100000
2023-04-20 22:30:37,148 epoch 18 - iter 8/20 - loss 0.54348721 - time (sec): 3.84 - samples/sec: 1727.87 - lr: 0.100000
2023-04-20 22:30:38,039 epoch 18 - iter 10/20 - loss 0.53448847 - time (sec): 4.73 - samples/sec: 1715.27 - lr: 0.100000
2023-04-20 22:30:39,214 epoch 18 - iter 12/20 - loss 0.52578852 - time (sec): 5.90 - samples/sec: 1634.98 - lr: 0.100000
2023-04-20 22:30:41,160 epoch 18 - iter 14/20 - loss 0.53031602 - time (sec): 7.85 - samples/sec: 1459.68 - lr: 0.100000
2023-04-20 22:30:42,208 epoch 18 - iter 16/20 - loss 0.52991449 - time (sec): 8.90 - samples/sec: 1437.81 - lr: 0.100000
2023-04-20 22:30:43,538 epoch 18 - i

100%|██████████| 3/3 [00:00<00:00,  5.65it/s]

2023-04-20 22:30:45,728 Evaluating as a multi-label problem: False
2023-04-20 22:30:45,743 DEV : loss 0.6197336316108704 - f1-score (micro avg)  0.2771
2023-04-20 22:30:45,754 BAD EPOCHS (no improvement): 3
2023-04-20 22:30:45,761 ----------------------------------------------------------------------------------------------------





2023-04-20 22:30:46,258 epoch 19 - iter 2/20 - loss 0.61387605 - time (sec): 0.49 - samples/sec: 3995.50 - lr: 0.100000
2023-04-20 22:30:47,309 epoch 19 - iter 4/20 - loss 0.55857745 - time (sec): 1.54 - samples/sec: 2222.91 - lr: 0.100000
2023-04-20 22:30:48,249 epoch 19 - iter 6/20 - loss 0.53701777 - time (sec): 2.48 - samples/sec: 1942.34 - lr: 0.100000
2023-04-20 22:30:49,346 epoch 19 - iter 8/20 - loss 0.51211674 - time (sec): 3.58 - samples/sec: 1818.26 - lr: 0.100000
2023-04-20 22:30:50,216 epoch 19 - iter 10/20 - loss 0.52511117 - time (sec): 4.45 - samples/sec: 1766.50 - lr: 0.100000
2023-04-20 22:30:51,922 epoch 19 - iter 12/20 - loss 0.51790439 - time (sec): 6.15 - samples/sec: 1589.12 - lr: 0.100000
2023-04-20 22:30:52,903 epoch 19 - iter 14/20 - loss 0.51131144 - time (sec): 7.14 - samples/sec: 1600.91 - lr: 0.100000
2023-04-20 22:30:53,858 epoch 19 - iter 16/20 - loss 0.52378253 - time (sec): 8.09 - samples/sec: 1587.53 - lr: 0.100000
2023-04-20 22:30:54,992 epoch 19 - i

100%|██████████| 3/3 [00:00<00:00,  3.83it/s]

2023-04-20 22:30:57,988 Evaluating as a multi-label problem: False
2023-04-20 22:30:58,009 DEV : loss 0.5348063111305237 - f1-score (micro avg)  0.3253





2023-04-20 22:30:58,023 BAD EPOCHS (no improvement): 0
2023-04-20 22:30:58,031 saving best model
2023-04-20 22:31:00,301 ----------------------------------------------------------------------------------------------------
2023-04-20 22:31:01,274 epoch 20 - iter 2/20 - loss 0.44728127 - time (sec): 0.96 - samples/sec: 2085.17 - lr: 0.100000
2023-04-20 22:31:02,315 epoch 20 - iter 4/20 - loss 0.56169753 - time (sec): 2.00 - samples/sec: 1802.64 - lr: 0.100000
2023-04-20 22:31:03,214 epoch 20 - iter 6/20 - loss 0.53934240 - time (sec): 2.90 - samples/sec: 1719.48 - lr: 0.100000
2023-04-20 22:31:04,247 epoch 20 - iter 8/20 - loss 0.52013046 - time (sec): 3.93 - samples/sec: 1658.36 - lr: 0.100000
2023-04-20 22:31:05,170 epoch 20 - iter 10/20 - loss 0.52867340 - time (sec): 4.85 - samples/sec: 1627.07 - lr: 0.100000
2023-04-20 22:31:06,188 epoch 20 - iter 12/20 - loss 0.51099114 - time (sec): 5.87 - samples/sec: 1615.80 - lr: 0.100000
2023-04-20 22:31:07,598 epoch 20 - iter 14/20 - loss 0.5

100%|██████████| 3/3 [00:00<00:00,  3.61it/s]

2023-04-20 22:31:13,556 Evaluating as a multi-label problem: False
2023-04-20 22:31:13,577 DEV : loss 0.6092150211334229 - f1-score (micro avg)  0.3158





2023-04-20 22:31:13,595 BAD EPOCHS (no improvement): 1
2023-04-20 22:31:13,601 ----------------------------------------------------------------------------------------------------
2023-04-20 22:31:14,745 epoch 21 - iter 2/20 - loss 0.36326577 - time (sec): 1.14 - samples/sec: 1906.88 - lr: 0.100000
2023-04-20 22:31:16,688 epoch 21 - iter 4/20 - loss 0.42612825 - time (sec): 3.08 - samples/sec: 1390.42 - lr: 0.100000
2023-04-20 22:31:17,666 epoch 21 - iter 6/20 - loss 0.43217444 - time (sec): 4.06 - samples/sec: 1414.85 - lr: 0.100000
2023-04-20 22:31:18,590 epoch 21 - iter 8/20 - loss 0.44800963 - time (sec): 4.99 - samples/sec: 1441.04 - lr: 0.100000
2023-04-20 22:31:19,548 epoch 21 - iter 10/20 - loss 0.46299481 - time (sec): 5.94 - samples/sec: 1475.42 - lr: 0.100000
2023-04-20 22:31:20,530 epoch 21 - iter 12/20 - loss 0.47480012 - time (sec): 6.93 - samples/sec: 1497.59 - lr: 0.100000
2023-04-20 22:31:21,458 epoch 21 - iter 14/20 - loss 0.48376114 - time (sec): 7.85 - samples/sec: 

100%|██████████| 3/3 [00:00<00:00,  4.47it/s]

2023-04-20 22:31:25,567 Evaluating as a multi-label problem: False
2023-04-20 22:31:25,593 DEV : loss 0.5179629921913147 - f1-score (micro avg)  0.2628





2023-04-20 22:31:25,610 BAD EPOCHS (no improvement): 2
2023-04-20 22:31:25,617 ----------------------------------------------------------------------------------------------------
2023-04-20 22:31:26,327 epoch 22 - iter 2/20 - loss 0.46823225 - time (sec): 0.71 - samples/sec: 2389.50 - lr: 0.100000
2023-04-20 22:31:27,570 epoch 22 - iter 4/20 - loss 0.46189109 - time (sec): 1.95 - samples/sec: 1688.92 - lr: 0.100000
2023-04-20 22:31:28,714 epoch 22 - iter 6/20 - loss 0.48509149 - time (sec): 3.09 - samples/sec: 1619.79 - lr: 0.100000
2023-04-20 22:31:29,802 epoch 22 - iter 8/20 - loss 0.48937673 - time (sec): 4.18 - samples/sec: 1491.19 - lr: 0.100000
2023-04-20 22:31:30,854 epoch 22 - iter 10/20 - loss 0.50014703 - time (sec): 5.23 - samples/sec: 1413.43 - lr: 0.100000
2023-04-20 22:31:31,777 epoch 22 - iter 12/20 - loss 0.49720533 - time (sec): 6.16 - samples/sec: 1451.56 - lr: 0.100000
2023-04-20 22:31:32,881 epoch 22 - iter 14/20 - loss 0.51893333 - time (sec): 7.26 - samples/sec: 

100%|██████████| 3/3 [00:00<00:00,  5.73it/s]

2023-04-20 22:31:38,056 Evaluating as a multi-label problem: False
2023-04-20 22:31:38,074 DEV : loss 0.49626293778419495 - f1-score (micro avg)  0.3484
2023-04-20 22:31:38,085 BAD EPOCHS (no improvement): 0
2023-04-20 22:31:38,089 saving best model





2023-04-20 22:31:40,242 ----------------------------------------------------------------------------------------------------
2023-04-20 22:31:40,759 epoch 23 - iter 2/20 - loss 0.41618902 - time (sec): 0.51 - samples/sec: 2885.87 - lr: 0.100000
2023-04-20 22:31:41,928 epoch 23 - iter 4/20 - loss 0.44595145 - time (sec): 1.68 - samples/sec: 1855.41 - lr: 0.100000
2023-04-20 22:31:43,217 epoch 23 - iter 6/20 - loss 0.43891944 - time (sec): 2.97 - samples/sec: 1621.03 - lr: 0.100000
2023-04-20 22:31:45,407 epoch 23 - iter 8/20 - loss 0.46353027 - time (sec): 5.16 - samples/sec: 1363.63 - lr: 0.100000
2023-04-20 22:31:46,890 epoch 23 - iter 10/20 - loss 0.45195593 - time (sec): 6.64 - samples/sec: 1361.20 - lr: 0.100000
2023-04-20 22:31:48,393 epoch 23 - iter 12/20 - loss 0.45465919 - time (sec): 8.15 - samples/sec: 1285.13 - lr: 0.100000
2023-04-20 22:31:49,643 epoch 23 - iter 14/20 - loss 0.44793915 - time (sec): 9.40 - samples/sec: 1250.81 - lr: 0.100000
2023-04-20 22:31:50,813 epoch 23

100%|██████████| 3/3 [00:00<00:00,  5.91it/s]

2023-04-20 22:31:53,910 Evaluating as a multi-label problem: False
2023-04-20 22:31:53,925 DEV : loss 0.49198031425476074 - f1-score (micro avg)  0.3497
2023-04-20 22:31:53,935 BAD EPOCHS (no improvement): 0
2023-04-20 22:31:53,938 saving best model





2023-04-20 22:31:55,848 ----------------------------------------------------------------------------------------------------
2023-04-20 22:31:57,021 epoch 24 - iter 2/20 - loss 0.49572500 - time (sec): 1.16 - samples/sec: 1780.87 - lr: 0.100000
2023-04-20 22:31:58,152 epoch 24 - iter 4/20 - loss 0.46744254 - time (sec): 2.29 - samples/sec: 1558.15 - lr: 0.100000
2023-04-20 22:31:59,349 epoch 24 - iter 6/20 - loss 0.46986890 - time (sec): 3.49 - samples/sec: 1428.44 - lr: 0.100000
2023-04-20 22:32:00,908 epoch 24 - iter 8/20 - loss 0.44110272 - time (sec): 5.05 - samples/sec: 1290.40 - lr: 0.100000
2023-04-20 22:32:01,970 epoch 24 - iter 10/20 - loss 0.44031112 - time (sec): 6.11 - samples/sec: 1296.05 - lr: 0.100000
2023-04-20 22:32:02,887 epoch 24 - iter 12/20 - loss 0.43643403 - time (sec): 7.03 - samples/sec: 1332.89 - lr: 0.100000
2023-04-20 22:32:04,762 epoch 24 - iter 14/20 - loss 0.44636864 - time (sec): 8.90 - samples/sec: 1252.14 - lr: 0.100000
2023-04-20 22:32:06,172 epoch 24

100%|██████████| 3/3 [00:00<00:00,  5.48it/s]

2023-04-20 22:32:09,797 Evaluating as a multi-label problem: False
2023-04-20 22:32:09,811 DEV : loss 0.5142960548400879 - f1-score (micro avg)  0.3695
2023-04-20 22:32:09,821 BAD EPOCHS (no improvement): 0
2023-04-20 22:32:09,825 saving best model





2023-04-20 22:32:11,751 ----------------------------------------------------------------------------------------------------
2023-04-20 22:32:13,159 epoch 25 - iter 2/20 - loss 0.44791096 - time (sec): 1.40 - samples/sec: 1210.95 - lr: 0.100000
2023-04-20 22:32:14,385 epoch 25 - iter 4/20 - loss 0.47838651 - time (sec): 2.63 - samples/sec: 1319.38 - lr: 0.100000
2023-04-20 22:32:16,317 epoch 25 - iter 6/20 - loss 0.42772296 - time (sec): 4.56 - samples/sec: 1234.00 - lr: 0.100000
2023-04-20 22:32:17,447 epoch 25 - iter 8/20 - loss 0.43126893 - time (sec): 5.69 - samples/sec: 1244.56 - lr: 0.100000
2023-04-20 22:32:18,407 epoch 25 - iter 10/20 - loss 0.43908637 - time (sec): 6.65 - samples/sec: 1272.42 - lr: 0.100000
2023-04-20 22:32:19,750 epoch 25 - iter 12/20 - loss 0.43185706 - time (sec): 7.99 - samples/sec: 1240.70 - lr: 0.100000
2023-04-20 22:32:21,079 epoch 25 - iter 14/20 - loss 0.43938871 - time (sec): 9.32 - samples/sec: 1214.31 - lr: 0.100000
2023-04-20 22:32:22,395 epoch 25

100%|██████████| 3/3 [00:00<00:00,  5.87it/s]

2023-04-20 22:32:25,805 Evaluating as a multi-label problem: False
2023-04-20 22:32:25,821 DEV : loss 0.4645352065563202 - f1-score (micro avg)  0.3958
2023-04-20 22:32:25,832 BAD EPOCHS (no improvement): 0
2023-04-20 22:32:25,837 saving best model





2023-04-20 22:32:27,888 ----------------------------------------------------------------------------------------------------
2023-04-20 22:32:28,944 epoch 26 - iter 2/20 - loss 0.38044096 - time (sec): 1.03 - samples/sec: 1743.14 - lr: 0.100000
2023-04-20 22:32:30,189 epoch 26 - iter 4/20 - loss 0.39968126 - time (sec): 2.28 - samples/sec: 1551.08 - lr: 0.100000
2023-04-20 22:32:31,528 epoch 26 - iter 6/20 - loss 0.40386477 - time (sec): 3.62 - samples/sec: 1436.73 - lr: 0.100000
2023-04-20 22:32:32,673 epoch 26 - iter 8/20 - loss 0.40923269 - time (sec): 4.76 - samples/sec: 1389.51 - lr: 0.100000
2023-04-20 22:32:33,700 epoch 26 - iter 10/20 - loss 0.41852963 - time (sec): 5.79 - samples/sec: 1396.91 - lr: 0.100000
2023-04-20 22:32:34,763 epoch 26 - iter 12/20 - loss 0.43446265 - time (sec): 6.85 - samples/sec: 1366.90 - lr: 0.100000
2023-04-20 22:32:36,090 epoch 26 - iter 14/20 - loss 0.43317759 - time (sec): 8.18 - samples/sec: 1361.87 - lr: 0.100000
2023-04-20 22:32:38,219 epoch 26

100%|██████████| 3/3 [00:00<00:00,  5.94it/s]

2023-04-20 22:32:41,350 Evaluating as a multi-label problem: False
2023-04-20 22:32:41,369 DEV : loss 0.4301404356956482 - f1-score (micro avg)  0.4211
2023-04-20 22:32:41,379 BAD EPOCHS (no improvement): 0
2023-04-20 22:32:41,383 saving best model





2023-04-20 22:32:43,390 ----------------------------------------------------------------------------------------------------
2023-04-20 22:32:44,051 epoch 27 - iter 2/20 - loss 0.41188761 - time (sec): 0.66 - samples/sec: 2294.08 - lr: 0.100000
2023-04-20 22:32:45,835 epoch 27 - iter 4/20 - loss 0.44099879 - time (sec): 2.44 - samples/sec: 1490.32 - lr: 0.100000
2023-04-20 22:32:47,315 epoch 27 - iter 6/20 - loss 0.43173865 - time (sec): 3.92 - samples/sec: 1363.42 - lr: 0.100000
2023-04-20 22:32:49,320 epoch 27 - iter 8/20 - loss 0.44478254 - time (sec): 5.93 - samples/sec: 1223.72 - lr: 0.100000
2023-04-20 22:32:50,487 epoch 27 - iter 10/20 - loss 0.44222001 - time (sec): 7.09 - samples/sec: 1220.51 - lr: 0.100000
2023-04-20 22:32:51,632 epoch 27 - iter 12/20 - loss 0.43483664 - time (sec): 8.24 - samples/sec: 1239.52 - lr: 0.100000
2023-04-20 22:32:52,927 epoch 27 - iter 14/20 - loss 0.42516722 - time (sec): 9.53 - samples/sec: 1220.09 - lr: 0.100000
2023-04-20 22:32:54,110 epoch 27

100%|██████████| 3/3 [00:00<00:00,  5.68it/s]

2023-04-20 22:32:57,260 Evaluating as a multi-label problem: False
2023-04-20 22:32:57,275 DEV : loss 0.45160385966300964 - f1-score (micro avg)  0.4015
2023-04-20 22:32:57,285 BAD EPOCHS (no improvement): 1
2023-04-20 22:32:57,292 ----------------------------------------------------------------------------------------------------





2023-04-20 22:32:57,670 epoch 28 - iter 2/20 - loss 0.41790056 - time (sec): 0.38 - samples/sec: 3900.62 - lr: 0.100000
2023-04-20 22:32:58,667 epoch 28 - iter 4/20 - loss 0.42046061 - time (sec): 1.37 - samples/sec: 2120.36 - lr: 0.100000
2023-04-20 22:33:00,851 epoch 28 - iter 6/20 - loss 0.43127253 - time (sec): 3.56 - samples/sec: 1386.19 - lr: 0.100000
2023-04-20 22:33:02,495 epoch 28 - iter 8/20 - loss 0.41741471 - time (sec): 5.20 - samples/sec: 1290.45 - lr: 0.100000
2023-04-20 22:33:03,706 epoch 28 - iter 10/20 - loss 0.42000073 - time (sec): 6.41 - samples/sec: 1270.83 - lr: 0.100000
2023-04-20 22:33:04,781 epoch 28 - iter 12/20 - loss 0.42013719 - time (sec): 7.49 - samples/sec: 1323.56 - lr: 0.100000
2023-04-20 22:33:05,648 epoch 28 - iter 14/20 - loss 0.41681188 - time (sec): 8.35 - samples/sec: 1358.74 - lr: 0.100000
2023-04-20 22:33:06,658 epoch 28 - iter 16/20 - loss 0.42364697 - time (sec): 9.36 - samples/sec: 1389.13 - lr: 0.100000
2023-04-20 22:33:07,839 epoch 28 - i

100%|██████████| 3/3 [00:00<00:00,  5.72it/s]

2023-04-20 22:33:09,918 Evaluating as a multi-label problem: False
2023-04-20 22:33:09,933 DEV : loss 0.4694359302520752 - f1-score (micro avg)  0.4116
2023-04-20 22:33:09,943 BAD EPOCHS (no improvement): 2
2023-04-20 22:33:09,948 ----------------------------------------------------------------------------------------------------





2023-04-20 22:33:10,743 epoch 29 - iter 2/20 - loss 0.43621912 - time (sec): 0.79 - samples/sec: 2057.67 - lr: 0.100000
2023-04-20 22:33:11,732 epoch 29 - iter 4/20 - loss 0.41420172 - time (sec): 1.78 - samples/sec: 1774.89 - lr: 0.100000
2023-04-20 22:33:12,648 epoch 29 - iter 6/20 - loss 0.40588012 - time (sec): 2.70 - samples/sec: 1702.87 - lr: 0.100000
2023-04-20 22:33:13,708 epoch 29 - iter 8/20 - loss 0.39877795 - time (sec): 3.76 - samples/sec: 1612.43 - lr: 0.100000
2023-04-20 22:33:15,059 epoch 29 - iter 10/20 - loss 0.38107756 - time (sec): 5.11 - samples/sec: 1516.02 - lr: 0.100000
2023-04-20 22:33:16,308 epoch 29 - iter 12/20 - loss 0.39320826 - time (sec): 6.36 - samples/sec: 1475.03 - lr: 0.100000
2023-04-20 22:33:18,479 epoch 29 - iter 14/20 - loss 0.38772567 - time (sec): 8.53 - samples/sec: 1347.75 - lr: 0.100000
2023-04-20 22:33:19,502 epoch 29 - iter 16/20 - loss 0.39241298 - time (sec): 9.55 - samples/sec: 1354.65 - lr: 0.100000
2023-04-20 22:33:20,556 epoch 29 - i

100%|██████████| 3/3 [00:00<00:00,  5.78it/s]

2023-04-20 22:33:22,617 Evaluating as a multi-label problem: False
2023-04-20 22:33:22,633 DEV : loss 0.4396190047264099 - f1-score (micro avg)  0.3895
2023-04-20 22:33:22,644 BAD EPOCHS (no improvement): 3
2023-04-20 22:33:22,649 ----------------------------------------------------------------------------------------------------





2023-04-20 22:33:23,029 epoch 30 - iter 2/20 - loss 0.35947787 - time (sec): 0.38 - samples/sec: 3821.16 - lr: 0.100000
2023-04-20 22:33:24,110 epoch 30 - iter 4/20 - loss 0.39145374 - time (sec): 1.46 - samples/sec: 2076.67 - lr: 0.100000
2023-04-20 22:33:25,774 epoch 30 - iter 6/20 - loss 0.37306685 - time (sec): 3.12 - samples/sec: 1510.91 - lr: 0.100000
2023-04-20 22:33:26,655 epoch 30 - iter 8/20 - loss 0.37786917 - time (sec): 4.00 - samples/sec: 1497.24 - lr: 0.100000
2023-04-20 22:33:27,637 epoch 30 - iter 10/20 - loss 0.38130592 - time (sec): 4.99 - samples/sec: 1518.14 - lr: 0.100000
2023-04-20 22:33:28,570 epoch 30 - iter 12/20 - loss 0.38275845 - time (sec): 5.92 - samples/sec: 1561.55 - lr: 0.100000
2023-04-20 22:33:29,800 epoch 30 - iter 14/20 - loss 0.38279578 - time (sec): 7.15 - samples/sec: 1517.31 - lr: 0.100000
2023-04-20 22:33:31,405 epoch 30 - iter 16/20 - loss 0.37226812 - time (sec): 8.75 - samples/sec: 1459.69 - lr: 0.100000
2023-04-20 22:33:32,667 epoch 30 - i

100%|██████████| 3/3 [00:00<00:00,  5.51it/s]

2023-04-20 22:33:35,115 Evaluating as a multi-label problem: False
2023-04-20 22:33:35,130 DEV : loss 0.42041081190109253 - f1-score (micro avg)  0.461
2023-04-20 22:33:35,139 BAD EPOCHS (no improvement): 0
2023-04-20 22:33:35,145 saving best model





2023-04-20 22:33:37,232 ----------------------------------------------------------------------------------------------------
2023-04-20 22:33:37,705 epoch 31 - iter 2/20 - loss 0.41359159 - time (sec): 0.45 - samples/sec: 3760.69 - lr: 0.100000
2023-04-20 22:33:38,756 epoch 31 - iter 4/20 - loss 0.40000094 - time (sec): 1.50 - samples/sec: 2083.11 - lr: 0.100000
2023-04-20 22:33:39,665 epoch 31 - iter 6/20 - loss 0.39318804 - time (sec): 2.41 - samples/sec: 1874.91 - lr: 0.100000
2023-04-20 22:33:40,751 epoch 31 - iter 8/20 - loss 0.37762679 - time (sec): 3.50 - samples/sec: 1784.45 - lr: 0.100000
2023-04-20 22:33:41,790 epoch 31 - iter 10/20 - loss 0.37702717 - time (sec): 4.53 - samples/sec: 1757.12 - lr: 0.100000
2023-04-20 22:33:42,719 epoch 31 - iter 12/20 - loss 0.38485545 - time (sec): 5.46 - samples/sec: 1700.38 - lr: 0.100000
2023-04-20 22:33:44,254 epoch 31 - iter 14/20 - loss 0.37568132 - time (sec): 7.00 - samples/sec: 1579.07 - lr: 0.100000
2023-04-20 22:33:46,297 epoch 31

100%|██████████| 3/3 [00:00<00:00,  5.71it/s]

2023-04-20 22:33:51,340 Evaluating as a multi-label problem: False
2023-04-20 22:33:51,357 DEV : loss 0.3963017761707306 - f1-score (micro avg)  0.4411
2023-04-20 22:33:51,373 BAD EPOCHS (no improvement): 1
2023-04-20 22:33:51,377 ----------------------------------------------------------------------------------------------------





2023-04-20 22:33:51,790 epoch 32 - iter 2/20 - loss 0.37304434 - time (sec): 0.41 - samples/sec: 3275.81 - lr: 0.100000
2023-04-20 22:33:52,966 epoch 32 - iter 4/20 - loss 0.36491895 - time (sec): 1.59 - samples/sec: 1950.90 - lr: 0.100000
2023-04-20 22:33:53,936 epoch 32 - iter 6/20 - loss 0.37988823 - time (sec): 2.56 - samples/sec: 1795.68 - lr: 0.100000
2023-04-20 22:33:54,819 epoch 32 - iter 8/20 - loss 0.37978762 - time (sec): 3.44 - samples/sec: 1753.42 - lr: 0.100000
2023-04-20 22:33:56,222 epoch 32 - iter 10/20 - loss 0.35851866 - time (sec): 4.84 - samples/sec: 1637.59 - lr: 0.100000
2023-04-20 22:33:57,214 epoch 32 - iter 12/20 - loss 0.36771136 - time (sec): 5.84 - samples/sec: 1627.44 - lr: 0.100000
2023-04-20 22:33:58,944 epoch 32 - iter 14/20 - loss 0.37306804 - time (sec): 7.57 - samples/sec: 1500.11 - lr: 0.100000
2023-04-20 22:33:59,870 epoch 32 - iter 16/20 - loss 0.37976993 - time (sec): 8.49 - samples/sec: 1496.50 - lr: 0.100000
2023-04-20 22:34:00,948 epoch 32 - i

100%|██████████| 3/3 [00:00<00:00,  3.74it/s]

2023-04-20 22:34:03,682 Evaluating as a multi-label problem: False
2023-04-20 22:34:03,706 DEV : loss 0.4166812002658844 - f1-score (micro avg)  0.4317





2023-04-20 22:34:03,724 BAD EPOCHS (no improvement): 2
2023-04-20 22:34:03,733 ----------------------------------------------------------------------------------------------------
2023-04-20 22:34:04,353 epoch 33 - iter 2/20 - loss 0.34987571 - time (sec): 0.62 - samples/sec: 2595.30 - lr: 0.100000
2023-04-20 22:34:05,594 epoch 33 - iter 4/20 - loss 0.36992907 - time (sec): 1.86 - samples/sec: 1635.67 - lr: 0.100000
2023-04-20 22:34:06,633 epoch 33 - iter 6/20 - loss 0.38638558 - time (sec): 2.90 - samples/sec: 1549.98 - lr: 0.100000
2023-04-20 22:34:07,970 epoch 33 - iter 8/20 - loss 0.37709968 - time (sec): 4.24 - samples/sec: 1456.17 - lr: 0.100000
2023-04-20 22:34:09,728 epoch 33 - iter 10/20 - loss 0.37341463 - time (sec): 5.99 - samples/sec: 1382.92 - lr: 0.100000
2023-04-20 22:34:10,786 epoch 33 - iter 12/20 - loss 0.37101527 - time (sec): 7.05 - samples/sec: 1408.17 - lr: 0.100000
2023-04-20 22:34:11,730 epoch 33 - iter 14/20 - loss 0.37213272 - time (sec): 8.00 - samples/sec: 

100%|██████████| 3/3 [00:00<00:00,  5.58it/s]

2023-04-20 22:34:15,721 Evaluating as a multi-label problem: False
2023-04-20 22:34:15,745 DEV : loss 0.4016941785812378 - f1-score (micro avg)  0.4803
2023-04-20 22:34:15,758 BAD EPOCHS (no improvement): 0
2023-04-20 22:34:15,766 saving best model





2023-04-20 22:34:18,108 ----------------------------------------------------------------------------------------------------
2023-04-20 22:34:18,723 epoch 34 - iter 2/20 - loss 0.37675405 - time (sec): 0.60 - samples/sec: 2419.95 - lr: 0.100000
2023-04-20 22:34:20,901 epoch 34 - iter 4/20 - loss 0.35810268 - time (sec): 2.78 - samples/sec: 1234.27 - lr: 0.100000
2023-04-20 22:34:22,140 epoch 34 - iter 6/20 - loss 0.33122500 - time (sec): 4.02 - samples/sec: 1341.65 - lr: 0.100000
2023-04-20 22:34:23,043 epoch 34 - iter 8/20 - loss 0.33474084 - time (sec): 4.92 - samples/sec: 1393.24 - lr: 0.100000
2023-04-20 22:34:24,162 epoch 34 - iter 10/20 - loss 0.33807461 - time (sec): 6.04 - samples/sec: 1389.07 - lr: 0.100000
2023-04-20 22:34:25,843 epoch 34 - iter 12/20 - loss 0.33508850 - time (sec): 7.72 - samples/sec: 1327.78 - lr: 0.100000
2023-04-20 22:34:26,968 epoch 34 - iter 14/20 - loss 0.34543962 - time (sec): 8.85 - samples/sec: 1313.95 - lr: 0.100000
2023-04-20 22:34:28,164 epoch 34

100%|██████████| 3/3 [00:00<00:00,  5.30it/s]

2023-04-20 22:34:31,492 Evaluating as a multi-label problem: False
2023-04-20 22:34:31,516 DEV : loss 0.4096400737762451 - f1-score (micro avg)  0.4502
2023-04-20 22:34:31,533 BAD EPOCHS (no improvement): 1
2023-04-20 22:34:31,539 ----------------------------------------------------------------------------------------------------





2023-04-20 22:34:32,330 epoch 35 - iter 2/20 - loss 0.31595221 - time (sec): 0.79 - samples/sec: 2024.40 - lr: 0.100000
2023-04-20 22:34:33,417 epoch 35 - iter 4/20 - loss 0.33091861 - time (sec): 1.88 - samples/sec: 1648.64 - lr: 0.100000
2023-04-20 22:34:35,148 epoch 35 - iter 6/20 - loss 0.33485588 - time (sec): 3.61 - samples/sec: 1378.58 - lr: 0.100000
2023-04-20 22:34:36,284 epoch 35 - iter 8/20 - loss 0.33239531 - time (sec): 4.74 - samples/sec: 1374.76 - lr: 0.100000
2023-04-20 22:34:37,307 epoch 35 - iter 10/20 - loss 0.33479872 - time (sec): 5.77 - samples/sec: 1362.45 - lr: 0.100000
2023-04-20 22:34:38,313 epoch 35 - iter 12/20 - loss 0.34286459 - time (sec): 6.77 - samples/sec: 1413.57 - lr: 0.100000
2023-04-20 22:34:39,193 epoch 35 - iter 14/20 - loss 0.34851964 - time (sec): 7.65 - samples/sec: 1425.33 - lr: 0.100000
2023-04-20 22:34:41,043 epoch 35 - iter 16/20 - loss 0.34821582 - time (sec): 9.50 - samples/sec: 1366.65 - lr: 0.100000
2023-04-20 22:34:42,055 epoch 35 - i

100%|██████████| 3/3 [00:00<00:00,  6.06it/s]

2023-04-20 22:34:44,193 Evaluating as a multi-label problem: False
2023-04-20 22:34:44,209 DEV : loss 0.43903854489326477 - f1-score (micro avg)  0.3969
2023-04-20 22:34:44,219 BAD EPOCHS (no improvement): 2
2023-04-20 22:34:44,224 ----------------------------------------------------------------------------------------------------





2023-04-20 22:34:44,663 epoch 36 - iter 2/20 - loss 0.43430363 - time (sec): 0.43 - samples/sec: 3510.26 - lr: 0.100000
2023-04-20 22:34:45,603 epoch 36 - iter 4/20 - loss 0.36464325 - time (sec): 1.37 - samples/sec: 2183.93 - lr: 0.100000
2023-04-20 22:34:46,466 epoch 36 - iter 6/20 - loss 0.36526251 - time (sec): 2.24 - samples/sec: 1910.01 - lr: 0.100000
2023-04-20 22:34:48,621 epoch 36 - iter 8/20 - loss 0.37376600 - time (sec): 4.39 - samples/sec: 1450.61 - lr: 0.100000
2023-04-20 22:34:49,889 epoch 36 - iter 10/20 - loss 0.36649755 - time (sec): 5.66 - samples/sec: 1425.38 - lr: 0.100000
2023-04-20 22:34:50,994 epoch 36 - iter 12/20 - loss 0.36297344 - time (sec): 6.76 - samples/sec: 1376.51 - lr: 0.100000
2023-04-20 22:34:52,249 epoch 36 - iter 14/20 - loss 0.35814797 - time (sec): 8.02 - samples/sec: 1347.73 - lr: 0.100000
2023-04-20 22:34:53,207 epoch 36 - iter 16/20 - loss 0.35673999 - time (sec): 8.98 - samples/sec: 1372.83 - lr: 0.100000
2023-04-20 22:34:54,194 epoch 36 - i

100%|██████████| 3/3 [00:00<00:00,  5.90it/s]

2023-04-20 22:34:56,761 Evaluating as a multi-label problem: False
2023-04-20 22:34:56,776 DEV : loss 0.3630940318107605 - f1-score (micro avg)  0.4823
2023-04-20 22:34:56,785 BAD EPOCHS (no improvement): 0
2023-04-20 22:34:56,790 saving best model





2023-04-20 22:34:58,633 ----------------------------------------------------------------------------------------------------
2023-04-20 22:34:59,079 epoch 37 - iter 2/20 - loss 0.32025555 - time (sec): 0.44 - samples/sec: 3140.00 - lr: 0.100000
2023-04-20 22:35:00,297 epoch 37 - iter 4/20 - loss 0.30347584 - time (sec): 1.66 - samples/sec: 1869.28 - lr: 0.100000
2023-04-20 22:35:01,352 epoch 37 - iter 6/20 - loss 0.32377213 - time (sec): 2.71 - samples/sec: 1796.11 - lr: 0.100000
2023-04-20 22:35:02,488 epoch 37 - iter 8/20 - loss 0.34410224 - time (sec): 3.85 - samples/sec: 1635.48 - lr: 0.100000
2023-04-20 22:35:03,570 epoch 37 - iter 10/20 - loss 0.34047885 - time (sec): 4.93 - samples/sec: 1581.54 - lr: 0.100000
2023-04-20 22:35:05,985 epoch 37 - iter 12/20 - loss 0.34760387 - time (sec): 7.35 - samples/sec: 1319.22 - lr: 0.100000
2023-04-20 22:35:07,617 epoch 37 - iter 14/20 - loss 0.35122784 - time (sec): 8.98 - samples/sec: 1255.38 - lr: 0.100000
2023-04-20 22:35:09,380 epoch 37

100%|██████████| 3/3 [00:00<00:00,  5.75it/s]

2023-04-20 22:35:12,323 Evaluating as a multi-label problem: False
2023-04-20 22:35:12,345 DEV : loss 0.41674861311912537 - f1-score (micro avg)  0.412
2023-04-20 22:35:12,358 BAD EPOCHS (no improvement): 1
2023-04-20 22:35:12,364 ----------------------------------------------------------------------------------------------------





2023-04-20 22:35:12,648 epoch 38 - iter 2/20 - loss 0.40123861 - time (sec): 0.28 - samples/sec: 4487.35 - lr: 0.100000
2023-04-20 22:35:13,594 epoch 38 - iter 4/20 - loss 0.35447060 - time (sec): 1.23 - samples/sec: 2287.41 - lr: 0.100000
2023-04-20 22:35:15,016 epoch 38 - iter 6/20 - loss 0.31272404 - time (sec): 2.65 - samples/sec: 1765.72 - lr: 0.100000
2023-04-20 22:35:15,907 epoch 38 - iter 8/20 - loss 0.31252377 - time (sec): 3.54 - samples/sec: 1729.52 - lr: 0.100000
2023-04-20 22:35:16,878 epoch 38 - iter 10/20 - loss 0.31179854 - time (sec): 4.51 - samples/sec: 1714.83 - lr: 0.100000
2023-04-20 22:35:17,941 epoch 38 - iter 12/20 - loss 0.32024039 - time (sec): 5.57 - samples/sec: 1677.87 - lr: 0.100000
2023-04-20 22:35:19,327 epoch 38 - iter 14/20 - loss 0.32654811 - time (sec): 6.96 - samples/sec: 1592.36 - lr: 0.100000
2023-04-20 22:35:21,399 epoch 38 - iter 16/20 - loss 0.32558902 - time (sec): 9.03 - samples/sec: 1445.23 - lr: 0.100000
2023-04-20 22:35:22,559 epoch 38 - i

100%|██████████| 3/3 [00:00<00:00,  5.61it/s]

2023-04-20 22:35:24,762 Evaluating as a multi-label problem: False
2023-04-20 22:35:24,777 DEV : loss 0.37492287158966064 - f1-score (micro avg)  0.4915
2023-04-20 22:35:24,789 BAD EPOCHS (no improvement): 0
2023-04-20 22:35:24,794 saving best model





2023-04-20 22:35:26,608 ----------------------------------------------------------------------------------------------------
2023-04-20 22:35:27,026 epoch 39 - iter 2/20 - loss 0.31849669 - time (sec): 0.40 - samples/sec: 3810.16 - lr: 0.100000
2023-04-20 22:35:27,957 epoch 39 - iter 4/20 - loss 0.32694259 - time (sec): 1.33 - samples/sec: 2142.38 - lr: 0.100000
2023-04-20 22:35:29,721 epoch 39 - iter 6/20 - loss 0.33074487 - time (sec): 3.10 - samples/sec: 1603.51 - lr: 0.100000
2023-04-20 22:35:30,780 epoch 39 - iter 8/20 - loss 0.32255106 - time (sec): 4.16 - samples/sec: 1626.02 - lr: 0.100000
2023-04-20 22:35:31,810 epoch 39 - iter 10/20 - loss 0.32825317 - time (sec): 5.19 - samples/sec: 1621.39 - lr: 0.100000
2023-04-20 22:35:32,680 epoch 39 - iter 12/20 - loss 0.33899098 - time (sec): 6.06 - samples/sec: 1624.06 - lr: 0.100000
2023-04-20 22:35:34,628 epoch 39 - iter 14/20 - loss 0.32927300 - time (sec): 8.00 - samples/sec: 1453.87 - lr: 0.100000
2023-04-20 22:35:36,631 epoch 39

100%|██████████| 3/3 [00:00<00:00,  5.61it/s]

2023-04-20 22:35:40,644 Evaluating as a multi-label problem: False
2023-04-20 22:35:40,659 DEV : loss 0.4175657629966736 - f1-score (micro avg)  0.3969
2023-04-20 22:35:40,669 BAD EPOCHS (no improvement): 1
2023-04-20 22:35:40,673 ----------------------------------------------------------------------------------------------------





2023-04-20 22:35:41,612 epoch 40 - iter 2/20 - loss 0.25568184 - time (sec): 0.94 - samples/sec: 2021.76 - lr: 0.100000
2023-04-20 22:35:42,606 epoch 40 - iter 4/20 - loss 0.27911323 - time (sec): 1.93 - samples/sec: 1885.13 - lr: 0.100000
2023-04-20 22:35:43,515 epoch 40 - iter 6/20 - loss 0.28885521 - time (sec): 2.84 - samples/sec: 1796.07 - lr: 0.100000
2023-04-20 22:35:44,451 epoch 40 - iter 8/20 - loss 0.31249211 - time (sec): 3.78 - samples/sec: 1737.32 - lr: 0.100000
2023-04-20 22:35:45,564 epoch 40 - iter 10/20 - loss 0.31529463 - time (sec): 4.89 - samples/sec: 1688.07 - lr: 0.100000
2023-04-20 22:35:47,310 epoch 40 - iter 12/20 - loss 0.32880610 - time (sec): 6.64 - samples/sec: 1541.52 - lr: 0.100000
2023-04-20 22:35:48,162 epoch 40 - iter 14/20 - loss 0.32390081 - time (sec): 7.49 - samples/sec: 1558.83 - lr: 0.100000
2023-04-20 22:35:49,203 epoch 40 - iter 16/20 - loss 0.32327726 - time (sec): 8.53 - samples/sec: 1563.81 - lr: 0.100000
2023-04-20 22:35:50,266 epoch 40 - i

100%|██████████| 3/3 [00:00<00:00,  3.73it/s]

2023-04-20 22:35:52,769 Evaluating as a multi-label problem: False





2023-04-20 22:35:52,806 DEV : loss 0.38088518381118774 - f1-score (micro avg)  0.5163
2023-04-20 22:35:52,826 BAD EPOCHS (no improvement): 0
2023-04-20 22:35:52,831 saving best model
2023-04-20 22:35:55,119 ----------------------------------------------------------------------------------------------------
2023-04-20 22:35:55,577 epoch 41 - iter 2/20 - loss 0.31217622 - time (sec): 0.45 - samples/sec: 3307.02 - lr: 0.100000
2023-04-20 22:35:56,555 epoch 41 - iter 4/20 - loss 0.29181036 - time (sec): 1.43 - samples/sec: 2136.02 - lr: 0.100000
2023-04-20 22:35:57,984 epoch 41 - iter 6/20 - loss 0.28312918 - time (sec): 2.86 - samples/sec: 1738.53 - lr: 0.100000
2023-04-20 22:35:58,909 epoch 41 - iter 8/20 - loss 0.30568872 - time (sec): 3.79 - samples/sec: 1679.39 - lr: 0.100000
2023-04-20 22:35:59,906 epoch 41 - iter 10/20 - loss 0.31508971 - time (sec): 4.78 - samples/sec: 1641.41 - lr: 0.100000
2023-04-20 22:36:00,846 epoch 41 - iter 12/20 - loss 0.31812974 - time (sec): 5.72 - sample

100%|██████████| 3/3 [00:00<00:00,  3.87it/s]

2023-04-20 22:36:08,429 Evaluating as a multi-label problem: False





2023-04-20 22:36:08,474 DEV : loss 0.4228930175304413 - f1-score (micro avg)  0.4768
2023-04-20 22:36:08,488 BAD EPOCHS (no improvement): 1
2023-04-20 22:36:08,499 ----------------------------------------------------------------------------------------------------
2023-04-20 22:36:09,164 epoch 42 - iter 2/20 - loss 0.32502459 - time (sec): 0.66 - samples/sec: 2252.06 - lr: 0.100000
2023-04-20 22:36:10,401 epoch 42 - iter 4/20 - loss 0.32441923 - time (sec): 1.90 - samples/sec: 1660.20 - lr: 0.100000
2023-04-20 22:36:11,425 epoch 42 - iter 6/20 - loss 0.33335041 - time (sec): 2.92 - samples/sec: 1584.29 - lr: 0.100000
2023-04-20 22:36:12,319 epoch 42 - iter 8/20 - loss 0.31895536 - time (sec): 3.82 - samples/sec: 1573.11 - lr: 0.100000
2023-04-20 22:36:14,081 epoch 42 - iter 10/20 - loss 0.32910138 - time (sec): 5.58 - samples/sec: 1413.87 - lr: 0.100000
2023-04-20 22:36:14,972 epoch 42 - iter 12/20 - loss 0.32778357 - time (sec): 6.47 - samples/sec: 1444.81 - lr: 0.100000
2023-04-20 22

100%|██████████| 3/3 [00:00<00:00,  5.42it/s]

2023-04-20 22:36:20,533 Evaluating as a multi-label problem: False
2023-04-20 22:36:20,553 DEV : loss 0.4201347231864929 - f1-score (micro avg)  0.3969
2023-04-20 22:36:20,568 BAD EPOCHS (no improvement): 2
2023-04-20 22:36:20,573 ----------------------------------------------------------------------------------------------------





2023-04-20 22:36:22,081 epoch 43 - iter 2/20 - loss 0.34314369 - time (sec): 1.51 - samples/sec: 1343.38 - lr: 0.100000
2023-04-20 22:36:23,216 epoch 43 - iter 4/20 - loss 0.33443770 - time (sec): 2.64 - samples/sec: 1295.89 - lr: 0.100000
2023-04-20 22:36:24,425 epoch 43 - iter 6/20 - loss 0.32020665 - time (sec): 3.85 - samples/sec: 1274.78 - lr: 0.100000
2023-04-20 22:36:25,642 epoch 43 - iter 8/20 - loss 0.32773039 - time (sec): 5.07 - samples/sec: 1248.93 - lr: 0.100000
2023-04-20 22:36:26,697 epoch 43 - iter 10/20 - loss 0.31656781 - time (sec): 6.12 - samples/sec: 1302.51 - lr: 0.100000
2023-04-20 22:36:27,741 epoch 43 - iter 12/20 - loss 0.31659029 - time (sec): 7.17 - samples/sec: 1317.36 - lr: 0.100000
2023-04-20 22:36:28,695 epoch 43 - iter 14/20 - loss 0.31246587 - time (sec): 8.12 - samples/sec: 1370.91 - lr: 0.100000
2023-04-20 22:36:29,647 epoch 43 - iter 16/20 - loss 0.31590549 - time (sec): 9.07 - samples/sec: 1393.09 - lr: 0.100000
2023-04-20 22:36:30,733 epoch 43 - i

100%|██████████| 3/3 [00:00<00:00,  6.05it/s]

2023-04-20 22:36:33,205 Evaluating as a multi-label problem: False
2023-04-20 22:36:33,220 DEV : loss 0.37935692071914673 - f1-score (micro avg)  0.4242
2023-04-20 22:36:33,230 BAD EPOCHS (no improvement): 3
2023-04-20 22:36:33,236 ----------------------------------------------------------------------------------------------------





2023-04-20 22:36:34,446 epoch 44 - iter 2/20 - loss 0.31844324 - time (sec): 1.21 - samples/sec: 1818.67 - lr: 0.100000
2023-04-20 22:36:35,364 epoch 44 - iter 4/20 - loss 0.31124180 - time (sec): 2.12 - samples/sec: 1753.04 - lr: 0.100000
2023-04-20 22:36:36,337 epoch 44 - iter 6/20 - loss 0.31629335 - time (sec): 3.10 - samples/sec: 1568.21 - lr: 0.100000
2023-04-20 22:36:37,590 epoch 44 - iter 8/20 - loss 0.31085459 - time (sec): 4.35 - samples/sec: 1460.07 - lr: 0.100000
2023-04-20 22:36:38,645 epoch 44 - iter 10/20 - loss 0.31694353 - time (sec): 5.41 - samples/sec: 1452.60 - lr: 0.100000
2023-04-20 22:36:39,837 epoch 44 - iter 12/20 - loss 0.31601342 - time (sec): 6.60 - samples/sec: 1396.33 - lr: 0.100000
2023-04-20 22:36:41,313 epoch 44 - iter 14/20 - loss 0.31362293 - time (sec): 8.07 - samples/sec: 1400.82 - lr: 0.100000
2023-04-20 22:36:42,269 epoch 44 - iter 16/20 - loss 0.30916131 - time (sec): 9.03 - samples/sec: 1413.31 - lr: 0.100000
2023-04-20 22:36:43,375 epoch 44 - i

100%|██████████| 3/3 [00:00<00:00,  6.04it/s]

2023-04-20 22:36:45,807 Evaluating as a multi-label problem: False
2023-04-20 22:36:45,823 DEV : loss 0.3217867612838745 - f1-score (micro avg)  0.5271
2023-04-20 22:36:45,833 BAD EPOCHS (no improvement): 0
2023-04-20 22:36:45,839 saving best model





2023-04-20 22:36:47,657 ----------------------------------------------------------------------------------------------------
2023-04-20 22:36:49,021 epoch 45 - iter 2/20 - loss 0.28707846 - time (sec): 1.35 - samples/sec: 1573.71 - lr: 0.100000
2023-04-20 22:36:49,947 epoch 45 - iter 4/20 - loss 0.29383593 - time (sec): 2.28 - samples/sec: 1624.24 - lr: 0.100000
2023-04-20 22:36:50,852 epoch 45 - iter 6/20 - loss 0.30438662 - time (sec): 3.19 - samples/sec: 1600.58 - lr: 0.100000
2023-04-20 22:36:51,947 epoch 45 - iter 8/20 - loss 0.30704793 - time (sec): 4.28 - samples/sec: 1566.44 - lr: 0.100000
2023-04-20 22:36:53,069 epoch 45 - iter 10/20 - loss 0.30427910 - time (sec): 5.40 - samples/sec: 1492.39 - lr: 0.100000
2023-04-20 22:36:55,236 epoch 45 - iter 12/20 - loss 0.30684555 - time (sec): 7.57 - samples/sec: 1346.35 - lr: 0.100000
2023-04-20 22:36:56,627 epoch 45 - iter 14/20 - loss 0.31316251 - time (sec): 8.96 - samples/sec: 1296.58 - lr: 0.100000
2023-04-20 22:36:57,890 epoch 45

100%|██████████| 3/3 [00:00<00:00,  5.52it/s]

2023-04-20 22:37:01,345 Evaluating as a multi-label problem: False
2023-04-20 22:37:01,360 DEV : loss 0.3962732255458832 - f1-score (micro avg)  0.4468
2023-04-20 22:37:01,370 BAD EPOCHS (no improvement): 1
2023-04-20 22:37:01,377 ----------------------------------------------------------------------------------------------------





2023-04-20 22:37:01,750 epoch 46 - iter 2/20 - loss 0.27017110 - time (sec): 0.37 - samples/sec: 3817.01 - lr: 0.100000
2023-04-20 22:37:02,802 epoch 46 - iter 4/20 - loss 0.28631068 - time (sec): 1.42 - samples/sec: 2114.29 - lr: 0.100000
2023-04-20 22:37:03,860 epoch 46 - iter 6/20 - loss 0.29466746 - time (sec): 2.48 - samples/sec: 1906.07 - lr: 0.100000
2023-04-20 22:37:05,176 epoch 46 - iter 8/20 - loss 0.30406347 - time (sec): 3.80 - samples/sec: 1665.07 - lr: 0.100000
2023-04-20 22:37:06,157 epoch 46 - iter 10/20 - loss 0.30455761 - time (sec): 4.78 - samples/sec: 1690.51 - lr: 0.100000
2023-04-20 22:37:07,334 epoch 46 - iter 12/20 - loss 0.30202088 - time (sec): 5.95 - samples/sec: 1657.63 - lr: 0.100000
2023-04-20 22:37:08,343 epoch 46 - iter 14/20 - loss 0.29725071 - time (sec): 6.96 - samples/sec: 1611.63 - lr: 0.100000
2023-04-20 22:37:09,389 epoch 46 - iter 16/20 - loss 0.30811579 - time (sec): 8.01 - samples/sec: 1582.64 - lr: 0.100000
2023-04-20 22:37:11,516 epoch 46 - i

100%|██████████| 3/3 [00:00<00:00,  5.89it/s]

2023-04-20 22:37:13,739 Evaluating as a multi-label problem: False
2023-04-20 22:37:13,754 DEV : loss 0.36322733759880066 - f1-score (micro avg)  0.4727
2023-04-20 22:37:13,763 BAD EPOCHS (no improvement): 2
2023-04-20 22:37:13,769 ----------------------------------------------------------------------------------------------------





2023-04-20 22:37:14,108 epoch 47 - iter 2/20 - loss 0.29354438 - time (sec): 0.33 - samples/sec: 4039.46 - lr: 0.100000
2023-04-20 22:37:15,052 epoch 47 - iter 4/20 - loss 0.27658446 - time (sec): 1.28 - samples/sec: 2090.20 - lr: 0.100000
2023-04-20 22:37:16,132 epoch 47 - iter 6/20 - loss 0.30549455 - time (sec): 2.36 - samples/sec: 1835.48 - lr: 0.100000
2023-04-20 22:37:17,889 epoch 47 - iter 8/20 - loss 0.29756481 - time (sec): 4.11 - samples/sec: 1540.01 - lr: 0.100000
2023-04-20 22:37:18,950 epoch 47 - iter 10/20 - loss 0.29223886 - time (sec): 5.18 - samples/sec: 1526.44 - lr: 0.100000
2023-04-20 22:37:20,000 epoch 47 - iter 12/20 - loss 0.28906263 - time (sec): 6.23 - samples/sec: 1534.92 - lr: 0.100000
2023-04-20 22:37:21,056 epoch 47 - iter 14/20 - loss 0.28717890 - time (sec): 7.28 - samples/sec: 1542.63 - lr: 0.100000
2023-04-20 22:37:21,920 epoch 47 - iter 16/20 - loss 0.29015860 - time (sec): 8.15 - samples/sec: 1555.12 - lr: 0.100000
2023-04-20 22:37:23,235 epoch 47 - i

100%|██████████| 3/3 [00:00<00:00,  3.81it/s]

2023-04-20 22:37:26,269 Evaluating as a multi-label problem: False
2023-04-20 22:37:26,289 DEV : loss 0.37041768431663513 - f1-score (micro avg)  0.4348
2023-04-20 22:37:26,303 BAD EPOCHS (no improvement): 3





2023-04-20 22:37:26,309 ----------------------------------------------------------------------------------------------------
2023-04-20 22:37:27,853 epoch 48 - iter 2/20 - loss 0.25963876 - time (sec): 1.54 - samples/sec: 1352.59 - lr: 0.100000
2023-04-20 22:37:29,304 epoch 48 - iter 4/20 - loss 0.28268813 - time (sec): 2.99 - samples/sec: 1288.59 - lr: 0.100000
2023-04-20 22:37:30,228 epoch 48 - iter 6/20 - loss 0.27923113 - time (sec): 3.92 - samples/sec: 1328.91 - lr: 0.100000
2023-04-20 22:37:31,148 epoch 48 - iter 8/20 - loss 0.28266724 - time (sec): 4.84 - samples/sec: 1392.93 - lr: 0.100000
2023-04-20 22:37:32,069 epoch 48 - iter 10/20 - loss 0.27451832 - time (sec): 5.76 - samples/sec: 1406.99 - lr: 0.100000
2023-04-20 22:37:33,065 epoch 48 - iter 12/20 - loss 0.26997104 - time (sec): 6.75 - samples/sec: 1433.90 - lr: 0.100000
2023-04-20 22:37:34,013 epoch 48 - iter 14/20 - loss 0.27884392 - time (sec): 7.70 - samples/sec: 1471.23 - lr: 0.100000
2023-04-20 22:37:35,010 epoch 48

100%|██████████| 3/3 [00:00<00:00,  3.94it/s]

2023-04-20 22:37:38,429 Evaluating as a multi-label problem: False
2023-04-20 22:37:38,452 DEV : loss 0.3786267638206482 - f1-score (micro avg)  0.4903
2023-04-20 22:37:38,468 Epoch    48: reducing learning rate of group 0 to 5.0000e-02.
2023-04-20 22:37:38,469 BAD EPOCHS (no improvement): 4
2023-04-20 22:37:38,476 ----------------------------------------------------------------------------------------------------





2023-04-20 22:37:39,059 epoch 49 - iter 2/20 - loss 0.34212347 - time (sec): 0.58 - samples/sec: 2631.51 - lr: 0.050000
2023-04-20 22:37:40,206 epoch 49 - iter 4/20 - loss 0.28131228 - time (sec): 1.73 - samples/sec: 1816.89 - lr: 0.050000
2023-04-20 22:37:42,004 epoch 49 - iter 6/20 - loss 0.26958361 - time (sec): 3.52 - samples/sec: 1392.07 - lr: 0.050000
2023-04-20 22:37:43,209 epoch 49 - iter 8/20 - loss 0.26328033 - time (sec): 4.73 - samples/sec: 1386.64 - lr: 0.050000
2023-04-20 22:37:44,290 epoch 49 - iter 10/20 - loss 0.26560651 - time (sec): 5.81 - samples/sec: 1377.25 - lr: 0.050000
2023-04-20 22:37:46,033 epoch 49 - iter 12/20 - loss 0.26492909 - time (sec): 7.55 - samples/sec: 1326.05 - lr: 0.050000
2023-04-20 22:37:47,114 epoch 49 - iter 14/20 - loss 0.25921385 - time (sec): 8.63 - samples/sec: 1355.60 - lr: 0.050000
2023-04-20 22:37:48,002 epoch 49 - iter 16/20 - loss 0.25694152 - time (sec): 9.52 - samples/sec: 1379.73 - lr: 0.050000
2023-04-20 22:37:48,961 epoch 49 - i

100%|██████████| 3/3 [00:00<00:00,  5.82it/s]

2023-04-20 22:37:50,949 Evaluating as a multi-label problem: False
2023-04-20 22:37:50,968 DEV : loss 0.35375842452049255 - f1-score (micro avg)  0.4582
2023-04-20 22:37:50,977 BAD EPOCHS (no improvement): 1
2023-04-20 22:37:50,982 ----------------------------------------------------------------------------------------------------





2023-04-20 22:37:51,312 epoch 50 - iter 2/20 - loss 0.23023772 - time (sec): 0.33 - samples/sec: 3981.93 - lr: 0.050000
2023-04-20 22:37:52,390 epoch 50 - iter 4/20 - loss 0.22870977 - time (sec): 1.41 - samples/sec: 2081.34 - lr: 0.050000
2023-04-20 22:37:54,326 epoch 50 - iter 6/20 - loss 0.24455948 - time (sec): 3.34 - samples/sec: 1469.23 - lr: 0.050000
2023-04-20 22:37:55,427 epoch 50 - iter 8/20 - loss 0.25104109 - time (sec): 4.44 - samples/sec: 1434.06 - lr: 0.050000
2023-04-20 22:37:57,039 epoch 50 - iter 10/20 - loss 0.25497764 - time (sec): 6.05 - samples/sec: 1336.02 - lr: 0.050000
2023-04-20 22:37:58,242 epoch 50 - iter 12/20 - loss 0.25163252 - time (sec): 7.26 - samples/sec: 1305.22 - lr: 0.050000
2023-04-20 22:37:59,378 epoch 50 - iter 14/20 - loss 0.25193164 - time (sec): 8.39 - samples/sec: 1350.90 - lr: 0.050000
2023-04-20 22:38:00,287 epoch 50 - iter 16/20 - loss 0.25405674 - time (sec): 9.30 - samples/sec: 1377.39 - lr: 0.050000
2023-04-20 22:38:01,233 epoch 50 - i

100%|██████████| 3/3 [00:00<00:00,  5.79it/s]

2023-04-20 22:38:03,455 Evaluating as a multi-label problem: False
2023-04-20 22:38:03,471 DEV : loss 0.3649383783340454 - f1-score (micro avg)  0.4396
2023-04-20 22:38:03,483 BAD EPOCHS (no improvement): 2
2023-04-20 22:38:03,491 ----------------------------------------------------------------------------------------------------





2023-04-20 22:38:03,793 epoch 51 - iter 2/20 - loss 0.25245650 - time (sec): 0.30 - samples/sec: 4532.36 - lr: 0.050000
2023-04-20 22:38:04,871 epoch 51 - iter 4/20 - loss 0.26307631 - time (sec): 1.38 - samples/sec: 2179.10 - lr: 0.050000
2023-04-20 22:38:06,593 epoch 51 - iter 6/20 - loss 0.27736091 - time (sec): 3.10 - samples/sec: 1565.25 - lr: 0.050000
2023-04-20 22:38:07,553 epoch 51 - iter 8/20 - loss 0.27292322 - time (sec): 4.06 - samples/sec: 1630.70 - lr: 0.050000
2023-04-20 22:38:08,551 epoch 51 - iter 10/20 - loss 0.26460013 - time (sec): 5.06 - samples/sec: 1623.69 - lr: 0.050000
2023-04-20 22:38:10,287 epoch 51 - iter 12/20 - loss 0.25993220 - time (sec): 6.79 - samples/sec: 1499.44 - lr: 0.050000
2023-04-20 22:38:11,509 epoch 51 - iter 14/20 - loss 0.25404790 - time (sec): 8.02 - samples/sec: 1469.97 - lr: 0.050000
2023-04-20 22:38:12,610 epoch 51 - iter 16/20 - loss 0.25057861 - time (sec): 9.12 - samples/sec: 1452.38 - lr: 0.050000
2023-04-20 22:38:13,785 epoch 51 - i

100%|██████████| 3/3 [00:00<00:00,  5.81it/s]

2023-04-20 22:38:15,821 Evaluating as a multi-label problem: False
2023-04-20 22:38:15,837 DEV : loss 0.3532470762729645 - f1-score (micro avg)  0.4582
2023-04-20 22:38:15,848 BAD EPOCHS (no improvement): 3
2023-04-20 22:38:15,855 ----------------------------------------------------------------------------------------------------





2023-04-20 22:38:16,303 epoch 52 - iter 2/20 - loss 0.23791389 - time (sec): 0.44 - samples/sec: 3358.97 - lr: 0.050000
2023-04-20 22:38:18,080 epoch 52 - iter 4/20 - loss 0.23182580 - time (sec): 2.22 - samples/sec: 1580.61 - lr: 0.050000
2023-04-20 22:38:19,097 epoch 52 - iter 6/20 - loss 0.23619620 - time (sec): 3.24 - samples/sec: 1580.90 - lr: 0.050000
2023-04-20 22:38:20,160 epoch 52 - iter 8/20 - loss 0.25324243 - time (sec): 4.30 - samples/sec: 1582.42 - lr: 0.050000
2023-04-20 22:38:21,596 epoch 52 - iter 10/20 - loss 0.25087924 - time (sec): 5.73 - samples/sec: 1485.66 - lr: 0.050000
2023-04-20 22:38:22,578 epoch 52 - iter 12/20 - loss 0.24619677 - time (sec): 6.72 - samples/sec: 1514.45 - lr: 0.050000
2023-04-20 22:38:23,480 epoch 52 - iter 14/20 - loss 0.24891279 - time (sec): 7.62 - samples/sec: 1532.12 - lr: 0.050000
2023-04-20 22:38:24,567 epoch 52 - iter 16/20 - loss 0.25157114 - time (sec): 8.71 - samples/sec: 1499.62 - lr: 0.050000
2023-04-20 22:38:25,875 epoch 52 - i

100%|██████████| 3/3 [00:00<00:00,  3.73it/s]

2023-04-20 22:38:28,569 Evaluating as a multi-label problem: False
2023-04-20 22:38:28,591 DEV : loss 0.3556252121925354 - f1-score (micro avg)  0.4552
2023-04-20 22:38:28,604 Epoch    52: reducing learning rate of group 0 to 2.5000e-02.
2023-04-20 22:38:28,607 BAD EPOCHS (no improvement): 4





2023-04-20 22:38:28,613 ----------------------------------------------------------------------------------------------------
2023-04-20 22:38:29,078 epoch 53 - iter 2/20 - loss 0.23099343 - time (sec): 0.46 - samples/sec: 3083.98 - lr: 0.025000
2023-04-20 22:38:30,181 epoch 53 - iter 4/20 - loss 0.24483493 - time (sec): 1.56 - samples/sec: 1882.76 - lr: 0.025000
2023-04-20 22:38:31,078 epoch 53 - iter 6/20 - loss 0.24164985 - time (sec): 2.46 - samples/sec: 1739.81 - lr: 0.025000
2023-04-20 22:38:32,841 epoch 53 - iter 8/20 - loss 0.25457402 - time (sec): 4.22 - samples/sec: 1538.46 - lr: 0.025000
2023-04-20 22:38:33,700 epoch 53 - iter 10/20 - loss 0.24988669 - time (sec): 5.08 - samples/sec: 1538.51 - lr: 0.025000
2023-04-20 22:38:34,660 epoch 53 - iter 12/20 - loss 0.25265524 - time (sec): 6.04 - samples/sec: 1559.71 - lr: 0.025000
2023-04-20 22:38:35,743 epoch 53 - iter 14/20 - loss 0.24811725 - time (sec): 7.13 - samples/sec: 1577.26 - lr: 0.025000
2023-04-20 22:38:36,720 epoch 53

100%|██████████| 3/3 [00:00<00:00,  3.97it/s]

2023-04-20 22:38:40,427 Evaluating as a multi-label problem: False
2023-04-20 22:38:40,458 DEV : loss 0.3458041548728943 - f1-score (micro avg)  0.4509
2023-04-20 22:38:40,482 BAD EPOCHS (no improvement): 1





2023-04-20 22:38:40,490 ----------------------------------------------------------------------------------------------------
2023-04-20 22:38:41,137 epoch 54 - iter 2/20 - loss 0.25400762 - time (sec): 0.64 - samples/sec: 2368.35 - lr: 0.025000
2023-04-20 22:38:42,439 epoch 54 - iter 4/20 - loss 0.24220255 - time (sec): 1.95 - samples/sec: 1583.54 - lr: 0.025000
2023-04-20 22:38:43,501 epoch 54 - iter 6/20 - loss 0.24292302 - time (sec): 3.01 - samples/sec: 1442.64 - lr: 0.025000
2023-04-20 22:38:45,280 epoch 54 - iter 8/20 - loss 0.22467019 - time (sec): 4.79 - samples/sec: 1339.70 - lr: 0.025000
2023-04-20 22:38:46,184 epoch 54 - iter 10/20 - loss 0.22859675 - time (sec): 5.69 - samples/sec: 1355.69 - lr: 0.025000
2023-04-20 22:38:47,101 epoch 54 - iter 12/20 - loss 0.22569630 - time (sec): 6.61 - samples/sec: 1393.06 - lr: 0.025000
2023-04-20 22:38:48,088 epoch 54 - iter 14/20 - loss 0.23021345 - time (sec): 7.60 - samples/sec: 1430.93 - lr: 0.025000
2023-04-20 22:38:49,870 epoch 54

100%|██████████| 3/3 [00:00<00:00,  5.48it/s]

2023-04-20 22:38:52,877 Evaluating as a multi-label problem: False
2023-04-20 22:38:52,894 DEV : loss 0.3481781780719757 - f1-score (micro avg)  0.4477
2023-04-20 22:38:52,904 BAD EPOCHS (no improvement): 2
2023-04-20 22:38:52,912 ----------------------------------------------------------------------------------------------------





2023-04-20 22:38:53,262 epoch 55 - iter 2/20 - loss 0.21950780 - time (sec): 0.34 - samples/sec: 4056.11 - lr: 0.025000
2023-04-20 22:38:54,386 epoch 55 - iter 4/20 - loss 0.22653041 - time (sec): 1.47 - samples/sec: 2084.06 - lr: 0.025000
2023-04-20 22:38:55,563 epoch 55 - iter 6/20 - loss 0.23622709 - time (sec): 2.64 - samples/sec: 1707.04 - lr: 0.025000
2023-04-20 22:38:56,688 epoch 55 - iter 8/20 - loss 0.24693355 - time (sec): 3.77 - samples/sec: 1597.63 - lr: 0.025000
2023-04-20 22:38:57,798 epoch 55 - iter 10/20 - loss 0.24140236 - time (sec): 4.88 - samples/sec: 1536.24 - lr: 0.025000
2023-04-20 22:38:59,811 epoch 55 - iter 12/20 - loss 0.23465607 - time (sec): 6.89 - samples/sec: 1346.55 - lr: 0.025000
2023-04-20 22:39:00,802 epoch 55 - iter 14/20 - loss 0.23728800 - time (sec): 7.88 - samples/sec: 1351.28 - lr: 0.025000
2023-04-20 22:39:01,823 epoch 55 - iter 16/20 - loss 0.23870368 - time (sec): 8.90 - samples/sec: 1381.56 - lr: 0.025000
2023-04-20 22:39:02,853 epoch 55 - i

100%|██████████| 3/3 [00:00<00:00,  5.92it/s]

2023-04-20 22:39:05,435 Evaluating as a multi-label problem: False
2023-04-20 22:39:05,451 DEV : loss 0.34389325976371765 - f1-score (micro avg)  0.4436
2023-04-20 22:39:05,461 BAD EPOCHS (no improvement): 3
2023-04-20 22:39:05,469 ----------------------------------------------------------------------------------------------------





2023-04-20 22:39:06,629 epoch 56 - iter 2/20 - loss 0.25739659 - time (sec): 1.16 - samples/sec: 1680.39 - lr: 0.025000
2023-04-20 22:39:07,597 epoch 56 - iter 4/20 - loss 0.25349794 - time (sec): 2.12 - samples/sec: 1629.72 - lr: 0.025000
2023-04-20 22:39:08,549 epoch 56 - iter 6/20 - loss 0.25483635 - time (sec): 3.08 - samples/sec: 1626.04 - lr: 0.025000
2023-04-20 22:39:09,485 epoch 56 - iter 8/20 - loss 0.24980052 - time (sec): 4.01 - samples/sec: 1648.29 - lr: 0.025000
2023-04-20 22:39:10,550 epoch 56 - iter 10/20 - loss 0.24723708 - time (sec): 5.08 - samples/sec: 1576.65 - lr: 0.025000
2023-04-20 22:39:11,954 epoch 56 - iter 12/20 - loss 0.23319512 - time (sec): 6.48 - samples/sec: 1488.30 - lr: 0.025000
2023-04-20 22:39:13,021 epoch 56 - iter 14/20 - loss 0.23720640 - time (sec): 7.55 - samples/sec: 1465.17 - lr: 0.025000
2023-04-20 22:39:14,176 epoch 56 - iter 16/20 - loss 0.23377000 - time (sec): 8.70 - samples/sec: 1437.13 - lr: 0.025000
2023-04-20 22:39:15,804 epoch 56 - i

100%|██████████| 3/3 [00:00<00:00,  5.90it/s]

2023-04-20 22:39:17,904 Evaluating as a multi-label problem: False
2023-04-20 22:39:17,920 DEV : loss 0.3504921495914459 - f1-score (micro avg)  0.4436
2023-04-20 22:39:17,930 Epoch    56: reducing learning rate of group 0 to 1.2500e-02.
2023-04-20 22:39:17,932 BAD EPOCHS (no improvement): 4
2023-04-20 22:39:17,939 ----------------------------------------------------------------------------------------------------





2023-04-20 22:39:18,237 epoch 57 - iter 2/20 - loss 0.28844674 - time (sec): 0.29 - samples/sec: 4484.99 - lr: 0.012500
2023-04-20 22:39:19,215 epoch 57 - iter 4/20 - loss 0.26547756 - time (sec): 1.27 - samples/sec: 2250.81 - lr: 0.012500
2023-04-20 22:39:20,241 epoch 57 - iter 6/20 - loss 0.25092931 - time (sec): 2.30 - samples/sec: 1985.26 - lr: 0.012500
2023-04-20 22:39:21,282 epoch 57 - iter 8/20 - loss 0.23403320 - time (sec): 3.34 - samples/sec: 1789.02 - lr: 0.012500
2023-04-20 22:39:22,236 epoch 57 - iter 10/20 - loss 0.23192462 - time (sec): 4.29 - samples/sec: 1736.53 - lr: 0.012500
2023-04-20 22:39:23,231 epoch 57 - iter 12/20 - loss 0.23620672 - time (sec): 5.29 - samples/sec: 1711.19 - lr: 0.012500
2023-04-20 22:39:24,138 epoch 57 - iter 14/20 - loss 0.23641209 - time (sec): 6.19 - samples/sec: 1689.54 - lr: 0.012500
2023-04-20 22:39:25,718 epoch 57 - iter 16/20 - loss 0.23216313 - time (sec): 7.77 - samples/sec: 1593.43 - lr: 0.012500
2023-04-20 22:39:26,832 epoch 57 - i

100%|██████████| 3/3 [00:00<00:00,  3.95it/s]

2023-04-20 22:39:30,290 Evaluating as a multi-label problem: False
2023-04-20 22:39:30,315 DEV : loss 0.34198662638664246 - f1-score (micro avg)  0.4388





2023-04-20 22:39:30,329 BAD EPOCHS (no improvement): 1
2023-04-20 22:39:30,334 ----------------------------------------------------------------------------------------------------
2023-04-20 22:39:30,886 epoch 58 - iter 2/20 - loss 0.23084860 - time (sec): 0.55 - samples/sec: 2505.60 - lr: 0.012500
2023-04-20 22:39:31,887 epoch 58 - iter 4/20 - loss 0.23714898 - time (sec): 1.55 - samples/sec: 1933.65 - lr: 0.012500
2023-04-20 22:39:32,806 epoch 58 - iter 6/20 - loss 0.23548835 - time (sec): 2.47 - samples/sec: 1847.25 - lr: 0.012500
2023-04-20 22:39:34,181 epoch 58 - iter 8/20 - loss 0.21710055 - time (sec): 3.84 - samples/sec: 1661.00 - lr: 0.012500
2023-04-20 22:39:35,379 epoch 58 - iter 10/20 - loss 0.21805807 - time (sec): 5.04 - samples/sec: 1632.35 - lr: 0.012500
2023-04-20 22:39:36,331 epoch 58 - iter 12/20 - loss 0.21869772 - time (sec): 5.99 - samples/sec: 1607.93 - lr: 0.012500
2023-04-20 22:39:38,140 epoch 58 - iter 14/20 - loss 0.22264582 - time (sec): 7.80 - samples/sec: 

100%|██████████| 3/3 [00:00<00:00,  3.82it/s]

2023-04-20 22:39:42,413 Evaluating as a multi-label problem: False
2023-04-20 22:39:42,437 DEV : loss 0.35242322087287903 - f1-score (micro avg)  0.4436
2023-04-20 22:39:42,459 BAD EPOCHS (no improvement): 2





2023-04-20 22:39:42,464 ----------------------------------------------------------------------------------------------------
2023-04-20 22:39:42,992 epoch 59 - iter 2/20 - loss 0.25585586 - time (sec): 0.53 - samples/sec: 2909.58 - lr: 0.012500
2023-04-20 22:39:44,281 epoch 59 - iter 4/20 - loss 0.23138322 - time (sec): 1.81 - samples/sec: 1686.30 - lr: 0.012500
2023-04-20 22:39:46,121 epoch 59 - iter 6/20 - loss 0.21984986 - time (sec): 3.65 - samples/sec: 1365.60 - lr: 0.012500
2023-04-20 22:39:47,134 epoch 59 - iter 8/20 - loss 0.22719464 - time (sec): 4.67 - samples/sec: 1348.14 - lr: 0.012500
2023-04-20 22:39:48,816 epoch 59 - iter 10/20 - loss 0.23124234 - time (sec): 6.35 - samples/sec: 1302.96 - lr: 0.012500
2023-04-20 22:39:49,718 epoch 59 - iter 12/20 - loss 0.23161178 - time (sec): 7.25 - samples/sec: 1327.59 - lr: 0.012500
2023-04-20 22:39:50,744 epoch 59 - iter 14/20 - loss 0.22586376 - time (sec): 8.28 - samples/sec: 1370.43 - lr: 0.012500
2023-04-20 22:39:51,643 epoch 59

100%|██████████| 3/3 [00:00<00:00,  5.84it/s]

2023-04-20 22:39:54,761 Evaluating as a multi-label problem: False
2023-04-20 22:39:54,777 DEV : loss 0.35598060488700867 - f1-score (micro avg)  0.4453
2023-04-20 22:39:54,789 BAD EPOCHS (no improvement): 3
2023-04-20 22:39:54,795 ----------------------------------------------------------------------------------------------------





2023-04-20 22:39:55,589 epoch 60 - iter 2/20 - loss 0.19986058 - time (sec): 0.79 - samples/sec: 2206.35 - lr: 0.012500
2023-04-20 22:39:56,964 epoch 60 - iter 4/20 - loss 0.21976782 - time (sec): 2.17 - samples/sec: 1610.84 - lr: 0.012500
2023-04-20 22:39:57,982 epoch 60 - iter 6/20 - loss 0.21914021 - time (sec): 3.19 - samples/sec: 1562.89 - lr: 0.012500
2023-04-20 22:39:59,063 epoch 60 - iter 8/20 - loss 0.22245246 - time (sec): 4.27 - samples/sec: 1525.78 - lr: 0.012500
2023-04-20 22:40:00,184 epoch 60 - iter 10/20 - loss 0.23215481 - time (sec): 5.39 - samples/sec: 1458.99 - lr: 0.012500
2023-04-20 22:40:01,410 epoch 60 - iter 12/20 - loss 0.22827819 - time (sec): 6.61 - samples/sec: 1448.06 - lr: 0.012500
2023-04-20 22:40:02,499 epoch 60 - iter 14/20 - loss 0.22783124 - time (sec): 7.70 - samples/sec: 1456.26 - lr: 0.012500
2023-04-20 22:40:03,518 epoch 60 - iter 16/20 - loss 0.22913011 - time (sec): 8.72 - samples/sec: 1477.33 - lr: 0.012500
2023-04-20 22:40:05,241 epoch 60 - i

100%|██████████| 3/3 [00:00<00:00,  6.01it/s]

2023-04-20 22:40:07,199 Evaluating as a multi-label problem: False
2023-04-20 22:40:07,217 DEV : loss 0.34443199634552 - f1-score (micro avg)  0.4565
2023-04-20 22:40:07,229 Epoch    60: reducing learning rate of group 0 to 6.2500e-03.
2023-04-20 22:40:07,230 BAD EPOCHS (no improvement): 4
2023-04-20 22:40:07,243 ----------------------------------------------------------------------------------------------------





2023-04-20 22:40:07,593 epoch 61 - iter 2/20 - loss 0.21428047 - time (sec): 0.35 - samples/sec: 3657.07 - lr: 0.006250
2023-04-20 22:40:08,666 epoch 61 - iter 4/20 - loss 0.22612233 - time (sec): 1.42 - samples/sec: 2347.33 - lr: 0.006250
2023-04-20 22:40:09,759 epoch 61 - iter 6/20 - loss 0.21623124 - time (sec): 2.52 - samples/sec: 2017.50 - lr: 0.006250
2023-04-20 22:40:10,662 epoch 61 - iter 8/20 - loss 0.21588431 - time (sec): 3.42 - samples/sec: 1917.49 - lr: 0.006250
2023-04-20 22:40:11,735 epoch 61 - iter 10/20 - loss 0.21946624 - time (sec): 4.49 - samples/sec: 1827.55 - lr: 0.006250
2023-04-20 22:40:13,318 epoch 61 - iter 12/20 - loss 0.22354688 - time (sec): 6.07 - samples/sec: 1642.65 - lr: 0.006250
2023-04-20 22:40:14,330 epoch 61 - iter 14/20 - loss 0.22462382 - time (sec): 7.09 - samples/sec: 1607.45 - lr: 0.006250
2023-04-20 22:40:16,435 epoch 61 - iter 16/20 - loss 0.22241662 - time (sec): 9.19 - samples/sec: 1443.48 - lr: 0.006250
2023-04-20 22:40:17,412 epoch 61 - i

100%|██████████| 3/3 [00:00<00:00,  5.85it/s]

2023-04-20 22:40:19,457 Evaluating as a multi-label problem: False
2023-04-20 22:40:19,473 DEV : loss 0.34268122911453247 - f1-score (micro avg)  0.4509
2023-04-20 22:40:19,485 BAD EPOCHS (no improvement): 1
2023-04-20 22:40:19,493 ----------------------------------------------------------------------------------------------------





2023-04-20 22:40:19,935 epoch 62 - iter 2/20 - loss 0.25336976 - time (sec): 0.44 - samples/sec: 3581.25 - lr: 0.006250
2023-04-20 22:40:20,921 epoch 62 - iter 4/20 - loss 0.22734884 - time (sec): 1.43 - samples/sec: 2313.18 - lr: 0.006250
2023-04-20 22:40:22,422 epoch 62 - iter 6/20 - loss 0.21386092 - time (sec): 2.93 - samples/sec: 1783.14 - lr: 0.006250
2023-04-20 22:40:24,125 epoch 62 - iter 8/20 - loss 0.21370507 - time (sec): 4.63 - samples/sec: 1536.23 - lr: 0.006250
2023-04-20 22:40:25,133 epoch 62 - iter 10/20 - loss 0.22456647 - time (sec): 5.64 - samples/sec: 1545.98 - lr: 0.006250
2023-04-20 22:40:26,088 epoch 62 - iter 12/20 - loss 0.22121587 - time (sec): 6.59 - samples/sec: 1546.89 - lr: 0.006250
2023-04-20 22:40:27,177 epoch 62 - iter 14/20 - loss 0.21881125 - time (sec): 7.68 - samples/sec: 1499.29 - lr: 0.006250
2023-04-20 22:40:28,376 epoch 62 - iter 16/20 - loss 0.22117038 - time (sec): 8.88 - samples/sec: 1464.27 - lr: 0.006250
2023-04-20 22:40:29,516 epoch 62 - i

100%|██████████| 3/3 [00:00<00:00,  4.26it/s]

2023-04-20 22:40:32,191 Evaluating as a multi-label problem: False
2023-04-20 22:40:32,211 DEV : loss 0.3406372368335724 - f1-score (micro avg)  0.4565
2023-04-20 22:40:32,224 BAD EPOCHS (no improvement): 2
2023-04-20 22:40:32,230 ----------------------------------------------------------------------------------------------------





2023-04-20 22:40:33,500 epoch 63 - iter 2/20 - loss 0.21348000 - time (sec): 1.27 - samples/sec: 1569.97 - lr: 0.006250
2023-04-20 22:40:34,432 epoch 63 - iter 4/20 - loss 0.20300993 - time (sec): 2.20 - samples/sec: 1665.42 - lr: 0.006250
2023-04-20 22:40:35,549 epoch 63 - iter 6/20 - loss 0.20645635 - time (sec): 3.32 - samples/sec: 1611.57 - lr: 0.006250
2023-04-20 22:40:36,510 epoch 63 - iter 8/20 - loss 0.21701163 - time (sec): 4.28 - samples/sec: 1571.53 - lr: 0.006250
2023-04-20 22:40:37,430 epoch 63 - iter 10/20 - loss 0.21239611 - time (sec): 5.20 - samples/sec: 1581.14 - lr: 0.006250
2023-04-20 22:40:38,753 epoch 63 - iter 12/20 - loss 0.21008450 - time (sec): 6.52 - samples/sec: 1493.33 - lr: 0.006250
2023-04-20 22:40:39,820 epoch 63 - iter 14/20 - loss 0.20963724 - time (sec): 7.59 - samples/sec: 1515.60 - lr: 0.006250
2023-04-20 22:40:40,760 epoch 63 - iter 16/20 - loss 0.21304232 - time (sec): 8.53 - samples/sec: 1503.09 - lr: 0.006250
2023-04-20 22:40:41,775 epoch 63 - i

100%|██████████| 3/3 [00:00<00:00,  3.81it/s]

2023-04-20 22:40:44,333 Evaluating as a multi-label problem: False
2023-04-20 22:40:44,355 DEV : loss 0.33791881799697876 - f1-score (micro avg)  0.4582
2023-04-20 22:40:44,370 BAD EPOCHS (no improvement): 3





2023-04-20 22:40:44,377 ----------------------------------------------------------------------------------------------------
2023-04-20 22:40:44,874 epoch 64 - iter 2/20 - loss 0.22293684 - time (sec): 0.50 - samples/sec: 2948.22 - lr: 0.006250
2023-04-20 22:40:46,021 epoch 64 - iter 4/20 - loss 0.21139939 - time (sec): 1.64 - samples/sec: 1701.71 - lr: 0.006250
2023-04-20 22:40:47,313 epoch 64 - iter 6/20 - loss 0.21736932 - time (sec): 2.93 - samples/sec: 1552.00 - lr: 0.006250
2023-04-20 22:40:48,274 epoch 64 - iter 8/20 - loss 0.21361608 - time (sec): 3.90 - samples/sec: 1555.85 - lr: 0.006250
2023-04-20 22:40:49,405 epoch 64 - iter 10/20 - loss 0.21135700 - time (sec): 5.03 - samples/sec: 1565.08 - lr: 0.006250
2023-04-20 22:40:50,430 epoch 64 - iter 12/20 - loss 0.22557403 - time (sec): 6.05 - samples/sec: 1579.36 - lr: 0.006250
2023-04-20 22:40:51,344 epoch 64 - iter 14/20 - loss 0.22282571 - time (sec): 6.97 - samples/sec: 1587.22 - lr: 0.006250
2023-04-20 22:40:52,681 epoch 64

100%|██████████| 3/3 [00:00<00:00,  5.82it/s]

2023-04-20 22:40:56,435 Evaluating as a multi-label problem: False
2023-04-20 22:40:56,451 DEV : loss 0.3382642865180969 - f1-score (micro avg)  0.4727
2023-04-20 22:40:56,463 Epoch    64: reducing learning rate of group 0 to 3.1250e-03.
2023-04-20 22:40:56,468 BAD EPOCHS (no improvement): 4
2023-04-20 22:40:56,476 ----------------------------------------------------------------------------------------------------





2023-04-20 22:40:56,975 epoch 65 - iter 2/20 - loss 0.25518219 - time (sec): 0.49 - samples/sec: 3359.94 - lr: 0.003125
2023-04-20 22:40:58,160 epoch 65 - iter 4/20 - loss 0.26023957 - time (sec): 1.68 - samples/sec: 1946.44 - lr: 0.003125
2023-04-20 22:41:00,194 epoch 65 - iter 6/20 - loss 0.23418798 - time (sec): 3.71 - samples/sec: 1432.64 - lr: 0.003125
2023-04-20 22:41:01,240 epoch 65 - iter 8/20 - loss 0.23144936 - time (sec): 4.76 - samples/sec: 1393.91 - lr: 0.003125
2023-04-20 22:41:02,372 epoch 65 - iter 10/20 - loss 0.22530345 - time (sec): 5.89 - samples/sec: 1388.47 - lr: 0.003125
2023-04-20 22:41:03,494 epoch 65 - iter 12/20 - loss 0.22245230 - time (sec): 7.01 - samples/sec: 1381.36 - lr: 0.003125
2023-04-20 22:41:04,432 epoch 65 - iter 14/20 - loss 0.22029651 - time (sec): 7.95 - samples/sec: 1405.36 - lr: 0.003125
2023-04-20 22:41:05,750 epoch 65 - iter 16/20 - loss 0.21944370 - time (sec): 9.27 - samples/sec: 1385.94 - lr: 0.003125
2023-04-20 22:41:06,842 epoch 65 - i

100%|██████████| 3/3 [00:00<00:00,  5.91it/s]

2023-04-20 22:41:08,892 Evaluating as a multi-label problem: False
2023-04-20 22:41:08,909 DEV : loss 0.3416958153247833 - f1-score (micro avg)  0.4509
2023-04-20 22:41:08,919 BAD EPOCHS (no improvement): 1
2023-04-20 22:41:08,926 ----------------------------------------------------------------------------------------------------





2023-04-20 22:41:09,718 epoch 66 - iter 2/20 - loss 0.20666808 - time (sec): 0.79 - samples/sec: 2403.19 - lr: 0.003125
2023-04-20 22:41:10,669 epoch 66 - iter 4/20 - loss 0.21793336 - time (sec): 1.74 - samples/sec: 2010.34 - lr: 0.003125
2023-04-20 22:41:11,638 epoch 66 - iter 6/20 - loss 0.21790092 - time (sec): 2.71 - samples/sec: 1868.41 - lr: 0.003125
2023-04-20 22:41:13,365 epoch 66 - iter 8/20 - loss 0.21753788 - time (sec): 4.43 - samples/sec: 1521.32 - lr: 0.003125
2023-04-20 22:41:14,552 epoch 66 - iter 10/20 - loss 0.21405533 - time (sec): 5.62 - samples/sec: 1475.32 - lr: 0.003125
2023-04-20 22:41:15,635 epoch 66 - iter 12/20 - loss 0.21619159 - time (sec): 6.70 - samples/sec: 1434.37 - lr: 0.003125
2023-04-20 22:41:16,938 epoch 66 - iter 14/20 - loss 0.22030400 - time (sec): 8.01 - samples/sec: 1400.20 - lr: 0.003125
2023-04-20 22:41:18,091 epoch 66 - iter 16/20 - loss 0.21649846 - time (sec): 9.16 - samples/sec: 1390.75 - lr: 0.003125
2023-04-20 22:41:19,096 epoch 66 - i

100%|██████████| 3/3 [00:00<00:00,  5.90it/s]

2023-04-20 22:41:21,395 Evaluating as a multi-label problem: False
2023-04-20 22:41:21,417 DEV : loss 0.33991652727127075 - f1-score (micro avg)  0.4604
2023-04-20 22:41:21,430 BAD EPOCHS (no improvement): 2
2023-04-20 22:41:21,434 ----------------------------------------------------------------------------------------------------





2023-04-20 22:41:21,822 epoch 67 - iter 2/20 - loss 0.29046551 - time (sec): 0.39 - samples/sec: 3562.56 - lr: 0.003125
2023-04-20 22:41:22,847 epoch 67 - iter 4/20 - loss 0.24364581 - time (sec): 1.41 - samples/sec: 2225.39 - lr: 0.003125
2023-04-20 22:41:24,663 epoch 67 - iter 6/20 - loss 0.22494687 - time (sec): 3.23 - samples/sec: 1685.34 - lr: 0.003125
2023-04-20 22:41:26,059 epoch 67 - iter 8/20 - loss 0.21419381 - time (sec): 4.62 - samples/sec: 1601.60 - lr: 0.003125
2023-04-20 22:41:26,938 epoch 67 - iter 10/20 - loss 0.21621930 - time (sec): 5.50 - samples/sec: 1562.53 - lr: 0.003125
2023-04-20 22:41:27,877 epoch 67 - iter 12/20 - loss 0.21808618 - time (sec): 6.44 - samples/sec: 1564.73 - lr: 0.003125
2023-04-20 22:41:28,943 epoch 67 - iter 14/20 - loss 0.21770912 - time (sec): 7.51 - samples/sec: 1509.02 - lr: 0.003125
2023-04-20 22:41:30,282 epoch 67 - iter 16/20 - loss 0.21697079 - time (sec): 8.85 - samples/sec: 1473.43 - lr: 0.003125
2023-04-20 22:41:31,445 epoch 67 - i

100%|██████████| 3/3 [00:00<00:00,  4.33it/s]

2023-04-20 22:41:33,934 Evaluating as a multi-label problem: False
2023-04-20 22:41:33,949 DEV : loss 0.34137457609176636 - f1-score (micro avg)  0.4604
2023-04-20 22:41:33,960 BAD EPOCHS (no improvement): 3
2023-04-20 22:41:33,964 ----------------------------------------------------------------------------------------------------





2023-04-20 22:41:34,873 epoch 68 - iter 2/20 - loss 0.18924672 - time (sec): 0.91 - samples/sec: 2151.27 - lr: 0.003125
2023-04-20 22:41:35,763 epoch 68 - iter 4/20 - loss 0.21030512 - time (sec): 1.80 - samples/sec: 1833.59 - lr: 0.003125
2023-04-20 22:41:37,514 epoch 68 - iter 6/20 - loss 0.23152053 - time (sec): 3.55 - samples/sec: 1525.56 - lr: 0.003125
2023-04-20 22:41:38,426 epoch 68 - iter 8/20 - loss 0.23184654 - time (sec): 4.46 - samples/sec: 1530.30 - lr: 0.003125
2023-04-20 22:41:39,461 epoch 68 - iter 10/20 - loss 0.23745213 - time (sec): 5.50 - samples/sec: 1554.75 - lr: 0.003125
2023-04-20 22:41:40,530 epoch 68 - iter 12/20 - loss 0.23401377 - time (sec): 6.57 - samples/sec: 1547.72 - lr: 0.003125
2023-04-20 22:41:41,458 epoch 68 - iter 14/20 - loss 0.23004771 - time (sec): 7.49 - samples/sec: 1549.68 - lr: 0.003125
2023-04-20 22:41:42,278 epoch 68 - iter 16/20 - loss 0.23029532 - time (sec): 8.31 - samples/sec: 1547.21 - lr: 0.003125
2023-04-20 22:41:43,302 epoch 68 - i

100%|██████████| 3/3 [00:00<00:00,  3.79it/s]

2023-04-20 22:41:45,863 Evaluating as a multi-label problem: False
2023-04-20 22:41:45,886 DEV : loss 0.3410501182079315 - f1-score (micro avg)  0.4565
2023-04-20 22:41:45,901 Epoch    68: reducing learning rate of group 0 to 1.5625e-03.





2023-04-20 22:41:45,905 BAD EPOCHS (no improvement): 4
2023-04-20 22:41:45,911 ----------------------------------------------------------------------------------------------------
2023-04-20 22:41:47,319 epoch 69 - iter 2/20 - loss 0.20546066 - time (sec): 1.41 - samples/sec: 1228.40 - lr: 0.001563
2023-04-20 22:41:48,419 epoch 69 - iter 4/20 - loss 0.20488230 - time (sec): 2.51 - samples/sec: 1227.82 - lr: 0.001563
2023-04-20 22:41:49,489 epoch 69 - iter 6/20 - loss 0.21475603 - time (sec): 3.58 - samples/sec: 1245.30 - lr: 0.001563
2023-04-20 22:41:50,877 epoch 69 - iter 8/20 - loss 0.20800316 - time (sec): 4.96 - samples/sec: 1246.32 - lr: 0.001563
2023-04-20 22:41:51,919 epoch 69 - iter 10/20 - loss 0.20831126 - time (sec): 6.01 - samples/sec: 1339.44 - lr: 0.001563
2023-04-20 22:41:52,826 epoch 69 - iter 12/20 - loss 0.21557692 - time (sec): 6.91 - samples/sec: 1372.28 - lr: 0.001563
2023-04-20 22:41:53,897 epoch 69 - iter 14/20 - loss 0.21454988 - time (sec): 7.98 - samples/sec: 

100%|██████████| 3/3 [00:00<00:00,  6.00it/s]

2023-04-20 22:41:57,918 Evaluating as a multi-label problem: False
2023-04-20 22:41:57,934 DEV : loss 0.3425144851207733 - f1-score (micro avg)  0.4565
2023-04-20 22:41:57,945 BAD EPOCHS (no improvement): 1
2023-04-20 22:41:57,951 ----------------------------------------------------------------------------------------------------





2023-04-20 22:41:59,162 epoch 70 - iter 2/20 - loss 0.24026985 - time (sec): 1.21 - samples/sec: 1627.19 - lr: 0.001563
2023-04-20 22:42:00,219 epoch 70 - iter 4/20 - loss 0.23347298 - time (sec): 2.27 - samples/sec: 1543.58 - lr: 0.001563
2023-04-20 22:42:01,274 epoch 70 - iter 6/20 - loss 0.23533348 - time (sec): 3.32 - samples/sec: 1463.15 - lr: 0.001563
2023-04-20 22:42:02,552 epoch 70 - iter 8/20 - loss 0.22326238 - time (sec): 4.60 - samples/sec: 1364.83 - lr: 0.001563
2023-04-20 22:42:03,635 epoch 70 - iter 10/20 - loss 0.22885819 - time (sec): 5.68 - samples/sec: 1335.61 - lr: 0.001563
2023-04-20 22:42:04,825 epoch 70 - iter 12/20 - loss 0.22525276 - time (sec): 6.87 - samples/sec: 1344.85 - lr: 0.001563
2023-04-20 22:42:05,797 epoch 70 - iter 14/20 - loss 0.22285291 - time (sec): 7.84 - samples/sec: 1369.33 - lr: 0.001563
2023-04-20 22:42:06,829 epoch 70 - iter 16/20 - loss 0.22051057 - time (sec): 8.88 - samples/sec: 1404.90 - lr: 0.001563
2023-04-20 22:42:07,881 epoch 70 - i

100%|██████████| 3/3 [00:00<00:00,  6.03it/s]

2023-04-20 22:42:10,400 Evaluating as a multi-label problem: False
2023-04-20 22:42:10,415 DEV : loss 0.3420298993587494 - f1-score (micro avg)  0.4549
2023-04-20 22:42:10,425 BAD EPOCHS (no improvement): 2
2023-04-20 22:42:10,430 ----------------------------------------------------------------------------------------------------





2023-04-20 22:42:10,865 epoch 71 - iter 2/20 - loss 0.23286309 - time (sec): 0.43 - samples/sec: 3778.74 - lr: 0.001563
2023-04-20 22:42:11,748 epoch 71 - iter 4/20 - loss 0.24331013 - time (sec): 1.32 - samples/sec: 2236.70 - lr: 0.001563
2023-04-20 22:42:12,679 epoch 71 - iter 6/20 - loss 0.24177447 - time (sec): 2.25 - samples/sec: 1968.61 - lr: 0.001563
2023-04-20 22:42:13,759 epoch 71 - iter 8/20 - loss 0.23706685 - time (sec): 3.33 - samples/sec: 1840.41 - lr: 0.001563
2023-04-20 22:42:15,300 epoch 71 - iter 10/20 - loss 0.22019900 - time (sec): 4.87 - samples/sec: 1594.86 - lr: 0.001563
2023-04-20 22:42:16,392 epoch 71 - iter 12/20 - loss 0.22645850 - time (sec): 5.96 - samples/sec: 1570.68 - lr: 0.001563
2023-04-20 22:42:17,665 epoch 71 - iter 14/20 - loss 0.22647174 - time (sec): 7.23 - samples/sec: 1514.39 - lr: 0.001563
2023-04-20 22:42:18,988 epoch 71 - iter 16/20 - loss 0.22185484 - time (sec): 8.56 - samples/sec: 1466.08 - lr: 0.001563
2023-04-20 22:42:20,917 epoch 71 - i

100%|██████████| 3/3 [00:00<00:00,  5.80it/s]

2023-04-20 22:42:22,943 Evaluating as a multi-label problem: False
2023-04-20 22:42:22,962 DEV : loss 0.34224584698677063 - f1-score (micro avg)  0.4549
2023-04-20 22:42:22,973 BAD EPOCHS (no improvement): 3
2023-04-20 22:42:22,980 ----------------------------------------------------------------------------------------------------





2023-04-20 22:42:23,549 epoch 72 - iter 2/20 - loss 0.22519150 - time (sec): 0.57 - samples/sec: 3076.75 - lr: 0.001563
2023-04-20 22:42:24,456 epoch 72 - iter 4/20 - loss 0.22266767 - time (sec): 1.47 - samples/sec: 2147.76 - lr: 0.001563
2023-04-20 22:42:25,879 epoch 72 - iter 6/20 - loss 0.22626454 - time (sec): 2.90 - samples/sec: 1687.94 - lr: 0.001563
2023-04-20 22:42:26,773 epoch 72 - iter 8/20 - loss 0.22375907 - time (sec): 3.79 - samples/sec: 1639.27 - lr: 0.001563
2023-04-20 22:42:27,694 epoch 72 - iter 10/20 - loss 0.22463120 - time (sec): 4.71 - samples/sec: 1628.64 - lr: 0.001563
2023-04-20 22:42:28,679 epoch 72 - iter 12/20 - loss 0.22512276 - time (sec): 5.70 - samples/sec: 1604.82 - lr: 0.001563
2023-04-20 22:42:30,487 epoch 72 - iter 14/20 - loss 0.22058548 - time (sec): 7.50 - samples/sec: 1473.86 - lr: 0.001563
2023-04-20 22:42:31,590 epoch 72 - iter 16/20 - loss 0.22107104 - time (sec): 8.61 - samples/sec: 1460.38 - lr: 0.001563
2023-04-20 22:42:32,951 epoch 72 - i

100%|██████████| 3/3 [00:00<00:00,  4.45it/s]

2023-04-20 22:42:35,556 Evaluating as a multi-label problem: False
2023-04-20 22:42:35,573 DEV : loss 0.3421279191970825 - f1-score (micro avg)  0.4549
2023-04-20 22:42:35,583 Epoch    72: reducing learning rate of group 0 to 7.8125e-04.
2023-04-20 22:42:35,585 BAD EPOCHS (no improvement): 4
2023-04-20 22:42:35,592 ----------------------------------------------------------------------------------------------------





2023-04-20 22:42:36,164 epoch 73 - iter 2/20 - loss 0.24417387 - time (sec): 0.57 - samples/sec: 3191.86 - lr: 0.000781
2023-04-20 22:42:37,101 epoch 73 - iter 4/20 - loss 0.24754099 - time (sec): 1.51 - samples/sec: 2190.10 - lr: 0.000781
2023-04-20 22:42:38,143 epoch 73 - iter 6/20 - loss 0.23223076 - time (sec): 2.55 - samples/sec: 1896.40 - lr: 0.000781
2023-04-20 22:42:39,021 epoch 73 - iter 8/20 - loss 0.22683884 - time (sec): 3.43 - samples/sec: 1807.93 - lr: 0.000781
2023-04-20 22:42:40,435 epoch 73 - iter 10/20 - loss 0.22245135 - time (sec): 4.84 - samples/sec: 1682.51 - lr: 0.000781
2023-04-20 22:42:41,310 epoch 73 - iter 12/20 - loss 0.22611206 - time (sec): 5.72 - samples/sec: 1646.63 - lr: 0.000781
2023-04-20 22:42:42,368 epoch 73 - iter 14/20 - loss 0.22407083 - time (sec): 6.77 - samples/sec: 1595.39 - lr: 0.000781
2023-04-20 22:42:44,140 epoch 73 - iter 16/20 - loss 0.22407139 - time (sec): 8.55 - samples/sec: 1514.55 - lr: 0.000781
2023-04-20 22:42:45,155 epoch 73 - i

100%|██████████| 3/3 [00:00<00:00,  3.76it/s]

2023-04-20 22:42:47,742 Evaluating as a multi-label problem: False
2023-04-20 22:42:47,761 DEV : loss 0.3419511914253235 - f1-score (micro avg)  0.4549
2023-04-20 22:42:47,780 BAD EPOCHS (no improvement): 1





2023-04-20 22:42:47,793 ----------------------------------------------------------------------------------------------------
2023-04-20 22:42:49,310 epoch 74 - iter 2/20 - loss 0.21153641 - time (sec): 1.51 - samples/sec: 1281.88 - lr: 0.000781
2023-04-20 22:42:50,575 epoch 74 - iter 4/20 - loss 0.22228446 - time (sec): 2.78 - samples/sec: 1266.17 - lr: 0.000781
2023-04-20 22:42:51,644 epoch 74 - iter 6/20 - loss 0.21593270 - time (sec): 3.85 - samples/sec: 1411.77 - lr: 0.000781
2023-04-20 22:42:52,606 epoch 74 - iter 8/20 - loss 0.21618135 - time (sec): 4.81 - samples/sec: 1417.99 - lr: 0.000781
2023-04-20 22:42:53,797 epoch 74 - iter 10/20 - loss 0.21416761 - time (sec): 6.00 - samples/sec: 1430.21 - lr: 0.000781
2023-04-20 22:42:55,245 epoch 74 - iter 12/20 - loss 0.21252506 - time (sec): 7.45 - samples/sec: 1419.05 - lr: 0.000781
2023-04-20 22:42:56,255 epoch 74 - iter 14/20 - loss 0.21433547 - time (sec): 8.46 - samples/sec: 1411.27 - lr: 0.000781
2023-04-20 22:42:57,114 epoch 74

100%|██████████| 3/3 [00:00<00:00,  5.83it/s]

2023-04-20 22:43:00,056 Evaluating as a multi-label problem: False
2023-04-20 22:43:00,073 DEV : loss 0.34234216809272766 - f1-score (micro avg)  0.4549
2023-04-20 22:43:00,088 BAD EPOCHS (no improvement): 2
2023-04-20 22:43:00,094 ----------------------------------------------------------------------------------------------------





2023-04-20 22:43:00,403 epoch 75 - iter 2/20 - loss 0.22940061 - time (sec): 0.31 - samples/sec: 4457.40 - lr: 0.000781
2023-04-20 22:43:01,712 epoch 75 - iter 4/20 - loss 0.22272844 - time (sec): 1.62 - samples/sec: 1876.08 - lr: 0.000781
2023-04-20 22:43:02,846 epoch 75 - iter 6/20 - loss 0.22627827 - time (sec): 2.75 - samples/sec: 1698.61 - lr: 0.000781
2023-04-20 22:43:04,177 epoch 75 - iter 8/20 - loss 0.22021745 - time (sec): 4.08 - samples/sec: 1515.98 - lr: 0.000781
2023-04-20 22:43:05,300 epoch 75 - iter 10/20 - loss 0.21827869 - time (sec): 5.20 - samples/sec: 1448.40 - lr: 0.000781
2023-04-20 22:43:06,507 epoch 75 - iter 12/20 - loss 0.21794974 - time (sec): 6.41 - samples/sec: 1417.03 - lr: 0.000781
2023-04-20 22:43:08,277 epoch 75 - iter 14/20 - loss 0.21106433 - time (sec): 8.18 - samples/sec: 1362.44 - lr: 0.000781
2023-04-20 22:43:09,243 epoch 75 - iter 16/20 - loss 0.20770248 - time (sec): 9.15 - samples/sec: 1403.26 - lr: 0.000781
2023-04-20 22:43:10,202 epoch 75 - i

100%|██████████| 3/3 [00:00<00:00,  5.89it/s]

2023-04-20 22:43:12,312 Evaluating as a multi-label problem: False
2023-04-20 22:43:12,331 DEV : loss 0.3422749638557434 - f1-score (micro avg)  0.4549
2023-04-20 22:43:12,343 BAD EPOCHS (no improvement): 3
2023-04-20 22:43:12,352 ----------------------------------------------------------------------------------------------------





2023-04-20 22:43:13,180 epoch 76 - iter 2/20 - loss 0.17246219 - time (sec): 0.82 - samples/sec: 1989.72 - lr: 0.000781
2023-04-20 22:43:14,180 epoch 76 - iter 4/20 - loss 0.21506827 - time (sec): 1.82 - samples/sec: 1749.34 - lr: 0.000781
2023-04-20 22:43:16,004 epoch 76 - iter 6/20 - loss 0.22044073 - time (sec): 3.64 - samples/sec: 1493.30 - lr: 0.000781
2023-04-20 22:43:17,138 epoch 76 - iter 8/20 - loss 0.21808687 - time (sec): 4.78 - samples/sec: 1504.05 - lr: 0.000781
2023-04-20 22:43:18,205 epoch 76 - iter 10/20 - loss 0.22186428 - time (sec): 5.85 - samples/sec: 1466.45 - lr: 0.000781
2023-04-20 22:43:19,561 epoch 76 - iter 12/20 - loss 0.21857814 - time (sec): 7.20 - samples/sec: 1398.34 - lr: 0.000781
2023-04-20 22:43:20,788 epoch 76 - iter 14/20 - loss 0.21661193 - time (sec): 8.43 - samples/sec: 1360.62 - lr: 0.000781
2023-04-20 22:43:21,907 epoch 76 - iter 16/20 - loss 0.21684773 - time (sec): 9.55 - samples/sec: 1366.51 - lr: 0.000781
2023-04-20 22:43:22,850 epoch 76 - i

100%|██████████| 3/3 [00:00<00:00,  5.56it/s]

2023-04-20 22:43:24,964 Evaluating as a multi-label problem: False
2023-04-20 22:43:24,979 DEV : loss 0.341854453086853 - f1-score (micro avg)  0.4549
2023-04-20 22:43:24,991 Epoch    76: reducing learning rate of group 0 to 3.9063e-04.
2023-04-20 22:43:24,992 BAD EPOCHS (no improvement): 4
2023-04-20 22:43:25,002 ----------------------------------------------------------------------------------------------------





2023-04-20 22:43:25,604 epoch 77 - iter 2/20 - loss 0.21063968 - time (sec): 0.60 - samples/sec: 2848.98 - lr: 0.000391
2023-04-20 22:43:26,518 epoch 77 - iter 4/20 - loss 0.20859986 - time (sec): 1.51 - samples/sec: 2091.04 - lr: 0.000391
2023-04-20 22:43:27,552 epoch 77 - iter 6/20 - loss 0.22299016 - time (sec): 2.55 - samples/sec: 1895.71 - lr: 0.000391
2023-04-20 22:43:29,345 epoch 77 - iter 8/20 - loss 0.21650462 - time (sec): 4.34 - samples/sec: 1533.61 - lr: 0.000391
2023-04-20 22:43:30,271 epoch 77 - iter 10/20 - loss 0.21242586 - time (sec): 5.27 - samples/sec: 1557.34 - lr: 0.000391
2023-04-20 22:43:31,267 epoch 77 - iter 12/20 - loss 0.21704439 - time (sec): 6.26 - samples/sec: 1548.49 - lr: 0.000391
2023-04-20 22:43:32,615 epoch 77 - iter 14/20 - loss 0.21740785 - time (sec): 7.61 - samples/sec: 1517.89 - lr: 0.000391
2023-04-20 22:43:34,182 epoch 77 - iter 16/20 - loss 0.21915470 - time (sec): 9.18 - samples/sec: 1459.52 - lr: 0.000391
2023-04-20 22:43:35,184 epoch 77 - i

100%|██████████| 3/3 [00:00<00:00,  5.46it/s]

2023-04-20 22:43:37,608 Evaluating as a multi-label problem: False
2023-04-20 22:43:37,625 DEV : loss 0.3417886793613434 - f1-score (micro avg)  0.4549
2023-04-20 22:43:37,640 BAD EPOCHS (no improvement): 1
2023-04-20 22:43:37,647 ----------------------------------------------------------------------------------------------------





2023-04-20 22:43:38,529 epoch 78 - iter 2/20 - loss 0.20914604 - time (sec): 0.88 - samples/sec: 2022.30 - lr: 0.000391
2023-04-20 22:43:39,519 epoch 78 - iter 4/20 - loss 0.20004513 - time (sec): 1.87 - samples/sec: 1863.26 - lr: 0.000391
2023-04-20 22:43:40,497 epoch 78 - iter 6/20 - loss 0.21981782 - time (sec): 2.85 - samples/sec: 1759.46 - lr: 0.000391
2023-04-20 22:43:42,196 epoch 78 - iter 8/20 - loss 0.22340874 - time (sec): 4.55 - samples/sec: 1477.20 - lr: 0.000391
2023-04-20 22:43:43,153 epoch 78 - iter 10/20 - loss 0.22392805 - time (sec): 5.50 - samples/sec: 1489.19 - lr: 0.000391
2023-04-20 22:43:44,111 epoch 78 - iter 12/20 - loss 0.22322129 - time (sec): 6.46 - samples/sec: 1491.76 - lr: 0.000391
2023-04-20 22:43:45,155 epoch 78 - iter 14/20 - loss 0.22305796 - time (sec): 7.50 - samples/sec: 1508.44 - lr: 0.000391
2023-04-20 22:43:46,106 epoch 78 - iter 16/20 - loss 0.22417322 - time (sec): 8.46 - samples/sec: 1538.83 - lr: 0.000391
2023-04-20 22:43:47,179 epoch 78 - i

100%|██████████| 3/3 [00:00<00:00,  3.93it/s]

2023-04-20 22:43:49,906 Evaluating as a multi-label problem: False
2023-04-20 22:43:49,928 DEV : loss 0.34180015325546265 - f1-score (micro avg)  0.4549
2023-04-20 22:43:49,946 BAD EPOCHS (no improvement): 2
2023-04-20 22:43:49,954 ----------------------------------------------------------------------------------------------------





2023-04-20 22:43:50,435 epoch 79 - iter 2/20 - loss 0.21151868 - time (sec): 0.48 - samples/sec: 3124.75 - lr: 0.000391
2023-04-20 22:43:52,031 epoch 79 - iter 4/20 - loss 0.21990079 - time (sec): 2.08 - samples/sec: 1580.17 - lr: 0.000391
2023-04-20 22:43:53,611 epoch 79 - iter 6/20 - loss 0.21221239 - time (sec): 3.66 - samples/sec: 1421.32 - lr: 0.000391
2023-04-20 22:43:54,502 epoch 79 - iter 8/20 - loss 0.20898011 - time (sec): 4.55 - samples/sec: 1436.70 - lr: 0.000391
2023-04-20 22:43:55,450 epoch 79 - iter 10/20 - loss 0.21139227 - time (sec): 5.49 - samples/sec: 1466.50 - lr: 0.000391
2023-04-20 22:43:56,413 epoch 79 - iter 12/20 - loss 0.21364020 - time (sec): 6.46 - samples/sec: 1474.81 - lr: 0.000391
2023-04-20 22:43:57,451 epoch 79 - iter 14/20 - loss 0.21479255 - time (sec): 7.50 - samples/sec: 1477.14 - lr: 0.000391
2023-04-20 22:43:58,384 epoch 79 - iter 16/20 - loss 0.21281303 - time (sec): 8.43 - samples/sec: 1516.43 - lr: 0.000391
2023-04-20 22:44:00,088 epoch 79 - i

100%|██████████| 3/3 [00:00<00:00,  5.81it/s]

2023-04-20 22:44:02,213 Evaluating as a multi-label problem: False
2023-04-20 22:44:02,235 DEV : loss 0.34201735258102417 - f1-score (micro avg)  0.4549
2023-04-20 22:44:02,246 BAD EPOCHS (no improvement): 3
2023-04-20 22:44:02,264 ----------------------------------------------------------------------------------------------------





2023-04-20 22:44:02,798 epoch 80 - iter 2/20 - loss 0.22771936 - time (sec): 0.53 - samples/sec: 3263.83 - lr: 0.000391
2023-04-20 22:44:03,928 epoch 80 - iter 4/20 - loss 0.21791179 - time (sec): 1.66 - samples/sec: 1921.03 - lr: 0.000391
2023-04-20 22:44:05,059 epoch 80 - iter 6/20 - loss 0.22037088 - time (sec): 2.79 - samples/sec: 1707.35 - lr: 0.000391
2023-04-20 22:44:06,431 epoch 80 - iter 8/20 - loss 0.21361914 - time (sec): 4.17 - samples/sec: 1558.50 - lr: 0.000391
2023-04-20 22:44:07,749 epoch 80 - iter 10/20 - loss 0.22219208 - time (sec): 5.48 - samples/sec: 1473.71 - lr: 0.000391
2023-04-20 22:44:08,640 epoch 80 - iter 12/20 - loss 0.21983495 - time (sec): 6.37 - samples/sec: 1454.44 - lr: 0.000391
2023-04-20 22:44:10,361 epoch 80 - iter 14/20 - loss 0.22069512 - time (sec): 8.10 - samples/sec: 1400.90 - lr: 0.000391
2023-04-20 22:44:11,358 epoch 80 - iter 16/20 - loss 0.22036703 - time (sec): 9.09 - samples/sec: 1433.83 - lr: 0.000391
2023-04-20 22:44:12,249 epoch 80 - i

100%|██████████| 3/3 [00:00<00:00,  5.91it/s]

2023-04-20 22:44:14,398 Evaluating as a multi-label problem: False
2023-04-20 22:44:14,414 DEV : loss 0.34180185198783875 - f1-score (micro avg)  0.4549
2023-04-20 22:44:14,425 Epoch    80: reducing learning rate of group 0 to 1.9531e-04.
2023-04-20 22:44:14,426 BAD EPOCHS (no improvement): 4
2023-04-20 22:44:14,433 ----------------------------------------------------------------------------------------------------





2023-04-20 22:44:14,981 epoch 81 - iter 2/20 - loss 0.21598067 - time (sec): 0.55 - samples/sec: 3346.77 - lr: 0.000195
2023-04-20 22:44:15,933 epoch 81 - iter 4/20 - loss 0.22902063 - time (sec): 1.50 - samples/sec: 2201.46 - lr: 0.000195
2023-04-20 22:44:16,959 epoch 81 - iter 6/20 - loss 0.22316267 - time (sec): 2.53 - samples/sec: 1812.56 - lr: 0.000195
2023-04-20 22:44:17,896 epoch 81 - iter 8/20 - loss 0.23429123 - time (sec): 3.46 - samples/sec: 1722.67 - lr: 0.000195
2023-04-20 22:44:19,087 epoch 81 - iter 10/20 - loss 0.22942210 - time (sec): 4.65 - samples/sec: 1606.71 - lr: 0.000195
2023-04-20 22:44:20,231 epoch 81 - iter 12/20 - loss 0.22261126 - time (sec): 5.80 - samples/sec: 1574.64 - lr: 0.000195
2023-04-20 22:44:21,837 epoch 81 - iter 14/20 - loss 0.21666742 - time (sec): 7.40 - samples/sec: 1478.03 - lr: 0.000195
2023-04-20 22:44:22,951 epoch 81 - iter 16/20 - loss 0.21787352 - time (sec): 8.52 - samples/sec: 1441.88 - lr: 0.000195
2023-04-20 22:44:24,754 epoch 81 - i

100%|██████████| 3/3 [00:00<00:00,  5.66it/s]

2023-04-20 22:44:26,841 Evaluating as a multi-label problem: False
2023-04-20 22:44:26,863 DEV : loss 0.3420756161212921 - f1-score (micro avg)  0.4549
2023-04-20 22:44:26,877 BAD EPOCHS (no improvement): 1
2023-04-20 22:44:26,881 ----------------------------------------------------------------------------------------------------





2023-04-20 22:44:27,241 epoch 82 - iter 2/20 - loss 0.25430137 - time (sec): 0.36 - samples/sec: 3822.99 - lr: 0.000195
2023-04-20 22:44:28,304 epoch 82 - iter 4/20 - loss 0.24109150 - time (sec): 1.42 - samples/sec: 2056.48 - lr: 0.000195
2023-04-20 22:44:29,254 epoch 82 - iter 6/20 - loss 0.24601357 - time (sec): 2.37 - samples/sec: 1856.75 - lr: 0.000195
2023-04-20 22:44:30,188 epoch 82 - iter 8/20 - loss 0.23379385 - time (sec): 3.31 - samples/sec: 1738.62 - lr: 0.000195
2023-04-20 22:44:31,177 epoch 82 - iter 10/20 - loss 0.23391286 - time (sec): 4.29 - samples/sec: 1674.17 - lr: 0.000195
2023-04-20 22:44:32,872 epoch 82 - iter 12/20 - loss 0.22455349 - time (sec): 5.99 - samples/sec: 1495.25 - lr: 0.000195
2023-04-20 22:44:34,594 epoch 82 - iter 14/20 - loss 0.22116224 - time (sec): 7.71 - samples/sec: 1424.38 - lr: 0.000195
2023-04-20 22:44:35,747 epoch 82 - iter 16/20 - loss 0.22295842 - time (sec): 8.86 - samples/sec: 1438.53 - lr: 0.000195
2023-04-20 22:44:36,838 epoch 82 - i

100%|██████████| 3/3 [00:00<00:00,  5.69it/s]

2023-04-20 22:44:39,398 Evaluating as a multi-label problem: False
2023-04-20 22:44:39,414 DEV : loss 0.3418343961238861 - f1-score (micro avg)  0.4549
2023-04-20 22:44:39,425 BAD EPOCHS (no improvement): 2
2023-04-20 22:44:39,433 ----------------------------------------------------------------------------------------------------





2023-04-20 22:44:39,875 epoch 83 - iter 2/20 - loss 0.19013307 - time (sec): 0.44 - samples/sec: 3727.17 - lr: 0.000195
2023-04-20 22:44:40,884 epoch 83 - iter 4/20 - loss 0.21499961 - time (sec): 1.45 - samples/sec: 2380.83 - lr: 0.000195
2023-04-20 22:44:41,853 epoch 83 - iter 6/20 - loss 0.21756384 - time (sec): 2.42 - samples/sec: 2045.07 - lr: 0.000195
2023-04-20 22:44:42,736 epoch 83 - iter 8/20 - loss 0.21938776 - time (sec): 3.30 - samples/sec: 1906.20 - lr: 0.000195
2023-04-20 22:44:43,635 epoch 83 - iter 10/20 - loss 0.22698634 - time (sec): 4.20 - samples/sec: 1808.70 - lr: 0.000195
2023-04-20 22:44:44,532 epoch 83 - iter 12/20 - loss 0.22316726 - time (sec): 5.09 - samples/sec: 1775.54 - lr: 0.000195
2023-04-20 22:44:46,308 epoch 83 - iter 14/20 - loss 0.22059119 - time (sec): 6.87 - samples/sec: 1666.01 - lr: 0.000195
2023-04-20 22:44:47,204 epoch 83 - iter 16/20 - loss 0.21944852 - time (sec): 7.77 - samples/sec: 1662.38 - lr: 0.000195
2023-04-20 22:44:48,225 epoch 83 - i

100%|██████████| 3/3 [00:00<00:00,  3.93it/s]

2023-04-20 22:44:50,924 Evaluating as a multi-label problem: False
2023-04-20 22:44:50,947 DEV : loss 0.3417668044567108 - f1-score (micro avg)  0.4549
2023-04-20 22:44:50,962 BAD EPOCHS (no improvement): 3
2023-04-20 22:44:50,969 ----------------------------------------------------------------------------------------------------





2023-04-20 22:44:51,542 epoch 84 - iter 2/20 - loss 0.20286367 - time (sec): 0.57 - samples/sec: 2564.56 - lr: 0.000195
2023-04-20 22:44:52,631 epoch 84 - iter 4/20 - loss 0.24607304 - time (sec): 1.66 - samples/sec: 1789.20 - lr: 0.000195
2023-04-20 22:44:54,782 epoch 84 - iter 6/20 - loss 0.24578940 - time (sec): 3.81 - samples/sec: 1297.27 - lr: 0.000195
2023-04-20 22:44:55,786 epoch 84 - iter 8/20 - loss 0.23724209 - time (sec): 4.81 - samples/sec: 1351.01 - lr: 0.000195
2023-04-20 22:44:56,820 epoch 84 - iter 10/20 - loss 0.22803665 - time (sec): 5.85 - samples/sec: 1373.12 - lr: 0.000195
2023-04-20 22:44:58,213 epoch 84 - iter 12/20 - loss 0.22229344 - time (sec): 7.24 - samples/sec: 1337.54 - lr: 0.000195
2023-04-20 22:44:59,152 epoch 84 - iter 14/20 - loss 0.22531228 - time (sec): 8.18 - samples/sec: 1371.91 - lr: 0.000195
2023-04-20 22:45:00,086 epoch 84 - iter 16/20 - loss 0.22306919 - time (sec): 9.11 - samples/sec: 1390.68 - lr: 0.000195
2023-04-20 22:45:01,121 epoch 84 - i

100%|██████████| 3/3 [00:00<00:00,  5.60it/s]

2023-04-20 22:45:03,261 Evaluating as a multi-label problem: False
2023-04-20 22:45:03,278 DEV : loss 0.3417571783065796 - f1-score (micro avg)  0.4549
2023-04-20 22:45:03,289 Epoch    84: reducing learning rate of group 0 to 9.7656e-05.
2023-04-20 22:45:03,290 BAD EPOCHS (no improvement): 4
2023-04-20 22:45:03,301 ----------------------------------------------------------------------------------------------------
2023-04-20 22:45:03,303 ----------------------------------------------------------------------------------------------------
2023-04-20 22:45:03,312 learning rate too small - quitting training!
2023-04-20 22:45:03,313 ----------------------------------------------------------------------------------------------------





2023-04-20 22:45:05,335 ----------------------------------------------------------------------------------------------------
2023-04-20 22:45:10,041 SequenceTagger predicts: Dictionary with 27 tags: O, S-TAX, B-TAX, E-TAX, I-TAX, S-OTHER, B-OTHER, E-OTHER, I-OTHER, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-TME, B-TME, E-TME, I-TME, S-ORG, B-ORG, E-ORG, I-ORG, <START>, <STOP>


100%|██████████| 3/3 [00:01<00:00,  1.51it/s]

2023-04-20 22:45:12,432 Evaluating as a multi-label problem: False
2023-04-20 22:45:12,456 0.6528	0.5497	0.5968	0.4312
2023-04-20 22:45:12,459 
Results:
- F-score (micro) 0.5968
- F-score (macro) 0.4624
- Accuracy 0.4312

By class:
              precision    recall  f1-score   support

         TAX     0.7385    0.6957    0.7164        69
         TME     0.7778    0.6562    0.7119        32
       OTHER     0.3500    0.2188    0.2692        32
         LOC     0.5714    0.4000    0.4706        20
         PER     0.5882    0.6250    0.6061        16
         ORG     0.0000    0.0000    0.0000         2

   micro avg     0.6528    0.5497    0.5968       171
   macro avg     0.5043    0.4326    0.4624       171
weighted avg     0.6309    0.5497    0.5844       171

2023-04-20 22:45:12,465 ----------------------------------------------------------------------------------------------------





{'test_score': 0.5968253968253968,
 'dev_score_history': [0.0,
  0.0,
  0.0,
  0.04975124378109452,
  0.11450381679389314,
  0.19191919191919193,
  0.20224719101123595,
  0.19138755980861247,
  0.1745454545454545,
  0.21505376344086022,
  0.25641025641025644,
  0.1739130434782609,
  0.2137404580152672,
  0.19753086419753085,
  0.297029702970297,
  0.2601626016260163,
  0.2591093117408907,
  0.277056277056277,
  0.3252595155709343,
  0.31578947368421056,
  0.26277372262773724,
  0.34843205574912894,
  0.3496503496503497,
  0.3694779116465864,
  0.39583333333333337,
  0.4210526315789474,
  0.4014598540145985,
  0.41155234657039713,
  0.38951310861423216,
  0.4610169491525424,
  0.4410646387832699,
  0.43165467625899284,
  0.48026315789473684,
  0.45018450184501846,
  0.3969465648854962,
  0.4822695035460993,
  0.4119850187265917,
  0.4914675767918089,
  0.3969465648854962,
  0.5163398692810457,
  0.4768211920529802,
  0.3969465648854962,
  0.42424242424242425,
  0.5270758122743682,
  0.4

## For Basque

In [17]:
from flair.datasets import NER_BASQUE
from flair.embeddings import WordEmbeddings, FlairEmbeddings, StackedEmbeddings
from flair.models import SequenceTagger
from flair.trainers import ModelTrainer

# get the corpus
corpus = NER_BASQUE()
print(corpus)



# Just to check what we have in the corpus
print(len(corpus.train))
print(len(corpus.test))
print(len(corpus.dev))
sentence=corpus.test[0]
print(sentence)
print(corpus)

2023-04-21 00:19:43,231 Reading data from /root/.flair/datasets/ner_basque
2023-04-21 00:19:43,233 Train: /root/.flair/datasets/ner_basque/named_ent_eu.train
2023-04-21 00:19:43,235 Dev: None
2023-04-21 00:19:43,238 Test: /root/.flair/datasets/ner_basque/named_ent_eu.test
Corpus: 2297 train + 255 dev + 842 test sentences
2297
842
255
Sentence[26]: "Garikoitz Plazaola * Eguzki-ko kidea * « Apaingarri birziklagarriak bultzatu behar dira » Gabonetako apainduren inguruan hitz egin dugu , eta gomendio zenbait egin dizkigu ." → ["Garikoitz Plazaola"/PER, "Eguzki-ko"/ORG, "Gabonetako"/OTH]
Corpus: 2297 train + 255 dev + 842 test sentences


In [13]:
# 2. what label do we want to predict?
label_type = 'ner'

# 3. make the label dictionary from the corpus
label_dict = downsampled_corpus.make_label_dictionary(label_type=label_type)
print(label_dict)

# 4. initialize embedding stack with Flair and GloVe
embedding_types = [
    WordEmbeddings('glove'),
    FlairEmbeddings('news-forward'),
    FlairEmbeddings('news-backward'),
]

embeddings = StackedEmbeddings(embeddings=embedding_types)

# 5. initialize sequence tagger
tagger = SequenceTagger(hidden_size=256,
                        embeddings=embeddings,
                        tag_dictionary=label_dict,
                        tag_type=label_type,
                        use_crf=True)

# 6. initialize trainer
trainer = ModelTrainer(tagger, downsampled_corpus)

# 7. start training
trainer.train('/content/drive/My Drive/ColabNotebooks/flairmodels/basque',
              learning_rate=0.1,
              mini_batch_size=32,
              max_epochs=100,
              write_weights=True)

2023-04-20 22:45:35,554 Computing label dictionary. Progress:


2297it [00:00, 28298.35it/s]

2023-04-20 22:45:35,647 Dictionary created for label 'ner' with 5 values: ORG (seen 1147 times), LOC (seen 1085 times), PER (seen 1067 times), OTH (seen 137 times)
Dictionary with 5 tags: <unk>, ORG, LOC, PER, OTH





2023-04-20 22:45:41,226 SequenceTagger predicts: Dictionary with 17 tags: O, S-ORG, B-ORG, E-ORG, I-ORG, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-OTH, B-OTH, E-OTH, I-OTH
2023-04-20 22:45:41,473 ----------------------------------------------------------------------------------------------------
2023-04-20 22:45:41,475 Model: "SequenceTagger(
  (embeddings): StackedEmbeddings(
    (list_embedding_0): WordEmbeddings(
      'glove'
      (embedding): Embedding(400001, 100)
    )
    (list_embedding_1): FlairEmbeddings(
      (lm): LanguageModel(
        (drop): Dropout(p=0.05, inplace=False)
        (encoder): Embedding(300, 100)
        (rnn): LSTM(100, 2048)
      )
    )
    (list_embedding_2): FlairEmbeddings(
      (lm): LanguageModel(
        (drop): Dropout(p=0.05, inplace=False)
        (encoder): Embedding(300, 100)
        (rnn): LSTM(100, 2048)
      )
    )
  )
  (word_dropout): WordDropout(p=0.05)
  (locked_dropout): LockedDropout(p=0.5)
  (embedding2nn): Lin

100%|██████████| 8/8 [00:04<00:00,  1.97it/s]

2023-04-20 22:46:30,611 Evaluating as a multi-label problem: False
2023-04-20 22:46:30,631 DEV : loss 0.4238862693309784 - f1-score (micro avg)  0.3155
2023-04-20 22:46:30,654 BAD EPOCHS (no improvement): 0
2023-04-20 22:46:30,659 saving best model





2023-04-20 22:46:32,590 ----------------------------------------------------------------------------------------------------
2023-04-20 22:46:33,500 epoch 2 - iter 7/72 - loss 0.47891467 - time (sec): 0.91 - samples/sec: 4094.92 - lr: 0.100000
2023-04-20 22:46:34,998 epoch 2 - iter 14/72 - loss 0.45667368 - time (sec): 2.40 - samples/sec: 3033.24 - lr: 0.100000
2023-04-20 22:46:37,006 epoch 2 - iter 21/72 - loss 0.43031643 - time (sec): 4.41 - samples/sec: 2517.58 - lr: 0.100000
2023-04-20 22:46:40,484 epoch 2 - iter 28/72 - loss 0.41964073 - time (sec): 7.89 - samples/sec: 1910.96 - lr: 0.100000
2023-04-20 22:46:43,315 epoch 2 - iter 35/72 - loss 0.40374340 - time (sec): 10.72 - samples/sec: 1751.36 - lr: 0.100000
2023-04-20 22:46:45,447 epoch 2 - iter 42/72 - loss 0.40261187 - time (sec): 12.85 - samples/sec: 1770.21 - lr: 0.100000
2023-04-20 22:46:47,218 epoch 2 - iter 49/72 - loss 0.38947906 - time (sec): 14.62 - samples/sec: 1837.57 - lr: 0.100000
2023-04-20 22:46:48,696 epoch 2 -

100%|██████████| 8/8 [00:01<00:00,  4.62it/s]

2023-04-20 22:46:54,305 Evaluating as a multi-label problem: False
2023-04-20 22:46:54,326 DEV : loss 0.2905963361263275 - f1-score (micro avg)  0.3978
2023-04-20 22:46:54,366 BAD EPOCHS (no improvement): 0
2023-04-20 22:46:54,371 saving best model





2023-04-20 22:46:56,889 ----------------------------------------------------------------------------------------------------
2023-04-20 22:47:08,630 epoch 3 - iter 7/72 - loss 0.29862593 - time (sec): 11.74 - samples/sec: 335.22 - lr: 0.100000
2023-04-20 22:47:10,677 epoch 3 - iter 14/72 - loss 0.31269996 - time (sec): 13.79 - samples/sec: 582.43 - lr: 0.100000
2023-04-20 22:47:12,532 epoch 3 - iter 21/72 - loss 0.31134999 - time (sec): 15.64 - samples/sec: 764.03 - lr: 0.100000
2023-04-20 22:47:14,025 epoch 3 - iter 28/72 - loss 0.30608700 - time (sec): 17.13 - samples/sec: 922.94 - lr: 0.100000
2023-04-20 22:47:15,433 epoch 3 - iter 35/72 - loss 0.30130636 - time (sec): 18.54 - samples/sec: 1062.85 - lr: 0.100000
2023-04-20 22:47:16,813 epoch 3 - iter 42/72 - loss 0.30307779 - time (sec): 19.92 - samples/sec: 1174.43 - lr: 0.100000
2023-04-20 22:47:18,238 epoch 3 - iter 49/72 - loss 0.30841503 - time (sec): 21.35 - samples/sec: 1278.48 - lr: 0.100000
2023-04-20 22:47:19,746 epoch 3 -

100%|██████████| 8/8 [00:01<00:00,  4.31it/s]

2023-04-20 22:47:25,410 Evaluating as a multi-label problem: False
2023-04-20 22:47:25,432 DEV : loss 0.25518640875816345 - f1-score (micro avg)  0.431
2023-04-20 22:47:25,475 BAD EPOCHS (no improvement): 0
2023-04-20 22:47:25,483 saving best model





2023-04-20 22:47:27,921 ----------------------------------------------------------------------------------------------------
2023-04-20 22:47:28,773 epoch 4 - iter 7/72 - loss 0.24077674 - time (sec): 0.85 - samples/sec: 4485.19 - lr: 0.100000
2023-04-20 22:47:30,436 epoch 4 - iter 14/72 - loss 0.24810824 - time (sec): 2.51 - samples/sec: 3153.83 - lr: 0.100000
2023-04-20 22:47:32,103 epoch 4 - iter 21/72 - loss 0.26706667 - time (sec): 4.18 - samples/sec: 2896.06 - lr: 0.100000
2023-04-20 22:47:33,515 epoch 4 - iter 28/72 - loss 0.26379570 - time (sec): 5.59 - samples/sec: 2830.21 - lr: 0.100000
2023-04-20 22:47:34,874 epoch 4 - iter 35/72 - loss 0.26343438 - time (sec): 6.95 - samples/sec: 2836.93 - lr: 0.100000
2023-04-20 22:47:36,863 epoch 4 - iter 42/72 - loss 0.25867438 - time (sec): 8.94 - samples/sec: 2636.04 - lr: 0.100000
2023-04-20 22:47:39,049 epoch 4 - iter 49/72 - loss 0.26729800 - time (sec): 11.13 - samples/sec: 2451.50 - lr: 0.100000
2023-04-20 22:47:41,025 epoch 4 - i

100%|██████████| 8/8 [00:01<00:00,  6.37it/s]

2023-04-20 22:47:46,605 Evaluating as a multi-label problem: False
2023-04-20 22:47:46,621 DEV : loss 0.226133331656456 - f1-score (micro avg)  0.48





2023-04-20 22:47:46,647 BAD EPOCHS (no improvement): 0
2023-04-20 22:47:46,652 saving best model
2023-04-20 22:47:48,512 ----------------------------------------------------------------------------------------------------
2023-04-20 22:47:49,502 epoch 5 - iter 7/72 - loss 0.23000037 - time (sec): 0.99 - samples/sec: 3991.47 - lr: 0.100000
2023-04-20 22:47:50,958 epoch 5 - iter 14/72 - loss 0.22641800 - time (sec): 2.44 - samples/sec: 3289.66 - lr: 0.100000
2023-04-20 22:47:52,428 epoch 5 - iter 21/72 - loss 0.23586178 - time (sec): 3.91 - samples/sec: 3068.98 - lr: 0.100000
2023-04-20 22:47:53,901 epoch 5 - iter 28/72 - loss 0.23514719 - time (sec): 5.39 - samples/sec: 2925.34 - lr: 0.100000
2023-04-20 22:47:55,592 epoch 5 - iter 35/72 - loss 0.23561175 - time (sec): 7.08 - samples/sec: 2737.33 - lr: 0.100000
2023-04-20 22:47:57,877 epoch 5 - iter 42/72 - loss 0.23772740 - time (sec): 9.36 - samples/sec: 2491.83 - lr: 0.100000
2023-04-20 22:48:00,299 epoch 5 - iter 49/72 - loss 0.23676

100%|██████████| 8/8 [00:01<00:00,  6.74it/s]

2023-04-20 22:48:07,105 Evaluating as a multi-label problem: False
2023-04-20 22:48:07,120 DEV : loss 0.21024435758590698 - f1-score (micro avg)  0.506





2023-04-20 22:48:07,149 BAD EPOCHS (no improvement): 0
2023-04-20 22:48:07,154 saving best model
2023-04-20 22:48:08,989 ----------------------------------------------------------------------------------------------------
2023-04-20 22:48:10,065 epoch 6 - iter 7/72 - loss 0.22322228 - time (sec): 1.07 - samples/sec: 3585.65 - lr: 0.100000
2023-04-20 22:48:12,164 epoch 6 - iter 14/72 - loss 0.22480956 - time (sec): 3.17 - samples/sec: 2560.81 - lr: 0.100000
2023-04-20 22:48:14,139 epoch 6 - iter 21/72 - loss 0.20899924 - time (sec): 5.15 - samples/sec: 2322.35 - lr: 0.100000
2023-04-20 22:48:15,800 epoch 6 - iter 28/72 - loss 0.21431769 - time (sec): 6.81 - samples/sec: 2334.50 - lr: 0.100000
2023-04-20 22:48:17,242 epoch 6 - iter 35/72 - loss 0.21582777 - time (sec): 8.25 - samples/sec: 2373.51 - lr: 0.100000
2023-04-20 22:48:19,549 epoch 6 - iter 42/72 - loss 0.21802203 - time (sec): 10.56 - samples/sec: 2230.58 - lr: 0.100000
2023-04-20 22:48:21,492 epoch 6 - iter 49/72 - loss 0.2149

100%|██████████| 8/8 [00:01<00:00,  4.25it/s]

2023-04-20 22:48:28,883 Evaluating as a multi-label problem: False
2023-04-20 22:48:28,907 DEV : loss 0.20029477775096893 - f1-score (micro avg)  0.5114
2023-04-20 22:48:28,948 BAD EPOCHS (no improvement): 0
2023-04-20 22:48:28,955 saving best model





2023-04-20 22:48:31,220 ----------------------------------------------------------------------------------------------------
2023-04-20 22:48:32,212 epoch 7 - iter 7/72 - loss 0.21073284 - time (sec): 0.99 - samples/sec: 3864.65 - lr: 0.100000
2023-04-20 22:48:33,581 epoch 7 - iter 14/72 - loss 0.19633286 - time (sec): 2.36 - samples/sec: 3223.91 - lr: 0.100000
2023-04-20 22:48:35,000 epoch 7 - iter 21/72 - loss 0.19691522 - time (sec): 3.78 - samples/sec: 3012.31 - lr: 0.100000
2023-04-20 22:48:36,404 epoch 7 - iter 28/72 - loss 0.20283583 - time (sec): 5.18 - samples/sec: 2973.66 - lr: 0.100000
2023-04-20 22:48:38,557 epoch 7 - iter 35/72 - loss 0.20271442 - time (sec): 7.33 - samples/sec: 2633.25 - lr: 0.100000
2023-04-20 22:48:41,274 epoch 7 - iter 42/72 - loss 0.20315465 - time (sec): 10.05 - samples/sec: 2318.33 - lr: 0.100000
2023-04-20 22:48:43,168 epoch 7 - iter 49/72 - loss 0.20356147 - time (sec): 11.94 - samples/sec: 2275.53 - lr: 0.100000
2023-04-20 22:48:45,233 epoch 7 - 

100%|██████████| 8/8 [00:01<00:00,  6.22it/s]

2023-04-20 22:48:50,400 Evaluating as a multi-label problem: False
2023-04-20 22:48:50,415 DEV : loss 0.19312630593776703 - f1-score (micro avg)  0.5286





2023-04-20 22:48:50,445 BAD EPOCHS (no improvement): 0
2023-04-20 22:48:50,449 saving best model
2023-04-20 22:48:52,283 ----------------------------------------------------------------------------------------------------
2023-04-20 22:48:53,235 epoch 8 - iter 7/72 - loss 0.20543053 - time (sec): 0.95 - samples/sec: 4280.89 - lr: 0.100000
2023-04-20 22:48:55,347 epoch 8 - iter 14/72 - loss 0.20312400 - time (sec): 3.06 - samples/sec: 2636.55 - lr: 0.100000
2023-04-20 22:48:57,004 epoch 8 - iter 21/72 - loss 0.19473626 - time (sec): 4.72 - samples/sec: 2474.19 - lr: 0.100000
2023-04-20 22:48:58,881 epoch 8 - iter 28/72 - loss 0.19558153 - time (sec): 6.60 - samples/sec: 2339.82 - lr: 0.100000
2023-04-20 22:49:01,147 epoch 8 - iter 35/72 - loss 0.19567340 - time (sec): 8.86 - samples/sec: 2159.29 - lr: 0.100000
2023-04-20 22:49:03,188 epoch 8 - iter 42/72 - loss 0.19505058 - time (sec): 10.90 - samples/sec: 2130.65 - lr: 0.100000
2023-04-20 22:49:05,174 epoch 8 - iter 49/72 - loss 0.1913

100%|██████████| 8/8 [00:01<00:00,  5.33it/s]

2023-04-20 22:49:11,763 Evaluating as a multi-label problem: False
2023-04-20 22:49:11,787 DEV : loss 0.20162229239940643 - f1-score (micro avg)  0.4699
2023-04-20 22:49:11,831 BAD EPOCHS (no improvement): 1
2023-04-20 22:49:11,837 ----------------------------------------------------------------------------------------------------





2023-04-20 22:49:13,202 epoch 9 - iter 7/72 - loss 0.19892004 - time (sec): 1.36 - samples/sec: 3055.86 - lr: 0.100000
2023-04-20 22:49:14,896 epoch 9 - iter 14/72 - loss 0.20214699 - time (sec): 3.05 - samples/sec: 2542.85 - lr: 0.100000
2023-04-20 22:49:16,768 epoch 9 - iter 21/72 - loss 0.19585196 - time (sec): 4.92 - samples/sec: 2358.63 - lr: 0.100000
2023-04-20 22:49:18,285 epoch 9 - iter 28/72 - loss 0.19916756 - time (sec): 6.44 - samples/sec: 2411.13 - lr: 0.100000
2023-04-20 22:49:19,703 epoch 9 - iter 35/72 - loss 0.19281245 - time (sec): 7.86 - samples/sec: 2493.20 - lr: 0.100000
2023-04-20 22:49:21,122 epoch 9 - iter 42/72 - loss 0.18996909 - time (sec): 9.28 - samples/sec: 2521.67 - lr: 0.100000
2023-04-20 22:49:22,580 epoch 9 - iter 49/72 - loss 0.18947077 - time (sec): 10.74 - samples/sec: 2543.97 - lr: 0.100000
2023-04-20 22:49:24,102 epoch 9 - iter 56/72 - loss 0.19005553 - time (sec): 12.26 - samples/sec: 2552.80 - lr: 0.100000
2023-04-20 22:49:25,448 epoch 9 - iter 

100%|██████████| 8/8 [00:01<00:00,  4.25it/s]

2023-04-20 22:49:30,104 Evaluating as a multi-label problem: False
2023-04-20 22:49:30,126 DEV : loss 0.19254715740680695 - f1-score (micro avg)  0.5144
2023-04-20 22:49:30,170 BAD EPOCHS (no improvement): 2
2023-04-20 22:49:30,177 ----------------------------------------------------------------------------------------------------





2023-04-20 22:49:31,689 epoch 10 - iter 7/72 - loss 0.16193555 - time (sec): 1.51 - samples/sec: 2566.11 - lr: 0.100000
2023-04-20 22:49:33,161 epoch 10 - iter 14/72 - loss 0.16862695 - time (sec): 2.98 - samples/sec: 2602.16 - lr: 0.100000
2023-04-20 22:49:34,581 epoch 10 - iter 21/72 - loss 0.17845537 - time (sec): 4.40 - samples/sec: 2660.40 - lr: 0.100000
2023-04-20 22:49:36,149 epoch 10 - iter 28/72 - loss 0.18209190 - time (sec): 5.97 - samples/sec: 2662.16 - lr: 0.100000
2023-04-20 22:49:37,468 epoch 10 - iter 35/72 - loss 0.18370001 - time (sec): 7.29 - samples/sec: 2698.28 - lr: 0.100000
2023-04-20 22:49:39,007 epoch 10 - iter 42/72 - loss 0.19001436 - time (sec): 8.83 - samples/sec: 2694.14 - lr: 0.100000
2023-04-20 22:49:40,513 epoch 10 - iter 49/72 - loss 0.18570382 - time (sec): 10.33 - samples/sec: 2680.76 - lr: 0.100000
2023-04-20 22:49:41,964 epoch 10 - iter 56/72 - loss 0.18551221 - time (sec): 11.78 - samples/sec: 2673.82 - lr: 0.100000
2023-04-20 22:49:43,649 epoch 1

100%|██████████| 8/8 [00:01<00:00,  5.34it/s]

2023-04-20 22:49:47,945 Evaluating as a multi-label problem: False
2023-04-20 22:49:47,962 DEV : loss 0.17256633937358856 - f1-score (micro avg)  0.5682





2023-04-20 22:49:47,991 BAD EPOCHS (no improvement): 0
2023-04-20 22:49:47,995 saving best model
2023-04-20 22:49:49,837 ----------------------------------------------------------------------------------------------------
2023-04-20 22:49:50,842 epoch 11 - iter 7/72 - loss 0.18424583 - time (sec): 0.99 - samples/sec: 4117.27 - lr: 0.100000
2023-04-20 22:49:52,386 epoch 11 - iter 14/72 - loss 0.18483373 - time (sec): 2.53 - samples/sec: 3200.69 - lr: 0.100000
2023-04-20 22:49:53,700 epoch 11 - iter 21/72 - loss 0.18331979 - time (sec): 3.85 - samples/sec: 3042.89 - lr: 0.100000
2023-04-20 22:49:55,103 epoch 11 - iter 28/72 - loss 0.17573934 - time (sec): 5.25 - samples/sec: 2959.06 - lr: 0.100000
2023-04-20 22:49:56,618 epoch 11 - iter 35/72 - loss 0.17084214 - time (sec): 6.76 - samples/sec: 2904.56 - lr: 0.100000
2023-04-20 22:49:58,913 epoch 11 - iter 42/72 - loss 0.17041266 - time (sec): 9.06 - samples/sec: 2606.58 - lr: 0.100000
2023-04-20 22:50:01,526 epoch 11 - iter 49/72 - loss 

100%|██████████| 8/8 [00:01<00:00,  6.41it/s]

2023-04-20 22:50:08,445 Evaluating as a multi-label problem: False
2023-04-20 22:50:08,465 DEV : loss 0.1774461567401886 - f1-score (micro avg)  0.5814





2023-04-20 22:50:08,493 BAD EPOCHS (no improvement): 0
2023-04-20 22:50:08,498 saving best model
2023-04-20 22:50:10,341 ----------------------------------------------------------------------------------------------------
2023-04-20 22:50:11,252 epoch 12 - iter 7/72 - loss 0.14672635 - time (sec): 0.91 - samples/sec: 4225.31 - lr: 0.100000
2023-04-20 22:50:12,816 epoch 12 - iter 14/72 - loss 0.15751310 - time (sec): 2.47 - samples/sec: 3228.04 - lr: 0.100000
2023-04-20 22:50:14,482 epoch 12 - iter 21/72 - loss 0.15862715 - time (sec): 4.14 - samples/sec: 2806.67 - lr: 0.100000
2023-04-20 22:50:16,249 epoch 12 - iter 28/72 - loss 0.16376390 - time (sec): 5.90 - samples/sec: 2598.42 - lr: 0.100000
2023-04-20 22:50:18,768 epoch 12 - iter 35/72 - loss 0.17380483 - time (sec): 8.42 - samples/sec: 2284.25 - lr: 0.100000
2023-04-20 22:50:20,766 epoch 12 - iter 42/72 - loss 0.16720348 - time (sec): 10.42 - samples/sec: 2215.88 - lr: 0.100000
2023-04-20 22:50:22,507 epoch 12 - iter 49/72 - loss

100%|██████████| 8/8 [00:01<00:00,  5.37it/s]

2023-04-20 22:50:29,235 Evaluating as a multi-label problem: False
2023-04-20 22:50:29,255 DEV : loss 0.16690368950366974 - f1-score (micro avg)  0.5583
2023-04-20 22:50:29,297 BAD EPOCHS (no improvement): 1
2023-04-20 22:50:29,303 ----------------------------------------------------------------------------------------------------





2023-04-20 22:50:30,559 epoch 13 - iter 7/72 - loss 0.17687004 - time (sec): 1.25 - samples/sec: 3160.53 - lr: 0.100000
2023-04-20 22:50:32,961 epoch 13 - iter 14/72 - loss 0.16820789 - time (sec): 3.65 - samples/sec: 2109.83 - lr: 0.100000
2023-04-20 22:50:34,711 epoch 13 - iter 21/72 - loss 0.16666485 - time (sec): 5.40 - samples/sec: 2194.43 - lr: 0.100000
2023-04-20 22:50:36,049 epoch 13 - iter 28/72 - loss 0.16201759 - time (sec): 6.74 - samples/sec: 2310.79 - lr: 0.100000
2023-04-20 22:50:37,558 epoch 13 - iter 35/72 - loss 0.16219831 - time (sec): 8.25 - samples/sec: 2368.88 - lr: 0.100000
2023-04-20 22:50:38,928 epoch 13 - iter 42/72 - loss 0.16107913 - time (sec): 9.62 - samples/sec: 2414.91 - lr: 0.100000
2023-04-20 22:50:40,406 epoch 13 - iter 49/72 - loss 0.16329567 - time (sec): 11.10 - samples/sec: 2440.39 - lr: 0.100000
2023-04-20 22:50:41,888 epoch 13 - iter 56/72 - loss 0.17053758 - time (sec): 12.58 - samples/sec: 2474.58 - lr: 0.100000
2023-04-20 22:50:43,346 epoch 1

100%|██████████| 8/8 [00:01<00:00,  4.14it/s]

2023-04-20 22:50:48,022 Evaluating as a multi-label problem: False
2023-04-20 22:50:48,045 DEV : loss 0.16806629300117493 - f1-score (micro avg)  0.5562
2023-04-20 22:50:48,085 BAD EPOCHS (no improvement): 2
2023-04-20 22:50:48,092 ----------------------------------------------------------------------------------------------------





2023-04-20 22:50:49,459 epoch 14 - iter 7/72 - loss 0.16162961 - time (sec): 1.37 - samples/sec: 2854.87 - lr: 0.100000
2023-04-20 22:50:50,866 epoch 14 - iter 14/72 - loss 0.16307019 - time (sec): 2.77 - samples/sec: 2762.99 - lr: 0.100000
2023-04-20 22:50:52,305 epoch 14 - iter 21/72 - loss 0.15748383 - time (sec): 4.21 - samples/sec: 2746.03 - lr: 0.100000
2023-04-20 22:50:53,680 epoch 14 - iter 28/72 - loss 0.17026276 - time (sec): 5.59 - samples/sec: 2739.91 - lr: 0.100000
2023-04-20 22:50:55,182 epoch 14 - iter 35/72 - loss 0.16658832 - time (sec): 7.09 - samples/sec: 2713.96 - lr: 0.100000
2023-04-20 22:50:56,651 epoch 14 - iter 42/72 - loss 0.16701042 - time (sec): 8.56 - samples/sec: 2728.48 - lr: 0.100000
2023-04-20 22:50:58,098 epoch 14 - iter 49/72 - loss 0.16902163 - time (sec): 10.00 - samples/sec: 2742.42 - lr: 0.100000
2023-04-20 22:50:59,762 epoch 14 - iter 56/72 - loss 0.16539526 - time (sec): 11.67 - samples/sec: 2686.73 - lr: 0.100000
2023-04-20 22:51:01,512 epoch 1

100%|██████████| 8/8 [00:01<00:00,  6.16it/s]

2023-04-20 22:51:05,795 Evaluating as a multi-label problem: False
2023-04-20 22:51:05,810 DEV : loss 0.161653533577919 - f1-score (micro avg)  0.5776





2023-04-20 22:51:05,834 BAD EPOCHS (no improvement): 3
2023-04-20 22:51:05,842 ----------------------------------------------------------------------------------------------------
2023-04-20 22:51:06,821 epoch 15 - iter 7/72 - loss 0.12694741 - time (sec): 0.98 - samples/sec: 3932.46 - lr: 0.100000
2023-04-20 22:51:08,237 epoch 15 - iter 14/72 - loss 0.13548105 - time (sec): 2.39 - samples/sec: 3131.91 - lr: 0.100000
2023-04-20 22:51:09,640 epoch 15 - iter 21/72 - loss 0.13807785 - time (sec): 3.80 - samples/sec: 2977.53 - lr: 0.100000
2023-04-20 22:51:11,092 epoch 15 - iter 28/72 - loss 0.14484652 - time (sec): 5.25 - samples/sec: 2928.51 - lr: 0.100000
2023-04-20 22:51:12,679 epoch 15 - iter 35/72 - loss 0.14658043 - time (sec): 6.84 - samples/sec: 2852.04 - lr: 0.100000
2023-04-20 22:51:14,081 epoch 15 - iter 42/72 - loss 0.15257365 - time (sec): 8.24 - samples/sec: 2824.68 - lr: 0.100000
2023-04-20 22:51:15,799 epoch 15 - iter 49/72 - loss 0.15126156 - time (sec): 9.96 - samples/se

100%|██████████| 8/8 [00:01<00:00,  6.63it/s]

2023-04-20 22:51:23,302 Evaluating as a multi-label problem: False
2023-04-20 22:51:23,323 DEV : loss 0.16575351357460022 - f1-score (micro avg)  0.5881
2023-04-20 22:51:23,347 BAD EPOCHS (no improvement): 0





2023-04-20 22:51:23,353 saving best model
2023-04-20 22:51:25,131 ----------------------------------------------------------------------------------------------------
2023-04-20 22:51:26,121 epoch 16 - iter 7/72 - loss 0.19205511 - time (sec): 0.99 - samples/sec: 4073.11 - lr: 0.100000
2023-04-20 22:51:27,570 epoch 16 - iter 14/72 - loss 0.16884308 - time (sec): 2.44 - samples/sec: 3264.10 - lr: 0.100000
2023-04-20 22:51:28,970 epoch 16 - iter 21/72 - loss 0.16134375 - time (sec): 3.84 - samples/sec: 3078.71 - lr: 0.100000
2023-04-20 22:51:30,827 epoch 16 - iter 28/72 - loss 0.16112223 - time (sec): 5.69 - samples/sec: 2820.72 - lr: 0.100000
2023-04-20 22:51:32,869 epoch 16 - iter 35/72 - loss 0.16240953 - time (sec): 7.73 - samples/sec: 2547.30 - lr: 0.100000
2023-04-20 22:51:35,618 epoch 16 - iter 42/72 - loss 0.15753735 - time (sec): 10.48 - samples/sec: 2261.95 - lr: 0.100000
2023-04-20 22:51:37,322 epoch 16 - iter 49/72 - loss 0.15536005 - time (sec): 12.19 - samples/sec: 2265.26 

100%|██████████| 8/8 [00:01<00:00,  6.33it/s]

2023-04-20 22:51:43,754 Evaluating as a multi-label problem: False
2023-04-20 22:51:43,770 DEV : loss 0.16387419402599335 - f1-score (micro avg)  0.5992





2023-04-20 22:51:43,805 BAD EPOCHS (no improvement): 0
2023-04-20 22:51:43,810 saving best model
2023-04-20 22:51:45,730 ----------------------------------------------------------------------------------------------------
2023-04-20 22:51:47,088 epoch 17 - iter 7/72 - loss 0.12759565 - time (sec): 1.36 - samples/sec: 2903.69 - lr: 0.100000
2023-04-20 22:51:49,032 epoch 17 - iter 14/72 - loss 0.13667093 - time (sec): 3.30 - samples/sec: 2355.70 - lr: 0.100000
2023-04-20 22:51:50,931 epoch 17 - iter 21/72 - loss 0.14243666 - time (sec): 5.20 - samples/sec: 2239.11 - lr: 0.100000
2023-04-20 22:51:52,603 epoch 17 - iter 28/72 - loss 0.14922420 - time (sec): 6.87 - samples/sec: 2262.77 - lr: 0.100000
2023-04-20 22:51:54,402 epoch 17 - iter 35/72 - loss 0.15881717 - time (sec): 8.67 - samples/sec: 2253.29 - lr: 0.100000
2023-04-20 22:51:56,272 epoch 17 - iter 42/72 - loss 0.15442938 - time (sec): 10.54 - samples/sec: 2207.07 - lr: 0.100000
2023-04-20 22:51:57,942 epoch 17 - iter 49/72 - loss

100%|██████████| 8/8 [00:01<00:00,  4.12it/s]

2023-04-20 22:52:05,733 Evaluating as a multi-label problem: False
2023-04-20 22:52:05,756 DEV : loss 0.16127386689186096 - f1-score (micro avg)  0.5805
2023-04-20 22:52:05,796 BAD EPOCHS (no improvement): 1
2023-04-20 22:52:05,803 ----------------------------------------------------------------------------------------------------





2023-04-20 22:52:06,880 epoch 18 - iter 7/72 - loss 0.13257099 - time (sec): 1.07 - samples/sec: 3460.52 - lr: 0.100000
2023-04-20 22:52:08,269 epoch 18 - iter 14/72 - loss 0.13654324 - time (sec): 2.46 - samples/sec: 3047.56 - lr: 0.100000
2023-04-20 22:52:10,466 epoch 18 - iter 21/72 - loss 0.14383738 - time (sec): 4.66 - samples/sec: 2517.89 - lr: 0.100000
2023-04-20 22:52:11,853 epoch 18 - iter 28/72 - loss 0.13988638 - time (sec): 6.05 - samples/sec: 2586.31 - lr: 0.100000
2023-04-20 22:52:13,244 epoch 18 - iter 35/72 - loss 0.14164493 - time (sec): 7.44 - samples/sec: 2622.93 - lr: 0.100000
2023-04-20 22:52:14,715 epoch 18 - iter 42/72 - loss 0.14296639 - time (sec): 8.91 - samples/sec: 2637.73 - lr: 0.100000
2023-04-20 22:52:16,138 epoch 18 - iter 49/72 - loss 0.14644605 - time (sec): 10.33 - samples/sec: 2662.12 - lr: 0.100000
2023-04-20 22:52:18,001 epoch 18 - iter 56/72 - loss 0.14589911 - time (sec): 12.20 - samples/sec: 2566.56 - lr: 0.100000
2023-04-20 22:52:19,811 epoch 1

100%|██████████| 8/8 [00:01<00:00,  6.41it/s]

2023-04-20 22:52:23,907 Evaluating as a multi-label problem: False
2023-04-20 22:52:23,924 DEV : loss 0.17370785772800446 - f1-score (micro avg)  0.5714





2023-04-20 22:52:23,950 BAD EPOCHS (no improvement): 2
2023-04-20 22:52:23,955 ----------------------------------------------------------------------------------------------------
2023-04-20 22:52:24,930 epoch 19 - iter 7/72 - loss 0.13814484 - time (sec): 0.97 - samples/sec: 4088.18 - lr: 0.100000
2023-04-20 22:52:26,389 epoch 19 - iter 14/72 - loss 0.14355587 - time (sec): 2.43 - samples/sec: 3232.98 - lr: 0.100000
2023-04-20 22:52:27,963 epoch 19 - iter 21/72 - loss 0.14304887 - time (sec): 4.01 - samples/sec: 2967.67 - lr: 0.100000
2023-04-20 22:52:29,482 epoch 19 - iter 28/72 - loss 0.14597317 - time (sec): 5.52 - samples/sec: 2848.89 - lr: 0.100000
2023-04-20 22:52:30,884 epoch 19 - iter 35/72 - loss 0.14392353 - time (sec): 6.93 - samples/sec: 2826.88 - lr: 0.100000
2023-04-20 22:52:32,539 epoch 19 - iter 42/72 - loss 0.14247743 - time (sec): 8.58 - samples/sec: 2713.77 - lr: 0.100000
2023-04-20 22:52:34,478 epoch 19 - iter 49/72 - loss 0.14104996 - time (sec): 10.52 - samples/s

100%|██████████| 8/8 [00:01<00:00,  6.52it/s]

2023-04-20 22:52:41,382 Evaluating as a multi-label problem: False
2023-04-20 22:52:41,398 DEV : loss 0.157558411359787 - f1-score (micro avg)  0.5647





2023-04-20 22:52:41,422 BAD EPOCHS (no improvement): 3
2023-04-20 22:52:41,429 ----------------------------------------------------------------------------------------------------
2023-04-20 22:52:42,263 epoch 20 - iter 7/72 - loss 0.11917993 - time (sec): 0.83 - samples/sec: 4628.98 - lr: 0.100000
2023-04-20 22:52:43,733 epoch 20 - iter 14/72 - loss 0.13635244 - time (sec): 2.30 - samples/sec: 3350.94 - lr: 0.100000
2023-04-20 22:52:45,190 epoch 20 - iter 21/72 - loss 0.14261072 - time (sec): 3.76 - samples/sec: 3078.83 - lr: 0.100000
2023-04-20 22:52:46,694 epoch 20 - iter 28/72 - loss 0.15332551 - time (sec): 5.26 - samples/sec: 2966.22 - lr: 0.100000
2023-04-20 22:52:48,690 epoch 20 - iter 35/72 - loss 0.14688383 - time (sec): 7.26 - samples/sec: 2708.25 - lr: 0.100000
2023-04-20 22:52:50,502 epoch 20 - iter 42/72 - loss 0.14684171 - time (sec): 9.07 - samples/sec: 2608.27 - lr: 0.100000
2023-04-20 22:52:52,445 epoch 20 - iter 49/72 - loss 0.14880289 - time (sec): 11.01 - samples/s

100%|██████████| 8/8 [00:01<00:00,  6.36it/s]

2023-04-20 22:52:58,705 Evaluating as a multi-label problem: False
2023-04-20 22:52:58,720 DEV : loss 0.16118288040161133 - f1-score (micro avg)  0.6034





2023-04-20 22:52:58,743 BAD EPOCHS (no improvement): 0
2023-04-20 22:52:58,749 saving best model
2023-04-20 22:53:00,919 ----------------------------------------------------------------------------------------------------
2023-04-20 22:53:01,941 epoch 21 - iter 7/72 - loss 0.14228202 - time (sec): 1.01 - samples/sec: 4080.05 - lr: 0.100000
2023-04-20 22:53:03,797 epoch 21 - iter 14/72 - loss 0.13409148 - time (sec): 2.86 - samples/sec: 2807.88 - lr: 0.100000
2023-04-20 22:53:05,823 epoch 21 - iter 21/72 - loss 0.13766312 - time (sec): 4.89 - samples/sec: 2484.54 - lr: 0.100000
2023-04-20 22:53:08,255 epoch 21 - iter 28/72 - loss 0.14033643 - time (sec): 7.32 - samples/sec: 2205.96 - lr: 0.100000
2023-04-20 22:53:10,742 epoch 21 - iter 35/72 - loss 0.13782092 - time (sec): 9.81 - samples/sec: 2038.71 - lr: 0.100000
2023-04-20 22:53:12,359 epoch 21 - iter 42/72 - loss 0.13765648 - time (sec): 11.43 - samples/sec: 2101.86 - lr: 0.100000
2023-04-20 22:53:13,702 epoch 21 - iter 49/72 - loss

100%|██████████| 8/8 [00:01<00:00,  4.29it/s]

2023-04-20 22:53:20,991 Evaluating as a multi-label problem: False
2023-04-20 22:53:21,016 DEV : loss 0.1536140739917755 - f1-score (micro avg)  0.6053
2023-04-20 22:53:21,065 BAD EPOCHS (no improvement): 0
2023-04-20 22:53:21,073 saving best model





2023-04-20 22:53:23,390 ----------------------------------------------------------------------------------------------------
2023-04-20 22:53:24,331 epoch 22 - iter 7/72 - loss 0.14636891 - time (sec): 0.90 - samples/sec: 4163.74 - lr: 0.100000
2023-04-20 22:53:25,893 epoch 22 - iter 14/72 - loss 0.13628426 - time (sec): 2.47 - samples/sec: 3106.60 - lr: 0.100000
2023-04-20 22:53:27,209 epoch 22 - iter 21/72 - loss 0.14096053 - time (sec): 3.78 - samples/sec: 3028.17 - lr: 0.100000
2023-04-20 22:53:28,758 epoch 22 - iter 28/72 - loss 0.14148311 - time (sec): 5.33 - samples/sec: 2851.28 - lr: 0.100000
2023-04-20 22:53:30,606 epoch 22 - iter 35/72 - loss 0.14535357 - time (sec): 7.18 - samples/sec: 2680.17 - lr: 0.100000
2023-04-20 22:53:32,560 epoch 22 - iter 42/72 - loss 0.14255136 - time (sec): 9.13 - samples/sec: 2537.17 - lr: 0.100000
2023-04-20 22:53:34,778 epoch 22 - iter 49/72 - loss 0.14140157 - time (sec): 11.35 - samples/sec: 2394.68 - lr: 0.100000
2023-04-20 22:53:36,655 epoc

100%|██████████| 8/8 [00:01<00:00,  6.38it/s]

2023-04-20 22:53:42,122 Evaluating as a multi-label problem: False
2023-04-20 22:53:42,138 DEV : loss 0.16394257545471191 - f1-score (micro avg)  0.5626





2023-04-20 22:53:42,164 BAD EPOCHS (no improvement): 1
2023-04-20 22:53:42,171 ----------------------------------------------------------------------------------------------------
2023-04-20 22:53:43,116 epoch 23 - iter 7/72 - loss 0.12413713 - time (sec): 0.94 - samples/sec: 4061.74 - lr: 0.100000
2023-04-20 22:53:44,544 epoch 23 - iter 14/72 - loss 0.12432607 - time (sec): 2.37 - samples/sec: 3242.19 - lr: 0.100000
2023-04-20 22:53:45,901 epoch 23 - iter 21/72 - loss 0.12979038 - time (sec): 3.73 - samples/sec: 3121.74 - lr: 0.100000
2023-04-20 22:53:47,339 epoch 23 - iter 28/72 - loss 0.13057476 - time (sec): 5.17 - samples/sec: 3015.82 - lr: 0.100000
2023-04-20 22:53:48,781 epoch 23 - iter 35/72 - loss 0.12973491 - time (sec): 6.61 - samples/sec: 2924.60 - lr: 0.100000
2023-04-20 22:53:51,353 epoch 23 - iter 42/72 - loss 0.13048444 - time (sec): 9.18 - samples/sec: 2531.52 - lr: 0.100000
2023-04-20 22:53:53,244 epoch 23 - iter 49/72 - loss 0.13179908 - time (sec): 11.07 - samples/s

100%|██████████| 8/8 [00:01<00:00,  6.42it/s]

2023-04-20 22:53:59,894 Evaluating as a multi-label problem: False
2023-04-20 22:53:59,911 DEV : loss 0.15591329336166382 - f1-score (micro avg)  0.6326





2023-04-20 22:53:59,936 BAD EPOCHS (no improvement): 0
2023-04-20 22:53:59,944 saving best model
2023-04-20 22:54:02,072 ----------------------------------------------------------------------------------------------------
2023-04-20 22:54:03,249 epoch 24 - iter 7/72 - loss 0.12962605 - time (sec): 1.15 - samples/sec: 3465.76 - lr: 0.100000
2023-04-20 22:54:06,355 epoch 24 - iter 14/72 - loss 0.13628059 - time (sec): 4.26 - samples/sec: 1858.81 - lr: 0.100000
2023-04-20 22:54:08,922 epoch 24 - iter 21/72 - loss 0.14095085 - time (sec): 6.82 - samples/sec: 1724.24 - lr: 0.100000
2023-04-20 22:54:11,760 epoch 24 - iter 28/72 - loss 0.13451491 - time (sec): 9.66 - samples/sec: 1629.35 - lr: 0.100000
2023-04-20 22:54:14,036 epoch 24 - iter 35/72 - loss 0.13770383 - time (sec): 11.94 - samples/sec: 1651.42 - lr: 0.100000
2023-04-20 22:54:15,582 epoch 24 - iter 42/72 - loss 0.13577988 - time (sec): 13.48 - samples/sec: 1746.86 - lr: 0.100000
2023-04-20 22:54:17,235 epoch 24 - iter 49/72 - los

100%|██████████| 8/8 [00:03<00:00,  2.47it/s]

2023-04-20 22:54:27,218 Evaluating as a multi-label problem: False
2023-04-20 22:54:27,242 DEV : loss 0.14981813728809357 - f1-score (micro avg)  0.6422
2023-04-20 22:54:27,286 BAD EPOCHS (no improvement): 0
2023-04-20 22:54:27,295 saving best model





2023-04-20 22:54:29,788 ----------------------------------------------------------------------------------------------------
2023-04-20 22:54:30,976 epoch 25 - iter 7/72 - loss 0.11153346 - time (sec): 1.18 - samples/sec: 3206.17 - lr: 0.100000
2023-04-20 22:54:32,643 epoch 25 - iter 14/72 - loss 0.12160749 - time (sec): 2.84 - samples/sec: 2678.46 - lr: 0.100000
2023-04-20 22:54:34,451 epoch 25 - iter 21/72 - loss 0.12535192 - time (sec): 4.65 - samples/sec: 2556.48 - lr: 0.100000
2023-04-20 22:54:36,184 epoch 25 - iter 28/72 - loss 0.12827873 - time (sec): 6.38 - samples/sec: 2481.38 - lr: 0.100000
2023-04-20 22:54:39,174 epoch 25 - iter 35/72 - loss 0.12912537 - time (sec): 9.37 - samples/sec: 2111.71 - lr: 0.100000
2023-04-20 22:54:42,775 epoch 25 - iter 42/72 - loss 0.12930417 - time (sec): 12.97 - samples/sec: 1830.41 - lr: 0.100000
2023-04-20 22:54:44,669 epoch 25 - iter 49/72 - loss 0.13042966 - time (sec): 14.87 - samples/sec: 1864.90 - lr: 0.100000
2023-04-20 22:54:46,453 epo

100%|██████████| 8/8 [00:01<00:00,  5.68it/s]


2023-04-20 22:54:52,458 Evaluating as a multi-label problem: False
2023-04-20 22:54:52,497 DEV : loss 0.14972126483917236 - f1-score (micro avg)  0.6147
2023-04-20 22:54:52,583 BAD EPOCHS (no improvement): 1
2023-04-20 22:54:52,600 ----------------------------------------------------------------------------------------------------
2023-04-20 22:54:54,714 epoch 26 - iter 7/72 - loss 0.12726499 - time (sec): 2.11 - samples/sec: 1953.72 - lr: 0.100000
2023-04-20 22:54:57,122 epoch 26 - iter 14/72 - loss 0.13280896 - time (sec): 4.52 - samples/sec: 1799.78 - lr: 0.100000
2023-04-20 22:54:59,214 epoch 26 - iter 21/72 - loss 0.12835014 - time (sec): 6.61 - samples/sec: 1816.22 - lr: 0.100000
2023-04-20 22:55:01,212 epoch 26 - iter 28/72 - loss 0.12651741 - time (sec): 8.61 - samples/sec: 1847.27 - lr: 0.100000
2023-04-20 22:55:03,140 epoch 26 - iter 35/72 - loss 0.12465673 - time (sec): 10.54 - samples/sec: 1860.22 - lr: 0.100000
2023-04-20 22:55:04,926 epoch 26 - iter 42/72 - loss 0.1269436

100%|██████████| 8/8 [00:02<00:00,  3.86it/s]

2023-04-20 22:55:17,993 Evaluating as a multi-label problem: False
2023-04-20 22:55:18,011 DEV : loss 0.1578214317560196 - f1-score (micro avg)  0.6029





2023-04-20 22:55:18,037 BAD EPOCHS (no improvement): 2
2023-04-20 22:55:18,044 ----------------------------------------------------------------------------------------------------
2023-04-20 22:55:19,123 epoch 27 - iter 7/72 - loss 0.15341237 - time (sec): 1.07 - samples/sec: 3795.93 - lr: 0.100000
2023-04-20 22:55:20,751 epoch 27 - iter 14/72 - loss 0.13266727 - time (sec): 2.70 - samples/sec: 2989.26 - lr: 0.100000
2023-04-20 22:55:22,273 epoch 27 - iter 21/72 - loss 0.12649875 - time (sec): 4.22 - samples/sec: 2781.82 - lr: 0.100000
2023-04-20 22:55:24,063 epoch 27 - iter 28/72 - loss 0.12737702 - time (sec): 6.01 - samples/sec: 2608.77 - lr: 0.100000
2023-04-20 22:55:26,128 epoch 27 - iter 35/72 - loss 0.12740637 - time (sec): 8.08 - samples/sec: 2428.74 - lr: 0.100000
2023-04-20 22:55:28,140 epoch 27 - iter 42/72 - loss 0.12552930 - time (sec): 10.09 - samples/sec: 2309.77 - lr: 0.100000
2023-04-20 22:55:30,352 epoch 27 - iter 49/72 - loss 0.12418957 - time (sec): 12.30 - samples/

100%|██████████| 8/8 [00:01<00:00,  4.11it/s]

2023-04-20 22:55:39,129 Evaluating as a multi-label problem: False
2023-04-20 22:55:39,154 DEV : loss 0.1503022313117981 - f1-score (micro avg)  0.6182
2023-04-20 22:55:39,197 BAD EPOCHS (no improvement): 3
2023-04-20 22:55:39,206 ----------------------------------------------------------------------------------------------------





2023-04-20 22:55:40,540 epoch 28 - iter 7/72 - loss 0.11415570 - time (sec): 1.33 - samples/sec: 2868.86 - lr: 0.100000
2023-04-20 22:55:42,446 epoch 28 - iter 14/72 - loss 0.11268204 - time (sec): 3.24 - samples/sec: 2407.22 - lr: 0.100000
2023-04-20 22:55:44,263 epoch 28 - iter 21/72 - loss 0.11115070 - time (sec): 5.05 - samples/sec: 2320.98 - lr: 0.100000
2023-04-20 22:55:46,087 epoch 28 - iter 28/72 - loss 0.11126113 - time (sec): 6.88 - samples/sec: 2245.70 - lr: 0.100000
2023-04-20 22:55:48,007 epoch 28 - iter 35/72 - loss 0.11654628 - time (sec): 8.80 - samples/sec: 2196.33 - lr: 0.100000
2023-04-20 22:55:49,958 epoch 28 - iter 42/72 - loss 0.11927185 - time (sec): 10.75 - samples/sec: 2168.81 - lr: 0.100000
2023-04-20 22:55:52,040 epoch 28 - iter 49/72 - loss 0.11784627 - time (sec): 12.83 - samples/sec: 2133.79 - lr: 0.100000
2023-04-20 22:55:54,032 epoch 28 - iter 56/72 - loss 0.12278602 - time (sec): 14.82 - samples/sec: 2098.51 - lr: 0.100000
2023-04-20 22:55:55,499 epoch 

100%|██████████| 8/8 [00:01<00:00,  4.43it/s]

2023-04-20 22:56:00,891 Evaluating as a multi-label problem: False
2023-04-20 22:56:00,912 DEV : loss 0.14623424410820007 - f1-score (micro avg)  0.6196





2023-04-20 22:56:00,940 Epoch    28: reducing learning rate of group 0 to 5.0000e-02.
2023-04-20 22:56:00,941 BAD EPOCHS (no improvement): 4
2023-04-20 22:56:00,946 ----------------------------------------------------------------------------------------------------
2023-04-20 22:56:01,958 epoch 29 - iter 7/72 - loss 0.10130741 - time (sec): 1.01 - samples/sec: 4137.27 - lr: 0.050000
2023-04-20 22:56:03,436 epoch 29 - iter 14/72 - loss 0.10465726 - time (sec): 2.49 - samples/sec: 3253.87 - lr: 0.050000
2023-04-20 22:56:04,796 epoch 29 - iter 21/72 - loss 0.10472630 - time (sec): 3.85 - samples/sec: 3082.89 - lr: 0.050000
2023-04-20 22:56:06,216 epoch 29 - iter 28/72 - loss 0.10660265 - time (sec): 5.27 - samples/sec: 2979.06 - lr: 0.050000
2023-04-20 22:56:07,565 epoch 29 - iter 35/72 - loss 0.10720359 - time (sec): 6.62 - samples/sec: 2932.30 - lr: 0.050000
2023-04-20 22:56:08,908 epoch 29 - iter 42/72 - loss 0.10921924 - time (sec): 7.96 - samples/sec: 2903.30 - lr: 0.050000
2023-04-2

100%|██████████| 8/8 [00:01<00:00,  6.19it/s]

2023-04-20 22:56:18,424 Evaluating as a multi-label problem: False
2023-04-20 22:56:18,441 DEV : loss 0.14410649240016937 - f1-score (micro avg)  0.6576





2023-04-20 22:56:18,467 BAD EPOCHS (no improvement): 0
2023-04-20 22:56:18,471 saving best model
2023-04-20 22:56:20,358 ----------------------------------------------------------------------------------------------------
2023-04-20 22:56:21,484 epoch 30 - iter 7/72 - loss 0.10369067 - time (sec): 1.12 - samples/sec: 3854.85 - lr: 0.050000
2023-04-20 22:56:22,966 epoch 30 - iter 14/72 - loss 0.10760555 - time (sec): 2.60 - samples/sec: 3254.54 - lr: 0.050000
2023-04-20 22:56:24,487 epoch 30 - iter 21/72 - loss 0.10889801 - time (sec): 4.12 - samples/sec: 3034.43 - lr: 0.050000
2023-04-20 22:56:25,875 epoch 30 - iter 28/72 - loss 0.11424274 - time (sec): 5.51 - samples/sec: 2976.17 - lr: 0.050000
2023-04-20 22:56:28,062 epoch 30 - iter 35/72 - loss 0.11085500 - time (sec): 7.70 - samples/sec: 2628.29 - lr: 0.050000
2023-04-20 22:56:31,471 epoch 30 - iter 42/72 - loss 0.11447106 - time (sec): 11.10 - samples/sec: 2170.83 - lr: 0.050000
2023-04-20 22:56:32,945 epoch 30 - iter 49/72 - loss

100%|██████████| 8/8 [00:01<00:00,  6.53it/s]

2023-04-20 22:56:39,110 Evaluating as a multi-label problem: False
2023-04-20 22:56:39,126 DEV : loss 0.15226994454860687 - f1-score (micro avg)  0.638
2023-04-20 22:56:39,149 BAD EPOCHS (no improvement): 1





2023-04-20 22:56:39,158 ----------------------------------------------------------------------------------------------------
2023-04-20 22:56:40,072 epoch 31 - iter 7/72 - loss 0.10823782 - time (sec): 0.91 - samples/sec: 4498.29 - lr: 0.050000
2023-04-20 22:56:41,554 epoch 31 - iter 14/72 - loss 0.11111959 - time (sec): 2.39 - samples/sec: 3293.62 - lr: 0.050000
2023-04-20 22:56:43,153 epoch 31 - iter 21/72 - loss 0.10703730 - time (sec): 3.99 - samples/sec: 2933.20 - lr: 0.050000
2023-04-20 22:56:45,013 epoch 31 - iter 28/72 - loss 0.10516071 - time (sec): 5.85 - samples/sec: 2633.24 - lr: 0.050000
2023-04-20 22:56:47,101 epoch 31 - iter 35/72 - loss 0.10441267 - time (sec): 7.94 - samples/sec: 2455.06 - lr: 0.050000
2023-04-20 22:56:48,661 epoch 31 - iter 42/72 - loss 0.10601895 - time (sec): 9.50 - samples/sec: 2464.09 - lr: 0.050000
2023-04-20 22:56:50,112 epoch 31 - iter 49/72 - loss 0.10955840 - time (sec): 10.95 - samples/sec: 2490.05 - lr: 0.050000
2023-04-20 22:56:51,590 epoc

100%|██████████| 8/8 [00:01<00:00,  6.43it/s]

2023-04-20 22:56:56,714 Evaluating as a multi-label problem: False
2023-04-20 22:56:56,730 DEV : loss 0.14091570675373077 - f1-score (micro avg)  0.6428





2023-04-20 22:56:56,754 BAD EPOCHS (no improvement): 2
2023-04-20 22:56:56,762 ----------------------------------------------------------------------------------------------------
2023-04-20 22:56:57,590 epoch 32 - iter 7/72 - loss 0.10892340 - time (sec): 0.83 - samples/sec: 4625.86 - lr: 0.050000
2023-04-20 22:56:59,226 epoch 32 - iter 14/72 - loss 0.10421053 - time (sec): 2.46 - samples/sec: 3092.76 - lr: 0.050000
2023-04-20 22:57:01,265 epoch 32 - iter 21/72 - loss 0.10428644 - time (sec): 4.50 - samples/sec: 2563.28 - lr: 0.050000
2023-04-20 22:57:03,071 epoch 32 - iter 28/72 - loss 0.10881131 - time (sec): 6.31 - samples/sec: 2407.55 - lr: 0.050000
2023-04-20 22:57:04,577 epoch 32 - iter 35/72 - loss 0.11037573 - time (sec): 7.81 - samples/sec: 2454.52 - lr: 0.050000
2023-04-20 22:57:06,122 epoch 32 - iter 42/72 - loss 0.11015185 - time (sec): 9.36 - samples/sec: 2488.25 - lr: 0.050000
2023-04-20 22:57:07,520 epoch 32 - iter 49/72 - loss 0.11031102 - time (sec): 10.76 - samples/s

100%|██████████| 8/8 [00:01<00:00,  4.43it/s]

2023-04-20 22:57:14,455 Evaluating as a multi-label problem: False
2023-04-20 22:57:14,481 DEV : loss 0.1453622281551361 - f1-score (micro avg)  0.6603
2023-04-20 22:57:14,527 BAD EPOCHS (no improvement): 0
2023-04-20 22:57:14,535 saving best model





2023-04-20 22:57:17,064 ----------------------------------------------------------------------------------------------------
2023-04-20 22:57:18,552 epoch 33 - iter 7/72 - loss 0.11109413 - time (sec): 1.49 - samples/sec: 2786.78 - lr: 0.050000
2023-04-20 22:57:20,162 epoch 33 - iter 14/72 - loss 0.09711852 - time (sec): 3.10 - samples/sec: 2596.15 - lr: 0.050000
2023-04-20 22:57:21,630 epoch 33 - iter 21/72 - loss 0.10264396 - time (sec): 4.56 - samples/sec: 2634.22 - lr: 0.050000
2023-04-20 22:57:22,962 epoch 33 - iter 28/72 - loss 0.10465107 - time (sec): 5.90 - samples/sec: 2663.85 - lr: 0.050000
2023-04-20 22:57:24,439 epoch 33 - iter 35/72 - loss 0.10582614 - time (sec): 7.37 - samples/sec: 2649.39 - lr: 0.050000
2023-04-20 22:57:26,508 epoch 33 - iter 42/72 - loss 0.10995612 - time (sec): 9.44 - samples/sec: 2485.29 - lr: 0.050000
2023-04-20 22:57:28,485 epoch 33 - iter 49/72 - loss 0.10911305 - time (sec): 11.42 - samples/sec: 2385.85 - lr: 0.050000
2023-04-20 22:57:30,383 epoc

100%|██████████| 8/8 [00:01<00:00,  6.03it/s]

2023-04-20 22:57:36,894 Evaluating as a multi-label problem: False
2023-04-20 22:57:36,909 DEV : loss 0.14342041313648224 - f1-score (micro avg)  0.6383





2023-04-20 22:57:36,937 BAD EPOCHS (no improvement): 1
2023-04-20 22:57:36,943 ----------------------------------------------------------------------------------------------------
2023-04-20 22:57:37,927 epoch 34 - iter 7/72 - loss 0.11334524 - time (sec): 0.98 - samples/sec: 4021.25 - lr: 0.050000
2023-04-20 22:57:39,366 epoch 34 - iter 14/72 - loss 0.09441131 - time (sec): 2.42 - samples/sec: 3311.48 - lr: 0.050000
2023-04-20 22:57:40,712 epoch 34 - iter 21/72 - loss 0.10011134 - time (sec): 3.76 - samples/sec: 3138.02 - lr: 0.050000
2023-04-20 22:57:42,232 epoch 34 - iter 28/72 - loss 0.10209986 - time (sec): 5.28 - samples/sec: 2949.88 - lr: 0.050000
2023-04-20 22:57:43,757 epoch 34 - iter 35/72 - loss 0.10068251 - time (sec): 6.81 - samples/sec: 2883.49 - lr: 0.050000
2023-04-20 22:57:45,349 epoch 34 - iter 42/72 - loss 0.10432033 - time (sec): 8.40 - samples/sec: 2778.39 - lr: 0.050000
2023-04-20 22:57:47,438 epoch 34 - iter 49/72 - loss 0.10230924 - time (sec): 10.49 - samples/s

100%|██████████| 8/8 [00:01<00:00,  6.55it/s]

2023-04-20 22:57:54,228 Evaluating as a multi-label problem: False
2023-04-20 22:57:54,247 DEV : loss 0.13923689723014832 - f1-score (micro avg)  0.6539





2023-04-20 22:57:54,272 BAD EPOCHS (no improvement): 2
2023-04-20 22:57:54,279 ----------------------------------------------------------------------------------------------------
2023-04-20 22:57:55,394 epoch 35 - iter 7/72 - loss 0.09526544 - time (sec): 1.11 - samples/sec: 3754.15 - lr: 0.050000
2023-04-20 22:57:56,765 epoch 35 - iter 14/72 - loss 0.10038586 - time (sec): 2.48 - samples/sec: 3153.14 - lr: 0.050000
2023-04-20 22:57:58,226 epoch 35 - iter 21/72 - loss 0.09734562 - time (sec): 3.95 - samples/sec: 3005.53 - lr: 0.050000
2023-04-20 22:57:59,640 epoch 35 - iter 28/72 - loss 0.09507067 - time (sec): 5.36 - samples/sec: 2894.28 - lr: 0.050000
2023-04-20 22:58:01,369 epoch 35 - iter 35/72 - loss 0.10080373 - time (sec): 7.09 - samples/sec: 2724.31 - lr: 0.050000
2023-04-20 22:58:03,512 epoch 35 - iter 42/72 - loss 0.10645067 - time (sec): 9.23 - samples/sec: 2529.57 - lr: 0.050000
2023-04-20 22:58:05,277 epoch 35 - iter 49/72 - loss 0.10455418 - time (sec): 11.00 - samples/s

100%|██████████| 8/8 [00:01<00:00,  6.40it/s]

2023-04-20 22:58:11,573 Evaluating as a multi-label problem: False
2023-04-20 22:58:11,590 DEV : loss 0.14527881145477295 - f1-score (micro avg)  0.6424





2023-04-20 22:58:11,617 BAD EPOCHS (no improvement): 3
2023-04-20 22:58:11,624 ----------------------------------------------------------------------------------------------------
2023-04-20 22:58:12,639 epoch 36 - iter 7/72 - loss 0.10590596 - time (sec): 1.01 - samples/sec: 4025.46 - lr: 0.050000
2023-04-20 22:58:14,108 epoch 36 - iter 14/72 - loss 0.10755173 - time (sec): 2.48 - samples/sec: 3275.08 - lr: 0.050000
2023-04-20 22:58:15,768 epoch 36 - iter 21/72 - loss 0.10681344 - time (sec): 4.14 - samples/sec: 2841.56 - lr: 0.050000
2023-04-20 22:58:17,657 epoch 36 - iter 28/72 - loss 0.10790418 - time (sec): 6.03 - samples/sec: 2599.96 - lr: 0.050000
2023-04-20 22:58:19,418 epoch 36 - iter 35/72 - loss 0.10481321 - time (sec): 7.79 - samples/sec: 2491.44 - lr: 0.050000
2023-04-20 22:58:21,063 epoch 36 - iter 42/72 - loss 0.10480881 - time (sec): 9.44 - samples/sec: 2489.55 - lr: 0.050000
2023-04-20 22:58:22,571 epoch 36 - iter 49/72 - loss 0.10485068 - time (sec): 10.94 - samples/s

100%|██████████| 8/8 [00:01<00:00,  6.44it/s]

2023-04-20 22:58:28,884 Evaluating as a multi-label problem: False
2023-04-20 22:58:28,905 DEV : loss 0.1425899863243103 - f1-score (micro avg)  0.6391





2023-04-20 22:58:28,932 Epoch    36: reducing learning rate of group 0 to 2.5000e-02.
2023-04-20 22:58:28,934 BAD EPOCHS (no improvement): 4
2023-04-20 22:58:28,943 ----------------------------------------------------------------------------------------------------
2023-04-20 22:58:29,956 epoch 37 - iter 7/72 - loss 0.13254435 - time (sec): 1.01 - samples/sec: 3908.95 - lr: 0.025000
2023-04-20 22:58:31,749 epoch 37 - iter 14/72 - loss 0.11938006 - time (sec): 2.80 - samples/sec: 2851.69 - lr: 0.025000
2023-04-20 22:58:33,535 epoch 37 - iter 21/72 - loss 0.10930035 - time (sec): 4.59 - samples/sec: 2535.27 - lr: 0.025000
2023-04-20 22:58:35,526 epoch 37 - iter 28/72 - loss 0.10755403 - time (sec): 6.58 - samples/sec: 2404.92 - lr: 0.025000
2023-04-20 22:58:37,099 epoch 37 - iter 35/72 - loss 0.10575690 - time (sec): 8.15 - samples/sec: 2434.70 - lr: 0.025000
2023-04-20 22:58:38,441 epoch 37 - iter 42/72 - loss 0.10450319 - time (sec): 9.50 - samples/sec: 2480.46 - lr: 0.025000
2023-04-2

100%|██████████| 8/8 [00:01<00:00,  4.41it/s]

2023-04-20 22:58:46,916 Evaluating as a multi-label problem: False
2023-04-20 22:58:46,941 DEV : loss 0.14373552799224854 - f1-score (micro avg)  0.6577
2023-04-20 22:58:46,985 BAD EPOCHS (no improvement): 1
2023-04-20 22:58:46,994 ----------------------------------------------------------------------------------------------------





2023-04-20 22:58:48,215 epoch 38 - iter 7/72 - loss 0.10484084 - time (sec): 1.22 - samples/sec: 3280.49 - lr: 0.025000
2023-04-20 22:58:50,094 epoch 38 - iter 14/72 - loss 0.10985199 - time (sec): 3.10 - samples/sec: 2540.38 - lr: 0.025000
2023-04-20 22:58:51,795 epoch 38 - iter 21/72 - loss 0.10521933 - time (sec): 4.80 - samples/sec: 2459.44 - lr: 0.025000
2023-04-20 22:58:53,250 epoch 38 - iter 28/72 - loss 0.10889902 - time (sec): 6.25 - samples/sec: 2526.73 - lr: 0.025000
2023-04-20 22:58:54,635 epoch 38 - iter 35/72 - loss 0.10504470 - time (sec): 7.64 - samples/sec: 2570.58 - lr: 0.025000
2023-04-20 22:58:56,118 epoch 38 - iter 42/72 - loss 0.10132966 - time (sec): 9.12 - samples/sec: 2574.35 - lr: 0.025000
2023-04-20 22:58:57,450 epoch 38 - iter 49/72 - loss 0.10220999 - time (sec): 10.45 - samples/sec: 2606.66 - lr: 0.025000
2023-04-20 22:58:58,925 epoch 38 - iter 56/72 - loss 0.10373848 - time (sec): 11.93 - samples/sec: 2599.01 - lr: 0.025000
2023-04-20 22:59:00,587 epoch 3

100%|██████████| 8/8 [00:02<00:00,  3.13it/s]

2023-04-20 22:59:05,888 Evaluating as a multi-label problem: False
2023-04-20 22:59:05,904 DEV : loss 0.1418641209602356 - f1-score (micro avg)  0.663





2023-04-20 22:59:05,928 BAD EPOCHS (no improvement): 0
2023-04-20 22:59:05,933 saving best model
2023-04-20 22:59:07,804 ----------------------------------------------------------------------------------------------------
2023-04-20 22:59:08,791 epoch 39 - iter 7/72 - loss 0.09293758 - time (sec): 0.96 - samples/sec: 4046.84 - lr: 0.025000
2023-04-20 22:59:10,217 epoch 39 - iter 14/72 - loss 0.09612735 - time (sec): 2.39 - samples/sec: 3265.12 - lr: 0.025000
2023-04-20 22:59:11,532 epoch 39 - iter 21/72 - loss 0.09392498 - time (sec): 3.70 - samples/sec: 3135.55 - lr: 0.025000
2023-04-20 22:59:13,222 epoch 39 - iter 28/72 - loss 0.09681749 - time (sec): 5.39 - samples/sec: 2944.61 - lr: 0.025000
2023-04-20 22:59:14,704 epoch 39 - iter 35/72 - loss 0.10134454 - time (sec): 6.87 - samples/sec: 2885.35 - lr: 0.025000
2023-04-20 22:59:16,826 epoch 39 - iter 42/72 - loss 0.10229536 - time (sec): 9.00 - samples/sec: 2625.38 - lr: 0.025000
2023-04-20 22:59:19,545 epoch 39 - iter 49/72 - loss 

100%|██████████| 8/8 [00:01<00:00,  6.28it/s]

2023-04-20 22:59:26,546 Evaluating as a multi-label problem: False
2023-04-20 22:59:26,562 DEV : loss 0.1433732807636261 - f1-score (micro avg)  0.6551





2023-04-20 22:59:26,590 BAD EPOCHS (no improvement): 1
2023-04-20 22:59:26,595 ----------------------------------------------------------------------------------------------------
2023-04-20 22:59:27,630 epoch 40 - iter 7/72 - loss 0.09450105 - time (sec): 1.03 - samples/sec: 3968.86 - lr: 0.025000
2023-04-20 22:59:29,051 epoch 40 - iter 14/72 - loss 0.09568016 - time (sec): 2.45 - samples/sec: 3303.75 - lr: 0.025000
2023-04-20 22:59:30,462 epoch 40 - iter 21/72 - loss 0.10455672 - time (sec): 3.86 - samples/sec: 3030.31 - lr: 0.025000
2023-04-20 22:59:31,947 epoch 40 - iter 28/72 - loss 0.10218107 - time (sec): 5.35 - samples/sec: 2875.58 - lr: 0.025000
2023-04-20 22:59:33,523 epoch 40 - iter 35/72 - loss 0.09993666 - time (sec): 6.92 - samples/sec: 2745.57 - lr: 0.025000
2023-04-20 22:59:35,707 epoch 40 - iter 42/72 - loss 0.09834849 - time (sec): 9.11 - samples/sec: 2538.31 - lr: 0.025000
2023-04-20 22:59:37,465 epoch 40 - iter 49/72 - loss 0.09705336 - time (sec): 10.87 - samples/s

100%|██████████| 8/8 [00:01<00:00,  6.40it/s]

2023-04-20 22:59:43,846 Evaluating as a multi-label problem: False
2023-04-20 22:59:43,862 DEV : loss 0.14183326065540314 - f1-score (micro avg)  0.6433





2023-04-20 22:59:43,894 BAD EPOCHS (no improvement): 2
2023-04-20 22:59:43,900 ----------------------------------------------------------------------------------------------------
2023-04-20 22:59:44,761 epoch 41 - iter 7/72 - loss 0.09694443 - time (sec): 0.85 - samples/sec: 4482.15 - lr: 0.025000
2023-04-20 22:59:46,138 epoch 41 - iter 14/72 - loss 0.09709349 - time (sec): 2.23 - samples/sec: 3479.67 - lr: 0.025000
2023-04-20 22:59:47,802 epoch 41 - iter 21/72 - loss 0.09227305 - time (sec): 3.89 - samples/sec: 2972.62 - lr: 0.025000
2023-04-20 22:59:49,757 epoch 41 - iter 28/72 - loss 0.09643824 - time (sec): 5.85 - samples/sec: 2630.79 - lr: 0.025000
2023-04-20 22:59:51,724 epoch 41 - iter 35/72 - loss 0.09536635 - time (sec): 7.82 - samples/sec: 2481.59 - lr: 0.025000
2023-04-20 22:59:53,269 epoch 41 - iter 42/72 - loss 0.09461066 - time (sec): 9.36 - samples/sec: 2489.15 - lr: 0.025000
2023-04-20 22:59:54,733 epoch 41 - iter 49/72 - loss 0.09568251 - time (sec): 10.83 - samples/s

100%|██████████| 8/8 [00:01<00:00,  6.53it/s]

2023-04-20 23:00:01,137 Evaluating as a multi-label problem: False
2023-04-20 23:00:01,153 DEV : loss 0.14255066215991974 - f1-score (micro avg)  0.6568





2023-04-20 23:00:01,178 BAD EPOCHS (no improvement): 3
2023-04-20 23:00:01,186 ----------------------------------------------------------------------------------------------------
2023-04-20 23:00:02,046 epoch 42 - iter 7/72 - loss 0.08554818 - time (sec): 0.86 - samples/sec: 4434.12 - lr: 0.025000
2023-04-20 23:00:03,807 epoch 42 - iter 14/72 - loss 0.08512724 - time (sec): 2.62 - samples/sec: 3018.75 - lr: 0.025000
2023-04-20 23:00:05,818 epoch 42 - iter 21/72 - loss 0.08768861 - time (sec): 4.63 - samples/sec: 2542.52 - lr: 0.025000
2023-04-20 23:00:07,854 epoch 42 - iter 28/72 - loss 0.08630336 - time (sec): 6.67 - samples/sec: 2413.13 - lr: 0.025000
2023-04-20 23:00:09,258 epoch 42 - iter 35/72 - loss 0.09119046 - time (sec): 8.07 - samples/sec: 2466.00 - lr: 0.025000
2023-04-20 23:00:10,570 epoch 42 - iter 42/72 - loss 0.09388679 - time (sec): 9.38 - samples/sec: 2517.34 - lr: 0.025000
2023-04-20 23:00:12,006 epoch 42 - iter 49/72 - loss 0.09388537 - time (sec): 10.82 - samples/s

100%|██████████| 8/8 [00:01<00:00,  4.34it/s]

2023-04-20 23:00:19,047 Evaluating as a multi-label problem: False
2023-04-20 23:00:19,074 DEV : loss 0.143371120095253 - f1-score (micro avg)  0.6649
2023-04-20 23:00:19,120 BAD EPOCHS (no improvement): 0
2023-04-20 23:00:19,128 saving best model





2023-04-20 23:00:21,684 ----------------------------------------------------------------------------------------------------
2023-04-20 23:00:23,203 epoch 43 - iter 7/72 - loss 0.09582240 - time (sec): 1.51 - samples/sec: 2678.24 - lr: 0.025000
2023-04-20 23:00:24,799 epoch 43 - iter 14/72 - loss 0.10063411 - time (sec): 3.11 - samples/sec: 2545.35 - lr: 0.025000
2023-04-20 23:00:26,273 epoch 43 - iter 21/72 - loss 0.09902772 - time (sec): 4.58 - samples/sec: 2568.68 - lr: 0.025000
2023-04-20 23:00:27,866 epoch 43 - iter 28/72 - loss 0.09838986 - time (sec): 6.18 - samples/sec: 2532.77 - lr: 0.025000
2023-04-20 23:00:29,936 epoch 43 - iter 35/72 - loss 0.09641835 - time (sec): 8.25 - samples/sec: 2376.96 - lr: 0.025000
2023-04-20 23:00:31,707 epoch 43 - iter 42/72 - loss 0.09583154 - time (sec): 10.02 - samples/sec: 2332.55 - lr: 0.025000
2023-04-20 23:00:33,418 epoch 43 - iter 49/72 - loss 0.09653736 - time (sec): 11.73 - samples/sec: 2330.85 - lr: 0.025000
2023-04-20 23:00:35,187 epo

100%|██████████| 8/8 [00:01<00:00,  6.51it/s]

2023-04-20 23:00:40,944 Evaluating as a multi-label problem: False
2023-04-20 23:00:40,960 DEV : loss 0.1428695023059845 - f1-score (micro avg)  0.6632





2023-04-20 23:00:40,984 BAD EPOCHS (no improvement): 1
2023-04-20 23:00:40,992 ----------------------------------------------------------------------------------------------------
2023-04-20 23:00:42,603 epoch 44 - iter 7/72 - loss 0.08636944 - time (sec): 1.61 - samples/sec: 2580.22 - lr: 0.025000
2023-04-20 23:00:44,004 epoch 44 - iter 14/72 - loss 0.09455063 - time (sec): 3.01 - samples/sec: 2692.12 - lr: 0.025000
2023-04-20 23:00:45,388 epoch 44 - iter 21/72 - loss 0.08887497 - time (sec): 4.39 - samples/sec: 2726.28 - lr: 0.025000
2023-04-20 23:00:46,975 epoch 44 - iter 28/72 - loss 0.09227054 - time (sec): 5.98 - samples/sec: 2698.59 - lr: 0.025000
2023-04-20 23:00:48,548 epoch 44 - iter 35/72 - loss 0.09388061 - time (sec): 7.55 - samples/sec: 2680.56 - lr: 0.025000
2023-04-20 23:00:50,229 epoch 44 - iter 42/72 - loss 0.09284582 - time (sec): 9.24 - samples/sec: 2617.06 - lr: 0.025000
2023-04-20 23:00:52,068 epoch 44 - iter 49/72 - loss 0.09319218 - time (sec): 11.07 - samples/s

100%|██████████| 8/8 [00:01<00:00,  6.51it/s]

2023-04-20 23:00:58,776 Evaluating as a multi-label problem: False
2023-04-20 23:00:58,791 DEV : loss 0.1443365067243576 - f1-score (micro avg)  0.6631





2023-04-20 23:00:58,819 BAD EPOCHS (no improvement): 2
2023-04-20 23:00:58,823 ----------------------------------------------------------------------------------------------------
2023-04-20 23:00:59,706 epoch 45 - iter 7/72 - loss 0.11549347 - time (sec): 0.88 - samples/sec: 4445.50 - lr: 0.025000
2023-04-20 23:01:01,228 epoch 45 - iter 14/72 - loss 0.10604167 - time (sec): 2.40 - samples/sec: 3319.92 - lr: 0.025000
2023-04-20 23:01:02,562 epoch 45 - iter 21/72 - loss 0.10012816 - time (sec): 3.73 - samples/sec: 3151.62 - lr: 0.025000
2023-04-20 23:01:04,211 epoch 45 - iter 28/72 - loss 0.10211885 - time (sec): 5.38 - samples/sec: 2941.26 - lr: 0.025000
2023-04-20 23:01:05,892 epoch 45 - iter 35/72 - loss 0.10074116 - time (sec): 7.06 - samples/sec: 2768.51 - lr: 0.025000
2023-04-20 23:01:07,780 epoch 45 - iter 42/72 - loss 0.09976406 - time (sec): 8.95 - samples/sec: 2626.31 - lr: 0.025000
2023-04-20 23:01:09,568 epoch 45 - iter 49/72 - loss 0.09874513 - time (sec): 10.74 - samples/s

100%|██████████| 8/8 [00:01<00:00,  6.35it/s]

2023-04-20 23:01:16,149 Evaluating as a multi-label problem: False
2023-04-20 23:01:16,165 DEV : loss 0.14651036262512207 - f1-score (micro avg)  0.6513





2023-04-20 23:01:16,189 BAD EPOCHS (no improvement): 3
2023-04-20 23:01:16,196 ----------------------------------------------------------------------------------------------------
2023-04-20 23:01:17,108 epoch 46 - iter 7/72 - loss 0.09544525 - time (sec): 0.91 - samples/sec: 4298.25 - lr: 0.025000
2023-04-20 23:01:18,679 epoch 46 - iter 14/72 - loss 0.09538991 - time (sec): 2.48 - samples/sec: 3274.01 - lr: 0.025000
2023-04-20 23:01:20,330 epoch 46 - iter 21/72 - loss 0.08958032 - time (sec): 4.13 - samples/sec: 2898.13 - lr: 0.025000
2023-04-20 23:01:22,116 epoch 46 - iter 28/72 - loss 0.09178188 - time (sec): 5.92 - samples/sec: 2680.41 - lr: 0.025000
2023-04-20 23:01:24,212 epoch 46 - iter 35/72 - loss 0.09246136 - time (sec): 8.01 - samples/sec: 2455.00 - lr: 0.025000
2023-04-20 23:01:25,751 epoch 46 - iter 42/72 - loss 0.09244417 - time (sec): 9.55 - samples/sec: 2450.07 - lr: 0.025000
2023-04-20 23:01:27,259 epoch 46 - iter 49/72 - loss 0.09329814 - time (sec): 11.06 - samples/s

100%|██████████| 8/8 [00:01<00:00,  6.41it/s]

2023-04-20 23:01:33,744 Evaluating as a multi-label problem: False
2023-04-20 23:01:33,761 DEV : loss 0.14665018022060394 - f1-score (micro avg)  0.657





2023-04-20 23:01:33,790 Epoch    46: reducing learning rate of group 0 to 1.2500e-02.
2023-04-20 23:01:33,792 BAD EPOCHS (no improvement): 4
2023-04-20 23:01:33,800 ----------------------------------------------------------------------------------------------------
2023-04-20 23:01:35,099 epoch 47 - iter 7/72 - loss 0.09429906 - time (sec): 1.30 - samples/sec: 3147.82 - lr: 0.012500
2023-04-20 23:01:36,808 epoch 47 - iter 14/72 - loss 0.10636391 - time (sec): 3.01 - samples/sec: 2604.47 - lr: 0.012500
2023-04-20 23:01:38,970 epoch 47 - iter 21/72 - loss 0.10443386 - time (sec): 5.17 - samples/sec: 2303.51 - lr: 0.012500
2023-04-20 23:01:40,525 epoch 47 - iter 28/72 - loss 0.10150447 - time (sec): 6.72 - samples/sec: 2353.15 - lr: 0.012500
2023-04-20 23:01:41,937 epoch 47 - iter 35/72 - loss 0.10059531 - time (sec): 8.13 - samples/sec: 2401.00 - lr: 0.012500
2023-04-20 23:01:43,239 epoch 47 - iter 42/72 - loss 0.09718862 - time (sec): 9.44 - samples/sec: 2437.98 - lr: 0.012500
2023-04-2

100%|██████████| 8/8 [00:01<00:00,  4.32it/s]

2023-04-20 23:01:52,007 Evaluating as a multi-label problem: False
2023-04-20 23:01:52,035 DEV : loss 0.1423102468252182 - f1-score (micro avg)  0.6514
2023-04-20 23:01:52,079 BAD EPOCHS (no improvement): 1
2023-04-20 23:01:52,083 ----------------------------------------------------------------------------------------------------





2023-04-20 23:01:53,297 epoch 48 - iter 7/72 - loss 0.08766265 - time (sec): 1.21 - samples/sec: 3275.07 - lr: 0.012500
2023-04-20 23:01:55,215 epoch 48 - iter 14/72 - loss 0.08987244 - time (sec): 3.13 - samples/sec: 2522.50 - lr: 0.012500
2023-04-20 23:01:56,764 epoch 48 - iter 21/72 - loss 0.08664916 - time (sec): 4.68 - samples/sec: 2487.41 - lr: 0.012500
2023-04-20 23:01:58,339 epoch 48 - iter 28/72 - loss 0.08686916 - time (sec): 6.25 - samples/sec: 2467.02 - lr: 0.012500
2023-04-20 23:01:59,869 epoch 48 - iter 35/72 - loss 0.08626589 - time (sec): 7.78 - samples/sec: 2496.15 - lr: 0.012500
2023-04-20 23:02:01,237 epoch 48 - iter 42/72 - loss 0.08544474 - time (sec): 9.15 - samples/sec: 2563.14 - lr: 0.012500
2023-04-20 23:02:02,872 epoch 48 - iter 49/72 - loss 0.08793080 - time (sec): 10.79 - samples/sec: 2567.93 - lr: 0.012500
2023-04-20 23:02:04,192 epoch 48 - iter 56/72 - loss 0.08767718 - time (sec): 12.11 - samples/sec: 2591.63 - lr: 0.012500
2023-04-20 23:02:05,735 epoch 4

100%|██████████| 8/8 [00:01<00:00,  4.12it/s]

2023-04-20 23:02:10,374 Evaluating as a multi-label problem: False
2023-04-20 23:02:10,390 DEV : loss 0.14385996758937836 - f1-score (micro avg)  0.6514
2023-04-20 23:02:10,414 BAD EPOCHS (no improvement): 2
2023-04-20 23:02:10,420 ----------------------------------------------------------------------------------------------------





2023-04-20 23:02:11,951 epoch 49 - iter 7/72 - loss 0.09057951 - time (sec): 1.53 - samples/sec: 2728.40 - lr: 0.012500
2023-04-20 23:02:13,493 epoch 49 - iter 14/72 - loss 0.08337475 - time (sec): 3.07 - samples/sec: 2666.39 - lr: 0.012500
2023-04-20 23:02:14,802 epoch 49 - iter 21/72 - loss 0.08467041 - time (sec): 4.38 - samples/sec: 2721.73 - lr: 0.012500
2023-04-20 23:02:16,447 epoch 49 - iter 28/72 - loss 0.08898056 - time (sec): 6.02 - samples/sec: 2661.52 - lr: 0.012500
2023-04-20 23:02:17,834 epoch 49 - iter 35/72 - loss 0.09020246 - time (sec): 7.41 - samples/sec: 2697.40 - lr: 0.012500
2023-04-20 23:02:19,234 epoch 49 - iter 42/72 - loss 0.08950805 - time (sec): 8.81 - samples/sec: 2678.26 - lr: 0.012500
2023-04-20 23:02:20,829 epoch 49 - iter 49/72 - loss 0.08932522 - time (sec): 10.41 - samples/sec: 2665.15 - lr: 0.012500
2023-04-20 23:02:22,581 epoch 49 - iter 56/72 - loss 0.08931487 - time (sec): 12.16 - samples/sec: 2598.42 - lr: 0.012500
2023-04-20 23:02:24,364 epoch 4

100%|██████████| 8/8 [00:01<00:00,  6.37it/s]

2023-04-20 23:02:28,176 Evaluating as a multi-label problem: False
2023-04-20 23:02:28,194 DEV : loss 0.1434168964624405 - f1-score (micro avg)  0.6469





2023-04-20 23:02:28,218 BAD EPOCHS (no improvement): 3
2023-04-20 23:02:28,227 ----------------------------------------------------------------------------------------------------
2023-04-20 23:02:29,111 epoch 50 - iter 7/72 - loss 0.09028378 - time (sec): 0.88 - samples/sec: 4413.39 - lr: 0.012500
2023-04-20 23:02:30,474 epoch 50 - iter 14/72 - loss 0.09919299 - time (sec): 2.24 - samples/sec: 3512.25 - lr: 0.012500
2023-04-20 23:02:31,998 epoch 50 - iter 21/72 - loss 0.09266660 - time (sec): 3.77 - samples/sec: 3086.03 - lr: 0.012500
2023-04-20 23:02:33,358 epoch 50 - iter 28/72 - loss 0.09310315 - time (sec): 5.13 - samples/sec: 3013.01 - lr: 0.012500
2023-04-20 23:02:34,897 epoch 50 - iter 35/72 - loss 0.09003776 - time (sec): 6.67 - samples/sec: 2936.66 - lr: 0.012500
2023-04-20 23:02:36,588 epoch 50 - iter 42/72 - loss 0.09084556 - time (sec): 8.36 - samples/sec: 2811.24 - lr: 0.012500
2023-04-20 23:02:38,356 epoch 50 - iter 49/72 - loss 0.09249986 - time (sec): 10.13 - samples/s

100%|██████████| 8/8 [00:01<00:00,  6.60it/s]

2023-04-20 23:02:45,314 Evaluating as a multi-label problem: False
2023-04-20 23:02:45,329 DEV : loss 0.1422778069972992 - f1-score (micro avg)  0.663
2023-04-20 23:02:45,353 Epoch    50: reducing learning rate of group 0 to 6.2500e-03.





2023-04-20 23:02:45,362 BAD EPOCHS (no improvement): 4
2023-04-20 23:02:45,368 ----------------------------------------------------------------------------------------------------
2023-04-20 23:02:46,336 epoch 51 - iter 7/72 - loss 0.09607714 - time (sec): 0.96 - samples/sec: 4244.65 - lr: 0.006250
2023-04-20 23:02:47,814 epoch 51 - iter 14/72 - loss 0.09442062 - time (sec): 2.44 - samples/sec: 3256.32 - lr: 0.006250
2023-04-20 23:02:49,319 epoch 51 - iter 21/72 - loss 0.09545365 - time (sec): 3.95 - samples/sec: 3015.65 - lr: 0.006250
2023-04-20 23:02:50,739 epoch 51 - iter 28/72 - loss 0.09199795 - time (sec): 5.37 - samples/sec: 2947.81 - lr: 0.006250
2023-04-20 23:02:52,490 epoch 51 - iter 35/72 - loss 0.08834696 - time (sec): 7.12 - samples/sec: 2755.47 - lr: 0.006250
2023-04-20 23:02:54,561 epoch 51 - iter 42/72 - loss 0.09008549 - time (sec): 9.19 - samples/sec: 2577.90 - lr: 0.006250
2023-04-20 23:02:56,395 epoch 51 - iter 49/72 - loss 0.08915448 - time (sec): 11.02 - samples/s

100%|██████████| 8/8 [00:01<00:00,  6.18it/s]

2023-04-20 23:03:02,942 Evaluating as a multi-label problem: False





2023-04-20 23:03:02,959 DEV : loss 0.1406830996274948 - f1-score (micro avg)  0.6595
2023-04-20 23:03:02,985 BAD EPOCHS (no improvement): 1
2023-04-20 23:03:02,990 ----------------------------------------------------------------------------------------------------
2023-04-20 23:03:03,843 epoch 52 - iter 7/72 - loss 0.09566765 - time (sec): 0.85 - samples/sec: 4403.03 - lr: 0.006250
2023-04-20 23:03:05,359 epoch 52 - iter 14/72 - loss 0.09288854 - time (sec): 2.37 - samples/sec: 3199.95 - lr: 0.006250
2023-04-20 23:03:06,943 epoch 52 - iter 21/72 - loss 0.09007634 - time (sec): 3.95 - samples/sec: 2915.07 - lr: 0.006250
2023-04-20 23:03:08,901 epoch 52 - iter 28/72 - loss 0.08554355 - time (sec): 5.91 - samples/sec: 2611.50 - lr: 0.006250
2023-04-20 23:03:10,939 epoch 52 - iter 35/72 - loss 0.08677749 - time (sec): 7.95 - samples/sec: 2459.60 - lr: 0.006250
2023-04-20 23:03:12,477 epoch 52 - iter 42/72 - loss 0.09027683 - time (sec): 9.48 - samples/sec: 2480.75 - lr: 0.006250
2023-04-20

100%|██████████| 8/8 [00:01<00:00,  6.52it/s]

2023-04-20 23:03:20,236 Evaluating as a multi-label problem: False
2023-04-20 23:03:20,255 DEV : loss 0.1417848914861679 - f1-score (micro avg)  0.6622





2023-04-20 23:03:20,281 BAD EPOCHS (no improvement): 2
2023-04-20 23:03:20,290 ----------------------------------------------------------------------------------------------------
2023-04-20 23:03:21,127 epoch 53 - iter 7/72 - loss 0.08218498 - time (sec): 0.83 - samples/sec: 4466.74 - lr: 0.006250
2023-04-20 23:03:22,896 epoch 53 - iter 14/72 - loss 0.08410358 - time (sec): 2.60 - samples/sec: 2867.61 - lr: 0.006250
2023-04-20 23:03:24,582 epoch 53 - iter 21/72 - loss 0.08738709 - time (sec): 4.29 - samples/sec: 2606.38 - lr: 0.006250
2023-04-20 23:03:26,659 epoch 53 - iter 28/72 - loss 0.08573925 - time (sec): 6.36 - samples/sec: 2379.74 - lr: 0.006250
2023-04-20 23:03:28,190 epoch 53 - iter 35/72 - loss 0.08383086 - time (sec): 7.89 - samples/sec: 2419.45 - lr: 0.006250
2023-04-20 23:03:29,573 epoch 53 - iter 42/72 - loss 0.08345046 - time (sec): 9.28 - samples/sec: 2462.68 - lr: 0.006250
2023-04-20 23:03:31,116 epoch 53 - iter 49/72 - loss 0.08626519 - time (sec): 10.82 - samples/s

100%|██████████| 8/8 [00:01<00:00,  5.09it/s]

2023-04-20 23:03:38,010 Evaluating as a multi-label problem: False
2023-04-20 23:03:38,037 DEV : loss 0.1424335539340973 - f1-score (micro avg)  0.6521
2023-04-20 23:03:38,087 BAD EPOCHS (no improvement): 3
2023-04-20 23:03:38,095 ----------------------------------------------------------------------------------------------------





2023-04-20 23:03:39,426 epoch 54 - iter 7/72 - loss 0.10003261 - time (sec): 1.33 - samples/sec: 3102.56 - lr: 0.006250
2023-04-20 23:03:41,219 epoch 54 - iter 14/72 - loss 0.09271067 - time (sec): 3.12 - samples/sec: 2559.09 - lr: 0.006250
2023-04-20 23:03:42,935 epoch 54 - iter 21/72 - loss 0.09401770 - time (sec): 4.84 - samples/sec: 2469.64 - lr: 0.006250
2023-04-20 23:03:44,314 epoch 54 - iter 28/72 - loss 0.08868612 - time (sec): 6.22 - samples/sec: 2524.26 - lr: 0.006250
2023-04-20 23:03:45,795 epoch 54 - iter 35/72 - loss 0.08931867 - time (sec): 7.70 - samples/sec: 2553.00 - lr: 0.006250
2023-04-20 23:03:47,218 epoch 54 - iter 42/72 - loss 0.09002580 - time (sec): 9.12 - samples/sec: 2571.45 - lr: 0.006250
2023-04-20 23:03:48,674 epoch 54 - iter 49/72 - loss 0.08959859 - time (sec): 10.58 - samples/sec: 2578.21 - lr: 0.006250
2023-04-20 23:03:50,237 epoch 54 - iter 56/72 - loss 0.08907501 - time (sec): 12.14 - samples/sec: 2569.85 - lr: 0.006250
2023-04-20 23:03:51,759 epoch 5

100%|██████████| 8/8 [00:02<00:00,  3.04it/s]

2023-04-20 23:03:56,852 Evaluating as a multi-label problem: False
2023-04-20 23:03:56,875 DEV : loss 0.14222998917102814 - f1-score (micro avg)  0.6559
2023-04-20 23:03:56,914 Epoch    54: reducing learning rate of group 0 to 3.1250e-03.
2023-04-20 23:03:56,915 BAD EPOCHS (no improvement): 4
2023-04-20 23:03:56,922 ----------------------------------------------------------------------------------------------------





2023-04-20 23:03:58,077 epoch 55 - iter 7/72 - loss 0.10228856 - time (sec): 1.15 - samples/sec: 3329.20 - lr: 0.003125
2023-04-20 23:03:59,660 epoch 55 - iter 14/72 - loss 0.09023072 - time (sec): 2.74 - samples/sec: 2870.47 - lr: 0.003125
2023-04-20 23:04:01,103 epoch 55 - iter 21/72 - loss 0.08620251 - time (sec): 4.18 - samples/sec: 2822.82 - lr: 0.003125
2023-04-20 23:04:02,544 epoch 55 - iter 28/72 - loss 0.09236542 - time (sec): 5.62 - samples/sec: 2789.96 - lr: 0.003125
2023-04-20 23:04:04,086 epoch 55 - iter 35/72 - loss 0.08853577 - time (sec): 7.16 - samples/sec: 2756.32 - lr: 0.003125
2023-04-20 23:04:05,405 epoch 55 - iter 42/72 - loss 0.08773807 - time (sec): 8.48 - samples/sec: 2780.87 - lr: 0.003125
2023-04-20 23:04:06,973 epoch 55 - iter 49/72 - loss 0.08644562 - time (sec): 10.05 - samples/sec: 2743.60 - lr: 0.003125
2023-04-20 23:04:08,756 epoch 55 - iter 56/72 - loss 0.08574833 - time (sec): 11.83 - samples/sec: 2666.62 - lr: 0.003125
2023-04-20 23:04:10,444 epoch 5

100%|██████████| 8/8 [00:01<00:00,  6.34it/s]

2023-04-20 23:04:14,564 Evaluating as a multi-label problem: False
2023-04-20 23:04:14,580 DEV : loss 0.14180274307727814 - f1-score (micro avg)  0.6577





2023-04-20 23:04:14,604 BAD EPOCHS (no improvement): 1
2023-04-20 23:04:14,614 ----------------------------------------------------------------------------------------------------
2023-04-20 23:04:15,636 epoch 56 - iter 7/72 - loss 0.08958956 - time (sec): 1.02 - samples/sec: 3950.66 - lr: 0.003125
2023-04-20 23:04:17,130 epoch 56 - iter 14/72 - loss 0.08811163 - time (sec): 2.52 - samples/sec: 3284.61 - lr: 0.003125
2023-04-20 23:04:18,525 epoch 56 - iter 21/72 - loss 0.08652581 - time (sec): 3.91 - samples/sec: 3079.78 - lr: 0.003125
2023-04-20 23:04:19,924 epoch 56 - iter 28/72 - loss 0.08412036 - time (sec): 5.31 - samples/sec: 2972.24 - lr: 0.003125
2023-04-20 23:04:21,375 epoch 56 - iter 35/72 - loss 0.08381122 - time (sec): 6.76 - samples/sec: 2915.95 - lr: 0.003125
2023-04-20 23:04:22,856 epoch 56 - iter 42/72 - loss 0.08334067 - time (sec): 8.24 - samples/sec: 2879.58 - lr: 0.003125
2023-04-20 23:04:24,616 epoch 56 - iter 49/72 - loss 0.08195146 - time (sec): 10.00 - samples/s

100%|██████████| 8/8 [00:01<00:00,  6.26it/s]

2023-04-20 23:04:31,980 Evaluating as a multi-label problem: False
2023-04-20 23:04:31,997 DEV : loss 0.14141641557216644 - f1-score (micro avg)  0.6622





2023-04-20 23:04:32,026 BAD EPOCHS (no improvement): 2
2023-04-20 23:04:32,032 ----------------------------------------------------------------------------------------------------
2023-04-20 23:04:32,986 epoch 57 - iter 7/72 - loss 0.09389103 - time (sec): 0.95 - samples/sec: 3898.50 - lr: 0.003125
2023-04-20 23:04:34,408 epoch 57 - iter 14/72 - loss 0.09425304 - time (sec): 2.38 - samples/sec: 3217.12 - lr: 0.003125
2023-04-20 23:04:35,818 epoch 57 - iter 21/72 - loss 0.08976526 - time (sec): 3.79 - samples/sec: 3025.46 - lr: 0.003125
2023-04-20 23:04:37,467 epoch 57 - iter 28/72 - loss 0.09028867 - time (sec): 5.43 - samples/sec: 2889.02 - lr: 0.003125
2023-04-20 23:04:39,223 epoch 57 - iter 35/72 - loss 0.08799923 - time (sec): 7.19 - samples/sec: 2712.10 - lr: 0.003125
2023-04-20 23:04:41,075 epoch 57 - iter 42/72 - loss 0.08901519 - time (sec): 9.04 - samples/sec: 2589.17 - lr: 0.003125
2023-04-20 23:04:42,948 epoch 57 - iter 49/72 - loss 0.08842335 - time (sec): 10.91 - samples/s

100%|██████████| 8/8 [00:01<00:00,  6.43it/s]

2023-04-20 23:04:49,475 Evaluating as a multi-label problem: False
2023-04-20 23:04:49,493 DEV : loss 0.1418592631816864 - f1-score (micro avg)  0.6595





2023-04-20 23:04:49,519 BAD EPOCHS (no improvement): 3
2023-04-20 23:04:49,528 ----------------------------------------------------------------------------------------------------
2023-04-20 23:04:50,415 epoch 58 - iter 7/72 - loss 0.08499103 - time (sec): 0.88 - samples/sec: 4402.11 - lr: 0.003125
2023-04-20 23:04:51,837 epoch 58 - iter 14/72 - loss 0.08303767 - time (sec): 2.31 - samples/sec: 3381.75 - lr: 0.003125
2023-04-20 23:04:53,194 epoch 58 - iter 21/72 - loss 0.08185521 - time (sec): 3.66 - samples/sec: 3220.37 - lr: 0.003125
2023-04-20 23:04:54,868 epoch 58 - iter 28/72 - loss 0.08010187 - time (sec): 5.34 - samples/sec: 2912.52 - lr: 0.003125
2023-04-20 23:04:56,749 epoch 58 - iter 35/72 - loss 0.07883075 - time (sec): 7.22 - samples/sec: 2686.11 - lr: 0.003125
2023-04-20 23:04:58,639 epoch 58 - iter 42/72 - loss 0.08093390 - time (sec): 9.11 - samples/sec: 2541.97 - lr: 0.003125
2023-04-20 23:05:00,170 epoch 58 - iter 49/72 - loss 0.08200010 - time (sec): 10.64 - samples/s

100%|██████████| 8/8 [00:01<00:00,  6.73it/s]

2023-04-20 23:05:06,942 Evaluating as a multi-label problem: False
2023-04-20 23:05:06,958 DEV : loss 0.14164473116397858 - f1-score (micro avg)  0.6558
2023-04-20 23:05:06,982 Epoch    58: reducing learning rate of group 0 to 1.5625e-03.
2023-04-20 23:05:06,983 BAD EPOCHS (no improvement): 4
2023-04-20 23:05:06,988 ----------------------------------------------------------------------------------------------------





2023-04-20 23:05:07,888 epoch 59 - iter 7/72 - loss 0.08250196 - time (sec): 0.90 - samples/sec: 4520.60 - lr: 0.001563
2023-04-20 23:05:09,613 epoch 59 - iter 14/72 - loss 0.09204223 - time (sec): 2.62 - samples/sec: 3143.95 - lr: 0.001563
2023-04-20 23:05:11,295 epoch 59 - iter 21/72 - loss 0.09944847 - time (sec): 4.31 - samples/sec: 2804.88 - lr: 0.001563
2023-04-20 23:05:13,322 epoch 59 - iter 28/72 - loss 0.09624755 - time (sec): 6.33 - samples/sec: 2523.07 - lr: 0.001563
2023-04-20 23:05:14,852 epoch 59 - iter 35/72 - loss 0.09536741 - time (sec): 7.86 - samples/sec: 2495.58 - lr: 0.001563
2023-04-20 23:05:16,322 epoch 59 - iter 42/72 - loss 0.09346114 - time (sec): 9.33 - samples/sec: 2521.11 - lr: 0.001563
2023-04-20 23:05:17,751 epoch 59 - iter 49/72 - loss 0.09191957 - time (sec): 10.76 - samples/sec: 2550.29 - lr: 0.001563
2023-04-20 23:05:19,096 epoch 59 - iter 56/72 - loss 0.09160560 - time (sec): 12.11 - samples/sec: 2582.20 - lr: 0.001563
2023-04-20 23:05:20,469 epoch 5

100%|██████████| 8/8 [00:01<00:00,  4.01it/s]

2023-04-20 23:05:24,762 Evaluating as a multi-label problem: False
2023-04-20 23:05:24,784 DEV : loss 0.14164191484451294 - f1-score (micro avg)  0.6577
2023-04-20 23:05:24,830 BAD EPOCHS (no improvement): 1
2023-04-20 23:05:24,839 ----------------------------------------------------------------------------------------------------





2023-04-20 23:05:26,035 epoch 60 - iter 7/72 - loss 0.09646998 - time (sec): 1.19 - samples/sec: 3407.39 - lr: 0.001563
2023-04-20 23:05:27,804 epoch 60 - iter 14/72 - loss 0.08966987 - time (sec): 2.96 - samples/sec: 2657.35 - lr: 0.001563
2023-04-20 23:05:29,665 epoch 60 - iter 21/72 - loss 0.09722641 - time (sec): 4.83 - samples/sec: 2425.95 - lr: 0.001563
2023-04-20 23:05:31,292 epoch 60 - iter 28/72 - loss 0.09476039 - time (sec): 6.45 - samples/sec: 2423.23 - lr: 0.001563
2023-04-20 23:05:32,680 epoch 60 - iter 35/72 - loss 0.09033700 - time (sec): 7.84 - samples/sec: 2492.68 - lr: 0.001563
2023-04-20 23:05:34,112 epoch 60 - iter 42/72 - loss 0.08872928 - time (sec): 9.27 - samples/sec: 2520.79 - lr: 0.001563
2023-04-20 23:05:35,651 epoch 60 - iter 49/72 - loss 0.08776878 - time (sec): 10.81 - samples/sec: 2531.30 - lr: 0.001563
2023-04-20 23:05:37,035 epoch 60 - iter 56/72 - loss 0.08940414 - time (sec): 12.20 - samples/sec: 2549.23 - lr: 0.001563
2023-04-20 23:05:38,529 epoch 6

100%|██████████| 8/8 [00:01<00:00,  4.29it/s]

2023-04-20 23:05:42,989 Evaluating as a multi-label problem: False
2023-04-20 23:05:43,011 DEV : loss 0.14177392423152924 - f1-score (micro avg)  0.6586
2023-04-20 23:05:43,056 BAD EPOCHS (no improvement): 2
2023-04-20 23:05:43,066 ----------------------------------------------------------------------------------------------------





2023-04-20 23:05:44,217 epoch 61 - iter 7/72 - loss 0.08164649 - time (sec): 1.15 - samples/sec: 3277.87 - lr: 0.001563
2023-04-20 23:05:45,835 epoch 61 - iter 14/72 - loss 0.09185086 - time (sec): 2.77 - samples/sec: 2733.24 - lr: 0.001563
2023-04-20 23:05:47,364 epoch 61 - iter 21/72 - loss 0.09370211 - time (sec): 4.30 - samples/sec: 2684.26 - lr: 0.001563
2023-04-20 23:05:48,727 epoch 61 - iter 28/72 - loss 0.09360301 - time (sec): 5.66 - samples/sec: 2713.76 - lr: 0.001563
2023-04-20 23:05:50,257 epoch 61 - iter 35/72 - loss 0.09688293 - time (sec): 7.19 - samples/sec: 2718.84 - lr: 0.001563
2023-04-20 23:05:51,599 epoch 61 - iter 42/72 - loss 0.09487736 - time (sec): 8.53 - samples/sec: 2723.47 - lr: 0.001563
2023-04-20 23:05:53,062 epoch 61 - iter 49/72 - loss 0.09518435 - time (sec): 9.99 - samples/sec: 2720.44 - lr: 0.001563
2023-04-20 23:05:54,524 epoch 61 - iter 56/72 - loss 0.09366274 - time (sec): 11.46 - samples/sec: 2716.85 - lr: 0.001563
2023-04-20 23:05:56,455 epoch 61

100%|██████████| 8/8 [00:01<00:00,  4.95it/s]

2023-04-20 23:06:01,026 Evaluating as a multi-label problem: False
2023-04-20 23:06:01,042 DEV : loss 0.14177630841732025 - f1-score (micro avg)  0.6541





2023-04-20 23:06:01,069 BAD EPOCHS (no improvement): 3
2023-04-20 23:06:01,075 ----------------------------------------------------------------------------------------------------
2023-04-20 23:06:02,046 epoch 62 - iter 7/72 - loss 0.10763200 - time (sec): 0.97 - samples/sec: 4199.75 - lr: 0.001563
2023-04-20 23:06:03,462 epoch 62 - iter 14/72 - loss 0.10247851 - time (sec): 2.38 - samples/sec: 3320.85 - lr: 0.001563
2023-04-20 23:06:04,946 epoch 62 - iter 21/72 - loss 0.09580810 - time (sec): 3.87 - samples/sec: 3058.82 - lr: 0.001563
2023-04-20 23:06:06,544 epoch 62 - iter 28/72 - loss 0.09381079 - time (sec): 5.46 - samples/sec: 2894.23 - lr: 0.001563
2023-04-20 23:06:07,965 epoch 62 - iter 35/72 - loss 0.09462369 - time (sec): 6.88 - samples/sec: 2849.29 - lr: 0.001563
2023-04-20 23:06:09,354 epoch 62 - iter 42/72 - loss 0.09252353 - time (sec): 8.27 - samples/sec: 2822.76 - lr: 0.001563
2023-04-20 23:06:11,057 epoch 62 - iter 49/72 - loss 0.09130714 - time (sec): 9.98 - samples/se

100%|██████████| 8/8 [00:01<00:00,  6.47it/s]

2023-04-20 23:06:18,486 Evaluating as a multi-label problem: False
2023-04-20 23:06:18,504 DEV : loss 0.14156121015548706 - f1-score (micro avg)  0.6595





2023-04-20 23:06:18,531 Epoch    62: reducing learning rate of group 0 to 7.8125e-04.
2023-04-20 23:06:18,533 BAD EPOCHS (no improvement): 4
2023-04-20 23:06:18,538 ----------------------------------------------------------------------------------------------------
2023-04-20 23:06:19,515 epoch 63 - iter 7/72 - loss 0.06489992 - time (sec): 0.97 - samples/sec: 4009.70 - lr: 0.000781
2023-04-20 23:06:21,016 epoch 63 - iter 14/72 - loss 0.08122714 - time (sec): 2.48 - samples/sec: 3234.85 - lr: 0.000781
2023-04-20 23:06:22,418 epoch 63 - iter 21/72 - loss 0.08434168 - time (sec): 3.88 - samples/sec: 3051.98 - lr: 0.000781
2023-04-20 23:06:23,879 epoch 63 - iter 28/72 - loss 0.08615816 - time (sec): 5.34 - samples/sec: 2941.35 - lr: 0.000781
2023-04-20 23:06:25,310 epoch 63 - iter 35/72 - loss 0.08811642 - time (sec): 6.77 - samples/sec: 2891.97 - lr: 0.000781
2023-04-20 23:06:27,038 epoch 63 - iter 42/72 - loss 0.09029190 - time (sec): 8.50 - samples/sec: 2753.52 - lr: 0.000781
2023-04-2

100%|██████████| 8/8 [00:01<00:00,  6.55it/s]

2023-04-20 23:06:36,030 Evaluating as a multi-label problem: False
2023-04-20 23:06:36,053 DEV : loss 0.14170803129673004 - f1-score (micro avg)  0.6613





2023-04-20 23:06:36,083 BAD EPOCHS (no improvement): 1
2023-04-20 23:06:36,088 ----------------------------------------------------------------------------------------------------
2023-04-20 23:06:36,995 epoch 64 - iter 7/72 - loss 0.08878864 - time (sec): 0.90 - samples/sec: 4226.23 - lr: 0.000781
2023-04-20 23:06:38,436 epoch 64 - iter 14/72 - loss 0.08694885 - time (sec): 2.35 - samples/sec: 3326.53 - lr: 0.000781
2023-04-20 23:06:39,936 epoch 64 - iter 21/72 - loss 0.08347787 - time (sec): 3.85 - samples/sec: 3099.12 - lr: 0.000781
2023-04-20 23:06:41,386 epoch 64 - iter 28/72 - loss 0.08282095 - time (sec): 5.30 - samples/sec: 2958.70 - lr: 0.000781
2023-04-20 23:06:43,110 epoch 64 - iter 35/72 - loss 0.08373759 - time (sec): 7.02 - samples/sec: 2785.35 - lr: 0.000781
2023-04-20 23:06:45,065 epoch 64 - iter 42/72 - loss 0.08488124 - time (sec): 8.98 - samples/sec: 2604.82 - lr: 0.000781
2023-04-20 23:06:46,866 epoch 64 - iter 49/72 - loss 0.08503952 - time (sec): 10.78 - samples/s

100%|██████████| 8/8 [00:01<00:00,  4.56it/s]

2023-04-20 23:06:53,737 Evaluating as a multi-label problem: False
2023-04-20 23:06:53,754 DEV : loss 0.14168362319469452 - f1-score (micro avg)  0.6586
2023-04-20 23:06:53,779 BAD EPOCHS (no improvement): 2
2023-04-20 23:06:53,785 ----------------------------------------------------------------------------------------------------





2023-04-20 23:06:54,684 epoch 65 - iter 7/72 - loss 0.07385105 - time (sec): 0.89 - samples/sec: 4220.63 - lr: 0.000781
2023-04-20 23:06:56,285 epoch 65 - iter 14/72 - loss 0.07940284 - time (sec): 2.49 - samples/sec: 3140.73 - lr: 0.000781
2023-04-20 23:06:58,178 epoch 65 - iter 21/72 - loss 0.08416991 - time (sec): 4.39 - samples/sec: 2692.32 - lr: 0.000781
2023-04-20 23:07:00,001 epoch 65 - iter 28/72 - loss 0.08494763 - time (sec): 6.21 - samples/sec: 2517.51 - lr: 0.000781
2023-04-20 23:07:01,826 epoch 65 - iter 35/72 - loss 0.08328417 - time (sec): 8.04 - samples/sec: 2436.66 - lr: 0.000781
2023-04-20 23:07:03,281 epoch 65 - iter 42/72 - loss 0.08222081 - time (sec): 9.49 - samples/sec: 2468.57 - lr: 0.000781
2023-04-20 23:07:04,922 epoch 65 - iter 49/72 - loss 0.08459372 - time (sec): 11.13 - samples/sec: 2478.91 - lr: 0.000781
2023-04-20 23:07:06,366 epoch 65 - iter 56/72 - loss 0.08435889 - time (sec): 12.58 - samples/sec: 2499.96 - lr: 0.000781
2023-04-20 23:07:07,729 epoch 6

100%|██████████| 8/8 [00:01<00:00,  6.40it/s]

2023-04-20 23:07:11,180 Evaluating as a multi-label problem: False
2023-04-20 23:07:11,196 DEV : loss 0.14179766178131104 - f1-score (micro avg)  0.6649





2023-04-20 23:07:11,226 BAD EPOCHS (no improvement): 3
2023-04-20 23:07:11,232 ----------------------------------------------------------------------------------------------------
2023-04-20 23:07:12,442 epoch 66 - iter 7/72 - loss 0.07262128 - time (sec): 1.21 - samples/sec: 3457.02 - lr: 0.000781
2023-04-20 23:07:14,157 epoch 66 - iter 14/72 - loss 0.07797969 - time (sec): 2.92 - samples/sec: 2680.05 - lr: 0.000781
2023-04-20 23:07:16,135 epoch 66 - iter 21/72 - loss 0.08060763 - time (sec): 4.90 - samples/sec: 2397.61 - lr: 0.000781
2023-04-20 23:07:17,780 epoch 66 - iter 28/72 - loss 0.08178050 - time (sec): 6.55 - samples/sec: 2375.26 - lr: 0.000781
2023-04-20 23:07:19,296 epoch 66 - iter 35/72 - loss 0.08393167 - time (sec): 8.06 - samples/sec: 2392.47 - lr: 0.000781
2023-04-20 23:07:20,715 epoch 66 - iter 42/72 - loss 0.08312971 - time (sec): 9.48 - samples/sec: 2436.63 - lr: 0.000781
2023-04-20 23:07:22,086 epoch 66 - iter 49/72 - loss 0.08354836 - time (sec): 10.85 - samples/s

100%|██████████| 8/8 [00:01<00:00,  4.39it/s]

2023-04-20 23:07:29,238 Evaluating as a multi-label problem: False
2023-04-20 23:07:29,269 DEV : loss 0.14171414077281952 - f1-score (micro avg)  0.6577
2023-04-20 23:07:29,317 Epoch    66: reducing learning rate of group 0 to 3.9063e-04.
2023-04-20 23:07:29,323 BAD EPOCHS (no improvement): 4
2023-04-20 23:07:29,329 ----------------------------------------------------------------------------------------------------





2023-04-20 23:07:30,595 epoch 67 - iter 7/72 - loss 0.08135035 - time (sec): 1.26 - samples/sec: 3159.76 - lr: 0.000391
2023-04-20 23:07:32,405 epoch 67 - iter 14/72 - loss 0.09095581 - time (sec): 3.07 - samples/sec: 2572.29 - lr: 0.000391
2023-04-20 23:07:33,894 epoch 67 - iter 21/72 - loss 0.09100551 - time (sec): 4.56 - samples/sec: 2585.30 - lr: 0.000391
2023-04-20 23:07:35,379 epoch 67 - iter 28/72 - loss 0.08917594 - time (sec): 6.05 - samples/sec: 2593.65 - lr: 0.000391
2023-04-20 23:07:36,848 epoch 67 - iter 35/72 - loss 0.09028531 - time (sec): 7.52 - samples/sec: 2594.85 - lr: 0.000391
2023-04-20 23:07:38,259 epoch 67 - iter 42/72 - loss 0.09078027 - time (sec): 8.93 - samples/sec: 2620.47 - lr: 0.000391
2023-04-20 23:07:39,677 epoch 67 - iter 49/72 - loss 0.08951419 - time (sec): 10.34 - samples/sec: 2639.02 - lr: 0.000391
2023-04-20 23:07:41,045 epoch 67 - iter 56/72 - loss 0.08988916 - time (sec): 11.71 - samples/sec: 2661.58 - lr: 0.000391
2023-04-20 23:07:42,692 epoch 6

100%|██████████| 8/8 [00:01<00:00,  4.24it/s]

2023-04-20 23:07:47,524 Evaluating as a multi-label problem: False
2023-04-20 23:07:47,541 DEV : loss 0.14170554280281067 - f1-score (micro avg)  0.6577
2023-04-20 23:07:47,567 BAD EPOCHS (no improvement): 1
2023-04-20 23:07:47,572 ----------------------------------------------------------------------------------------------------





2023-04-20 23:07:48,695 epoch 68 - iter 7/72 - loss 0.09193384 - time (sec): 1.12 - samples/sec: 3766.08 - lr: 0.000391
2023-04-20 23:07:50,115 epoch 68 - iter 14/72 - loss 0.10150342 - time (sec): 2.54 - samples/sec: 3246.14 - lr: 0.000391
2023-04-20 23:07:51,709 epoch 68 - iter 21/72 - loss 0.09499684 - time (sec): 4.13 - samples/sec: 2983.03 - lr: 0.000391
2023-04-20 23:07:53,082 epoch 68 - iter 28/72 - loss 0.08932923 - time (sec): 5.51 - samples/sec: 2929.62 - lr: 0.000391
2023-04-20 23:07:54,447 epoch 68 - iter 35/72 - loss 0.08652536 - time (sec): 6.87 - samples/sec: 2885.63 - lr: 0.000391
2023-04-20 23:07:55,813 epoch 68 - iter 42/72 - loss 0.08495984 - time (sec): 8.24 - samples/sec: 2865.27 - lr: 0.000391
2023-04-20 23:07:57,275 epoch 68 - iter 49/72 - loss 0.09010658 - time (sec): 9.70 - samples/sec: 2842.39 - lr: 0.000391
2023-04-20 23:07:58,995 epoch 68 - iter 56/72 - loss 0.08984558 - time (sec): 11.42 - samples/sec: 2730.50 - lr: 0.000391
2023-04-20 23:08:00,780 epoch 68

100%|██████████| 8/8 [00:01<00:00,  6.37it/s]

2023-04-20 23:08:04,902 Evaluating as a multi-label problem: False
2023-04-20 23:08:04,922 DEV : loss 0.14167502522468567 - f1-score (micro avg)  0.6595





2023-04-20 23:08:04,948 BAD EPOCHS (no improvement): 2
2023-04-20 23:08:04,954 ----------------------------------------------------------------------------------------------------
2023-04-20 23:08:05,970 epoch 69 - iter 7/72 - loss 0.08842669 - time (sec): 1.02 - samples/sec: 4066.67 - lr: 0.000391
2023-04-20 23:08:07,506 epoch 69 - iter 14/72 - loss 0.08914878 - time (sec): 2.55 - samples/sec: 3202.45 - lr: 0.000391
2023-04-20 23:08:08,893 epoch 69 - iter 21/72 - loss 0.08911975 - time (sec): 3.94 - samples/sec: 3101.11 - lr: 0.000391
2023-04-20 23:08:10,340 epoch 69 - iter 28/72 - loss 0.08600907 - time (sec): 5.38 - samples/sec: 2993.70 - lr: 0.000391
2023-04-20 23:08:11,759 epoch 69 - iter 35/72 - loss 0.08446393 - time (sec): 6.80 - samples/sec: 2921.24 - lr: 0.000391
2023-04-20 23:08:13,489 epoch 69 - iter 42/72 - loss 0.08574225 - time (sec): 8.53 - samples/sec: 2777.50 - lr: 0.000391
2023-04-20 23:08:15,362 epoch 69 - iter 49/72 - loss 0.08651294 - time (sec): 10.41 - samples/s

100%|██████████| 8/8 [00:01<00:00,  6.43it/s]

2023-04-20 23:08:22,449 Evaluating as a multi-label problem: False
2023-04-20 23:08:22,472 DEV : loss 0.141811341047287 - f1-score (micro avg)  0.6577





2023-04-20 23:08:22,499 BAD EPOCHS (no improvement): 3
2023-04-20 23:08:22,509 ----------------------------------------------------------------------------------------------------
2023-04-20 23:08:24,087 epoch 70 - iter 7/72 - loss 0.07830049 - time (sec): 1.58 - samples/sec: 2426.61 - lr: 0.000391
2023-04-20 23:08:25,536 epoch 70 - iter 14/72 - loss 0.07487148 - time (sec): 3.02 - samples/sec: 2556.55 - lr: 0.000391
2023-04-20 23:08:26,933 epoch 70 - iter 21/72 - loss 0.07768623 - time (sec): 4.42 - samples/sec: 2614.30 - lr: 0.000391
2023-04-20 23:08:28,488 epoch 70 - iter 28/72 - loss 0.07902915 - time (sec): 5.98 - samples/sec: 2569.99 - lr: 0.000391
2023-04-20 23:08:30,423 epoch 70 - iter 35/72 - loss 0.08590578 - time (sec): 7.91 - samples/sec: 2447.44 - lr: 0.000391
2023-04-20 23:08:32,282 epoch 70 - iter 42/72 - loss 0.08571846 - time (sec): 9.77 - samples/sec: 2393.02 - lr: 0.000391
2023-04-20 23:08:33,979 epoch 70 - iter 49/72 - loss 0.09013256 - time (sec): 11.47 - samples/s

100%|██████████| 8/8 [00:01<00:00,  6.56it/s]

2023-04-20 23:08:40,283 Evaluating as a multi-label problem: False
2023-04-20 23:08:40,299 DEV : loss 0.14184999465942383 - f1-score (micro avg)  0.6595





2023-04-20 23:08:40,326 Epoch    70: reducing learning rate of group 0 to 1.9531e-04.
2023-04-20 23:08:40,328 BAD EPOCHS (no improvement): 4
2023-04-20 23:08:40,333 ----------------------------------------------------------------------------------------------------
2023-04-20 23:08:41,197 epoch 71 - iter 7/72 - loss 0.06178396 - time (sec): 0.86 - samples/sec: 4371.66 - lr: 0.000195
2023-04-20 23:08:42,682 epoch 71 - iter 14/72 - loss 0.08893628 - time (sec): 2.34 - samples/sec: 3219.95 - lr: 0.000195
2023-04-20 23:08:44,404 epoch 71 - iter 21/72 - loss 0.08420364 - time (sec): 4.07 - samples/sec: 2773.89 - lr: 0.000195
2023-04-20 23:08:46,101 epoch 71 - iter 28/72 - loss 0.08327308 - time (sec): 5.76 - samples/sec: 2623.21 - lr: 0.000195
2023-04-20 23:08:47,993 epoch 71 - iter 35/72 - loss 0.08860185 - time (sec): 7.65 - samples/sec: 2480.05 - lr: 0.000195
2023-04-20 23:08:49,633 epoch 71 - iter 42/72 - loss 0.08829373 - time (sec): 9.30 - samples/sec: 2454.47 - lr: 0.000195
2023-04-2

100%|██████████| 8/8 [00:01<00:00,  6.41it/s]

2023-04-20 23:08:57,574 Evaluating as a multi-label problem: False
2023-04-20 23:08:57,593 DEV : loss 0.14187464118003845 - f1-score (micro avg)  0.6595





2023-04-20 23:08:57,621 BAD EPOCHS (no improvement): 1
2023-04-20 23:08:57,627 ----------------------------------------------------------------------------------------------------
2023-04-20 23:08:58,466 epoch 72 - iter 7/72 - loss 0.09117393 - time (sec): 0.84 - samples/sec: 4503.83 - lr: 0.000195
2023-04-20 23:09:00,259 epoch 72 - iter 14/72 - loss 0.08689917 - time (sec): 2.63 - samples/sec: 2930.12 - lr: 0.000195
2023-04-20 23:09:02,095 epoch 72 - iter 21/72 - loss 0.08732103 - time (sec): 4.47 - samples/sec: 2568.23 - lr: 0.000195
2023-04-20 23:09:04,065 epoch 72 - iter 28/72 - loss 0.08828219 - time (sec): 6.44 - samples/sec: 2403.47 - lr: 0.000195
2023-04-20 23:09:05,539 epoch 72 - iter 35/72 - loss 0.08666300 - time (sec): 7.91 - samples/sec: 2448.79 - lr: 0.000195
2023-04-20 23:09:06,873 epoch 72 - iter 42/72 - loss 0.08775105 - time (sec): 9.24 - samples/sec: 2487.53 - lr: 0.000195
2023-04-20 23:09:08,554 epoch 72 - iter 49/72 - loss 0.09034220 - time (sec): 10.93 - samples/s

100%|██████████| 8/8 [00:01<00:00,  4.51it/s]

2023-04-20 23:09:15,381 Evaluating as a multi-label problem: False
2023-04-20 23:09:15,404 DEV : loss 0.14189475774765015 - f1-score (micro avg)  0.6595
2023-04-20 23:09:15,445 BAD EPOCHS (no improvement): 2
2023-04-20 23:09:15,451 ----------------------------------------------------------------------------------------------------





2023-04-20 23:09:16,676 epoch 73 - iter 7/72 - loss 0.07552306 - time (sec): 1.22 - samples/sec: 3291.88 - lr: 0.000195
2023-04-20 23:09:18,583 epoch 73 - iter 14/72 - loss 0.08720915 - time (sec): 3.13 - samples/sec: 2459.61 - lr: 0.000195
2023-04-20 23:09:20,232 epoch 73 - iter 21/72 - loss 0.08693510 - time (sec): 4.78 - samples/sec: 2437.30 - lr: 0.000195
2023-04-20 23:09:21,672 epoch 73 - iter 28/72 - loss 0.08851387 - time (sec): 6.22 - samples/sec: 2493.58 - lr: 0.000195
2023-04-20 23:09:23,272 epoch 73 - iter 35/72 - loss 0.08474355 - time (sec): 7.82 - samples/sec: 2509.13 - lr: 0.000195
2023-04-20 23:09:24,673 epoch 73 - iter 42/72 - loss 0.08720850 - time (sec): 9.22 - samples/sec: 2566.51 - lr: 0.000195
2023-04-20 23:09:26,227 epoch 73 - iter 49/72 - loss 0.08916590 - time (sec): 10.77 - samples/sec: 2576.76 - lr: 0.000195
2023-04-20 23:09:27,681 epoch 73 - iter 56/72 - loss 0.08680379 - time (sec): 12.23 - samples/sec: 2589.56 - lr: 0.000195
2023-04-20 23:09:29,101 epoch 7

100%|██████████| 8/8 [00:01<00:00,  4.12it/s]

2023-04-20 23:09:33,666 Evaluating as a multi-label problem: False
2023-04-20 23:09:33,695 DEV : loss 0.1418944150209427 - f1-score (micro avg)  0.6595
2023-04-20 23:09:33,742 BAD EPOCHS (no improvement): 3
2023-04-20 23:09:33,749 ----------------------------------------------------------------------------------------------------





2023-04-20 23:09:34,877 epoch 74 - iter 7/72 - loss 0.07758271 - time (sec): 1.13 - samples/sec: 3381.13 - lr: 0.000195
2023-04-20 23:09:36,267 epoch 74 - iter 14/72 - loss 0.09600624 - time (sec): 2.52 - samples/sec: 2996.00 - lr: 0.000195
2023-04-20 23:09:37,756 epoch 74 - iter 21/72 - loss 0.09283723 - time (sec): 4.01 - samples/sec: 2877.76 - lr: 0.000195
2023-04-20 23:09:39,223 epoch 74 - iter 28/72 - loss 0.09208321 - time (sec): 5.47 - samples/sec: 2810.22 - lr: 0.000195
2023-04-20 23:09:40,577 epoch 74 - iter 35/72 - loss 0.09217346 - time (sec): 6.83 - samples/sec: 2826.87 - lr: 0.000195
2023-04-20 23:09:42,168 epoch 74 - iter 42/72 - loss 0.08805977 - time (sec): 8.42 - samples/sec: 2778.29 - lr: 0.000195
2023-04-20 23:09:43,646 epoch 74 - iter 49/72 - loss 0.08771718 - time (sec): 9.89 - samples/sec: 2768.92 - lr: 0.000195
2023-04-20 23:09:45,330 epoch 74 - iter 56/72 - loss 0.08726912 - time (sec): 11.58 - samples/sec: 2704.99 - lr: 0.000195
2023-04-20 23:09:47,157 epoch 74

100%|██████████| 8/8 [00:01<00:00,  6.13it/s]

2023-04-20 23:09:51,374 Evaluating as a multi-label problem: False
2023-04-20 23:09:51,392 DEV : loss 0.14191746711730957 - f1-score (micro avg)  0.6595





2023-04-20 23:09:51,421 Epoch    74: reducing learning rate of group 0 to 9.7656e-05.
2023-04-20 23:09:51,423 BAD EPOCHS (no improvement): 4
2023-04-20 23:09:51,430 ----------------------------------------------------------------------------------------------------
2023-04-20 23:09:51,432 ----------------------------------------------------------------------------------------------------
2023-04-20 23:09:51,434 learning rate too small - quitting training!
2023-04-20 23:09:51,440 ----------------------------------------------------------------------------------------------------
2023-04-20 23:09:53,262 ----------------------------------------------------------------------------------------------------
2023-04-20 23:09:55,647 SequenceTagger predicts: Dictionary with 19 tags: O, S-ORG, B-ORG, E-ORG, I-ORG, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-OTH, B-OTH, E-OTH, I-OTH, <START>, <STOP>


100%|██████████| 27/27 [00:13<00:00,  2.00it/s]


2023-04-20 23:10:09,526 Evaluating as a multi-label problem: False
2023-04-20 23:10:09,548 0.6422	0.6284	0.6352	0.5503
2023-04-20 23:10:09,550 
Results:
- F-score (micro) 0.6352
- F-score (macro) 0.4869
- Accuracy 0.5503

By class:
              precision    recall  f1-score   support

         LOC     0.5468    0.7048    0.6158       315
         PER     0.7519    0.6724    0.7099       293
         ORG     0.6888    0.5666    0.6217       293
         OTH     0.0000    0.0000    0.0000        30

   micro avg     0.6422    0.6284    0.6352       931
   macro avg     0.4969    0.4859    0.4869       931
weighted avg     0.6384    0.6284    0.6274       931

2023-04-20 23:10:09,551 ----------------------------------------------------------------------------------------------------


{'test_score': 0.6351791530944625,
 'dev_score_history': [0.3155397390272835,
  0.39780521262002744,
  0.4309859154929578,
  0.48,
  0.5059920106524634,
  0.5113636363636362,
  0.5286103542234333,
  0.46986089644513135,
  0.5144429160935351,
  0.5682137834036568,
  0.5813630041724618,
  0.5582655826558266,
  0.5561643835616438,
  0.5775978407557355,
  0.5880794701986755,
  0.5991902834008098,
  0.5804676753782667,
  0.5714285714285714,
  0.5647382920110193,
  0.6033519553072626,
  0.6052998605299861,
  0.5625841184387617,
  0.63257065948856,
  0.6421768707482993,
  0.6147308781869688,
  0.6028571428571429,
  0.6181818181818182,
  0.6196403872752421,
  0.657608695652174,
  0.6379542395693136,
  0.6427586206896551,
  0.6603260869565217,
  0.6382978723404256,
  0.6538987688098497,
  0.6423751686909581,
  0.6391478029294274,
  0.6576819407008087,
  0.663013698630137,
  0.6550802139037433,
  0.6433378196500673,
  0.6568364611260055,
  0.6649076517150396,
  0.6631578947368422,
  0.6630872483

## For Japanese

In [9]:
from flair.datasets import NER_JAPANESE
from flair.embeddings import WordEmbeddings, FlairEmbeddings, StackedEmbeddings
from flair.models import SequenceTagger
from flair.trainers import ModelTrainer

# get the corpus
corpus = NER_JAPANESE()
print(corpus)

# Just to check what I have in the corpus
print(len(corpus.train))
print(len(corpus.test))
print(len(corpus.dev))
sentence=corpus.test[3]
print(sentence)
print(corpus)

2023-04-21 09:27:29,225 Reading data from /root/.flair/datasets/ner_japanese
2023-04-21 09:27:29,226 Train: /root/.flair/datasets/ner_japanese/train.txt
2023-04-21 09:27:29,227 Dev: None
2023-04-21 09:27:29,229 Test: None
Corpus: 3621 train + 402 dev + 447 test sentences
3621
447
402
Sentence[30]: "世界遺産会議はユネスコ主催で年一回開催され、世界遺産の新規登録や登録物件の現状の評価などを行う。" → ["世界遺産会議"/ORG, "ユネスコ"/ORG]
Corpus: 3621 train + 402 dev + 447 test sentences


In [16]:
# 2. what label do we want to predict?
label_type = 'ner'

# 3. make the label dictionary from the corpus
label_dict = downsampled_corpus.make_label_dictionary(label_type=label_type)
print(label_dict)

# 4. initialize embedding stack with Flair and GloVe
embedding_types = [
    WordEmbeddings('glove'),
    FlairEmbeddings('news-forward'),
    FlairEmbeddings('news-backward'),
]

embeddings = StackedEmbeddings(embeddings=embedding_types)

# 5. initialize sequence tagger
tagger = SequenceTagger(hidden_size=256,
                        embeddings=embeddings,
                        tag_dictionary=label_dict,
                        tag_type=label_type,
                        use_crf=True)

# 6. initialize trainer
trainer = ModelTrainer(tagger, downsampled_corpus)

# 7. start training
trainer.train('/content/drive/My Drive/ColabNotebooks/flairmodels/japanese',
              learning_rate=0.1,
              mini_batch_size=32,
              max_epochs=100,
              write_weights=True)

2023-04-20 23:12:52,628 Computing label dictionary. Progress:


3621it [00:00, 42368.24it/s]

2023-04-20 23:12:52,720 Dictionary created for label 'ner' with 20 values: LOCATION (seen 2337 times), ORGANIZATION (seen 1293 times), DATE (seen 1212 times), PERSON (seen 1155 times), NUMBER (seen 874 times), ARTIFACT (seen 575 times), LOC (seen 324 times), OTHER (seen 288 times), DAT (seen 216 times), EVENT (seen 216 times), ORG (seen 199 times), PERCENT (seen 123 times), PSN (seen 122 times), ART (seen 103 times), MONEY (seen 59 times), TIM (seen 40 times), TIME (seen 17 times), PNT (seen 10 times), MNY (seen 7 times)
Dictionary with 20 tags: <unk>, LOCATION, ORGANIZATION, DATE, PERSON, NUMBER, ARTIFACT, LOC, OTHER, DAT, EVENT, ORG, PERCENT, PSN, ART, MONEY, TIM, TIME, PNT, MNY





2023-04-20 23:13:01,813 SequenceTagger predicts: Dictionary with 77 tags: O, S-LOCATION, B-LOCATION, E-LOCATION, I-LOCATION, S-ORGANIZATION, B-ORGANIZATION, E-ORGANIZATION, I-ORGANIZATION, S-DATE, B-DATE, E-DATE, I-DATE, S-PERSON, B-PERSON, E-PERSON, I-PERSON, S-NUMBER, B-NUMBER, E-NUMBER, I-NUMBER, S-ARTIFACT, B-ARTIFACT, E-ARTIFACT, I-ARTIFACT, S-LOC, B-LOC, E-LOC, I-LOC, S-OTHER, B-OTHER, E-OTHER, I-OTHER, S-DAT, B-DAT, E-DAT, I-DAT, S-EVENT, B-EVENT, E-EVENT, I-EVENT, S-ORG, B-ORG, E-ORG, I-ORG, S-PERCENT, B-PERCENT, E-PERCENT, I-PERCENT, S-PSN
2023-04-20 23:13:02,046 ----------------------------------------------------------------------------------------------------
2023-04-20 23:13:02,051 Model: "SequenceTagger(
  (embeddings): StackedEmbeddings(
    (list_embedding_0): WordEmbeddings(
      'glove'
      (embedding): Embedding(400001, 100)
    )
    (list_embedding_1): FlairEmbeddings(
      (lm): LanguageModel(
        (drop): Dropout(p=0.05, inplace=False)
        (encoder): E

100%|██████████| 13/13 [00:08<00:00,  1.55it/s]

2023-04-20 23:14:40,594 Evaluating as a multi-label problem: False
2023-04-20 23:14:40,617 DEV : loss 1.0012956857681274 - f1-score (micro avg)  0.0063
2023-04-20 23:14:40,679 BAD EPOCHS (no improvement): 0
2023-04-20 23:14:40,685 saving best model





2023-04-20 23:14:42,971 ----------------------------------------------------------------------------------------------------
2023-04-20 23:14:47,915 epoch 2 - iter 11/114 - loss 0.93357233 - time (sec): 4.94 - samples/sec: 2551.00 - lr: 0.100000
2023-04-20 23:14:55,397 epoch 2 - iter 22/114 - loss 0.88063252 - time (sec): 12.42 - samples/sec: 2046.68 - lr: 0.100000
2023-04-20 23:15:03,339 epoch 2 - iter 33/114 - loss 0.85374571 - time (sec): 20.37 - samples/sec: 1845.76 - lr: 0.100000
2023-04-20 23:15:08,419 epoch 2 - iter 44/114 - loss 0.83430432 - time (sec): 25.45 - samples/sec: 1962.79 - lr: 0.100000
2023-04-20 23:15:13,462 epoch 2 - iter 55/114 - loss 0.82010354 - time (sec): 30.49 - samples/sec: 2049.58 - lr: 0.100000
2023-04-20 23:15:18,909 epoch 2 - iter 66/114 - loss 0.80862013 - time (sec): 35.94 - samples/sec: 2079.57 - lr: 0.100000
2023-04-20 23:15:23,415 epoch 2 - iter 77/114 - loss 0.79740978 - time (sec): 40.44 - samples/sec: 2158.69 - lr: 0.100000
2023-04-20 23:15:27,33

100%|██████████| 13/13 [00:03<00:00,  3.34it/s]

2023-04-20 23:15:41,646 Evaluating as a multi-label problem: False
2023-04-20 23:15:41,665 DEV : loss 0.7073206901550293 - f1-score (micro avg)  0.1123





2023-04-20 23:15:41,734 BAD EPOCHS (no improvement): 0
2023-04-20 23:15:41,740 saving best model
2023-04-20 23:15:43,595 ----------------------------------------------------------------------------------------------------
2023-04-20 23:15:47,528 epoch 3 - iter 11/114 - loss 0.75183849 - time (sec): 3.93 - samples/sec: 3082.04 - lr: 0.100000
2023-04-20 23:15:52,863 epoch 3 - iter 22/114 - loss 0.73070059 - time (sec): 9.26 - samples/sec: 2635.47 - lr: 0.100000
2023-04-20 23:15:58,545 epoch 3 - iter 33/114 - loss 0.73960392 - time (sec): 14.94 - samples/sec: 2530.31 - lr: 0.100000
2023-04-20 23:16:03,355 epoch 3 - iter 44/114 - loss 0.72957218 - time (sec): 19.75 - samples/sec: 2540.29 - lr: 0.100000
2023-04-20 23:16:07,694 epoch 3 - iter 55/114 - loss 0.71657639 - time (sec): 24.09 - samples/sec: 2597.16 - lr: 0.100000
2023-04-20 23:16:12,802 epoch 3 - iter 66/114 - loss 0.70769899 - time (sec): 29.20 - samples/sec: 2571.66 - lr: 0.100000
2023-04-20 23:16:17,427 epoch 3 - iter 77/114 - 

100%|██████████| 13/13 [00:04<00:00,  2.70it/s]

2023-04-20 23:16:37,477 Evaluating as a multi-label problem: False





2023-04-20 23:16:37,500 DEV : loss 0.6131987571716309 - f1-score (micro avg)  0.1696
2023-04-20 23:16:37,562 BAD EPOCHS (no improvement): 0
2023-04-20 23:16:37,568 saving best model
2023-04-20 23:16:39,649 ----------------------------------------------------------------------------------------------------
2023-04-20 23:16:44,139 epoch 4 - iter 11/114 - loss 0.68470553 - time (sec): 4.49 - samples/sec: 2853.66 - lr: 0.100000
2023-04-20 23:16:51,278 epoch 4 - iter 22/114 - loss 0.65458462 - time (sec): 11.62 - samples/sec: 2158.65 - lr: 0.100000
2023-04-20 23:16:55,732 epoch 4 - iter 33/114 - loss 0.64407808 - time (sec): 16.08 - samples/sec: 2306.68 - lr: 0.100000
2023-04-20 23:16:59,762 epoch 4 - iter 44/114 - loss 0.63869681 - time (sec): 20.11 - samples/sec: 2429.18 - lr: 0.100000
2023-04-20 23:17:03,926 epoch 4 - iter 55/114 - loss 0.62444731 - time (sec): 24.27 - samples/sec: 2512.62 - lr: 0.100000
2023-04-20 23:17:08,189 epoch 4 - iter 66/114 - loss 0.60961335 - time (sec): 28.54 

100%|██████████| 13/13 [00:04<00:00,  2.67it/s]

2023-04-20 23:17:34,505 Evaluating as a multi-label problem: False
2023-04-20 23:17:34,533 DEV : loss 0.5502644777297974 - f1-score (micro avg)  0.1827
2023-04-20 23:17:34,649 BAD EPOCHS (no improvement): 0
2023-04-20 23:17:34,656 saving best model





2023-04-20 23:17:37,220 ----------------------------------------------------------------------------------------------------
2023-04-20 23:17:40,339 epoch 5 - iter 11/114 - loss 0.60593663 - time (sec): 3.12 - samples/sec: 3740.08 - lr: 0.100000
2023-04-20 23:17:45,133 epoch 5 - iter 22/114 - loss 0.59283988 - time (sec): 7.91 - samples/sec: 3059.26 - lr: 0.100000
2023-04-20 23:17:49,792 epoch 5 - iter 33/114 - loss 0.58123691 - time (sec): 12.57 - samples/sec: 2897.75 - lr: 0.100000
2023-04-20 23:17:54,433 epoch 5 - iter 44/114 - loss 0.56540691 - time (sec): 17.21 - samples/sec: 2831.12 - lr: 0.100000
2023-04-20 23:17:59,079 epoch 5 - iter 55/114 - loss 0.55888244 - time (sec): 21.86 - samples/sec: 2832.33 - lr: 0.100000
2023-04-20 23:18:03,048 epoch 5 - iter 66/114 - loss 0.55589122 - time (sec): 25.83 - samples/sec: 2866.17 - lr: 0.100000
2023-04-20 23:18:08,713 epoch 5 - iter 77/114 - loss 0.56047282 - time (sec): 31.49 - samples/sec: 2752.59 - lr: 0.100000
2023-04-20 23:18:13,390

100%|██████████| 13/13 [00:03<00:00,  3.35it/s]

2023-04-20 23:18:28,812 Evaluating as a multi-label problem: False
2023-04-20 23:18:28,831 DEV : loss 0.5507055521011353 - f1-score (micro avg)  0.2108





2023-04-20 23:18:28,900 BAD EPOCHS (no improvement): 0
2023-04-20 23:18:28,906 saving best model
2023-04-20 23:18:30,858 ----------------------------------------------------------------------------------------------------
2023-04-20 23:18:34,519 epoch 6 - iter 11/114 - loss 0.58065917 - time (sec): 3.66 - samples/sec: 3379.24 - lr: 0.100000
2023-04-20 23:18:40,446 epoch 6 - iter 22/114 - loss 0.58905837 - time (sec): 9.59 - samples/sec: 2613.85 - lr: 0.100000
2023-04-20 23:18:45,076 epoch 6 - iter 33/114 - loss 0.55866197 - time (sec): 14.22 - samples/sec: 2628.74 - lr: 0.100000
2023-04-20 23:18:50,049 epoch 6 - iter 44/114 - loss 0.55519185 - time (sec): 19.19 - samples/sec: 2584.56 - lr: 0.100000
2023-04-20 23:18:55,431 epoch 6 - iter 55/114 - loss 0.55412162 - time (sec): 24.57 - samples/sec: 2530.30 - lr: 0.100000
2023-04-20 23:18:59,553 epoch 6 - iter 66/114 - loss 0.54728353 - time (sec): 28.69 - samples/sec: 2609.97 - lr: 0.100000
2023-04-20 23:19:03,487 epoch 6 - iter 77/114 - 

100%|██████████| 13/13 [00:06<00:00,  2.01it/s]

2023-04-20 23:19:25,165 Evaluating as a multi-label problem: False





2023-04-20 23:19:25,185 DEV : loss 0.5088964700698853 - f1-score (micro avg)  0.1656
2023-04-20 23:19:25,248 BAD EPOCHS (no improvement): 1
2023-04-20 23:19:25,253 ----------------------------------------------------------------------------------------------------
2023-04-20 23:19:29,218 epoch 7 - iter 11/114 - loss 0.50033403 - time (sec): 3.96 - samples/sec: 3098.99 - lr: 0.100000
2023-04-20 23:19:33,358 epoch 7 - iter 22/114 - loss 0.49787364 - time (sec): 8.10 - samples/sec: 3037.19 - lr: 0.100000
2023-04-20 23:19:38,300 epoch 7 - iter 33/114 - loss 0.51268468 - time (sec): 13.05 - samples/sec: 2859.42 - lr: 0.100000
2023-04-20 23:19:43,154 epoch 7 - iter 44/114 - loss 0.52353322 - time (sec): 17.90 - samples/sec: 2772.05 - lr: 0.100000
2023-04-20 23:19:47,828 epoch 7 - iter 55/114 - loss 0.52469694 - time (sec): 22.57 - samples/sec: 2772.63 - lr: 0.100000
2023-04-20 23:19:52,848 epoch 7 - iter 66/114 - loss 0.51691411 - time (sec): 27.59 - samples/sec: 2719.55 - lr: 0.100000
2023-

100%|██████████| 13/13 [00:03<00:00,  3.30it/s]

2023-04-20 23:20:16,610 Evaluating as a multi-label problem: False
2023-04-20 23:20:16,629 DEV : loss 0.4769771695137024 - f1-score (micro avg)  0.2258





2023-04-20 23:20:16,701 BAD EPOCHS (no improvement): 0
2023-04-20 23:20:16,706 saving best model
2023-04-20 23:20:18,887 ----------------------------------------------------------------------------------------------------
2023-04-20 23:20:23,268 epoch 8 - iter 11/114 - loss 0.49353748 - time (sec): 4.36 - samples/sec: 2942.20 - lr: 0.100000
2023-04-20 23:20:28,265 epoch 8 - iter 22/114 - loss 0.50814427 - time (sec): 9.35 - samples/sec: 2676.38 - lr: 0.100000
2023-04-20 23:20:33,137 epoch 8 - iter 33/114 - loss 0.50175526 - time (sec): 14.23 - samples/sec: 2584.81 - lr: 0.100000
2023-04-20 23:20:37,614 epoch 8 - iter 44/114 - loss 0.49582221 - time (sec): 18.70 - samples/sec: 2630.78 - lr: 0.100000
2023-04-20 23:20:41,566 epoch 8 - iter 55/114 - loss 0.49902424 - time (sec): 22.66 - samples/sec: 2699.69 - lr: 0.100000
2023-04-20 23:20:45,768 epoch 8 - iter 66/114 - loss 0.48948160 - time (sec): 26.86 - samples/sec: 2732.66 - lr: 0.100000
2023-04-20 23:20:49,990 epoch 8 - iter 77/114 - 

100%|██████████| 13/13 [00:05<00:00,  2.40it/s]

2023-04-20 23:21:12,633 Evaluating as a multi-label problem: False
2023-04-20 23:21:12,652 DEV : loss 0.4592331051826477 - f1-score (micro avg)  0.2425





2023-04-20 23:21:12,720 BAD EPOCHS (no improvement): 0
2023-04-20 23:21:12,740 saving best model
2023-04-20 23:21:14,691 ----------------------------------------------------------------------------------------------------
2023-04-20 23:21:19,415 epoch 9 - iter 11/114 - loss 0.48312731 - time (sec): 4.70 - samples/sec: 2526.22 - lr: 0.100000
2023-04-20 23:21:24,113 epoch 9 - iter 22/114 - loss 0.46101839 - time (sec): 9.40 - samples/sec: 2588.79 - lr: 0.100000
2023-04-20 23:21:30,623 epoch 9 - iter 33/114 - loss 0.46452660 - time (sec): 15.91 - samples/sec: 2348.34 - lr: 0.100000
2023-04-20 23:21:35,090 epoch 9 - iter 44/114 - loss 0.46562047 - time (sec): 20.37 - samples/sec: 2435.75 - lr: 0.100000
2023-04-20 23:21:39,891 epoch 9 - iter 55/114 - loss 0.46895883 - time (sec): 25.17 - samples/sec: 2471.29 - lr: 0.100000
2023-04-20 23:21:44,603 epoch 9 - iter 66/114 - loss 0.46322059 - time (sec): 29.89 - samples/sec: 2499.19 - lr: 0.100000
2023-04-20 23:21:48,907 epoch 9 - iter 77/114 - 

100%|██████████| 13/13 [00:03<00:00,  3.39it/s]

2023-04-20 23:22:07,517 Evaluating as a multi-label problem: False





2023-04-20 23:22:07,540 DEV : loss 0.44397181272506714 - f1-score (micro avg)  0.2132
2023-04-20 23:22:07,615 BAD EPOCHS (no improvement): 1
2023-04-20 23:22:07,620 ----------------------------------------------------------------------------------------------------
2023-04-20 23:22:12,262 epoch 10 - iter 11/114 - loss 0.46357768 - time (sec): 4.64 - samples/sec: 2727.25 - lr: 0.100000
2023-04-20 23:22:16,817 epoch 10 - iter 22/114 - loss 0.46510168 - time (sec): 9.19 - samples/sec: 2727.33 - lr: 0.100000
2023-04-20 23:22:21,851 epoch 10 - iter 33/114 - loss 0.47122798 - time (sec): 14.23 - samples/sec: 2652.88 - lr: 0.100000
2023-04-20 23:22:26,008 epoch 10 - iter 44/114 - loss 0.46563459 - time (sec): 18.39 - samples/sec: 2690.45 - lr: 0.100000
2023-04-20 23:22:30,530 epoch 10 - iter 55/114 - loss 0.45869367 - time (sec): 22.91 - samples/sec: 2692.30 - lr: 0.100000
2023-04-20 23:22:34,704 epoch 10 - iter 66/114 - loss 0.45373771 - time (sec): 27.08 - samples/sec: 2723.84 - lr: 0.10000

100%|██████████| 13/13 [00:05<00:00,  2.34it/s]

2023-04-20 23:22:59,978 Evaluating as a multi-label problem: False
2023-04-20 23:22:59,997 DEV : loss 0.43390411138534546 - f1-score (micro avg)  0.2305





2023-04-20 23:23:00,073 BAD EPOCHS (no improvement): 2
2023-04-20 23:23:00,079 ----------------------------------------------------------------------------------------------------
2023-04-20 23:23:03,619 epoch 11 - iter 11/114 - loss 0.42512927 - time (sec): 3.54 - samples/sec: 3414.22 - lr: 0.100000
2023-04-20 23:23:07,845 epoch 11 - iter 22/114 - loss 0.42618895 - time (sec): 7.76 - samples/sec: 3145.47 - lr: 0.100000
2023-04-20 23:23:13,155 epoch 11 - iter 33/114 - loss 0.43179881 - time (sec): 13.07 - samples/sec: 2827.26 - lr: 0.100000
2023-04-20 23:23:17,278 epoch 11 - iter 44/114 - loss 0.42099813 - time (sec): 17.20 - samples/sec: 2854.88 - lr: 0.100000
2023-04-20 23:23:21,157 epoch 11 - iter 55/114 - loss 0.42776962 - time (sec): 21.08 - samples/sec: 2912.09 - lr: 0.100000
2023-04-20 23:23:26,707 epoch 11 - iter 66/114 - loss 0.43560626 - time (sec): 26.63 - samples/sec: 2804.00 - lr: 0.100000
2023-04-20 23:23:31,493 epoch 11 - iter 77/114 - loss 0.43525574 - time (sec): 31.41

100%|██████████| 13/13 [00:04<00:00,  2.66it/s]

2023-04-20 23:23:52,277 Evaluating as a multi-label problem: False
2023-04-20 23:23:52,297 DEV : loss 0.4213787615299225 - f1-score (micro avg)  0.2082





2023-04-20 23:23:52,367 BAD EPOCHS (no improvement): 3
2023-04-20 23:23:52,372 ----------------------------------------------------------------------------------------------------
2023-04-20 23:23:56,530 epoch 12 - iter 11/114 - loss 0.39256225 - time (sec): 4.15 - samples/sec: 3012.85 - lr: 0.100000
2023-04-20 23:24:02,001 epoch 12 - iter 22/114 - loss 0.42399478 - time (sec): 9.62 - samples/sec: 2621.25 - lr: 0.100000
2023-04-20 23:24:06,136 epoch 12 - iter 33/114 - loss 0.43312361 - time (sec): 13.76 - samples/sec: 2687.71 - lr: 0.100000
2023-04-20 23:24:09,799 epoch 12 - iter 44/114 - loss 0.43281154 - time (sec): 17.42 - samples/sec: 2815.63 - lr: 0.100000
2023-04-20 23:24:14,884 epoch 12 - iter 55/114 - loss 0.43427915 - time (sec): 22.51 - samples/sec: 2740.66 - lr: 0.100000
2023-04-20 23:24:19,242 epoch 12 - iter 66/114 - loss 0.43038623 - time (sec): 26.87 - samples/sec: 2753.92 - lr: 0.100000
2023-04-20 23:24:23,728 epoch 12 - iter 77/114 - loss 0.43104277 - time (sec): 31.35

100%|██████████| 13/13 [00:05<00:00,  2.48it/s]

2023-04-20 23:24:45,033 Evaluating as a multi-label problem: False
2023-04-20 23:24:45,065 DEV : loss 0.41006913781166077 - f1-score (micro avg)  0.2711
2023-04-20 23:24:45,175 BAD EPOCHS (no improvement): 0
2023-04-20 23:24:45,187 saving best model





2023-04-20 23:24:47,469 ----------------------------------------------------------------------------------------------------
2023-04-20 23:24:50,840 epoch 13 - iter 11/114 - loss 0.39469117 - time (sec): 3.36 - samples/sec: 3565.01 - lr: 0.100000
2023-04-20 23:24:55,598 epoch 13 - iter 22/114 - loss 0.39979575 - time (sec): 8.12 - samples/sec: 3009.52 - lr: 0.100000
2023-04-20 23:25:01,569 epoch 13 - iter 33/114 - loss 0.39720307 - time (sec): 14.09 - samples/sec: 2631.52 - lr: 0.100000
2023-04-20 23:25:06,172 epoch 13 - iter 44/114 - loss 0.40828056 - time (sec): 18.70 - samples/sec: 2624.49 - lr: 0.100000
2023-04-20 23:25:10,640 epoch 13 - iter 55/114 - loss 0.41577418 - time (sec): 23.16 - samples/sec: 2641.57 - lr: 0.100000
2023-04-20 23:25:14,735 epoch 13 - iter 66/114 - loss 0.41369810 - time (sec): 27.26 - samples/sec: 2700.35 - lr: 0.100000
2023-04-20 23:25:19,726 epoch 13 - iter 77/114 - loss 0.41939023 - time (sec): 32.25 - samples/sec: 2665.47 - lr: 0.100000
2023-04-20 23:25

100%|██████████| 13/13 [00:04<00:00,  2.74it/s]

2023-04-20 23:25:40,845 Evaluating as a multi-label problem: False
2023-04-20 23:25:40,865 DEV : loss 0.39932021498680115 - f1-score (micro avg)  0.2644





2023-04-20 23:25:40,942 BAD EPOCHS (no improvement): 1
2023-04-20 23:25:40,947 ----------------------------------------------------------------------------------------------------
2023-04-20 23:25:44,694 epoch 14 - iter 11/114 - loss 0.46143817 - time (sec): 3.75 - samples/sec: 3208.82 - lr: 0.100000
2023-04-20 23:25:50,058 epoch 14 - iter 22/114 - loss 0.43789579 - time (sec): 9.11 - samples/sec: 2691.62 - lr: 0.100000
2023-04-20 23:25:54,855 epoch 14 - iter 33/114 - loss 0.43774427 - time (sec): 13.91 - samples/sec: 2686.65 - lr: 0.100000
2023-04-20 23:25:59,735 epoch 14 - iter 44/114 - loss 0.42948365 - time (sec): 18.79 - samples/sec: 2643.19 - lr: 0.100000
2023-04-20 23:26:04,092 epoch 14 - iter 55/114 - loss 0.42216488 - time (sec): 23.14 - samples/sec: 2670.25 - lr: 0.100000
2023-04-20 23:26:08,402 epoch 14 - iter 66/114 - loss 0.41117817 - time (sec): 27.45 - samples/sec: 2699.15 - lr: 0.100000
2023-04-20 23:26:12,716 epoch 14 - iter 77/114 - loss 0.41186107 - time (sec): 31.77

100%|██████████| 13/13 [00:05<00:00,  2.26it/s]

2023-04-20 23:26:33,770 Evaluating as a multi-label problem: False
2023-04-20 23:26:33,790 DEV : loss 0.3932626247406006 - f1-score (micro avg)  0.254





2023-04-20 23:26:33,855 BAD EPOCHS (no improvement): 2
2023-04-20 23:26:33,861 ----------------------------------------------------------------------------------------------------
2023-04-20 23:26:37,568 epoch 15 - iter 11/114 - loss 0.39169523 - time (sec): 3.70 - samples/sec: 3257.49 - lr: 0.100000
2023-04-20 23:26:41,383 epoch 15 - iter 22/114 - loss 0.40201333 - time (sec): 7.52 - samples/sec: 3256.02 - lr: 0.100000
2023-04-20 23:26:46,567 epoch 15 - iter 33/114 - loss 0.40342939 - time (sec): 12.70 - samples/sec: 2920.25 - lr: 0.100000
2023-04-20 23:26:51,258 epoch 15 - iter 44/114 - loss 0.39167642 - time (sec): 17.39 - samples/sec: 2845.13 - lr: 0.100000
2023-04-20 23:26:55,863 epoch 15 - iter 55/114 - loss 0.40232301 - time (sec): 22.00 - samples/sec: 2825.60 - lr: 0.100000
2023-04-20 23:27:01,413 epoch 15 - iter 66/114 - loss 0.40321960 - time (sec): 27.55 - samples/sec: 2715.10 - lr: 0.100000
2023-04-20 23:27:05,535 epoch 15 - iter 77/114 - loss 0.40155034 - time (sec): 31.67

100%|██████████| 13/13 [00:03<00:00,  3.36it/s]

2023-04-20 23:27:25,013 Evaluating as a multi-label problem: False
2023-04-20 23:27:25,032 DEV : loss 0.38680773973464966 - f1-score (micro avg)  0.2443





2023-04-20 23:27:25,098 BAD EPOCHS (no improvement): 3
2023-04-20 23:27:25,105 ----------------------------------------------------------------------------------------------------
2023-04-20 23:27:29,005 epoch 16 - iter 11/114 - loss 0.39271834 - time (sec): 3.90 - samples/sec: 3240.78 - lr: 0.100000
2023-04-20 23:27:34,837 epoch 16 - iter 22/114 - loss 0.39472101 - time (sec): 9.73 - samples/sec: 2566.45 - lr: 0.100000
2023-04-20 23:27:39,650 epoch 16 - iter 33/114 - loss 0.38440476 - time (sec): 14.54 - samples/sec: 2581.33 - lr: 0.100000
2023-04-20 23:27:44,384 epoch 16 - iter 44/114 - loss 0.39588240 - time (sec): 19.27 - samples/sec: 2602.56 - lr: 0.100000
2023-04-20 23:27:49,241 epoch 16 - iter 55/114 - loss 0.40003701 - time (sec): 24.13 - samples/sec: 2581.89 - lr: 0.100000
2023-04-20 23:27:53,719 epoch 16 - iter 66/114 - loss 0.39754217 - time (sec): 28.61 - samples/sec: 2602.78 - lr: 0.100000
2023-04-20 23:27:57,619 epoch 16 - iter 77/114 - loss 0.39446657 - time (sec): 32.51

100%|██████████| 13/13 [00:05<00:00,  2.40it/s]

2023-04-20 23:28:19,170 Evaluating as a multi-label problem: False
2023-04-20 23:28:19,201 DEV : loss 0.39348238706588745 - f1-score (micro avg)  0.2764
2023-04-20 23:28:19,312 BAD EPOCHS (no improvement): 0
2023-04-20 23:28:19,321 saving best model





2023-04-20 23:28:21,570 ----------------------------------------------------------------------------------------------------
2023-04-20 23:28:25,203 epoch 17 - iter 11/114 - loss 0.37847569 - time (sec): 3.60 - samples/sec: 3452.12 - lr: 0.100000
2023-04-20 23:28:30,298 epoch 17 - iter 22/114 - loss 0.37707333 - time (sec): 8.70 - samples/sec: 2868.14 - lr: 0.100000
2023-04-20 23:28:36,624 epoch 17 - iter 33/114 - loss 0.37298555 - time (sec): 15.02 - samples/sec: 2496.34 - lr: 0.100000
2023-04-20 23:28:40,002 epoch 17 - iter 44/114 - loss 0.37365979 - time (sec): 18.40 - samples/sec: 2677.15 - lr: 0.100000
2023-04-20 23:28:44,666 epoch 17 - iter 55/114 - loss 0.38420245 - time (sec): 23.07 - samples/sec: 2663.52 - lr: 0.100000
2023-04-20 23:28:50,000 epoch 17 - iter 66/114 - loss 0.38773152 - time (sec): 28.40 - samples/sec: 2597.84 - lr: 0.100000
2023-04-20 23:28:55,076 epoch 17 - iter 77/114 - loss 0.38326645 - time (sec): 33.48 - samples/sec: 2595.71 - lr: 0.100000
2023-04-20 23:28

100%|██████████| 13/13 [00:03<00:00,  3.38it/s]

2023-04-20 23:29:13,381 Evaluating as a multi-label problem: False
2023-04-20 23:29:13,401 DEV : loss 0.3778543770313263 - f1-score (micro avg)  0.267





2023-04-20 23:29:13,467 BAD EPOCHS (no improvement): 1
2023-04-20 23:29:13,472 ----------------------------------------------------------------------------------------------------
2023-04-20 23:29:16,826 epoch 18 - iter 11/114 - loss 0.38388559 - time (sec): 3.35 - samples/sec: 3637.37 - lr: 0.100000
2023-04-20 23:29:21,351 epoch 18 - iter 22/114 - loss 0.40071615 - time (sec): 7.87 - samples/sec: 3098.40 - lr: 0.100000
2023-04-20 23:29:25,731 epoch 18 - iter 33/114 - loss 0.39847178 - time (sec): 12.25 - samples/sec: 2987.78 - lr: 0.100000
2023-04-20 23:29:29,942 epoch 18 - iter 44/114 - loss 0.39677382 - time (sec): 16.47 - samples/sec: 2954.93 - lr: 0.100000
2023-04-20 23:29:35,099 epoch 18 - iter 55/114 - loss 0.39165630 - time (sec): 21.62 - samples/sec: 2838.49 - lr: 0.100000
2023-04-20 23:29:39,737 epoch 18 - iter 66/114 - loss 0.38557465 - time (sec): 26.26 - samples/sec: 2801.85 - lr: 0.100000
2023-04-20 23:29:43,448 epoch 18 - iter 77/114 - loss 0.38500208 - time (sec): 29.97

100%|██████████| 13/13 [00:06<00:00,  2.03it/s]

2023-04-20 23:30:07,010 Evaluating as a multi-label problem: False
2023-04-20 23:30:07,038 DEV : loss 0.37206709384918213 - f1-score (micro avg)  0.2491
2023-04-20 23:30:07,141 BAD EPOCHS (no improvement): 2
2023-04-20 23:30:07,147 ----------------------------------------------------------------------------------------------------





2023-04-20 23:30:11,013 epoch 19 - iter 11/114 - loss 0.38255257 - time (sec): 3.86 - samples/sec: 3298.65 - lr: 0.100000
2023-04-20 23:30:15,984 epoch 19 - iter 22/114 - loss 0.38393884 - time (sec): 8.83 - samples/sec: 2872.44 - lr: 0.100000
2023-04-20 23:30:20,764 epoch 19 - iter 33/114 - loss 0.37569660 - time (sec): 13.61 - samples/sec: 2750.34 - lr: 0.100000
2023-04-20 23:30:25,145 epoch 19 - iter 44/114 - loss 0.37375541 - time (sec): 17.99 - samples/sec: 2749.23 - lr: 0.100000
2023-04-20 23:30:29,515 epoch 19 - iter 55/114 - loss 0.37493413 - time (sec): 22.36 - samples/sec: 2762.91 - lr: 0.100000
2023-04-20 23:30:34,125 epoch 19 - iter 66/114 - loss 0.36959279 - time (sec): 26.97 - samples/sec: 2751.39 - lr: 0.100000
2023-04-20 23:30:39,052 epoch 19 - iter 77/114 - loss 0.37144075 - time (sec): 31.90 - samples/sec: 2720.82 - lr: 0.100000
2023-04-20 23:30:43,274 epoch 19 - iter 88/114 - loss 0.37379085 - time (sec): 36.12 - samples/sec: 2744.81 - lr: 0.100000
2023-04-20 23:30:4

100%|██████████| 13/13 [00:03<00:00,  3.37it/s]

2023-04-20 23:30:57,256 Evaluating as a multi-label problem: False
2023-04-20 23:30:57,275 DEV : loss 0.385672390460968 - f1-score (micro avg)  0.2462





2023-04-20 23:30:57,341 BAD EPOCHS (no improvement): 3
2023-04-20 23:30:57,347 ----------------------------------------------------------------------------------------------------
2023-04-20 23:31:00,791 epoch 20 - iter 11/114 - loss 0.39977209 - time (sec): 3.44 - samples/sec: 3630.48 - lr: 0.100000
2023-04-20 23:31:05,209 epoch 20 - iter 22/114 - loss 0.40089286 - time (sec): 7.86 - samples/sec: 3085.59 - lr: 0.100000
2023-04-20 23:31:10,001 epoch 20 - iter 33/114 - loss 0.39457256 - time (sec): 12.65 - samples/sec: 2877.87 - lr: 0.100000
2023-04-20 23:31:15,774 epoch 20 - iter 44/114 - loss 0.38828583 - time (sec): 18.42 - samples/sec: 2740.43 - lr: 0.100000
2023-04-20 23:31:20,202 epoch 20 - iter 55/114 - loss 0.38366564 - time (sec): 22.85 - samples/sec: 2737.99 - lr: 0.100000
2023-04-20 23:31:24,797 epoch 20 - iter 66/114 - loss 0.38164961 - time (sec): 27.45 - samples/sec: 2719.03 - lr: 0.100000
2023-04-20 23:31:29,472 epoch 20 - iter 77/114 - loss 0.37619083 - time (sec): 32.12

100%|██████████| 13/13 [00:03<00:00,  3.37it/s]

2023-04-20 23:31:48,064 Evaluating as a multi-label problem: False





2023-04-20 23:31:48,086 DEV : loss 0.3650381863117218 - f1-score (micro avg)  0.2749
2023-04-20 23:31:48,149 Epoch    20: reducing learning rate of group 0 to 5.0000e-02.
2023-04-20 23:31:48,150 BAD EPOCHS (no improvement): 4
2023-04-20 23:31:48,156 ----------------------------------------------------------------------------------------------------
2023-04-20 23:31:52,633 epoch 21 - iter 11/114 - loss 0.33550317 - time (sec): 4.47 - samples/sec: 2715.21 - lr: 0.050000
2023-04-20 23:31:56,695 epoch 21 - iter 22/114 - loss 0.34276980 - time (sec): 8.53 - samples/sec: 2846.08 - lr: 0.050000
2023-04-20 23:32:01,508 epoch 21 - iter 33/114 - loss 0.34862766 - time (sec): 13.35 - samples/sec: 2763.08 - lr: 0.050000
2023-04-20 23:32:06,882 epoch 21 - iter 44/114 - loss 0.35527086 - time (sec): 18.72 - samples/sec: 2669.52 - lr: 0.050000
2023-04-20 23:32:12,623 epoch 21 - iter 55/114 - loss 0.35453157 - time (sec): 24.46 - samples/sec: 2559.25 - lr: 0.050000
2023-04-20 23:32:16,761 epoch 21 - i

100%|██████████| 13/13 [00:04<00:00,  2.67it/s]

2023-04-20 23:32:41,930 Evaluating as a multi-label problem: False
2023-04-20 23:32:41,951 DEV : loss 0.3614034354686737 - f1-score (micro avg)  0.2887





2023-04-20 23:32:42,018 BAD EPOCHS (no improvement): 0
2023-04-20 23:32:42,022 saving best model
2023-04-20 23:32:43,958 ----------------------------------------------------------------------------------------------------
2023-04-20 23:32:48,993 epoch 22 - iter 11/114 - loss 0.36493654 - time (sec): 5.01 - samples/sec: 2621.69 - lr: 0.050000
2023-04-20 23:32:54,261 epoch 22 - iter 22/114 - loss 0.35100238 - time (sec): 10.28 - samples/sec: 2500.30 - lr: 0.050000
2023-04-20 23:32:58,756 epoch 22 - iter 33/114 - loss 0.34127564 - time (sec): 14.77 - samples/sec: 2575.32 - lr: 0.050000
2023-04-20 23:33:02,770 epoch 22 - iter 44/114 - loss 0.34503021 - time (sec): 18.78 - samples/sec: 2666.27 - lr: 0.050000
2023-04-20 23:33:06,931 epoch 22 - iter 55/114 - loss 0.34590416 - time (sec): 22.95 - samples/sec: 2711.97 - lr: 0.050000
2023-04-20 23:33:11,474 epoch 22 - iter 66/114 - loss 0.34262849 - time (sec): 27.49 - samples/sec: 2717.80 - lr: 0.050000
2023-04-20 23:33:15,856 epoch 22 - iter 7

100%|██████████| 13/13 [00:03<00:00,  3.35it/s]

2023-04-20 23:33:35,379 Evaluating as a multi-label problem: False
2023-04-20 23:33:35,399 DEV : loss 0.36018940806388855 - f1-score (micro avg)  0.2886





2023-04-20 23:33:35,465 BAD EPOCHS (no improvement): 1
2023-04-20 23:33:35,471 ----------------------------------------------------------------------------------------------------
2023-04-20 23:33:40,172 epoch 23 - iter 11/114 - loss 0.31595372 - time (sec): 4.70 - samples/sec: 2669.45 - lr: 0.050000
2023-04-20 23:33:44,808 epoch 23 - iter 22/114 - loss 0.32880604 - time (sec): 9.33 - samples/sec: 2689.66 - lr: 0.050000
2023-04-20 23:33:49,722 epoch 23 - iter 33/114 - loss 0.33859181 - time (sec): 14.25 - samples/sec: 2663.50 - lr: 0.050000
2023-04-20 23:33:54,530 epoch 23 - iter 44/114 - loss 0.34689874 - time (sec): 19.05 - samples/sec: 2636.60 - lr: 0.050000
2023-04-20 23:33:58,525 epoch 23 - iter 55/114 - loss 0.34548517 - time (sec): 23.05 - samples/sec: 2699.53 - lr: 0.050000
2023-04-20 23:34:02,620 epoch 23 - iter 66/114 - loss 0.34334534 - time (sec): 27.15 - samples/sec: 2738.37 - lr: 0.050000
2023-04-20 23:34:06,721 epoch 23 - iter 77/114 - loss 0.34362996 - time (sec): 31.25

100%|██████████| 13/13 [00:06<00:00,  2.12it/s]

2023-04-20 23:34:29,109 Evaluating as a multi-label problem: False
2023-04-20 23:34:29,132 DEV : loss 0.3532733619213104 - f1-score (micro avg)  0.27





2023-04-20 23:34:29,202 BAD EPOCHS (no improvement): 2
2023-04-20 23:34:29,208 ----------------------------------------------------------------------------------------------------
2023-04-20 23:34:32,406 epoch 24 - iter 11/114 - loss 0.36159612 - time (sec): 3.20 - samples/sec: 3772.01 - lr: 0.050000
2023-04-20 23:34:36,801 epoch 24 - iter 22/114 - loss 0.34915697 - time (sec): 7.59 - samples/sec: 3228.44 - lr: 0.050000
2023-04-20 23:34:42,374 epoch 24 - iter 33/114 - loss 0.33798448 - time (sec): 13.16 - samples/sec: 2839.48 - lr: 0.050000
2023-04-20 23:34:46,646 epoch 24 - iter 44/114 - loss 0.33370599 - time (sec): 17.44 - samples/sec: 2856.08 - lr: 0.050000
2023-04-20 23:34:52,140 epoch 24 - iter 55/114 - loss 0.34356326 - time (sec): 22.93 - samples/sec: 2738.53 - lr: 0.050000
2023-04-20 23:34:56,574 epoch 24 - iter 66/114 - loss 0.34455390 - time (sec): 27.36 - samples/sec: 2709.94 - lr: 0.050000
2023-04-20 23:35:00,739 epoch 24 - iter 77/114 - loss 0.34501916 - time (sec): 31.53

100%|██████████| 13/13 [00:03<00:00,  3.29it/s]

2023-04-20 23:35:20,912 Evaluating as a multi-label problem: False
2023-04-20 23:35:20,932 DEV : loss 0.35446885228157043 - f1-score (micro avg)  0.2937





2023-04-20 23:35:20,998 BAD EPOCHS (no improvement): 0
2023-04-20 23:35:21,003 saving best model
2023-04-20 23:35:23,185 ----------------------------------------------------------------------------------------------------
2023-04-20 23:35:27,114 epoch 25 - iter 11/114 - loss 0.32417230 - time (sec): 3.90 - samples/sec: 3000.52 - lr: 0.050000
2023-04-20 23:35:31,926 epoch 25 - iter 22/114 - loss 0.33493012 - time (sec): 8.71 - samples/sec: 2722.81 - lr: 0.050000
2023-04-20 23:35:36,783 epoch 25 - iter 33/114 - loss 0.34746729 - time (sec): 13.57 - samples/sec: 2719.36 - lr: 0.050000
2023-04-20 23:35:41,161 epoch 25 - iter 44/114 - loss 0.34713264 - time (sec): 17.95 - samples/sec: 2719.09 - lr: 0.050000
2023-04-20 23:35:46,015 epoch 25 - iter 55/114 - loss 0.34404871 - time (sec): 22.80 - samples/sec: 2684.53 - lr: 0.050000
2023-04-20 23:35:50,402 epoch 25 - iter 66/114 - loss 0.34123536 - time (sec): 27.19 - samples/sec: 2721.07 - lr: 0.050000
2023-04-20 23:35:54,071 epoch 25 - iter 77

100%|██████████| 13/13 [00:04<00:00,  2.76it/s]

2023-04-20 23:36:16,220 Evaluating as a multi-label problem: False
2023-04-20 23:36:16,242 DEV : loss 0.3500347137451172 - f1-score (micro avg)  0.2947





2023-04-20 23:36:16,313 BAD EPOCHS (no improvement): 0
2023-04-20 23:36:16,317 saving best model
2023-04-20 23:36:18,264 ----------------------------------------------------------------------------------------------------
2023-04-20 23:36:21,309 epoch 26 - iter 11/114 - loss 0.35828949 - time (sec): 3.01 - samples/sec: 3926.94 - lr: 0.050000
2023-04-20 23:36:28,704 epoch 26 - iter 22/114 - loss 0.33541846 - time (sec): 10.41 - samples/sec: 2367.13 - lr: 0.050000
2023-04-20 23:36:33,558 epoch 26 - iter 33/114 - loss 0.33384176 - time (sec): 15.26 - samples/sec: 2422.02 - lr: 0.050000
2023-04-20 23:36:38,084 epoch 26 - iter 44/114 - loss 0.33651843 - time (sec): 19.79 - samples/sec: 2517.64 - lr: 0.050000
2023-04-20 23:36:42,975 epoch 26 - iter 55/114 - loss 0.33663143 - time (sec): 24.68 - samples/sec: 2522.50 - lr: 0.050000
2023-04-20 23:36:47,631 epoch 26 - iter 66/114 - loss 0.33566537 - time (sec): 29.33 - samples/sec: 2554.84 - lr: 0.050000
2023-04-20 23:36:52,474 epoch 26 - iter 7

100%|██████████| 13/13 [00:04<00:00,  2.78it/s]

2023-04-20 23:37:12,224 Evaluating as a multi-label problem: False
2023-04-20 23:37:12,253 DEV : loss 0.3557974100112915 - f1-score (micro avg)  0.2985
2023-04-20 23:37:12,373 BAD EPOCHS (no improvement): 0
2023-04-20 23:37:12,380 saving best model





2023-04-20 23:37:14,940 ----------------------------------------------------------------------------------------------------
2023-04-20 23:37:18,732 epoch 27 - iter 11/114 - loss 0.34041347 - time (sec): 3.76 - samples/sec: 3310.47 - lr: 0.050000
2023-04-20 23:37:23,347 epoch 27 - iter 22/114 - loss 0.33676042 - time (sec): 8.38 - samples/sec: 2980.33 - lr: 0.050000
2023-04-20 23:37:28,358 epoch 27 - iter 33/114 - loss 0.32628681 - time (sec): 13.39 - samples/sec: 2774.96 - lr: 0.050000
2023-04-20 23:37:32,698 epoch 27 - iter 44/114 - loss 0.32727415 - time (sec): 17.73 - samples/sec: 2754.84 - lr: 0.050000
2023-04-20 23:37:37,163 epoch 27 - iter 55/114 - loss 0.33227436 - time (sec): 22.19 - samples/sec: 2738.28 - lr: 0.050000
2023-04-20 23:37:42,381 epoch 27 - iter 66/114 - loss 0.33297190 - time (sec): 27.41 - samples/sec: 2679.06 - lr: 0.050000
2023-04-20 23:37:47,292 epoch 27 - iter 77/114 - loss 0.33396624 - time (sec): 32.32 - samples/sec: 2671.09 - lr: 0.050000
2023-04-20 23:37

100%|██████████| 13/13 [00:03<00:00,  3.34it/s]

2023-04-20 23:38:07,020 Evaluating as a multi-label problem: False
2023-04-20 23:38:07,040 DEV : loss 0.34997889399528503 - f1-score (micro avg)  0.29





2023-04-20 23:38:07,111 BAD EPOCHS (no improvement): 1
2023-04-20 23:38:07,116 ----------------------------------------------------------------------------------------------------
2023-04-20 23:38:10,379 epoch 28 - iter 11/114 - loss 0.33354535 - time (sec): 3.26 - samples/sec: 3736.56 - lr: 0.050000
2023-04-20 23:38:15,701 epoch 28 - iter 22/114 - loss 0.33005572 - time (sec): 8.58 - samples/sec: 2929.18 - lr: 0.050000
2023-04-20 23:38:19,894 epoch 28 - iter 33/114 - loss 0.32633697 - time (sec): 12.77 - samples/sec: 2953.44 - lr: 0.050000
2023-04-20 23:38:24,617 epoch 28 - iter 44/114 - loss 0.32644667 - time (sec): 17.50 - samples/sec: 2865.37 - lr: 0.050000
2023-04-20 23:38:29,313 epoch 28 - iter 55/114 - loss 0.32835760 - time (sec): 22.19 - samples/sec: 2798.22 - lr: 0.050000
2023-04-20 23:38:34,526 epoch 28 - iter 66/114 - loss 0.32935021 - time (sec): 27.40 - samples/sec: 2706.22 - lr: 0.050000
2023-04-20 23:38:38,558 epoch 28 - iter 77/114 - loss 0.32948891 - time (sec): 31.44

100%|██████████| 13/13 [00:05<00:00,  2.30it/s]

2023-04-20 23:38:59,487 Evaluating as a multi-label problem: False
2023-04-20 23:38:59,515 DEV : loss 0.34856942296028137 - f1-score (micro avg)  0.2948
2023-04-20 23:38:59,633 BAD EPOCHS (no improvement): 2
2023-04-20 23:38:59,640 ----------------------------------------------------------------------------------------------------





2023-04-20 23:39:03,570 epoch 29 - iter 11/114 - loss 0.34513080 - time (sec): 3.93 - samples/sec: 3035.55 - lr: 0.050000
2023-04-20 23:39:08,111 epoch 29 - iter 22/114 - loss 0.35061807 - time (sec): 8.47 - samples/sec: 2883.24 - lr: 0.050000
2023-04-20 23:39:11,809 epoch 29 - iter 33/114 - loss 0.33857043 - time (sec): 12.17 - samples/sec: 2959.15 - lr: 0.050000
2023-04-20 23:39:17,448 epoch 29 - iter 44/114 - loss 0.33909390 - time (sec): 17.81 - samples/sec: 2735.18 - lr: 0.050000
2023-04-20 23:39:21,578 epoch 29 - iter 55/114 - loss 0.33321343 - time (sec): 21.94 - samples/sec: 2792.17 - lr: 0.050000
2023-04-20 23:39:26,401 epoch 29 - iter 66/114 - loss 0.33978917 - time (sec): 26.76 - samples/sec: 2760.30 - lr: 0.050000
2023-04-20 23:39:31,483 epoch 29 - iter 77/114 - loss 0.33587099 - time (sec): 31.84 - samples/sec: 2704.38 - lr: 0.050000
2023-04-20 23:39:36,363 epoch 29 - iter 88/114 - loss 0.33366011 - time (sec): 36.72 - samples/sec: 2692.05 - lr: 0.050000
2023-04-20 23:39:4

100%|██████████| 13/13 [00:04<00:00,  3.05it/s]

2023-04-20 23:39:51,548 Evaluating as a multi-label problem: False
2023-04-20 23:39:51,568 DEV : loss 0.3459908068180084 - f1-score (micro avg)  0.2702





2023-04-20 23:39:51,634 BAD EPOCHS (no improvement): 3
2023-04-20 23:39:51,639 ----------------------------------------------------------------------------------------------------
2023-04-20 23:39:54,812 epoch 30 - iter 11/114 - loss 0.33000280 - time (sec): 3.17 - samples/sec: 3744.10 - lr: 0.050000
2023-04-20 23:39:58,847 epoch 30 - iter 22/114 - loss 0.33570703 - time (sec): 7.20 - samples/sec: 3369.02 - lr: 0.050000
2023-04-20 23:40:03,844 epoch 30 - iter 33/114 - loss 0.32957002 - time (sec): 12.20 - samples/sec: 3036.12 - lr: 0.050000
2023-04-20 23:40:08,555 epoch 30 - iter 44/114 - loss 0.33107319 - time (sec): 16.91 - samples/sec: 2950.83 - lr: 0.050000
2023-04-20 23:40:13,141 epoch 30 - iter 55/114 - loss 0.33035692 - time (sec): 21.50 - samples/sec: 2907.26 - lr: 0.050000
2023-04-20 23:40:18,315 epoch 30 - iter 66/114 - loss 0.33187404 - time (sec): 26.67 - samples/sec: 2800.42 - lr: 0.050000
2023-04-20 23:40:22,660 epoch 30 - iter 77/114 - loss 0.33145072 - time (sec): 31.02

100%|██████████| 13/13 [00:03<00:00,  3.29it/s]

2023-04-20 23:40:42,402 Evaluating as a multi-label problem: False





2023-04-20 23:40:42,429 DEV : loss 0.3457631468772888 - f1-score (micro avg)  0.2924
2023-04-20 23:40:42,497 Epoch    30: reducing learning rate of group 0 to 2.5000e-02.
2023-04-20 23:40:42,500 BAD EPOCHS (no improvement): 4
2023-04-20 23:40:42,505 ----------------------------------------------------------------------------------------------------
2023-04-20 23:40:48,288 epoch 31 - iter 11/114 - loss 0.35045857 - time (sec): 5.78 - samples/sec: 2140.61 - lr: 0.025000
2023-04-20 23:40:53,024 epoch 31 - iter 22/114 - loss 0.34414505 - time (sec): 10.52 - samples/sec: 2371.95 - lr: 0.025000
2023-04-20 23:40:57,752 epoch 31 - iter 33/114 - loss 0.33734244 - time (sec): 15.24 - samples/sec: 2438.44 - lr: 0.025000
2023-04-20 23:41:02,771 epoch 31 - iter 44/114 - loss 0.33199957 - time (sec): 20.26 - samples/sec: 2464.58 - lr: 0.025000
2023-04-20 23:41:07,219 epoch 31 - iter 55/114 - loss 0.33293411 - time (sec): 24.71 - samples/sec: 2518.47 - lr: 0.025000
2023-04-20 23:41:11,590 epoch 31 - 

100%|██████████| 13/13 [00:05<00:00,  2.39it/s]

2023-04-20 23:41:36,443 Evaluating as a multi-label problem: False
2023-04-20 23:41:36,463 DEV : loss 0.3413424491882324 - f1-score (micro avg)  0.3013





2023-04-20 23:41:36,534 BAD EPOCHS (no improvement): 0
2023-04-20 23:41:36,539 saving best model
2023-04-20 23:41:38,413 ----------------------------------------------------------------------------------------------------
2023-04-20 23:41:41,900 epoch 32 - iter 11/114 - loss 0.30874579 - time (sec): 3.47 - samples/sec: 3526.02 - lr: 0.025000
2023-04-20 23:41:46,809 epoch 32 - iter 22/114 - loss 0.31373341 - time (sec): 8.38 - samples/sec: 2904.62 - lr: 0.025000
2023-04-20 23:41:52,225 epoch 32 - iter 33/114 - loss 0.31654110 - time (sec): 13.79 - samples/sec: 2653.07 - lr: 0.025000
2023-04-20 23:41:56,276 epoch 32 - iter 44/114 - loss 0.31416389 - time (sec): 17.85 - samples/sec: 2742.54 - lr: 0.025000
2023-04-20 23:42:00,474 epoch 32 - iter 55/114 - loss 0.31804466 - time (sec): 22.04 - samples/sec: 2781.74 - lr: 0.025000
2023-04-20 23:42:05,824 epoch 32 - iter 66/114 - loss 0.32099722 - time (sec): 27.39 - samples/sec: 2686.90 - lr: 0.025000
2023-04-20 23:42:10,794 epoch 32 - iter 77

100%|██████████| 13/13 [00:03<00:00,  3.38it/s]

2023-04-20 23:42:30,323 Evaluating as a multi-label problem: False
2023-04-20 23:42:30,343 DEV : loss 0.3403237760066986 - f1-score (micro avg)  0.2977





2023-04-20 23:42:30,410 BAD EPOCHS (no improvement): 1
2023-04-20 23:42:30,415 ----------------------------------------------------------------------------------------------------
2023-04-20 23:42:35,484 epoch 33 - iter 11/114 - loss 0.31283241 - time (sec): 5.06 - samples/sec: 2565.72 - lr: 0.025000
2023-04-20 23:42:39,620 epoch 33 - iter 22/114 - loss 0.31329126 - time (sec): 9.20 - samples/sec: 2691.55 - lr: 0.025000
2023-04-20 23:42:43,834 epoch 33 - iter 33/114 - loss 0.31983514 - time (sec): 13.41 - samples/sec: 2785.62 - lr: 0.025000
2023-04-20 23:42:48,603 epoch 33 - iter 44/114 - loss 0.32226407 - time (sec): 18.18 - samples/sec: 2734.72 - lr: 0.025000
2023-04-20 23:42:53,154 epoch 33 - iter 55/114 - loss 0.32282458 - time (sec): 22.73 - samples/sec: 2707.21 - lr: 0.025000
2023-04-20 23:42:57,663 epoch 33 - iter 66/114 - loss 0.32077854 - time (sec): 27.24 - samples/sec: 2704.19 - lr: 0.025000
2023-04-20 23:43:01,337 epoch 33 - iter 77/114 - loss 0.32171272 - time (sec): 30.92

100%|██████████| 13/13 [00:06<00:00,  1.98it/s]

2023-04-20 23:43:24,294 Evaluating as a multi-label problem: False
2023-04-20 23:43:24,314 DEV : loss 0.33933961391448975 - f1-score (micro avg)  0.3062





2023-04-20 23:43:24,388 BAD EPOCHS (no improvement): 0
2023-04-20 23:43:24,392 saving best model
2023-04-20 23:43:26,547 ----------------------------------------------------------------------------------------------------
2023-04-20 23:43:29,966 epoch 34 - iter 11/114 - loss 0.31339599 - time (sec): 3.41 - samples/sec: 3571.76 - lr: 0.025000
2023-04-20 23:43:34,855 epoch 34 - iter 22/114 - loss 0.31466620 - time (sec): 8.30 - samples/sec: 2973.87 - lr: 0.025000
2023-04-20 23:43:40,606 epoch 34 - iter 33/114 - loss 0.32509091 - time (sec): 14.05 - samples/sec: 2618.88 - lr: 0.025000
2023-04-20 23:43:45,438 epoch 34 - iter 44/114 - loss 0.31980707 - time (sec): 18.88 - samples/sec: 2614.87 - lr: 0.025000
2023-04-20 23:43:49,763 epoch 34 - iter 55/114 - loss 0.31890187 - time (sec): 23.21 - samples/sec: 2660.05 - lr: 0.025000
2023-04-20 23:43:54,938 epoch 34 - iter 66/114 - loss 0.31900049 - time (sec): 28.38 - samples/sec: 2615.58 - lr: 0.025000
2023-04-20 23:43:59,027 epoch 34 - iter 77

100%|██████████| 13/13 [00:03<00:00,  3.35it/s]

2023-04-20 23:44:19,091 Evaluating as a multi-label problem: False
2023-04-20 23:44:19,111 DEV : loss 0.3379875719547272 - f1-score (micro avg)  0.3051





2023-04-20 23:44:19,183 BAD EPOCHS (no improvement): 1
2023-04-20 23:44:19,189 ----------------------------------------------------------------------------------------------------
2023-04-20 23:44:23,629 epoch 35 - iter 11/114 - loss 0.31254742 - time (sec): 4.44 - samples/sec: 2803.28 - lr: 0.025000
2023-04-20 23:44:27,920 epoch 35 - iter 22/114 - loss 0.31618618 - time (sec): 8.73 - samples/sec: 2833.14 - lr: 0.025000
2023-04-20 23:44:31,790 epoch 35 - iter 33/114 - loss 0.31794200 - time (sec): 12.60 - samples/sec: 2894.72 - lr: 0.025000
2023-04-20 23:44:36,850 epoch 35 - iter 44/114 - loss 0.31875391 - time (sec): 17.66 - samples/sec: 2784.74 - lr: 0.025000
2023-04-20 23:44:41,137 epoch 35 - iter 55/114 - loss 0.31923182 - time (sec): 21.95 - samples/sec: 2790.70 - lr: 0.025000
2023-04-20 23:44:45,019 epoch 35 - iter 66/114 - loss 0.32180658 - time (sec): 25.83 - samples/sec: 2834.49 - lr: 0.025000
2023-04-20 23:44:48,909 epoch 35 - iter 77/114 - loss 0.32017698 - time (sec): 29.72

100%|██████████| 13/13 [00:05<00:00,  2.30it/s]

2023-04-20 23:45:11,347 Evaluating as a multi-label problem: False
2023-04-20 23:45:11,367 DEV : loss 0.3364706337451935 - f1-score (micro avg)  0.3001





2023-04-20 23:45:11,437 BAD EPOCHS (no improvement): 2
2023-04-20 23:45:11,442 ----------------------------------------------------------------------------------------------------
2023-04-20 23:45:16,829 epoch 36 - iter 11/114 - loss 0.30166891 - time (sec): 5.38 - samples/sec: 2366.78 - lr: 0.025000
2023-04-20 23:45:22,063 epoch 36 - iter 22/114 - loss 0.31650238 - time (sec): 10.62 - samples/sec: 2393.96 - lr: 0.025000
2023-04-20 23:45:26,867 epoch 36 - iter 33/114 - loss 0.32096102 - time (sec): 15.42 - samples/sec: 2469.45 - lr: 0.025000
2023-04-20 23:45:32,132 epoch 36 - iter 44/114 - loss 0.32205283 - time (sec): 20.68 - samples/sec: 2478.21 - lr: 0.025000
2023-04-20 23:45:35,952 epoch 36 - iter 55/114 - loss 0.31940227 - time (sec): 24.50 - samples/sec: 2582.72 - lr: 0.025000
2023-04-20 23:45:40,298 epoch 36 - iter 66/114 - loss 0.32073744 - time (sec): 28.85 - samples/sec: 2617.89 - lr: 0.025000
2023-04-20 23:45:44,918 epoch 36 - iter 77/114 - loss 0.32331401 - time (sec): 33.4

100%|██████████| 13/13 [00:03<00:00,  3.30it/s]

2023-04-20 23:46:03,770 Evaluating as a multi-label problem: False





2023-04-20 23:46:03,793 DEV : loss 0.33571454882621765 - f1-score (micro avg)  0.3092
2023-04-20 23:46:03,857 BAD EPOCHS (no improvement): 0
2023-04-20 23:46:03,862 saving best model
2023-04-20 23:46:05,824 ----------------------------------------------------------------------------------------------------
2023-04-20 23:46:10,459 epoch 37 - iter 11/114 - loss 0.32755454 - time (sec): 4.61 - samples/sec: 2746.24 - lr: 0.025000
2023-04-20 23:46:14,829 epoch 37 - iter 22/114 - loss 0.32621150 - time (sec): 8.98 - samples/sec: 2755.47 - lr: 0.025000
2023-04-20 23:46:19,517 epoch 37 - iter 33/114 - loss 0.32526168 - time (sec): 13.67 - samples/sec: 2685.66 - lr: 0.025000
2023-04-20 23:46:24,725 epoch 37 - iter 44/114 - loss 0.32690269 - time (sec): 18.88 - samples/sec: 2632.89 - lr: 0.025000
2023-04-20 23:46:29,543 epoch 37 - iter 55/114 - loss 0.33043457 - time (sec): 23.70 - samples/sec: 2616.07 - lr: 0.025000
2023-04-20 23:46:33,740 epoch 37 - iter 66/114 - loss 0.32862593 - time (sec): 

100%|██████████| 13/13 [00:04<00:00,  2.65it/s]

2023-04-20 23:46:59,695 Evaluating as a multi-label problem: False
2023-04-20 23:46:59,716 DEV : loss 0.3380512297153473 - f1-score (micro avg)  0.3004





2023-04-20 23:46:59,786 BAD EPOCHS (no improvement): 1
2023-04-20 23:46:59,791 ----------------------------------------------------------------------------------------------------
2023-04-20 23:47:03,134 epoch 38 - iter 11/114 - loss 0.30863932 - time (sec): 3.34 - samples/sec: 3694.48 - lr: 0.025000
2023-04-20 23:47:07,844 epoch 38 - iter 22/114 - loss 0.32256377 - time (sec): 8.05 - samples/sec: 3098.19 - lr: 0.025000
2023-04-20 23:47:12,597 epoch 38 - iter 33/114 - loss 0.31326756 - time (sec): 12.80 - samples/sec: 2888.71 - lr: 0.025000
2023-04-20 23:47:16,126 epoch 38 - iter 44/114 - loss 0.31125165 - time (sec): 16.33 - samples/sec: 2977.31 - lr: 0.025000
2023-04-20 23:47:20,682 epoch 38 - iter 55/114 - loss 0.31088481 - time (sec): 20.89 - samples/sec: 2904.69 - lr: 0.025000
2023-04-20 23:47:26,030 epoch 38 - iter 66/114 - loss 0.31431835 - time (sec): 26.24 - samples/sec: 2791.39 - lr: 0.025000
2023-04-20 23:47:30,677 epoch 38 - iter 77/114 - loss 0.31596549 - time (sec): 30.88

100%|██████████| 13/13 [00:04<00:00,  2.64it/s]


2023-04-20 23:47:51,253 Evaluating as a multi-label problem: False
2023-04-20 23:47:51,274 DEV : loss 0.33527660369873047 - f1-score (micro avg)  0.3
2023-04-20 23:47:51,337 BAD EPOCHS (no improvement): 2
2023-04-20 23:47:51,342 ----------------------------------------------------------------------------------------------------
2023-04-20 23:47:55,847 epoch 39 - iter 11/114 - loss 0.31013440 - time (sec): 4.49 - samples/sec: 2833.15 - lr: 0.025000
2023-04-20 23:48:00,247 epoch 39 - iter 22/114 - loss 0.31360560 - time (sec): 8.89 - samples/sec: 2809.57 - lr: 0.025000
2023-04-20 23:48:04,805 epoch 39 - iter 33/114 - loss 0.31866153 - time (sec): 13.45 - samples/sec: 2772.04 - lr: 0.025000
2023-04-20 23:48:08,910 epoch 39 - iter 44/114 - loss 0.32402335 - time (sec): 17.56 - samples/sec: 2786.28 - lr: 0.025000
2023-04-20 23:48:14,380 epoch 39 - iter 55/114 - loss 0.31850702 - time (sec): 23.03 - samples/sec: 2678.28 - lr: 0.025000
2023-04-20 23:48:18,851 epoch 39 - iter 66/114 - loss 0.3

100%|██████████| 13/13 [00:05<00:00,  2.31it/s]

2023-04-20 23:48:44,074 Evaluating as a multi-label problem: False
2023-04-20 23:48:44,103 DEV : loss 0.3357560634613037 - f1-score (micro avg)  0.3062
2023-04-20 23:48:44,223 BAD EPOCHS (no improvement): 3
2023-04-20 23:48:44,232 ----------------------------------------------------------------------------------------------------





2023-04-20 23:48:48,341 epoch 40 - iter 11/114 - loss 0.30801024 - time (sec): 4.11 - samples/sec: 2980.44 - lr: 0.025000
2023-04-20 23:48:52,395 epoch 40 - iter 22/114 - loss 0.31671924 - time (sec): 8.16 - samples/sec: 3013.37 - lr: 0.025000
2023-04-20 23:48:56,924 epoch 40 - iter 33/114 - loss 0.32002598 - time (sec): 12.69 - samples/sec: 2914.09 - lr: 0.025000
2023-04-20 23:49:01,565 epoch 40 - iter 44/114 - loss 0.31798779 - time (sec): 17.33 - samples/sec: 2848.50 - lr: 0.025000
2023-04-20 23:49:05,755 epoch 40 - iter 55/114 - loss 0.31812069 - time (sec): 21.52 - samples/sec: 2864.53 - lr: 0.025000
2023-04-20 23:49:10,358 epoch 40 - iter 66/114 - loss 0.31804493 - time (sec): 26.12 - samples/sec: 2829.32 - lr: 0.025000
2023-04-20 23:49:15,817 epoch 40 - iter 77/114 - loss 0.32075488 - time (sec): 31.58 - samples/sec: 2728.53 - lr: 0.025000
2023-04-20 23:49:20,349 epoch 40 - iter 88/114 - loss 0.31982936 - time (sec): 36.11 - samples/sec: 2745.11 - lr: 0.025000
2023-04-20 23:49:2

100%|██████████| 13/13 [00:03<00:00,  3.38it/s]

2023-04-20 23:49:35,150 Evaluating as a multi-label problem: False





2023-04-20 23:49:35,178 DEV : loss 0.33563533425331116 - f1-score (micro avg)  0.3071
2023-04-20 23:49:35,243 Epoch    40: reducing learning rate of group 0 to 1.2500e-02.
2023-04-20 23:49:35,246 BAD EPOCHS (no improvement): 4
2023-04-20 23:49:35,255 ----------------------------------------------------------------------------------------------------
2023-04-20 23:49:40,377 epoch 41 - iter 11/114 - loss 0.31741405 - time (sec): 5.12 - samples/sec: 2424.31 - lr: 0.012500
2023-04-20 23:49:45,634 epoch 41 - iter 22/114 - loss 0.31496367 - time (sec): 10.38 - samples/sec: 2367.23 - lr: 0.012500
2023-04-20 23:49:49,522 epoch 41 - iter 33/114 - loss 0.30530830 - time (sec): 14.26 - samples/sec: 2571.40 - lr: 0.012500
2023-04-20 23:49:54,092 epoch 41 - iter 44/114 - loss 0.30771358 - time (sec): 18.83 - samples/sec: 2624.81 - lr: 0.012500
2023-04-20 23:49:58,509 epoch 41 - iter 55/114 - loss 0.31258724 - time (sec): 23.25 - samples/sec: 2624.91 - lr: 0.012500
2023-04-20 23:50:03,514 epoch 41 -

100%|██████████| 13/13 [00:04<00:00,  3.03it/s]

2023-04-20 23:50:27,346 Evaluating as a multi-label problem: False
2023-04-20 23:50:27,375 DEV : loss 0.33339038491249084 - f1-score (micro avg)  0.3072
2023-04-20 23:50:27,490 BAD EPOCHS (no improvement): 1
2023-04-20 23:50:27,499 ----------------------------------------------------------------------------------------------------





2023-04-20 23:50:31,663 epoch 42 - iter 11/114 - loss 0.33099608 - time (sec): 4.16 - samples/sec: 2857.76 - lr: 0.012500
2023-04-20 23:50:35,977 epoch 42 - iter 22/114 - loss 0.31644243 - time (sec): 8.48 - samples/sec: 2848.33 - lr: 0.012500
2023-04-20 23:50:40,170 epoch 42 - iter 33/114 - loss 0.31260959 - time (sec): 12.67 - samples/sec: 2879.67 - lr: 0.012500
2023-04-20 23:50:45,075 epoch 42 - iter 44/114 - loss 0.31029466 - time (sec): 17.58 - samples/sec: 2790.95 - lr: 0.012500
2023-04-20 23:50:49,543 epoch 42 - iter 55/114 - loss 0.31176473 - time (sec): 22.04 - samples/sec: 2795.94 - lr: 0.012500
2023-04-20 23:50:53,526 epoch 42 - iter 66/114 - loss 0.31235642 - time (sec): 26.03 - samples/sec: 2821.28 - lr: 0.012500
2023-04-20 23:50:58,572 epoch 42 - iter 77/114 - loss 0.31261524 - time (sec): 31.07 - samples/sec: 2759.02 - lr: 0.012500
2023-04-20 23:51:03,878 epoch 42 - iter 88/114 - loss 0.31417005 - time (sec): 36.38 - samples/sec: 2707.62 - lr: 0.012500
2023-04-20 23:51:0

100%|██████████| 13/13 [00:04<00:00,  2.75it/s]

2023-04-20 23:51:19,971 Evaluating as a multi-label problem: False
2023-04-20 23:51:19,991 DEV : loss 0.3333870768547058 - f1-score (micro avg)  0.3073





2023-04-20 23:51:20,059 BAD EPOCHS (no improvement): 2
2023-04-20 23:51:20,064 ----------------------------------------------------------------------------------------------------
2023-04-20 23:51:23,786 epoch 43 - iter 11/114 - loss 0.29614189 - time (sec): 3.72 - samples/sec: 3341.36 - lr: 0.012500
2023-04-20 23:51:28,100 epoch 43 - iter 22/114 - loss 0.30509216 - time (sec): 8.03 - samples/sec: 3056.01 - lr: 0.012500
2023-04-20 23:51:33,161 epoch 43 - iter 33/114 - loss 0.30323558 - time (sec): 13.09 - samples/sec: 2834.95 - lr: 0.012500
2023-04-20 23:51:37,286 epoch 43 - iter 44/114 - loss 0.30590732 - time (sec): 17.22 - samples/sec: 2873.24 - lr: 0.012500
2023-04-20 23:51:40,702 epoch 43 - iter 55/114 - loss 0.31387180 - time (sec): 20.63 - samples/sec: 2972.17 - lr: 0.012500
2023-04-20 23:51:45,037 epoch 43 - iter 66/114 - loss 0.31737571 - time (sec): 24.97 - samples/sec: 2940.82 - lr: 0.012500
2023-04-20 23:51:50,736 epoch 43 - iter 77/114 - loss 0.32001632 - time (sec): 30.67

100%|██████████| 13/13 [00:04<00:00,  2.67it/s]

2023-04-20 23:52:11,492 Evaluating as a multi-label problem: False





2023-04-20 23:52:11,515 DEV : loss 0.3324538767337799 - f1-score (micro avg)  0.3067
2023-04-20 23:52:11,581 BAD EPOCHS (no improvement): 3
2023-04-20 23:52:11,585 ----------------------------------------------------------------------------------------------------
2023-04-20 23:52:16,252 epoch 44 - iter 11/114 - loss 0.31258341 - time (sec): 4.66 - samples/sec: 2653.27 - lr: 0.012500
2023-04-20 23:52:20,421 epoch 44 - iter 22/114 - loss 0.30034308 - time (sec): 8.83 - samples/sec: 2741.22 - lr: 0.012500
2023-04-20 23:52:24,894 epoch 44 - iter 33/114 - loss 0.31270623 - time (sec): 13.31 - samples/sec: 2776.02 - lr: 0.012500
2023-04-20 23:52:29,486 epoch 44 - iter 44/114 - loss 0.31370837 - time (sec): 17.90 - samples/sec: 2761.44 - lr: 0.012500
2023-04-20 23:52:35,286 epoch 44 - iter 55/114 - loss 0.31687565 - time (sec): 23.70 - samples/sec: 2616.54 - lr: 0.012500
2023-04-20 23:52:40,070 epoch 44 - iter 66/114 - loss 0.31706712 - time (sec): 28.48 - samples/sec: 2633.23 - lr: 0.012500

100%|██████████| 13/13 [00:05<00:00,  2.38it/s]

2023-04-20 23:53:05,101 Evaluating as a multi-label problem: False
2023-04-20 23:53:05,123 DEV : loss 0.33195674419403076 - f1-score (micro avg)  0.3037





2023-04-20 23:53:05,189 Epoch    44: reducing learning rate of group 0 to 6.2500e-03.
2023-04-20 23:53:05,190 BAD EPOCHS (no improvement): 4
2023-04-20 23:53:05,196 ----------------------------------------------------------------------------------------------------
2023-04-20 23:53:09,089 epoch 45 - iter 11/114 - loss 0.27912449 - time (sec): 3.89 - samples/sec: 3206.83 - lr: 0.006250
2023-04-20 23:53:13,239 epoch 45 - iter 22/114 - loss 0.30176010 - time (sec): 8.04 - samples/sec: 3032.39 - lr: 0.006250
2023-04-20 23:53:17,856 epoch 45 - iter 33/114 - loss 0.30693239 - time (sec): 12.66 - samples/sec: 2895.72 - lr: 0.006250
2023-04-20 23:53:22,024 epoch 45 - iter 44/114 - loss 0.31187535 - time (sec): 16.83 - samples/sec: 2911.42 - lr: 0.006250
2023-04-20 23:53:27,177 epoch 45 - iter 55/114 - loss 0.31170820 - time (sec): 21.98 - samples/sec: 2800.78 - lr: 0.006250
2023-04-20 23:53:32,038 epoch 45 - iter 66/114 - loss 0.31199257 - time (sec): 26.84 - samples/sec: 2739.60 - lr: 0.00625

100%|██████████| 13/13 [00:04<00:00,  2.72it/s]

2023-04-20 23:53:57,476 Evaluating as a multi-label problem: False
2023-04-20 23:53:57,498 DEV : loss 0.3318275213241577 - f1-score (micro avg)  0.3074
2023-04-20 23:53:57,562 BAD EPOCHS (no improvement): 1
2023-04-20 23:53:57,566 ----------------------------------------------------------------------------------------------------





2023-04-20 23:54:01,339 epoch 46 - iter 11/114 - loss 0.32004264 - time (sec): 3.77 - samples/sec: 3224.40 - lr: 0.006250
2023-04-20 23:54:05,999 epoch 46 - iter 22/114 - loss 0.31281400 - time (sec): 8.43 - samples/sec: 2873.80 - lr: 0.006250
2023-04-20 23:54:11,203 epoch 46 - iter 33/114 - loss 0.29836835 - time (sec): 13.64 - samples/sec: 2731.42 - lr: 0.006250
2023-04-20 23:54:15,752 epoch 46 - iter 44/114 - loss 0.30519839 - time (sec): 18.18 - samples/sec: 2725.96 - lr: 0.006250
2023-04-20 23:54:20,702 epoch 46 - iter 55/114 - loss 0.31124825 - time (sec): 23.13 - samples/sec: 2673.09 - lr: 0.006250
2023-04-20 23:54:25,439 epoch 46 - iter 66/114 - loss 0.30935142 - time (sec): 27.87 - samples/sec: 2659.91 - lr: 0.006250
2023-04-20 23:54:30,821 epoch 46 - iter 77/114 - loss 0.31150472 - time (sec): 33.25 - samples/sec: 2624.44 - lr: 0.006250
2023-04-20 23:54:35,613 epoch 46 - iter 88/114 - loss 0.31454315 - time (sec): 38.04 - samples/sec: 2612.80 - lr: 0.006250
2023-04-20 23:54:3

100%|██████████| 13/13 [00:05<00:00,  2.23it/s]

2023-04-20 23:54:50,980 Evaluating as a multi-label problem: False





2023-04-20 23:54:51,005 DEV : loss 0.331737756729126 - f1-score (micro avg)  0.3062
2023-04-20 23:54:51,071 BAD EPOCHS (no improvement): 2
2023-04-20 23:54:51,077 ----------------------------------------------------------------------------------------------------
2023-04-20 23:54:55,319 epoch 47 - iter 11/114 - loss 0.29421333 - time (sec): 4.24 - samples/sec: 2982.69 - lr: 0.006250
2023-04-20 23:55:00,452 epoch 47 - iter 22/114 - loss 0.30397809 - time (sec): 9.37 - samples/sec: 2757.92 - lr: 0.006250
2023-04-20 23:55:05,709 epoch 47 - iter 33/114 - loss 0.30696844 - time (sec): 14.63 - samples/sec: 2572.68 - lr: 0.006250
2023-04-20 23:55:09,868 epoch 47 - iter 44/114 - loss 0.31287289 - time (sec): 18.79 - samples/sec: 2655.05 - lr: 0.006250
2023-04-20 23:55:14,147 epoch 47 - iter 55/114 - loss 0.31129297 - time (sec): 23.07 - samples/sec: 2685.76 - lr: 0.006250
2023-04-20 23:55:18,630 epoch 47 - iter 66/114 - loss 0.30715277 - time (sec): 27.55 - samples/sec: 2694.63 - lr: 0.006250


100%|██████████| 13/13 [00:03<00:00,  3.34it/s]

2023-04-20 23:55:43,217 Evaluating as a multi-label problem: False
2023-04-20 23:55:43,239 DEV : loss 0.3312007188796997 - f1-score (micro avg)  0.3074





2023-04-20 23:55:43,311 BAD EPOCHS (no improvement): 3
2023-04-20 23:55:43,317 ----------------------------------------------------------------------------------------------------
2023-04-20 23:55:46,951 epoch 48 - iter 11/114 - loss 0.30568000 - time (sec): 3.63 - samples/sec: 3343.34 - lr: 0.006250
2023-04-20 23:55:51,940 epoch 48 - iter 22/114 - loss 0.30997265 - time (sec): 8.62 - samples/sec: 2746.91 - lr: 0.006250
2023-04-20 23:55:56,217 epoch 48 - iter 33/114 - loss 0.32130281 - time (sec): 12.90 - samples/sec: 2799.12 - lr: 0.006250
2023-04-20 23:56:00,267 epoch 48 - iter 44/114 - loss 0.32143386 - time (sec): 16.95 - samples/sec: 2866.39 - lr: 0.006250
2023-04-20 23:56:06,103 epoch 48 - iter 55/114 - loss 0.31856807 - time (sec): 22.78 - samples/sec: 2688.77 - lr: 0.006250
2023-04-20 23:56:10,096 epoch 48 - iter 66/114 - loss 0.31511132 - time (sec): 26.78 - samples/sec: 2759.03 - lr: 0.006250
2023-04-20 23:56:14,427 epoch 48 - iter 77/114 - loss 0.31025866 - time (sec): 31.11

100%|██████████| 13/13 [00:06<00:00,  1.94it/s]

2023-04-20 23:56:37,597 Evaluating as a multi-label problem: False
2023-04-20 23:56:37,630 DEV : loss 0.3308182954788208 - f1-score (micro avg)  0.3062
2023-04-20 23:56:37,742 Epoch    48: reducing learning rate of group 0 to 3.1250e-03.
2023-04-20 23:56:37,746 BAD EPOCHS (no improvement): 4
2023-04-20 23:56:37,753 ----------------------------------------------------------------------------------------------------





2023-04-20 23:56:41,609 epoch 49 - iter 11/114 - loss 0.29178160 - time (sec): 3.85 - samples/sec: 3213.39 - lr: 0.003125
2023-04-20 23:56:45,607 epoch 49 - iter 22/114 - loss 0.30162041 - time (sec): 7.85 - samples/sec: 3136.71 - lr: 0.003125
2023-04-20 23:56:50,217 epoch 49 - iter 33/114 - loss 0.31266581 - time (sec): 12.46 - samples/sec: 2940.68 - lr: 0.003125
2023-04-20 23:56:54,923 epoch 49 - iter 44/114 - loss 0.31547920 - time (sec): 17.17 - samples/sec: 2850.04 - lr: 0.003125
2023-04-20 23:56:58,739 epoch 49 - iter 55/114 - loss 0.31527288 - time (sec): 20.98 - samples/sec: 2897.68 - lr: 0.003125
2023-04-20 23:57:03,786 epoch 49 - iter 66/114 - loss 0.31465687 - time (sec): 26.03 - samples/sec: 2832.39 - lr: 0.003125
2023-04-20 23:57:09,186 epoch 49 - iter 77/114 - loss 0.31237204 - time (sec): 31.43 - samples/sec: 2742.41 - lr: 0.003125
2023-04-20 23:57:13,428 epoch 49 - iter 88/114 - loss 0.31383180 - time (sec): 35.67 - samples/sec: 2767.07 - lr: 0.003125
2023-04-20 23:57:1

100%|██████████| 13/13 [00:03<00:00,  3.29it/s]

2023-04-20 23:57:28,690 Evaluating as a multi-label problem: False
2023-04-20 23:57:28,711 DEV : loss 0.3307628929615021 - f1-score (micro avg)  0.3107





2023-04-20 23:57:28,779 BAD EPOCHS (no improvement): 0
2023-04-20 23:57:28,785 saving best model
2023-04-20 23:57:30,815 ----------------------------------------------------------------------------------------------------
2023-04-20 23:57:34,394 epoch 50 - iter 11/114 - loss 0.30579720 - time (sec): 3.58 - samples/sec: 3401.40 - lr: 0.003125
2023-04-20 23:57:39,680 epoch 50 - iter 22/114 - loss 0.30101394 - time (sec): 8.86 - samples/sec: 2743.31 - lr: 0.003125
2023-04-20 23:57:44,822 epoch 50 - iter 33/114 - loss 0.31023947 - time (sec): 14.01 - samples/sec: 2635.48 - lr: 0.003125
2023-04-20 23:57:49,078 epoch 50 - iter 44/114 - loss 0.31377232 - time (sec): 18.26 - samples/sec: 2710.55 - lr: 0.003125
2023-04-20 23:57:54,223 epoch 50 - iter 55/114 - loss 0.31565610 - time (sec): 23.41 - samples/sec: 2649.33 - lr: 0.003125
2023-04-20 23:57:58,812 epoch 50 - iter 66/114 - loss 0.31390509 - time (sec): 27.99 - samples/sec: 2664.28 - lr: 0.003125
2023-04-20 23:58:03,207 epoch 50 - iter 77

100%|██████████| 13/13 [00:06<00:00,  1.96it/s]

2023-04-20 23:58:25,411 Evaluating as a multi-label problem: False
2023-04-20 23:58:25,431 DEV : loss 0.33104801177978516 - f1-score (micro avg)  0.3091





2023-04-20 23:58:25,504 BAD EPOCHS (no improvement): 1
2023-04-20 23:58:25,511 ----------------------------------------------------------------------------------------------------
2023-04-20 23:58:29,162 epoch 51 - iter 11/114 - loss 0.31923972 - time (sec): 3.64 - samples/sec: 3344.88 - lr: 0.003125
2023-04-20 23:58:32,743 epoch 51 - iter 22/114 - loss 0.29589809 - time (sec): 7.22 - samples/sec: 3316.05 - lr: 0.003125
2023-04-20 23:58:37,396 epoch 51 - iter 33/114 - loss 0.29585723 - time (sec): 11.87 - samples/sec: 3018.61 - lr: 0.003125
2023-04-20 23:58:41,708 epoch 51 - iter 44/114 - loss 0.30909543 - time (sec): 16.19 - samples/sec: 2953.87 - lr: 0.003125
2023-04-20 23:58:46,867 epoch 51 - iter 55/114 - loss 0.30892720 - time (sec): 21.35 - samples/sec: 2859.35 - lr: 0.003125
2023-04-20 23:58:51,066 epoch 51 - iter 66/114 - loss 0.30643153 - time (sec): 25.54 - samples/sec: 2884.86 - lr: 0.003125
2023-04-20 23:58:55,708 epoch 51 - iter 77/114 - loss 0.31017198 - time (sec): 30.19

100%|██████████| 13/13 [00:03<00:00,  3.36it/s]


2023-04-20 23:59:16,095 Evaluating as a multi-label problem: False
2023-04-20 23:59:16,114 DEV : loss 0.330510675907135 - f1-score (micro avg)  0.3115
2023-04-20 23:59:16,183 BAD EPOCHS (no improvement): 0
2023-04-20 23:59:16,187 saving best model
2023-04-20 23:59:18,238 ----------------------------------------------------------------------------------------------------
2023-04-20 23:59:22,683 epoch 52 - iter 11/114 - loss 0.29813392 - time (sec): 4.44 - samples/sec: 2752.41 - lr: 0.003125
2023-04-20 23:59:28,414 epoch 52 - iter 22/114 - loss 0.31044593 - time (sec): 10.17 - samples/sec: 2394.42 - lr: 0.003125
2023-04-20 23:59:33,599 epoch 52 - iter 33/114 - loss 0.32093492 - time (sec): 15.36 - samples/sec: 2425.28 - lr: 0.003125
2023-04-20 23:59:37,536 epoch 52 - iter 44/114 - loss 0.32285350 - time (sec): 19.30 - samples/sec: 2565.28 - lr: 0.003125
2023-04-20 23:59:41,925 epoch 52 - iter 55/114 - loss 0.31475377 - time (sec): 23.68 - samples/sec: 2599.05 - lr: 0.003125
2023-04-20 23

100%|██████████| 13/13 [00:05<00:00,  2.22it/s]

2023-04-21 00:00:12,586 Evaluating as a multi-label problem: False





2023-04-21 00:00:12,611 DEV : loss 0.33113041520118713 - f1-score (micro avg)  0.3083
2023-04-21 00:00:12,674 BAD EPOCHS (no improvement): 1
2023-04-21 00:00:12,680 ----------------------------------------------------------------------------------------------------
2023-04-21 00:00:17,041 epoch 53 - iter 11/114 - loss 0.31530394 - time (sec): 4.36 - samples/sec: 2976.66 - lr: 0.003125
2023-04-21 00:00:21,684 epoch 53 - iter 22/114 - loss 0.31696210 - time (sec): 9.00 - samples/sec: 2838.69 - lr: 0.003125
2023-04-21 00:00:26,538 epoch 53 - iter 33/114 - loss 0.31307083 - time (sec): 13.85 - samples/sec: 2713.96 - lr: 0.003125
2023-04-21 00:00:31,072 epoch 53 - iter 44/114 - loss 0.30985512 - time (sec): 18.39 - samples/sec: 2722.57 - lr: 0.003125
2023-04-21 00:00:35,146 epoch 53 - iter 55/114 - loss 0.31298489 - time (sec): 22.46 - samples/sec: 2769.27 - lr: 0.003125
2023-04-21 00:00:39,743 epoch 53 - iter 66/114 - loss 0.30837557 - time (sec): 27.06 - samples/sec: 2764.67 - lr: 0.00312

100%|██████████| 13/13 [00:03<00:00,  3.29it/s]

2023-04-21 00:01:04,798 Evaluating as a multi-label problem: False





2023-04-21 00:01:04,823 DEV : loss 0.33041509985923767 - f1-score (micro avg)  0.3084
2023-04-21 00:01:04,895 BAD EPOCHS (no improvement): 2
2023-04-21 00:01:04,900 ----------------------------------------------------------------------------------------------------
2023-04-21 00:01:08,553 epoch 54 - iter 11/114 - loss 0.31964077 - time (sec): 3.65 - samples/sec: 3411.56 - lr: 0.003125
2023-04-21 00:01:14,134 epoch 54 - iter 22/114 - loss 0.33947442 - time (sec): 9.23 - samples/sec: 2705.17 - lr: 0.003125
2023-04-21 00:01:18,453 epoch 54 - iter 33/114 - loss 0.32876561 - time (sec): 13.55 - samples/sec: 2724.83 - lr: 0.003125
2023-04-21 00:01:22,468 epoch 54 - iter 44/114 - loss 0.32220291 - time (sec): 17.57 - samples/sec: 2800.82 - lr: 0.003125
2023-04-21 00:01:27,181 epoch 54 - iter 55/114 - loss 0.32048807 - time (sec): 22.28 - samples/sec: 2757.12 - lr: 0.003125
2023-04-21 00:01:31,927 epoch 54 - iter 66/114 - loss 0.31894281 - time (sec): 27.03 - samples/sec: 2723.34 - lr: 0.00312

100%|██████████| 13/13 [00:04<00:00,  2.62it/s]

2023-04-21 00:01:57,303 Evaluating as a multi-label problem: False
2023-04-21 00:01:57,332 DEV : loss 0.3305472731590271 - f1-score (micro avg)  0.3118
2023-04-21 00:01:57,446 BAD EPOCHS (no improvement): 0
2023-04-21 00:01:57,455 saving best model





2023-04-21 00:01:59,844 ----------------------------------------------------------------------------------------------------
2023-04-21 00:02:04,047 epoch 55 - iter 11/114 - loss 0.32930895 - time (sec): 4.20 - samples/sec: 2993.85 - lr: 0.003125
2023-04-21 00:02:08,429 epoch 55 - iter 22/114 - loss 0.32221539 - time (sec): 8.58 - samples/sec: 2910.93 - lr: 0.003125
2023-04-21 00:02:14,569 epoch 55 - iter 33/114 - loss 0.31016245 - time (sec): 14.72 - samples/sec: 2549.30 - lr: 0.003125
2023-04-21 00:02:19,169 epoch 55 - iter 44/114 - loss 0.31324308 - time (sec): 19.32 - samples/sec: 2589.49 - lr: 0.003125
2023-04-21 00:02:23,492 epoch 55 - iter 55/114 - loss 0.30916860 - time (sec): 23.65 - samples/sec: 2648.25 - lr: 0.003125
2023-04-21 00:02:27,492 epoch 55 - iter 66/114 - loss 0.31036889 - time (sec): 27.64 - samples/sec: 2705.16 - lr: 0.003125
2023-04-21 00:02:32,243 epoch 55 - iter 77/114 - loss 0.30937458 - time (sec): 32.40 - samples/sec: 2697.62 - lr: 0.003125
2023-04-21 00:02

100%|██████████| 13/13 [00:04<00:00,  2.72it/s]

2023-04-21 00:02:53,166 Evaluating as a multi-label problem: False





2023-04-21 00:02:53,203 DEV : loss 0.3301137685775757 - f1-score (micro avg)  0.3081
2023-04-21 00:02:53,285 BAD EPOCHS (no improvement): 1
2023-04-21 00:02:53,295 ----------------------------------------------------------------------------------------------------
2023-04-21 00:02:56,787 epoch 56 - iter 11/114 - loss 0.30581390 - time (sec): 3.49 - samples/sec: 3411.20 - lr: 0.003125
2023-04-21 00:03:01,935 epoch 56 - iter 22/114 - loss 0.31481231 - time (sec): 8.63 - samples/sec: 2892.33 - lr: 0.003125
2023-04-21 00:03:05,752 epoch 56 - iter 33/114 - loss 0.30841994 - time (sec): 12.45 - samples/sec: 2968.11 - lr: 0.003125
2023-04-21 00:03:09,393 epoch 56 - iter 44/114 - loss 0.31301956 - time (sec): 16.09 - samples/sec: 3062.12 - lr: 0.003125
2023-04-21 00:03:14,200 epoch 56 - iter 55/114 - loss 0.30794078 - time (sec): 20.90 - samples/sec: 2952.71 - lr: 0.003125
2023-04-21 00:03:18,789 epoch 56 - iter 66/114 - loss 0.30722193 - time (sec): 25.49 - samples/sec: 2892.41 - lr: 0.003125

100%|██████████| 13/13 [00:05<00:00,  2.44it/s]

2023-04-21 00:03:45,781 Evaluating as a multi-label problem: False
2023-04-21 00:03:45,813 DEV : loss 0.3299809992313385 - f1-score (micro avg)  0.3079
2023-04-21 00:03:45,925 BAD EPOCHS (no improvement): 2
2023-04-21 00:03:45,932 ----------------------------------------------------------------------------------------------------





2023-04-21 00:03:49,723 epoch 57 - iter 11/114 - loss 0.29284645 - time (sec): 3.79 - samples/sec: 3345.40 - lr: 0.003125
2023-04-21 00:03:53,477 epoch 57 - iter 22/114 - loss 0.31023880 - time (sec): 7.54 - samples/sec: 3207.97 - lr: 0.003125
2023-04-21 00:03:57,753 epoch 57 - iter 33/114 - loss 0.30524362 - time (sec): 11.82 - samples/sec: 3129.81 - lr: 0.003125
2023-04-21 00:04:03,924 epoch 57 - iter 44/114 - loss 0.30846457 - time (sec): 17.99 - samples/sec: 2769.38 - lr: 0.003125
2023-04-21 00:04:07,897 epoch 57 - iter 55/114 - loss 0.30362336 - time (sec): 21.96 - samples/sec: 2799.90 - lr: 0.003125
2023-04-21 00:04:12,142 epoch 57 - iter 66/114 - loss 0.30897319 - time (sec): 26.20 - samples/sec: 2821.58 - lr: 0.003125
2023-04-21 00:04:17,609 epoch 57 - iter 77/114 - loss 0.30650035 - time (sec): 31.67 - samples/sec: 2736.24 - lr: 0.003125
2023-04-21 00:04:22,410 epoch 57 - iter 88/114 - loss 0.30966181 - time (sec): 36.47 - samples/sec: 2731.23 - lr: 0.003125
2023-04-21 00:04:2

100%|██████████| 13/13 [00:03<00:00,  3.39it/s]

2023-04-21 00:04:36,953 Evaluating as a multi-label problem: False





2023-04-21 00:04:36,978 DEV : loss 0.330718994140625 - f1-score (micro avg)  0.306
2023-04-21 00:04:37,043 BAD EPOCHS (no improvement): 3
2023-04-21 00:04:37,052 ----------------------------------------------------------------------------------------------------
2023-04-21 00:04:40,870 epoch 58 - iter 11/114 - loss 0.29545291 - time (sec): 3.81 - samples/sec: 3347.57 - lr: 0.003125
2023-04-21 00:04:45,428 epoch 58 - iter 22/114 - loss 0.30055959 - time (sec): 8.37 - samples/sec: 2978.72 - lr: 0.003125
2023-04-21 00:04:50,751 epoch 58 - iter 33/114 - loss 0.30408147 - time (sec): 13.70 - samples/sec: 2724.29 - lr: 0.003125
2023-04-21 00:04:55,688 epoch 58 - iter 44/114 - loss 0.30699975 - time (sec): 18.63 - samples/sec: 2685.29 - lr: 0.003125
2023-04-21 00:04:59,755 epoch 58 - iter 55/114 - loss 0.30770310 - time (sec): 22.70 - samples/sec: 2710.91 - lr: 0.003125
2023-04-21 00:05:04,773 epoch 58 - iter 66/114 - loss 0.30969270 - time (sec): 27.72 - samples/sec: 2681.85 - lr: 0.003125
2

100%|██████████| 13/13 [00:04<00:00,  3.00it/s]

2023-04-21 00:05:29,933 Evaluating as a multi-label problem: False
2023-04-21 00:05:29,961 DEV : loss 0.33007436990737915 - f1-score (micro avg)  0.3109
2023-04-21 00:05:30,083 Epoch    58: reducing learning rate of group 0 to 1.5625e-03.
2023-04-21 00:05:30,087 BAD EPOCHS (no improvement): 4
2023-04-21 00:05:30,096 ----------------------------------------------------------------------------------------------------





2023-04-21 00:05:34,337 epoch 59 - iter 11/114 - loss 0.32454240 - time (sec): 4.24 - samples/sec: 2947.83 - lr: 0.001563
2023-04-21 00:05:38,264 epoch 59 - iter 22/114 - loss 0.30210493 - time (sec): 8.17 - samples/sec: 3017.74 - lr: 0.001563
2023-04-21 00:05:42,609 epoch 59 - iter 33/114 - loss 0.29955029 - time (sec): 12.51 - samples/sec: 2977.79 - lr: 0.001563
2023-04-21 00:05:46,825 epoch 59 - iter 44/114 - loss 0.30385125 - time (sec): 16.73 - samples/sec: 2940.44 - lr: 0.001563
2023-04-21 00:05:51,499 epoch 59 - iter 55/114 - loss 0.30951146 - time (sec): 21.40 - samples/sec: 2871.67 - lr: 0.001563
2023-04-21 00:05:56,729 epoch 59 - iter 66/114 - loss 0.31009125 - time (sec): 26.63 - samples/sec: 2770.07 - lr: 0.001563
2023-04-21 00:06:01,779 epoch 59 - iter 77/114 - loss 0.30970470 - time (sec): 31.68 - samples/sec: 2733.35 - lr: 0.001563
2023-04-21 00:06:07,119 epoch 59 - iter 88/114 - loss 0.30988101 - time (sec): 37.02 - samples/sec: 2684.72 - lr: 0.001563
2023-04-21 00:06:1

100%|██████████| 13/13 [00:04<00:00,  2.80it/s]


2023-04-21 00:06:22,758 Evaluating as a multi-label problem: False
2023-04-21 00:06:22,779 DEV : loss 0.3300893306732178 - f1-score (micro avg)  0.3088
2023-04-21 00:06:22,849 BAD EPOCHS (no improvement): 1
2023-04-21 00:06:22,856 ----------------------------------------------------------------------------------------------------
2023-04-21 00:06:27,124 epoch 60 - iter 11/114 - loss 0.29664058 - time (sec): 4.27 - samples/sec: 2923.99 - lr: 0.001563
2023-04-21 00:06:31,597 epoch 60 - iter 22/114 - loss 0.29524015 - time (sec): 8.74 - samples/sec: 2826.94 - lr: 0.001563
2023-04-21 00:06:36,089 epoch 60 - iter 33/114 - loss 0.30716897 - time (sec): 13.23 - samples/sec: 2764.83 - lr: 0.001563
2023-04-21 00:06:40,394 epoch 60 - iter 44/114 - loss 0.30365968 - time (sec): 17.54 - samples/sec: 2804.82 - lr: 0.001563
2023-04-21 00:06:44,691 epoch 60 - iter 55/114 - loss 0.30825879 - time (sec): 21.83 - samples/sec: 2821.46 - lr: 0.001563
2023-04-21 00:06:49,286 epoch 60 - iter 66/114 - loss 0

100%|██████████| 13/13 [00:04<00:00,  2.66it/s]

2023-04-21 00:07:14,948 Evaluating as a multi-label problem: False
2023-04-21 00:07:14,970 DEV : loss 0.3301243484020233 - f1-score (micro avg)  0.3075
2023-04-21 00:07:15,035 BAD EPOCHS (no improvement): 2
2023-04-21 00:07:15,048 ----------------------------------------------------------------------------------------------------





2023-04-21 00:07:19,164 epoch 61 - iter 11/114 - loss 0.32655949 - time (sec): 4.11 - samples/sec: 2984.57 - lr: 0.001563
2023-04-21 00:07:24,791 epoch 61 - iter 22/114 - loss 0.31557691 - time (sec): 9.74 - samples/sec: 2525.43 - lr: 0.001563
2023-04-21 00:07:30,013 epoch 61 - iter 33/114 - loss 0.31440105 - time (sec): 14.96 - samples/sec: 2520.51 - lr: 0.001563
2023-04-21 00:07:34,341 epoch 61 - iter 44/114 - loss 0.31572984 - time (sec): 19.29 - samples/sec: 2580.38 - lr: 0.001563
2023-04-21 00:07:38,855 epoch 61 - iter 55/114 - loss 0.31773261 - time (sec): 23.80 - samples/sec: 2590.92 - lr: 0.001563
2023-04-21 00:07:43,009 epoch 61 - iter 66/114 - loss 0.31804968 - time (sec): 27.95 - samples/sec: 2659.46 - lr: 0.001563
2023-04-21 00:07:48,088 epoch 61 - iter 77/114 - loss 0.31610954 - time (sec): 33.03 - samples/sec: 2639.45 - lr: 0.001563
2023-04-21 00:07:52,718 epoch 61 - iter 88/114 - loss 0.31286846 - time (sec): 37.66 - samples/sec: 2639.07 - lr: 0.001563
2023-04-21 00:07:5

100%|██████████| 13/13 [00:05<00:00,  2.45it/s]

2023-04-21 00:08:08,133 Evaluating as a multi-label problem: False
2023-04-21 00:08:08,154 DEV : loss 0.33031314611434937 - f1-score (micro avg)  0.3083





2023-04-21 00:08:08,224 BAD EPOCHS (no improvement): 3
2023-04-21 00:08:08,233 ----------------------------------------------------------------------------------------------------
2023-04-21 00:08:11,625 epoch 62 - iter 11/114 - loss 0.32744010 - time (sec): 3.39 - samples/sec: 3549.54 - lr: 0.001563
2023-04-21 00:08:15,289 epoch 62 - iter 22/114 - loss 0.32356040 - time (sec): 7.05 - samples/sec: 3399.09 - lr: 0.001563
2023-04-21 00:08:19,670 epoch 62 - iter 33/114 - loss 0.31243523 - time (sec): 11.43 - samples/sec: 3179.90 - lr: 0.001563
2023-04-21 00:08:24,240 epoch 62 - iter 44/114 - loss 0.31568073 - time (sec): 16.01 - samples/sec: 3028.47 - lr: 0.001563
2023-04-21 00:08:29,291 epoch 62 - iter 55/114 - loss 0.31818148 - time (sec): 21.06 - samples/sec: 2882.18 - lr: 0.001563
2023-04-21 00:08:34,031 epoch 62 - iter 66/114 - loss 0.31165314 - time (sec): 25.80 - samples/sec: 2840.29 - lr: 0.001563
2023-04-21 00:08:39,489 epoch 62 - iter 77/114 - loss 0.31242883 - time (sec): 31.25

100%|██████████| 13/13 [00:03<00:00,  3.34it/s]

2023-04-21 00:08:59,462 Evaluating as a multi-label problem: False





2023-04-21 00:08:59,494 DEV : loss 0.33039921522140503 - f1-score (micro avg)  0.3092
2023-04-21 00:08:59,564 Epoch    62: reducing learning rate of group 0 to 7.8125e-04.
2023-04-21 00:08:59,567 BAD EPOCHS (no improvement): 4
2023-04-21 00:08:59,576 ----------------------------------------------------------------------------------------------------
2023-04-21 00:09:03,291 epoch 63 - iter 11/114 - loss 0.28162012 - time (sec): 3.71 - samples/sec: 3292.76 - lr: 0.000781
2023-04-21 00:09:08,300 epoch 63 - iter 22/114 - loss 0.30721660 - time (sec): 8.72 - samples/sec: 2821.86 - lr: 0.000781
2023-04-21 00:09:12,507 epoch 63 - iter 33/114 - loss 0.30197182 - time (sec): 12.93 - samples/sec: 2853.84 - lr: 0.000781
2023-04-21 00:09:17,591 epoch 63 - iter 44/114 - loss 0.29800282 - time (sec): 18.01 - samples/sec: 2726.65 - lr: 0.000781
2023-04-21 00:09:22,701 epoch 63 - iter 55/114 - loss 0.30501735 - time (sec): 23.12 - samples/sec: 2662.49 - lr: 0.000781
2023-04-21 00:09:27,008 epoch 63 - 

100%|██████████| 13/13 [00:05<00:00,  2.23it/s]


2023-04-21 00:09:53,552 Evaluating as a multi-label problem: False
2023-04-21 00:09:53,576 DEV : loss 0.3301905691623688 - f1-score (micro avg)  0.3099
2023-04-21 00:09:53,640 BAD EPOCHS (no improvement): 1
2023-04-21 00:09:53,645 ----------------------------------------------------------------------------------------------------
2023-04-21 00:09:57,123 epoch 64 - iter 11/114 - loss 0.28264580 - time (sec): 3.48 - samples/sec: 3556.41 - lr: 0.000781
2023-04-21 00:10:01,638 epoch 64 - iter 22/114 - loss 0.29973847 - time (sec): 7.99 - samples/sec: 3100.73 - lr: 0.000781
2023-04-21 00:10:06,639 epoch 64 - iter 33/114 - loss 0.30212063 - time (sec): 12.99 - samples/sec: 2881.02 - lr: 0.000781
2023-04-21 00:10:10,965 epoch 64 - iter 44/114 - loss 0.30864323 - time (sec): 17.32 - samples/sec: 2865.24 - lr: 0.000781
2023-04-21 00:10:15,775 epoch 64 - iter 55/114 - loss 0.30932723 - time (sec): 22.13 - samples/sec: 2790.29 - lr: 0.000781
2023-04-21 00:10:19,767 epoch 64 - iter 66/114 - loss 0

100%|██████████| 13/13 [00:03<00:00,  3.36it/s]

2023-04-21 00:10:45,277 Evaluating as a multi-label problem: False





2023-04-21 00:10:45,302 DEV : loss 0.3299003541469574 - f1-score (micro avg)  0.3091
2023-04-21 00:10:45,375 BAD EPOCHS (no improvement): 2
2023-04-21 00:10:45,379 ----------------------------------------------------------------------------------------------------
2023-04-21 00:10:49,625 epoch 65 - iter 11/114 - loss 0.31818568 - time (sec): 4.24 - samples/sec: 2789.40 - lr: 0.000781
2023-04-21 00:10:55,035 epoch 65 - iter 22/114 - loss 0.31348162 - time (sec): 9.65 - samples/sec: 2561.49 - lr: 0.000781
2023-04-21 00:11:00,147 epoch 65 - iter 33/114 - loss 0.31958633 - time (sec): 14.76 - samples/sec: 2548.30 - lr: 0.000781
2023-04-21 00:11:04,488 epoch 65 - iter 44/114 - loss 0.31160611 - time (sec): 19.11 - samples/sec: 2625.00 - lr: 0.000781
2023-04-21 00:11:09,273 epoch 65 - iter 55/114 - loss 0.30994164 - time (sec): 23.89 - samples/sec: 2603.11 - lr: 0.000781
2023-04-21 00:11:13,938 epoch 65 - iter 66/114 - loss 0.31042270 - time (sec): 28.55 - samples/sec: 2615.15 - lr: 0.000781

100%|██████████| 13/13 [00:06<00:00,  2.13it/s]

2023-04-21 00:11:38,814 Evaluating as a multi-label problem: False
2023-04-21 00:11:38,843 DEV : loss 0.32997873425483704 - f1-score (micro avg)  0.3081
2023-04-21 00:11:38,962 BAD EPOCHS (no improvement): 3
2023-04-21 00:11:38,971 ----------------------------------------------------------------------------------------------------





2023-04-21 00:11:42,428 epoch 66 - iter 11/114 - loss 0.30895373 - time (sec): 3.46 - samples/sec: 3360.47 - lr: 0.000781
2023-04-21 00:11:47,270 epoch 66 - iter 22/114 - loss 0.30420337 - time (sec): 8.30 - samples/sec: 2947.69 - lr: 0.000781
2023-04-21 00:11:51,632 epoch 66 - iter 33/114 - loss 0.30295671 - time (sec): 12.66 - samples/sec: 2907.32 - lr: 0.000781
2023-04-21 00:11:55,935 epoch 66 - iter 44/114 - loss 0.30066385 - time (sec): 16.96 - samples/sec: 2849.93 - lr: 0.000781
2023-04-21 00:12:00,725 epoch 66 - iter 55/114 - loss 0.30799355 - time (sec): 21.75 - samples/sec: 2818.12 - lr: 0.000781
2023-04-21 00:12:05,938 epoch 66 - iter 66/114 - loss 0.30810483 - time (sec): 26.96 - samples/sec: 2759.66 - lr: 0.000781
2023-04-21 00:12:10,966 epoch 66 - iter 77/114 - loss 0.30556702 - time (sec): 31.99 - samples/sec: 2708.87 - lr: 0.000781
2023-04-21 00:12:15,039 epoch 66 - iter 88/114 - loss 0.30378177 - time (sec): 36.07 - samples/sec: 2747.68 - lr: 0.000781
2023-04-21 00:12:2

100%|██████████| 13/13 [00:03<00:00,  3.37it/s]

2023-04-21 00:12:29,991 Evaluating as a multi-label problem: False





2023-04-21 00:12:30,015 DEV : loss 0.32998013496398926 - f1-score (micro avg)  0.3097
2023-04-21 00:12:30,120 Epoch    66: reducing learning rate of group 0 to 3.9063e-04.
2023-04-21 00:12:30,126 BAD EPOCHS (no improvement): 4
2023-04-21 00:12:30,132 ----------------------------------------------------------------------------------------------------
2023-04-21 00:12:34,358 epoch 67 - iter 11/114 - loss 0.30012634 - time (sec): 4.22 - samples/sec: 3008.70 - lr: 0.000391
2023-04-21 00:12:38,721 epoch 67 - iter 22/114 - loss 0.30329314 - time (sec): 8.59 - samples/sec: 2824.44 - lr: 0.000391
2023-04-21 00:12:43,154 epoch 67 - iter 33/114 - loss 0.30591795 - time (sec): 13.02 - samples/sec: 2812.74 - lr: 0.000391
2023-04-21 00:12:47,188 epoch 67 - iter 44/114 - loss 0.30439039 - time (sec): 17.05 - samples/sec: 2868.59 - lr: 0.000391
2023-04-21 00:12:51,946 epoch 67 - iter 55/114 - loss 0.30698959 - time (sec): 21.81 - samples/sec: 2835.18 - lr: 0.000391
2023-04-21 00:12:57,343 epoch 67 - 

100%|██████████| 13/13 [00:03<00:00,  3.37it/s]

2023-04-21 00:13:21,138 Evaluating as a multi-label problem: False





2023-04-21 00:13:21,168 DEV : loss 0.32997235655784607 - f1-score (micro avg)  0.3095
2023-04-21 00:13:21,233 BAD EPOCHS (no improvement): 1
2023-04-21 00:13:21,240 ----------------------------------------------------------------------------------------------------
2023-04-21 00:13:26,477 epoch 68 - iter 11/114 - loss 0.28629709 - time (sec): 5.23 - samples/sec: 2339.42 - lr: 0.000391
2023-04-21 00:13:30,610 epoch 68 - iter 22/114 - loss 0.31107726 - time (sec): 9.36 - samples/sec: 2577.37 - lr: 0.000391
2023-04-21 00:13:35,034 epoch 68 - iter 33/114 - loss 0.31870036 - time (sec): 13.79 - samples/sec: 2692.64 - lr: 0.000391
2023-04-21 00:13:39,773 epoch 68 - iter 44/114 - loss 0.32104595 - time (sec): 18.53 - samples/sec: 2671.93 - lr: 0.000391
2023-04-21 00:13:44,638 epoch 68 - iter 55/114 - loss 0.32099315 - time (sec): 23.39 - samples/sec: 2633.54 - lr: 0.000391
2023-04-21 00:13:49,026 epoch 68 - iter 66/114 - loss 0.31842246 - time (sec): 27.78 - samples/sec: 2647.97 - lr: 0.00039

100%|██████████| 13/13 [00:05<00:00,  2.52it/s]

2023-04-21 00:14:14,979 Evaluating as a multi-label problem: False





2023-04-21 00:14:15,004 DEV : loss 0.3300705850124359 - f1-score (micro avg)  0.3097
2023-04-21 00:14:15,070 BAD EPOCHS (no improvement): 2
2023-04-21 00:14:15,076 ----------------------------------------------------------------------------------------------------
2023-04-21 00:14:19,027 epoch 69 - iter 11/114 - loss 0.32569372 - time (sec): 3.95 - samples/sec: 3050.22 - lr: 0.000391
2023-04-21 00:14:23,307 epoch 69 - iter 22/114 - loss 0.31764259 - time (sec): 8.23 - samples/sec: 2906.68 - lr: 0.000391
2023-04-21 00:14:28,921 epoch 69 - iter 33/114 - loss 0.31184960 - time (sec): 13.84 - samples/sec: 2650.94 - lr: 0.000391
2023-04-21 00:14:34,310 epoch 69 - iter 44/114 - loss 0.31401225 - time (sec): 19.23 - samples/sec: 2603.96 - lr: 0.000391
2023-04-21 00:14:38,185 epoch 69 - iter 55/114 - loss 0.31141003 - time (sec): 23.11 - samples/sec: 2692.96 - lr: 0.000391
2023-04-21 00:14:43,661 epoch 69 - iter 66/114 - loss 0.30982963 - time (sec): 28.58 - samples/sec: 2622.07 - lr: 0.000391

100%|██████████| 13/13 [00:03<00:00,  3.34it/s]

2023-04-21 00:15:06,911 Evaluating as a multi-label problem: False
2023-04-21 00:15:06,931 DEV : loss 0.32999199628829956 - f1-score (micro avg)  0.3097





2023-04-21 00:15:07,003 BAD EPOCHS (no improvement): 3
2023-04-21 00:15:07,008 ----------------------------------------------------------------------------------------------------
2023-04-21 00:15:11,354 epoch 70 - iter 11/114 - loss 0.30826409 - time (sec): 4.34 - samples/sec: 2918.88 - lr: 0.000391
2023-04-21 00:15:16,366 epoch 70 - iter 22/114 - loss 0.30848108 - time (sec): 9.35 - samples/sec: 2692.08 - lr: 0.000391
2023-04-21 00:15:20,767 epoch 70 - iter 33/114 - loss 0.31544903 - time (sec): 13.76 - samples/sec: 2699.10 - lr: 0.000391
2023-04-21 00:15:25,207 epoch 70 - iter 44/114 - loss 0.31576184 - time (sec): 18.20 - samples/sec: 2710.99 - lr: 0.000391
2023-04-21 00:15:29,448 epoch 70 - iter 55/114 - loss 0.31255636 - time (sec): 22.44 - samples/sec: 2726.52 - lr: 0.000391
2023-04-21 00:15:34,312 epoch 70 - iter 66/114 - loss 0.31192237 - time (sec): 27.30 - samples/sec: 2701.16 - lr: 0.000391
2023-04-21 00:15:39,036 epoch 70 - iter 77/114 - loss 0.31102915 - time (sec): 32.03

100%|██████████| 13/13 [00:06<00:00,  2.04it/s]

2023-04-21 00:16:01,480 Evaluating as a multi-label problem: False





2023-04-21 00:16:01,509 DEV : loss 0.330016165971756 - f1-score (micro avg)  0.3102
2023-04-21 00:16:01,573 Epoch    70: reducing learning rate of group 0 to 1.9531e-04.
2023-04-21 00:16:01,574 BAD EPOCHS (no improvement): 4
2023-04-21 00:16:01,582 ----------------------------------------------------------------------------------------------------
2023-04-21 00:16:04,867 epoch 71 - iter 11/114 - loss 0.29896310 - time (sec): 3.28 - samples/sec: 3781.77 - lr: 0.000195
2023-04-21 00:16:09,053 epoch 71 - iter 22/114 - loss 0.29844822 - time (sec): 7.47 - samples/sec: 3223.40 - lr: 0.000195
2023-04-21 00:16:14,597 epoch 71 - iter 33/114 - loss 0.30699607 - time (sec): 13.01 - samples/sec: 2822.70 - lr: 0.000195
2023-04-21 00:16:19,797 epoch 71 - iter 44/114 - loss 0.30455119 - time (sec): 18.21 - samples/sec: 2717.97 - lr: 0.000195
2023-04-21 00:16:24,061 epoch 71 - iter 55/114 - loss 0.30635774 - time (sec): 22.48 - samples/sec: 2755.85 - lr: 0.000195
2023-04-21 00:16:28,259 epoch 71 - it

100%|██████████| 13/13 [00:03<00:00,  3.28it/s]

2023-04-21 00:16:52,821 Evaluating as a multi-label problem: False
2023-04-21 00:16:52,842 DEV : loss 0.33000126481056213 - f1-score (micro avg)  0.3102





2023-04-21 00:16:52,910 BAD EPOCHS (no improvement): 1
2023-04-21 00:16:52,915 ----------------------------------------------------------------------------------------------------
2023-04-21 00:16:56,806 epoch 72 - iter 11/114 - loss 0.29998160 - time (sec): 3.89 - samples/sec: 3133.54 - lr: 0.000195
2023-04-21 00:17:01,617 epoch 72 - iter 22/114 - loss 0.30810177 - time (sec): 8.70 - samples/sec: 2807.04 - lr: 0.000195
2023-04-21 00:17:06,446 epoch 72 - iter 33/114 - loss 0.30288303 - time (sec): 13.53 - samples/sec: 2754.47 - lr: 0.000195
2023-04-21 00:17:10,950 epoch 72 - iter 44/114 - loss 0.30873930 - time (sec): 18.03 - samples/sec: 2762.50 - lr: 0.000195
2023-04-21 00:17:15,859 epoch 72 - iter 55/114 - loss 0.30846328 - time (sec): 22.94 - samples/sec: 2724.94 - lr: 0.000195
2023-04-21 00:17:20,387 epoch 72 - iter 66/114 - loss 0.30768216 - time (sec): 27.47 - samples/sec: 2727.18 - lr: 0.000195
2023-04-21 00:17:24,612 epoch 72 - iter 77/114 - loss 0.30899799 - time (sec): 31.70

100%|██████████| 13/13 [00:05<00:00,  2.38it/s]

2023-04-21 00:17:45,692 Evaluating as a multi-label problem: False
2023-04-21 00:17:45,724 DEV : loss 0.3299783170223236 - f1-score (micro avg)  0.3097
2023-04-21 00:17:45,836 BAD EPOCHS (no improvement): 2
2023-04-21 00:17:45,843 ----------------------------------------------------------------------------------------------------





2023-04-21 00:17:50,895 epoch 73 - iter 11/114 - loss 0.34261858 - time (sec): 5.05 - samples/sec: 2435.42 - lr: 0.000195
2023-04-21 00:17:56,393 epoch 73 - iter 22/114 - loss 0.32547579 - time (sec): 10.55 - samples/sec: 2379.93 - lr: 0.000195
2023-04-21 00:18:01,566 epoch 73 - iter 33/114 - loss 0.33217123 - time (sec): 15.72 - samples/sec: 2382.36 - lr: 0.000195
2023-04-21 00:18:05,468 epoch 73 - iter 44/114 - loss 0.32406878 - time (sec): 19.62 - samples/sec: 2537.03 - lr: 0.000195
2023-04-21 00:18:09,280 epoch 73 - iter 55/114 - loss 0.31648516 - time (sec): 23.44 - samples/sec: 2639.49 - lr: 0.000195
2023-04-21 00:18:13,923 epoch 73 - iter 66/114 - loss 0.31406913 - time (sec): 28.08 - samples/sec: 2648.26 - lr: 0.000195
2023-04-21 00:18:19,366 epoch 73 - iter 77/114 - loss 0.31172756 - time (sec): 33.52 - samples/sec: 2596.32 - lr: 0.000195
2023-04-21 00:18:23,850 epoch 73 - iter 88/114 - loss 0.31117668 - time (sec): 38.01 - samples/sec: 2602.65 - lr: 0.000195
2023-04-21 00:18:

100%|██████████| 13/13 [00:03<00:00,  3.35it/s]

2023-04-21 00:18:38,561 Evaluating as a multi-label problem: False





2023-04-21 00:18:38,584 DEV : loss 0.3299439549446106 - f1-score (micro avg)  0.3095
2023-04-21 00:18:38,651 BAD EPOCHS (no improvement): 3
2023-04-21 00:18:38,656 ----------------------------------------------------------------------------------------------------
2023-04-21 00:18:41,758 epoch 74 - iter 11/114 - loss 0.28243586 - time (sec): 3.10 - samples/sec: 3806.36 - lr: 0.000195
2023-04-21 00:18:46,373 epoch 74 - iter 22/114 - loss 0.29956792 - time (sec): 7.71 - samples/sec: 3079.77 - lr: 0.000195
2023-04-21 00:18:51,757 epoch 74 - iter 33/114 - loss 0.30795583 - time (sec): 13.10 - samples/sec: 2800.57 - lr: 0.000195
2023-04-21 00:18:56,554 epoch 74 - iter 44/114 - loss 0.31063776 - time (sec): 17.90 - samples/sec: 2755.71 - lr: 0.000195
2023-04-21 00:19:00,960 epoch 74 - iter 55/114 - loss 0.30672073 - time (sec): 22.30 - samples/sec: 2770.54 - lr: 0.000195
2023-04-21 00:19:05,626 epoch 74 - iter 66/114 - loss 0.30301300 - time (sec): 26.97 - samples/sec: 2740.15 - lr: 0.000195

100%|██████████| 13/13 [00:04<00:00,  2.89it/s]

2023-04-21 00:19:30,145 Evaluating as a multi-label problem: False
2023-04-21 00:19:30,178 DEV : loss 0.329958438873291 - f1-score (micro avg)  0.3095
2023-04-21 00:19:30,306 Epoch    74: reducing learning rate of group 0 to 9.7656e-05.
2023-04-21 00:19:30,307 BAD EPOCHS (no improvement): 4
2023-04-21 00:19:30,318 ----------------------------------------------------------------------------------------------------
2023-04-21 00:19:30,319 ----------------------------------------------------------------------------------------------------
2023-04-21 00:19:30,321 learning rate too small - quitting training!
2023-04-21 00:19:30,322 ----------------------------------------------------------------------------------------------------





2023-04-21 00:19:32,723 ----------------------------------------------------------------------------------------------------
2023-04-21 00:19:35,491 SequenceTagger predicts: Dictionary with 79 tags: O, S-LOCATION, B-LOCATION, E-LOCATION, I-LOCATION, S-ORGANIZATION, B-ORGANIZATION, E-ORGANIZATION, I-ORGANIZATION, S-DATE, B-DATE, E-DATE, I-DATE, S-PERSON, B-PERSON, E-PERSON, I-PERSON, S-NUMBER, B-NUMBER, E-NUMBER, I-NUMBER, S-ARTIFACT, B-ARTIFACT, E-ARTIFACT, I-ARTIFACT, S-LOC, B-LOC, E-LOC, I-LOC, S-OTHER, B-OTHER, E-OTHER, I-OTHER, S-DAT, B-DAT, E-DAT, I-DAT, S-EVENT, B-EVENT, E-EVENT, I-EVENT, S-ORG, B-ORG, E-ORG, I-ORG, S-PERCENT, B-PERCENT, E-PERCENT, I-PERCENT, S-PSN


100%|██████████| 14/14 [00:07<00:00,  1.93it/s]

2023-04-21 00:19:43,168 Evaluating as a multi-label problem: False
2023-04-21 00:19:43,190 0.5169	0.2257	0.3142	0.1988
2023-04-21 00:19:43,193 
Results:
- F-score (micro) 0.3142
- F-score (macro) 0.2015
- Accuracy 0.1988

By class:
              precision    recall  f1-score   support

    LOCATION     0.3134    0.0727    0.1180       289
        DATE     0.7586    0.7051    0.7309       156
      NUMBER     0.4135    0.5729    0.4803        96
ORGANIZATION     0.6111    0.1183    0.1982       186
      PERSON     0.1667    0.0968    0.1224        93
    ARTIFACT     0.6818    0.1705    0.2727        88
         DAT     0.4500    0.3000    0.3600        30
     PERCENT     0.7778    0.6087    0.6829        23
       OTHER     0.0000    0.0000    0.0000        41
         LOC     0.0000    0.0000    0.0000        37
       EVENT     0.5000    0.0294    0.0556        34
         ORG     0.7500    0.1034    0.1818        29
         PSN     0.0000    0.0000    0.0000        26
         AR




{'test_score': 0.31419939577039274,
 'dev_score_history': [0.006304176516942475,
  0.11226944667201282,
  0.16962220508866613,
  0.18269230769230768,
  0.21076573161485973,
  0.16555183946488294,
  0.22580645161290325,
  0.24250871080139372,
  0.21319018404907975,
  0.23054331864904548,
  0.20823620823620823,
  0.27114093959731544,
  0.26436781609195403,
  0.2540381791483113,
  0.24426350851221315,
  0.2764331210191083,
  0.26695526695526695,
  0.24906226556639158,
  0.24624624624624622,
  0.274932614555256,
  0.2886740331491713,
  0.28857715430861725,
  0.27002967359050445,
  0.29366736256089077,
  0.29473684210526313,
  0.2985468956406869,
  0.2899786780383795,
  0.29482071713147406,
  0.2701908957415565,
  0.2923816060398078,
  0.30127298444130124,
  0.29766123316796594,
  0.30617977528089885,
  0.30506058446186735,
  0.30006882312456984,
  0.3091525423728813,
  0.3003508771929824,
  0.3,
  0.3062200956937799,
  0.3071479122434536,
  0.3071528751753156,
  0.3072625698324022,
  0.306

## Examples of use

We create a new function in order to pass the taggers from flair for the different languages when evaluating the example sentences. We decided to use the model_path as tagger for Basque and Japanese since we could not find any Embeddings for these languages.

In [19]:
from flair.data import Sentence
from flair.models import SequenceTagger

def predict_ner_flair(model_path, text):
    """
    Args:
        model_path (str): Local path of the SequenceTagger model to load.
        text (str): text to process.

    Returns:
        sentence (Sentence): Sentence object with the recognized entities and their labels.
    """
    # If tagger_name is not provided, use the model_path
    if tagger_name is None:
        tagger = SequenceTagger.load(model_path)
    else:
        # Load the tagger based on the provided tagger_name
        tagger = SequenceTagger.load(tagger_name)

    # Create a Sentence object with the text
    sentence = Sentence(text)

    # Pass the sentence to the SequenceTagger model to perform NER prediction
    tagger.predict(sentence)

    # Return the Sentence object with the recognized entities and their tags.
    return sentence

In [10]:
#For Spanish
tagger_name='es-ner-large'
model_path='/content/drive/My Drive/ColabNotebooks/flairmodels/spanish/final-model-spanish.pt'
text= 'Juan trabaja en una empresa llamada Acme en Madrid.'
sentence=predict_ner_flair(model_path,text)

print(sentence.to_tagged_string('ner'))

Downloading pytorch_model.bin:   0%|          | 0.00/2.24G [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/616 [00:00<?, ?B/s]

Downloading (…)tencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/9.10M [00:00<?, ?B/s]

2023-04-21 21:00:49,585 SequenceTagger predicts: Dictionary with 20 tags: <unk>, O, S-LOC, S-ORG, B-PER, I-PER, E-PER, S-MISC, B-ORG, E-ORG, S-PER, I-ORG, B-LOC, E-LOC, B-MISC, E-MISC, I-MISC, I-LOC, <START>, <STOP>
Sentence[10]: "Juan trabaja en una empresa llamada Acme en Madrid." → ["Juan"/PER, "Acme"/ORG, "Madrid"/LOC]


In [11]:
#For German
tagger_name='de-ner-large'
model_path='/content/drive/My Drive/ColabNotebooks/flairmodels/german/final-model-german.pt'
text= 'Die Firma ABC GmbH mit Sitz in Berlin wurde im Jahr 2005 gegründet.'
sentence=predict_ner_flair(model_path,text)

print(sentence.to_tagged_string('ner'))

Downloading pytorch_model.bin:   0%|          | 0.00/2.24G [00:00<?, ?B/s]

2023-04-21 21:04:40,201 SequenceTagger predicts: Dictionary with 20 tags: <unk>, O, B-PER, E-PER, S-LOC, B-MISC, I-MISC, E-MISC, S-PER, B-ORG, E-ORG, S-ORG, I-ORG, B-LOC, E-LOC, S-MISC, I-PER, I-LOC, <START>, <STOP>
Sentence[14]: "Die Firma ABC GmbH mit Sitz in Berlin wurde im Jahr 2005 gegründet." → ["ABC GmbH"/ORG, "Berlin"/LOC]


In [20]:
#For Basque
tagger_name=None
model_path='/content/drive/My Drive/ColabNotebooks/flairmodels/basque/final-model-basque.pt'
text= 'Maite Perez eta Jon Anderk Eusko Jaurlaritzako Osasun Sailaren esku jarri dute beraien kontuak.'
sentence=predict_ner_flair(model_path,text)

print(sentence.to_tagged_string('ner'))

2023-04-21 21:11:18,815 SequenceTagger predicts: Dictionary with 19 tags: O, S-ORG, B-ORG, E-ORG, I-ORG, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-OTH, B-OTH, E-OTH, I-OTH, <START>, <STOP>
Sentence[15]: "Maite Perez eta Jon Anderk Eusko Jaurlaritzako Osasun Sailaren esku jarri dute beraien kontuak." → ["Maite Perez"/PER, "Jon Anderk"/PER, "Eusko Jaurlaritzako Osasun Sailaren"/ORG]


In [21]:
#For Japanese
tagger_name=None
model_path='/content/drive/My Drive/ColabNotebooks/flairmodels/japanese/final-model-japanese.pt'
text= '株式会社XYZは2023年4月20日に設立されました。' #XYZ Corporation was founded on April 20, 2023.
sentence=predict_ner_flair(model_path,text)

print(sentence.to_tagged_string('ner'))

2023-04-21 21:12:01,433 SequenceTagger predicts: Dictionary with 79 tags: O, S-LOCATION, B-LOCATION, E-LOCATION, I-LOCATION, S-ORGANIZATION, B-ORGANIZATION, E-ORGANIZATION, I-ORGANIZATION, S-DATE, B-DATE, E-DATE, I-DATE, S-PERSON, B-PERSON, E-PERSON, I-PERSON, S-NUMBER, B-NUMBER, E-NUMBER, I-NUMBER, S-ARTIFACT, B-ARTIFACT, E-ARTIFACT, I-ARTIFACT, S-LOC, B-LOC, E-LOC, I-LOC, S-OTHER, B-OTHER, E-OTHER, I-OTHER, S-DAT, B-DAT, E-DAT, I-DAT, S-EVENT, B-EVENT, E-EVENT, I-EVENT, S-ORG, B-ORG, E-ORG, I-ORG, S-PERCENT, B-PERCENT, E-PERCENT, I-PERCENT, S-PSN
Sentence[10]: "株式会社XYZは2023年4月20日に設立されました。" → ["XYZ"/ORGANIZATION, "2023年4月20日に設立されました"/DATE]


## Evaluation

We compare the performance of the different models as before:
- For the Flair model for Spanish:


```
                precision    recall  f1-score   support

         ORG     0.7857    0.8919    0.8354       111
         LOC     0.7381    0.6667    0.7006        93
         PER     0.9697    0.9697    0.9697        66
        MISC     0.5000    0.1765    0.2609        17

   micro avg     0.8085    0.7944    0.8014       287
   macro avg     0.7484    0.6762    0.6916       287
weighted avg     0.7957    0.7944    0.7886       287
```
The model performs really well. The micro avg is really good for precision, recall and f-score. A little less for the macro avg, but it's still more than 0.5, so it makes a good performance. The 'MISC' label is the worse one, but it can be due to appearing a less number of times.

- For the Flair model for German:


```
               precision    recall  f1-score   support

         TAX     0.7385    0.6957    0.7164        69
         TME     0.7778    0.6562    0.7119        32
       OTHER     0.3500    0.2188    0.2692        32
         LOC     0.5714    0.4000    0.4706        20
         PER     0.5882    0.6250    0.6061        16
         ORG     0.0000    0.0000    0.0000         2

   micro avg     0.6528    0.5497    0.5968       171
   macro avg     0.5043    0.4326    0.4624       171
weighted avg     0.6309    0.5497    0.5844       171
```
As it seems, the organization label is not being recognize but it can be because there are no organizations in the dataset, which in part is weird, but it also happens when training with TARS. Otherwise, the model could be better, since it only has a macro with values ranging between 0.4 and 0.5 and a micro between 0.5 and 0.6. 

- For the Flair model for Basque:


```
                precision    recall  f1-score   support

         LOC     0.5468    0.7048    0.6158       315
         PER     0.7519    0.6724    0.7099       293
         ORG     0.6888    0.5666    0.6217       293
         OTH     0.0000    0.0000    0.0000        30

   micro avg     0.6422    0.6284    0.6352       931
   macro avg     0.4969    0.4859    0.4869       931
weighted avg     0.6384    0.6284    0.6274       931
```
The model performance is quite good, even though it was not used any embedding from basque. 

- For the Flair model for Japanese:


```
                precision    recall  f1-score   support

    LOCATION     0.3134    0.0727    0.1180       289
        DATE     0.7586    0.7051    0.7309       156
      NUMBER     0.4135    0.5729    0.4803        96
ORGANIZATION     0.6111    0.1183    0.1982       186
      PERSON     0.1667    0.0968    0.1224        93
    ARTIFACT     0.6818    0.1705    0.2727        88
         DAT     0.4500    0.3000    0.3600        30
     PERCENT     0.7778    0.6087    0.6829        23
       OTHER     0.0000    0.0000    0.0000        41
         LOC     0.0000    0.0000    0.0000        37
       EVENT     0.5000    0.0294    0.0556        34
         ORG     0.7500    0.1034    0.1818        29
         PSN     0.0000    0.0000    0.0000        26
         ART     1.0000    0.1250    0.2222         8
         TIM     0.0000    0.0000    0.0000         9
        TIME     0.0000    0.0000    0.0000         7
       MONEY     0.0000    0.0000    0.0000         0
    
micro avg        0.5169    0.2257    0.3142      1152
macro avg        0.3778    0.1708    0.2015      1152
weighted avg     0.4479    0.2257    0.2621      1152
```
The performance of the model is not optimal, which is expected considering that Japanese embeddings were not used during training. It is reasonable to assume that the model lacks the necessary information to accurately understand and predict labels in Japanese. However, it appears that the model is able to recognize certain labels, such as percentages or dates, which are common across Indo-European languages. This suggests that the model may be leveraging its knowledge of these labels from other languages, despite the absence of Japanese embeddings. 


# Comparing the performance

In this section, we compare the performance of our models for the same language based on the results obtained from usage examples. We analyze the differences between the models to identify variations in their performance.

- Spanish examples:


```
TARS: "Juan trabaja en una empresa llamada Acme en Madrid." → ["Juan"/<unk>, "trabaja"/<unk>, "en"/<unk>, "una"/<unk>, "empresa"/<unk>, "llamada"/<unk>, "Acme"/Beginning of an organization name, "en"/<unk>, "Madrid"/Beginning of a location name, "."/<unk>]

Flair: "Juan trabaja en una empresa llamada Acme en Madrid." → ["Juan"/PER, "Acme"/ORG, "Madrid"/LOC]
```

- German examples:

```
TARS: "Die Firma ABC GmbH mit Sitz in Berlin wurde im Jahr 2005 gegründet." → ["2005"/Time]

Flair: "Die Firma ABC GmbH mit Sitz in Berlin wurde im Jahr 2005 gegründet." → ["ABC GmbH"/ORG, "Berlin"/LOC]
```
- Basque examples:


```
TARS: "Maite Perez eta Jon Anderk Eusko Jaurlaritzako Osasun Sailaren esku jarri dute beraien kontuak." → ["Maite Perez"/person entity, "Jon Anderk"/person entity, "Eusko Jaurlaritzako Osasun Sailaren"/organization entity]

Flair: "Maite Perez eta Jon Anderk Eusko Jaurlaritzako Osasun Sailaren esku jarri dute beraien kontuak." → ["Maite Perez"/PER, "Jon Anderk"/PER, "Eusko Jaurlaritzako Osasun Sailaren"/ORG]
```
- Japanese examples:


```
TARS: "株式会社XYZは2023年4月20日に設立されました。" → ["株式会社XYZ"/organization entity, "2023年4月"/date label]

Flair: "株式会社XYZは2023年4月20日に設立されました。" → ["XYZ"/ORGANIZATION, "2023年4月20日に設立されました"/DATE]
```

We have chosen examples taking into account the labels that have been best interpreted by the system to check if the model had really been well trained. With other sentences the performance varies quite a lot, since there are labels that have a very low or even null interpretation score.