# Inference

This notebook contains all the commands for model inference used in Fig. 2 of the following paper:

```
Hosseini, Nanni and Coll Ardanuy (2020), DeezyMatch: A Flexible Deep Learning Approach to Fuzzy String Matching, EMNLP: System Demonstrations.
```

Refer to the `Fig2_EMNLP_train` notebook where we train and fine-tune the models.

---

In this notebook:

* skyline1: trained on *OCR* dataset
* skyline2: trained on *WG:en+OCR* dataset
* baseline: trained on *WG:en* dataset

---

* model A: both embedding and recurrent units are frozen (i.e., their parameters are not updated during fine-tuning).
* model B: only the embedding layer is frozen. 

---

To show the impact of fine-tuning and choice of architecture on the model performance, we trained various models starting with the baseline model and included more training instances from the training set of *OCR*.

The performance of these models is then assessed on the *OCR* test set. 

Refer to the paper for more information.

## skyline1

In [1]:
from DeezyMatch import inference as dm_inference

# model inference using a model stored at pretrained_model_path and pretrained_vocab_path 
dm_inference(input_file_path="./inputs/input_dfm.yaml",
             dataset_path="./dataset/ocr_test.txt", 
             pretrained_model_path="./models/ocr_001/ocr_001.model", 
             pretrained_vocab_path="./models/ocr_001/ocr_001.vocab")


[92m2020-09-10 18:17:55[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm.yaml[0m
[92m2020-09-10 18:17:56[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m


length s1:   0%|          | 0/8508 [00:00<?, ?it/s]

[92m2020-09-10 18:17:58[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 18:17:58[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m
[92m2020-09-10 18:17:58[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                    

HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 18:18:02[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:18:02 -- Epoch: 0/0; Test; loss: 0.137; acc: 0.956; precision: 0.949, recall: 0.963, macrof1: 0.956, weightedf1: 0.956[0m
--- 6.686788320541382 seconds ---


In [2]:
from DeezyMatch import inference as dm_inference

# model inference using a model stored at pretrained_model_path and pretrained_vocab_path 
dm_inference(input_file_path="./inputs/input_dfm.yaml",
             dataset_path="./dataset/wikigaz_en_test.txt", 
             pretrained_model_path="./models/ocr_001/ocr_001.model", 
             pretrained_vocab_path="./models/ocr_001/ocr_001.vocab")


[92m2020-09-10 18:19:01[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm.yaml[0m
[92m2020-09-10 18:19:01[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:19:01[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 18:19:02[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 18:19:03[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 18:19:45[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:19:45 -- Epoch: 0/0; Test; loss: 3.691; acc: 0.642; precision: 0.718, recall: 0.470, macrof1: 0.631, weightedf1: 0.631[0m
--- 43.22682738304138 seconds ---


## skyline1b

In [23]:
from DeezyMatch import inference as dm_inference

# model inference using a model stored at pretrained_model_path and pretrained_vocab_path 
dm_inference(input_file_path="./inputs/input_dfm_b.yaml",
             dataset_path="./dataset/ocr_test.txt", 
             pretrained_model_path="./models/ocr_001b/ocr_001b.model", 
             pretrained_vocab_path="./models/ocr_001b/ocr_001b.vocab")


s1 padding:   0%|          | 0/8508 [00:00<?, ?it/s]

[92m2020-09-10 22:17:04[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_b.yaml[0m
[92m2020-09-10 22:17:04[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 22:17:04[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 22:17:04[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m
[92m2020-09-10 22:17:04[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                    

HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 22:17:07[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_22:17:07 -- Epoch: 0/0; Test; loss: 0.120; acc: 0.964; precision: 0.970, recall: 0.958, macrof1: 0.964, weightedf1: 0.964[0m
--- 3.20565128326416 seconds ---


In [24]:
from DeezyMatch import inference as dm_inference

# model inference using a model stored at pretrained_model_path and pretrained_vocab_path 
dm_inference(input_file_path="./inputs/input_dfm_b.yaml",
             dataset_path="./dataset/wikigaz_en_test.txt", 
             pretrained_model_path="./models/ocr_001b/ocr_001b.model", 
             pretrained_vocab_path="./models/ocr_001b/ocr_001b.vocab")


[92m2020-09-10 22:17:07[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_b.yaml[0m
[92m2020-09-10 22:17:07[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 22:17:07[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 22:17:08[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 22:17:09[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 22:17:57[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_22:17:57 -- Epoch: 0/0; Test; loss: 4.925; acc: 0.650; precision: 0.750, recall: 0.450, macrof1: 0.635, weightedf1: 0.635[0m
--- 49.29198336601257 seconds ---


## skyline2

In [3]:
from DeezyMatch import inference as dm_inference

# model inference using a model stored at pretrained_model_path and pretrained_vocab_path 
dm_inference(input_file_path="./inputs/input_dfm.yaml",
             dataset_path="./dataset/ocr_test.txt", 
             pretrained_model_path="./models/wikigaz_en_ocr_gru_001/wikigaz_en_ocr_gru_001.model", 
             pretrained_vocab_path="./models/wikigaz_en_ocr_gru_001/wikigaz_en_ocr_gru_001.vocab")



[92m2020-09-10 18:20:04[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm.yaml[0m
[92m2020-09-10 18:20:04[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:20:05[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 18:20:05[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 18:20:05[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 18:20:09[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:20:09 -- Epoch: 0/0; Test; loss: 0.285; acc: 0.881; precision: 0.860, recall: 0.910, macrof1: 0.881, weightedf1: 0.881[0m
--- 4.605390548706055 seconds ---


In [4]:
from DeezyMatch import inference as dm_inference

# model inference using a model stored at pretrained_model_path and pretrained_vocab_path 
dm_inference(input_file_path="./inputs/input_dfm.yaml",
             dataset_path="./dataset/wikigaz_en_test.txt", 
             pretrained_model_path="./models/wikigaz_en_ocr_gru_001/wikigaz_en_ocr_gru_001.model", 
             pretrained_vocab_path="./models/wikigaz_en_ocr_gru_001/wikigaz_en_ocr_gru_001.vocab")



[92m2020-09-10 18:20:56[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm.yaml[0m
[92m2020-09-10 18:20:56[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:20:56[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 18:20:56[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 18:20:57[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 18:21:42[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:21:42 -- Epoch: 0/0; Test; loss: 0.178; acc: 0.925; precision: 0.915, recall: 0.938, macrof1: 0.925, weightedf1: 0.925[0m
--- 46.3150749206543 seconds ---


## skyline2b

In [27]:
from DeezyMatch import inference as dm_inference

# model inference using a model stored at pretrained_model_path and pretrained_vocab_path 
dm_inference(input_file_path="./inputs/input_dfm_b.yaml",
             dataset_path="./dataset/ocr_test.txt", 
             pretrained_model_path="./models/wikigaz_en_ocr_gru_001b/wikigaz_en_ocr_gru_001b.model", 
             pretrained_vocab_path="./models/wikigaz_en_ocr_gru_001b/wikigaz_en_ocr_gru_001b.vocab")



                                                   

[92m2020-09-10 23:01:20[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_b.yaml[0m
[92m2020-09-10 23:01:20[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 23:01:20[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 23:01:20[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m
[92m2020-09-10 23:01:20[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                    

HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 23:01:23[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_23:01:23 -- Epoch: 0/0; Test; loss: 0.256; acc: 0.895; precision: 0.871, recall: 0.926, macrof1: 0.895, weightedf1: 0.895[0m
--- 3.2803962230682373 seconds ---


In [28]:
from DeezyMatch import inference as dm_inference

# model inference using a model stored at pretrained_model_path and pretrained_vocab_path 
dm_inference(input_file_path="./inputs/input_dfm_b.yaml",
             dataset_path="./dataset/wikigaz_en_test.txt", 
             pretrained_model_path="./models/wikigaz_en_ocr_gru_001b/wikigaz_en_ocr_gru_001b.model", 
             pretrained_vocab_path="./models/wikigaz_en_ocr_gru_001b/wikigaz_en_ocr_gru_001b.vocab")



[92m2020-09-10 23:01:23[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_b.yaml[0m
[92m2020-09-10 23:01:23[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 23:01:23[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 23:01:24[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 23:01:24[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 23:01:53[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_23:01:53 -- Epoch: 0/0; Test; loss: 0.176; acc: 0.926; precision: 0.908, recall: 0.949, macrof1: 0.926, weightedf1: 0.926[0m
--- 29.86833095550537 seconds ---


## baseline1_gru

In [5]:
# model inference using a model stored at pretrained_model_path and pretrained_vocab_path 
dm_inference(input_file_path="./inputs/input_dfm.yaml",
             dataset_path="./dataset/ocr_test.txt", 
             pretrained_model_path="./models/wikigaz_en_gru_001/wikigaz_en_gru_001.model", 
             pretrained_vocab_path="./models/wikigaz_en_gru_001/wikigaz_en_gru_001.vocab")



length s2:   0%|          | 0/8508 [00:00<?, ?it/s]

[92m2020-09-10 18:22:44[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm.yaml[0m
[92m2020-09-10 18:22:44[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:22:44[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 18:22:44[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m
[92m2020-09-10 18:22:44[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                    

HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 18:22:49[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:22:49 -- Epoch: 0/0; Test; loss: 1.741; acc: 0.455; precision: 0.451, recall: 0.413, macrof1: 0.454, weightedf1: 0.454[0m
--- 5.1771557331085205 seconds ---


In [6]:
# model inference using a model stored at pretrained_model_path and pretrained_vocab_path 
dm_inference(input_file_path="./inputs/input_dfm.yaml",
             dataset_path="./dataset/wikigaz_en_test.txt", 
             pretrained_model_path="./models/wikigaz_en_gru_001/wikigaz_en_gru_001.model", 
             pretrained_vocab_path="./models/wikigaz_en_gru_001/wikigaz_en_gru_001.vocab")


[92m2020-09-10 18:23:09[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm.yaml[0m
[92m2020-09-10 18:23:09[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:23:09[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 18:23:09[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 18:23:10[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 18:23:56[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:23:56 -- Epoch: 0/0; Test; loss: 0.157; acc: 0.938; precision: 0.937, recall: 0.939, macrof1: 0.938, weightedf1: 0.938[0m
--- 46.95348644256592 seconds ---


## baseline1_lstm

In [7]:
# model inference using a model stored at pretrained_model_path and pretrained_vocab_path 
dm_inference(input_file_path="./inputs/input_dfm_lstm.yaml",
             dataset_path="./dataset/ocr_test.txt", 
             pretrained_model_path="./models/wikigaz_en_lstm_001/wikigaz_en_lstm_001.model", 
             pretrained_vocab_path="./models/wikigaz_en_lstm_001/wikigaz_en_lstm_001.vocab")




[92m2020-09-10 18:25:00[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_lstm.yaml[0m
[92m2020-09-10 18:25:00[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:25:00[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 18:25:00[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 18:25:00[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 18:25:05[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:25:05 -- Epoch: 0/0; Test; loss: 1.824; acc: 0.452; precision: 0.454, recall: 0.473, macrof1: 0.452, weightedf1: 0.452[0m
--- 5.7508580684661865 seconds ---


In [8]:
# model inference using a model stored at pretrained_model_path and pretrained_vocab_path 
dm_inference(input_file_path="./inputs/input_dfm_lstm.yaml",
             dataset_path="./dataset/wikigaz_en_test.txt", 
             pretrained_model_path="./models/wikigaz_en_lstm_001/wikigaz_en_lstm_001.model", 
             pretrained_vocab_path="./models/wikigaz_en_lstm_001/wikigaz_en_lstm_001.vocab")


[92m2020-09-10 18:25:45[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_lstm.yaml[0m
[92m2020-09-10 18:25:45[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:25:45[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 18:25:45[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 18:25:46[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 18:26:35[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:26:35 -- Epoch: 0/0; Test; loss: 0.158; acc: 0.937; precision: 0.936, recall: 0.937, macrof1: 0.937, weightedf1: 0.937[0m
--- 50.00007247924805 seconds ---


## baseline1_rnn

In [9]:
# model inference using a model stored at pretrained_model_path and pretrained_vocab_path 
dm_inference(input_file_path="./inputs/input_dfm_rnn.yaml",
             dataset_path="./dataset/ocr_test.txt", 
             pretrained_model_path="./models/wikigaz_en_rnn_001/wikigaz_en_rnn_001.model", 
             pretrained_vocab_path="./models/wikigaz_en_rnn_001/wikigaz_en_rnn_001.vocab")




length s2:   0%|          | 0/8508 [00:00<?, ?it/s]

[92m2020-09-10 18:27:12[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_rnn.yaml[0m
[92m2020-09-10 18:27:12[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:27:12[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 18:27:12[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m
[92m2020-09-10 18:27:12[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                    

HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 18:27:17[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:27:17 -- Epoch: 0/0; Test; loss: 1.243; acc: 0.484; precision: 0.484, recall: 0.505, macrof1: 0.483, weightedf1: 0.483[0m
--- 4.953282833099365 seconds ---


In [10]:
# model inference using a model stored at pretrained_model_path and pretrained_vocab_path 
dm_inference(input_file_path="./inputs/input_dfm_rnn.yaml",
             dataset_path="./dataset/wikigaz_en_test.txt", 
             pretrained_model_path="./models/wikigaz_en_rnn_001/wikigaz_en_rnn_001.model", 
             pretrained_vocab_path="./models/wikigaz_en_rnn_001/wikigaz_en_rnn_001.vocab")


[92m2020-09-10 18:27:42[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_rnn.yaml[0m
[92m2020-09-10 18:27:42[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:27:42[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 18:27:42[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 18:27:43[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 18:28:28[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:28:28 -- Epoch: 0/0; Test; loss: 0.219; acc: 0.909; precision: 0.920, recall: 0.897, macrof1: 0.909, weightedf1: 0.909[0m
--- 46.63703799247742 seconds ---


## Fine-Tuned, model A, GRU

In [11]:
from DeezyMatch import finetune as dm_finetune

for n_ft_examples in [250, 500, 1000, 2000, 4000, 8000, 16000, 32000, 64000, 84000]:
    print("---------", n_ft_examples)
    # model inference using a model stored at pretrained_model_path and pretrained_vocab_path 
    dm_inference(input_file_path="./inputs/input_dfm_gru_model_A.yaml",
             dataset_path="./dataset/ocr_test.txt", 
             pretrained_model_path=f"./models/wikigaz_en_ft_ocr_gru_v001_n{n_ft_examples}/wikigaz_en_ft_ocr_gru_v001_n{n_ft_examples}.model", 
             pretrained_vocab_path=f"./models/wikigaz_en_ft_ocr_gru_v001_n{n_ft_examples}/wikigaz_en_ft_ocr_gru_v001_n{n_ft_examples}.vocab")
    
    

--------- 250
[92m2020-09-10 18:31:05[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_gru_model_A.yaml[0m
[92m2020-09-10 18:31:05[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:31:06[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m


                                                    

[92m2020-09-10 18:31:06[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m
[92m2020-09-10 18:31:06[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 18:31:10[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:31:10 -- Epoch: 0/0; Test; loss: 0.905; acc: 0.602; precision: 0.590, recall: 0.669, macrof1: 0.600, weightedf1: 0.600[0m
--- 4.97077751159668 seconds ---
--------- 500
[92m2020-09-10 18:31:10[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_gru_model_A.yaml[0m
[92m2020-09-10 18:31:10[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:31:10[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 18:31:11[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 18:31:11[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 18:31:15[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:31:15 -- Epoch: 0/0; Test; loss: 0.715; acc: 0.677; precision: 0.659, recall: 0.732, macrof1: 0.676, weightedf1: 0.676[0m
--- 4.442528486251831 seconds ---
--------- 1000
[92m2020-09-10 18:31:15[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_gru_model_A.yaml[0m
[92m2020-09-10 18:31:15[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:31:15[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 18:31:15[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 18:31:15[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 18:31:19[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:31:19 -- Epoch: 0/0; Test; loss: 0.585; acc: 0.743; precision: 0.745, recall: 0.740, macrof1: 0.743, weightedf1: 0.743[0m
--- 4.261778354644775 seconds ---
--------- 2000
[92m2020-09-10 18:31:19[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_gru_model_A.yaml[0m
[92m2020-09-10 18:31:19[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:31:19[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 18:31:19[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 18:31:19[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 18:31:23[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:31:23 -- Epoch: 0/0; Test; loss: 0.495; acc: 0.787; precision: 0.774, recall: 0.811, macrof1: 0.787, weightedf1: 0.787[0m
--- 4.284550905227661 seconds ---
--------- 4000
[92m2020-09-10 18:31:23[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_gru_model_A.yaml[0m
[92m2020-09-10 18:31:23[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:31:24[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 18:31:24[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 18:31:24[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 18:31:28[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:31:28 -- Epoch: 0/0; Test; loss: 0.423; acc: 0.824; precision: 0.818, recall: 0.834, macrof1: 0.824, weightedf1: 0.824[0m
--- 4.089308261871338 seconds ---
--------- 8000
[92m2020-09-10 18:31:28[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_gru_model_A.yaml[0m
[92m2020-09-10 18:31:28[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:31:28[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 18:31:28[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 18:31:28[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 18:31:32[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:31:32 -- Epoch: 0/0; Test; loss: 0.362; acc: 0.851; precision: 0.839, recall: 0.869, macrof1: 0.851, weightedf1: 0.851[0m
--- 4.261324405670166 seconds ---
--------- 16000
[92m2020-09-10 18:31:32[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_gru_model_A.yaml[0m
[92m2020-09-10 18:31:32[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:31:32[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 18:31:32[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 18:31:32[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 18:31:36[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:31:36 -- Epoch: 0/0; Test; loss: 0.306; acc: 0.878; precision: 0.861, recall: 0.902, macrof1: 0.878, weightedf1: 0.878[0m
--- 4.272459030151367 seconds ---
--------- 32000
[92m2020-09-10 18:31:36[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_gru_model_A.yaml[0m
[92m2020-09-10 18:31:36[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:31:36[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 18:31:36[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 18:31:36[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 18:31:40[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:31:40 -- Epoch: 0/0; Test; loss: 0.265; acc: 0.896; precision: 0.898, recall: 0.894, macrof1: 0.896, weightedf1: 0.896[0m
--- 4.092085123062134 seconds ---
--------- 64000
[92m2020-09-10 18:31:40[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_gru_model_A.yaml[0m
[92m2020-09-10 18:31:40[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:31:40[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 18:31:40[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 18:31:40[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 18:31:44[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:31:44 -- Epoch: 0/0; Test; loss: 0.230; acc: 0.915; precision: 0.909, recall: 0.922, macrof1: 0.915, weightedf1: 0.915[0m
--- 4.144334077835083 seconds ---
--------- 84000
[92m2020-09-10 18:31:44[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_gru_model_A.yaml[0m
[92m2020-09-10 18:31:44[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:31:44[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 18:31:45[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 18:31:45[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 18:31:49[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:31:49 -- Epoch: 0/0; Test; loss: 0.211; acc: 0.924; precision: 0.914, recall: 0.936, macrof1: 0.924, weightedf1: 0.924[0m
--- 4.116542816162109 seconds ---


In [12]:
from DeezyMatch import finetune as dm_finetune

for n_ft_examples in [250, 500, 1000, 2000, 4000, 8000, 16000, 32000, 64000, 84000]:
    print("---------", n_ft_examples)
    # model inference using a model stored at pretrained_model_path and pretrained_vocab_path 
    dm_inference(input_file_path="./inputs/input_dfm_gru_model_A.yaml",
             dataset_path="./dataset/wikigaz_en_test.txt", 
             pretrained_model_path=f"./models/wikigaz_en_ft_ocr_gru_v001_n{n_ft_examples}/wikigaz_en_ft_ocr_gru_v001_n{n_ft_examples}.model", 
             pretrained_vocab_path=f"./models/wikigaz_en_ft_ocr_gru_v001_n{n_ft_examples}/wikigaz_en_ft_ocr_gru_v001_n{n_ft_examples}.vocab")
    
    

--------- 250
[92m2020-09-10 18:33:32[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_gru_model_A.yaml[0m
[92m2020-09-10 18:33:32[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:33:32[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 18:33:32[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 18:33:33[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 18:34:14[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:34:14 -- Epoch: 0/0; Test; loss: 0.241; acc: 0.905; precision: 0.889, recall: 0.927, macrof1: 0.905, weightedf1: 0.905[0m
--- 41.99560475349426 seconds ---
--------- 500
[92m2020-09-10 18:34:14[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_gru_model_A.yaml[0m
[92m2020-09-10 18:34:14[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:34:14[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 18:34:14[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 18:34:15[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 18:34:56[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:34:56 -- Epoch: 0/0; Test; loss: 0.320; acc: 0.874; precision: 0.857, recall: 0.897, macrof1: 0.874, weightedf1: 0.874[0m
--- 41.614906311035156 seconds ---
--------- 1000
[92m2020-09-10 18:34:56[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_gru_model_A.yaml[0m
[92m2020-09-10 18:34:56[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:34:56[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 18:34:56[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 18:34:57[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 18:35:41[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:35:41 -- Epoch: 0/0; Test; loss: 0.457; acc: 0.818; precision: 0.841, recall: 0.785, macrof1: 0.818, weightedf1: 0.818[0m
--- 44.90234637260437 seconds ---
--------- 2000
[92m2020-09-10 18:35:41[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_gru_model_A.yaml[0m
[92m2020-09-10 18:35:41[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:35:41[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 18:35:41[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 18:35:42[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 18:36:25[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:36:25 -- Epoch: 0/0; Test; loss: 0.682; acc: 0.770; precision: 0.787, recall: 0.739, macrof1: 0.769, weightedf1: 0.769[0m
--- 44.764214277267456 seconds ---
--------- 4000
[92m2020-09-10 18:36:25[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_gru_model_A.yaml[0m
[92m2020-09-10 18:36:25[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:36:25[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 18:36:26[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 18:36:27[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 18:37:09[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:37:09 -- Epoch: 0/0; Test; loss: 0.853; acc: 0.748; precision: 0.777, recall: 0.695, macrof1: 0.747, weightedf1: 0.747[0m
--- 44.099066734313965 seconds ---
--------- 8000
[92m2020-09-10 18:37:09[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_gru_model_A.yaml[0m
[92m2020-09-10 18:37:09[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:37:09[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 18:37:10[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 18:37:11[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 18:37:58[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:37:58 -- Epoch: 0/0; Test; loss: 1.304; acc: 0.709; precision: 0.754, recall: 0.621, macrof1: 0.707, weightedf1: 0.707[0m
--- 48.10148477554321 seconds ---
--------- 16000
[92m2020-09-10 18:37:58[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_gru_model_A.yaml[0m
[92m2020-09-10 18:37:58[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:37:58[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 18:37:58[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 18:37:59[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 18:38:45[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:38:45 -- Epoch: 0/0; Test; loss: 1.234; acc: 0.707; precision: 0.737, recall: 0.644, macrof1: 0.706, weightedf1: 0.706[0m
--- 47.47420120239258 seconds ---
--------- 32000
[92m2020-09-10 18:38:45[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_gru_model_A.yaml[0m
[92m2020-09-10 18:38:45[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:38:45[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 18:38:46[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 18:38:46[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 18:39:30[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:39:30 -- Epoch: 0/0; Test; loss: 1.532; acc: 0.690; precision: 0.734, recall: 0.598, macrof1: 0.688, weightedf1: 0.688[0m
--- 45.29329752922058 seconds ---
--------- 64000
[92m2020-09-10 18:39:30[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_gru_model_A.yaml[0m
[92m2020-09-10 18:39:30[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:39:30[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 18:39:31[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 18:39:31[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 18:40:18[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:40:18 -- Epoch: 0/0; Test; loss: 1.673; acc: 0.690; precision: 0.729, recall: 0.607, macrof1: 0.688, weightedf1: 0.688[0m
--- 48.14600992202759 seconds ---
--------- 84000
[92m2020-09-10 18:40:18[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_gru_model_A.yaml[0m
[92m2020-09-10 18:40:18[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:40:19[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 18:40:19[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 18:40:20[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 18:41:07[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:41:07 -- Epoch: 0/0; Test; loss: 1.866; acc: 0.685; precision: 0.718, recall: 0.608, macrof1: 0.683, weightedf1: 0.683[0m
--- 48.287535429000854 seconds ---


## Fine-Tuned, model A, LSTM

In [13]:
from DeezyMatch import finetune as dm_finetune

for n_ft_examples in [250, 500, 1000, 2000, 4000, 8000, 16000, 32000, 64000, 84000]:
    print("---------", n_ft_examples)
    # model inference using a model stored at pretrained_model_path and pretrained_vocab_path 
    dm_inference(input_file_path="./inputs/input_dfm_lstm_model_A.yaml",
             dataset_path="./dataset/ocr_test.txt", 
             pretrained_model_path=f"./models/wikigaz_en_ft_ocr_lstm_v001_n{n_ft_examples}/wikigaz_en_ft_ocr_lstm_v001_n{n_ft_examples}.model", 
             pretrained_vocab_path=f"./models/wikigaz_en_ft_ocr_lstm_v001_n{n_ft_examples}/wikigaz_en_ft_ocr_lstm_v001_n{n_ft_examples}.vocab")
    
    

length s2:   0%|          | 0/8508 [00:00<?, ?it/s]

--------- 250
[92m2020-09-10 18:45:24[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_lstm_model_A.yaml[0m
[92m2020-09-10 18:45:24[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:45:24[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 18:45:24[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m
[92m2020-09-10 18:45:24[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                    

HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 18:45:28[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:45:28 -- Epoch: 0/0; Test; loss: 1.055; acc: 0.605; precision: 0.593, recall: 0.672, macrof1: 0.603, weightedf1: 0.603[0m
--- 4.489645957946777 seconds ---
--------- 500
[92m2020-09-10 18:45:28[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_lstm_model_A.yaml[0m
[92m2020-09-10 18:45:28[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:45:28[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 18:45:28[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 18:45:28[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 18:45:35[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:45:35 -- Epoch: 0/0; Test; loss: 0.791; acc: 0.691; precision: 0.680, recall: 0.722, macrof1: 0.691, weightedf1: 0.691[0m
--- 6.434430360794067 seconds ---
--------- 1000
[92m2020-09-10 18:45:35[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_lstm_model_A.yaml[0m
[92m2020-09-10 18:45:35[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:45:35[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 18:45:35[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 18:45:35[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 18:45:41[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:45:41 -- Epoch: 0/0; Test; loss: 0.588; acc: 0.763; precision: 0.757, recall: 0.774, macrof1: 0.763, weightedf1: 0.763[0m
--- 6.529709815979004 seconds ---
--------- 2000
[92m2020-09-10 18:45:41[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_lstm_model_A.yaml[0m
[92m2020-09-10 18:45:41[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:45:41[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 18:45:41[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 18:45:41[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 18:45:48[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:45:48 -- Epoch: 0/0; Test; loss: 0.457; acc: 0.812; precision: 0.806, recall: 0.824, macrof1: 0.812, weightedf1: 0.812[0m
--- 6.572016477584839 seconds ---
--------- 4000
[92m2020-09-10 18:45:48[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_lstm_model_A.yaml[0m
[92m2020-09-10 18:45:48[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:45:48[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 18:45:48[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 18:45:48[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 18:45:54[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:45:54 -- Epoch: 0/0; Test; loss: 0.400; acc: 0.835; precision: 0.831, recall: 0.840, macrof1: 0.835, weightedf1: 0.835[0m
--- 6.511359691619873 seconds ---
--------- 8000
[92m2020-09-10 18:45:54[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_lstm_model_A.yaml[0m
[92m2020-09-10 18:45:54[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:45:54[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 18:45:54[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 18:45:54[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 18:46:01[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:46:01 -- Epoch: 0/0; Test; loss: 0.339; acc: 0.863; precision: 0.860, recall: 0.867, macrof1: 0.863, weightedf1: 0.863[0m
--- 6.453651428222656 seconds ---
--------- 16000
[92m2020-09-10 18:46:01[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_lstm_model_A.yaml[0m
[92m2020-09-10 18:46:01[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:46:01[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 18:46:01[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 18:46:01[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 18:46:07[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:46:07 -- Epoch: 0/0; Test; loss: 0.281; acc: 0.892; precision: 0.884, recall: 0.904, macrof1: 0.892, weightedf1: 0.892[0m
--- 6.4285571575164795 seconds ---
--------- 32000
[92m2020-09-10 18:46:07[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_lstm_model_A.yaml[0m
[92m2020-09-10 18:46:07[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:46:07[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m


                                                    

[92m2020-09-10 18:46:07[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m
[92m2020-09-10 18:46:07[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 18:46:14[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:46:14 -- Epoch: 0/0; Test; loss: 0.246; acc: 0.906; precision: 0.901, recall: 0.911, macrof1: 0.906, weightedf1: 0.906[0m
--- 6.5677735805511475 seconds ---
--------- 64000
[92m2020-09-10 18:46:14[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_lstm_model_A.yaml[0m
[92m2020-09-10 18:46:14[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:46:14[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 18:46:14[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 18:46:14[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 18:46:20[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:46:20 -- Epoch: 0/0; Test; loss: 0.202; acc: 0.923; precision: 0.916, recall: 0.931, macrof1: 0.923, weightedf1: 0.923[0m
--- 6.509921312332153 seconds ---
--------- 84000
[92m2020-09-10 18:46:20[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_lstm_model_A.yaml[0m
[92m2020-09-10 18:46:20[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:46:20[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 18:46:20[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 18:46:20[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 18:46:27[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:46:27 -- Epoch: 0/0; Test; loss: 0.187; acc: 0.931; precision: 0.928, recall: 0.935, macrof1: 0.931, weightedf1: 0.931[0m
--- 6.3850836753845215 seconds ---


In [14]:
from DeezyMatch import finetune as dm_finetune

for n_ft_examples in [250, 500, 1000, 2000, 4000, 8000, 16000, 32000, 64000, 84000]:
    print("---------", n_ft_examples)
    # model inference using a model stored at pretrained_model_path and pretrained_vocab_path 
    dm_inference(input_file_path="./inputs/input_dfm_lstm_model_A.yaml",
             dataset_path="./dataset/wikigaz_en_test.txt", 
             pretrained_model_path=f"./models/wikigaz_en_ft_ocr_lstm_v001_n{n_ft_examples}/wikigaz_en_ft_ocr_lstm_v001_n{n_ft_examples}.model", 
             pretrained_vocab_path=f"./models/wikigaz_en_ft_ocr_lstm_v001_n{n_ft_examples}/wikigaz_en_ft_ocr_lstm_v001_n{n_ft_examples}.vocab")
    
    

--------- 250
[92m2020-09-10 18:46:27[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_lstm_model_A.yaml[0m
[92m2020-09-10 18:46:27[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:46:27[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 18:46:27[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 18:46:28[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 18:47:24[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:47:24 -- Epoch: 0/0; Test; loss: 0.240; acc: 0.907; precision: 0.909, recall: 0.905, macrof1: 0.907, weightedf1: 0.907[0m
--- 57.79513621330261 seconds ---
--------- 500
[92m2020-09-10 18:47:24[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_lstm_model_A.yaml[0m
[92m2020-09-10 18:47:24[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:47:24[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 18:47:25[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 18:47:26[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 18:48:27[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:48:27 -- Epoch: 0/0; Test; loss: 0.333; acc: 0.876; precision: 0.883, recall: 0.867, macrof1: 0.876, weightedf1: 0.876[0m
--- 62.41485595703125 seconds ---
--------- 1000
[92m2020-09-10 18:48:27[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_lstm_model_A.yaml[0m
[92m2020-09-10 18:48:27[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:48:27[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 18:48:27[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 18:48:28[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 18:49:25[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:49:25 -- Epoch: 0/0; Test; loss: 0.504; acc: 0.826; precision: 0.854, recall: 0.788, macrof1: 0.826, weightedf1: 0.826[0m
--- 58.05944013595581 seconds ---
--------- 2000
[92m2020-09-10 18:49:25[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_lstm_model_A.yaml[0m
[92m2020-09-10 18:49:25[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:49:25[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 18:49:25[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 18:49:26[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 18:50:27[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:50:27 -- Epoch: 0/0; Test; loss: 0.689; acc: 0.787; precision: 0.825, recall: 0.730, macrof1: 0.787, weightedf1: 0.787[0m
--- 62.48609185218811 seconds ---
--------- 4000
[92m2020-09-10 18:50:27[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_lstm_model_A.yaml[0m
[92m2020-09-10 18:50:27[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:50:27[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 18:50:28[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 18:50:29[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 18:51:25[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:51:25 -- Epoch: 0/0; Test; loss: 0.725; acc: 0.775; precision: 0.812, recall: 0.718, macrof1: 0.775, weightedf1: 0.775[0m
--- 57.90281414985657 seconds ---
--------- 8000
[92m2020-09-10 18:51:25[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_lstm_model_A.yaml[0m
[92m2020-09-10 18:51:25[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:51:25[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 18:51:26[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 18:51:26[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 18:52:25[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:52:25 -- Epoch: 0/0; Test; loss: 0.871; acc: 0.746; precision: 0.776, recall: 0.690, macrof1: 0.745, weightedf1: 0.745[0m
--- 59.80358624458313 seconds ---
--------- 16000
[92m2020-09-10 18:52:25[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_lstm_model_A.yaml[0m
[92m2020-09-10 18:52:25[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:52:25[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 18:52:26[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 18:52:26[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 18:53:27[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:53:27 -- Epoch: 0/0; Test; loss: 1.314; acc: 0.709; precision: 0.737, recall: 0.652, macrof1: 0.708, weightedf1: 0.708[0m
--- 61.88062405586243 seconds ---
--------- 32000
[92m2020-09-10 18:53:27[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_lstm_model_A.yaml[0m
[92m2020-09-10 18:53:27[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:53:27[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 18:53:27[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 18:53:28[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 18:54:24[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:54:24 -- Epoch: 0/0; Test; loss: 1.387; acc: 0.713; precision: 0.753, recall: 0.634, macrof1: 0.711, weightedf1: 0.711[0m
--- 57.35245490074158 seconds ---
--------- 64000
[92m2020-09-10 18:54:24[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_lstm_model_A.yaml[0m
[92m2020-09-10 18:54:24[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:54:24[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 18:54:25[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 18:54:26[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 18:55:27[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:55:27 -- Epoch: 0/0; Test; loss: 1.594; acc: 0.708; precision: 0.757, recall: 0.613, macrof1: 0.706, weightedf1: 0.706[0m
--- 62.685957193374634 seconds ---
--------- 84000
[92m2020-09-10 18:55:27[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_lstm_model_A.yaml[0m
[92m2020-09-10 18:55:27[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:55:27[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 18:55:28[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 18:55:28[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 18:56:24[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:56:24 -- Epoch: 0/0; Test; loss: 1.829; acc: 0.705; precision: 0.758, recall: 0.603, macrof1: 0.702, weightedf1: 0.702[0m
--- 57.40061807632446 seconds ---


## Fine-Tuned, model A, RNN

In [15]:
from DeezyMatch import finetune as dm_finetune

for n_ft_examples in [250, 500, 1000, 2000, 4000, 8000, 16000, 32000, 64000, 84000]:
    print("---------", n_ft_examples)
    # model inference using a model stored at pretrained_model_path and pretrained_vocab_path 
    dm_inference(input_file_path="./inputs/input_dfm_rnn_model_A.yaml",
             dataset_path="./dataset/ocr_test.txt", 
             pretrained_model_path=f"./models/wikigaz_en_ft_ocr_rnn_v001_n{n_ft_examples}/wikigaz_en_ft_ocr_rnn_v001_n{n_ft_examples}.model", 
             pretrained_vocab_path=f"./models/wikigaz_en_ft_ocr_rnn_v001_n{n_ft_examples}/wikigaz_en_ft_ocr_rnn_v001_n{n_ft_examples}.vocab")
    
    

--------- 250
[92m2020-09-10 18:56:24[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_rnn_model_A.yaml[0m
[92m2020-09-10 18:56:24[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:56:25[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 18:56:25[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 18:56:25[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 18:56:30[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:56:30 -- Epoch: 0/0; Test; loss: 0.759; acc: 0.593; precision: 0.587, recall: 0.626, macrof1: 0.593, weightedf1: 0.593[0m
--- 5.469081401824951 seconds ---
--------- 500
[92m2020-09-10 18:56:30[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_rnn_model_A.yaml[0m
[92m2020-09-10 18:56:30[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:56:30[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 18:56:30[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 18:56:30[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 18:56:35[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:56:35 -- Epoch: 0/0; Test; loss: 0.636; acc: 0.691; precision: 0.691, recall: 0.691, macrof1: 0.691, weightedf1: 0.691[0m
--- 5.20325493812561 seconds ---
--------- 1000
[92m2020-09-10 18:56:35[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_rnn_model_A.yaml[0m
[92m2020-09-10 18:56:35[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:56:35[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 18:56:35[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 18:56:35[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 18:56:40[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:56:40 -- Epoch: 0/0; Test; loss: 0.583; acc: 0.727; precision: 0.716, recall: 0.752, macrof1: 0.727, weightedf1: 0.727[0m
--- 5.0874879360198975 seconds ---
--------- 2000
[92m2020-09-10 18:56:40[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_rnn_model_A.yaml[0m
[92m2020-09-10 18:56:40[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:56:40[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 18:56:40[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 18:56:40[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 18:56:46[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:56:46 -- Epoch: 0/0; Test; loss: 0.538; acc: 0.752; precision: 0.725, recall: 0.813, macrof1: 0.751, weightedf1: 0.751[0m
--- 5.983654022216797 seconds ---
--------- 4000
[92m2020-09-10 18:56:46[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_rnn_model_A.yaml[0m
[92m2020-09-10 18:56:46[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:56:46[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m


                                                    

[92m2020-09-10 18:56:47[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m
[92m2020-09-10 18:56:47[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 18:56:53[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:56:53 -- Epoch: 0/0; Test; loss: 0.470; acc: 0.783; precision: 0.776, recall: 0.795, macrof1: 0.783, weightedf1: 0.783[0m
--- 6.274932622909546 seconds ---
--------- 8000
[92m2020-09-10 18:56:53[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_rnn_model_A.yaml[0m
[92m2020-09-10 18:56:53[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:56:53[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 18:56:53[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 18:56:53[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 18:56:59[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:56:59 -- Epoch: 0/0; Test; loss: 0.415; acc: 0.816; precision: 0.796, recall: 0.850, macrof1: 0.816, weightedf1: 0.816[0m
--- 6.120548963546753 seconds ---
--------- 16000
[92m2020-09-10 18:56:59[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_rnn_model_A.yaml[0m
[92m2020-09-10 18:56:59[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:56:59[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 18:56:59[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 18:56:59[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 18:57:05[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:57:05 -- Epoch: 0/0; Test; loss: 0.377; acc: 0.838; precision: 0.802, recall: 0.897, macrof1: 0.837, weightedf1: 0.837[0m
--- 6.093186855316162 seconds ---
--------- 32000
[92m2020-09-10 18:57:05[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_rnn_model_A.yaml[0m
[92m2020-09-10 18:57:05[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:57:05[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m


                                                    

[92m2020-09-10 18:57:05[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m
[92m2020-09-10 18:57:05[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 18:57:11[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:57:11 -- Epoch: 0/0; Test; loss: 0.346; acc: 0.851; precision: 0.845, recall: 0.860, macrof1: 0.851, weightedf1: 0.851[0m
--- 6.3465282917022705 seconds ---
--------- 64000
[92m2020-09-10 18:57:11[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_rnn_model_A.yaml[0m
[92m2020-09-10 18:57:11[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:57:11[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 18:57:11[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 18:57:11[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 18:57:17[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:57:17 -- Epoch: 0/0; Test; loss: 0.325; acc: 0.865; precision: 0.844, recall: 0.894, macrof1: 0.864, weightedf1: 0.864[0m
--- 6.091903209686279 seconds ---
--------- 84000
[92m2020-09-10 18:57:17[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_rnn_model_A.yaml[0m
[92m2020-09-10 18:57:17[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:57:17[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 18:57:17[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 18:57:17[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 18:57:23[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:57:23 -- Epoch: 0/0; Test; loss: 0.306; acc: 0.873; precision: 0.861, recall: 0.889, macrof1: 0.873, weightedf1: 0.873[0m
--- 6.133760690689087 seconds ---


In [16]:
from DeezyMatch import finetune as dm_finetune

for n_ft_examples in [250, 500, 1000, 2000, 4000, 8000, 16000, 32000, 64000, 84000]:
    print("---------", n_ft_examples)
    # model inference using a model stored at pretrained_model_path and pretrained_vocab_path 
    dm_inference(input_file_path="./inputs/input_dfm_rnn_model_A.yaml",
             dataset_path="./dataset/wikigaz_en_test.txt", 
             pretrained_model_path=f"./models/wikigaz_en_ft_ocr_rnn_v001_n{n_ft_examples}/wikigaz_en_ft_ocr_rnn_v001_n{n_ft_examples}.model", 
             pretrained_vocab_path=f"./models/wikigaz_en_ft_ocr_rnn_v001_n{n_ft_examples}/wikigaz_en_ft_ocr_rnn_v001_n{n_ft_examples}.vocab")
    
    

--------- 250
[92m2020-09-10 18:57:23[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_rnn_model_A.yaml[0m
[92m2020-09-10 18:57:23[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:57:23[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 18:57:24[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 18:57:25[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 18:58:10[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:58:10 -- Epoch: 0/0; Test; loss: 0.320; acc: 0.858; precision: 0.863, recall: 0.851, macrof1: 0.858, weightedf1: 0.858[0m
--- 46.1121187210083 seconds ---
--------- 500
[92m2020-09-10 18:58:10[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_rnn_model_A.yaml[0m
[92m2020-09-10 18:58:10[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:58:10[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 18:58:10[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 18:58:11[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 18:59:04[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:59:04 -- Epoch: 0/0; Test; loss: 0.550; acc: 0.776; precision: 0.805, recall: 0.727, macrof1: 0.775, weightedf1: 0.775[0m
--- 54.828999280929565 seconds ---
--------- 1000
[92m2020-09-10 18:59:04[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_rnn_model_A.yaml[0m
[92m2020-09-10 18:59:04[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:59:04[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 18:59:05[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 18:59:06[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 18:59:59[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_18:59:59 -- Epoch: 0/0; Test; loss: 0.557; acc: 0.765; precision: 0.784, recall: 0.731, macrof1: 0.764, weightedf1: 0.764[0m
--- 54.77786135673523 seconds ---
--------- 2000
[92m2020-09-10 18:59:59[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_rnn_model_A.yaml[0m
[92m2020-09-10 18:59:59[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 18:59:59[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 19:00:00[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 19:00:00[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 19:00:55[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_19:00:55 -- Epoch: 0/0; Test; loss: 0.583; acc: 0.754; precision: 0.771, recall: 0.723, macrof1: 0.754, weightedf1: 0.754[0m
--- 55.344016551971436 seconds ---
--------- 4000
[92m2020-09-10 19:00:55[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_rnn_model_A.yaml[0m
[92m2020-09-10 19:00:55[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 19:00:55[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 19:00:55[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 19:00:56[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 19:01:49[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_19:01:49 -- Epoch: 0/0; Test; loss: 0.758; acc: 0.726; precision: 0.767, recall: 0.650, macrof1: 0.724, weightedf1: 0.724[0m
--- 54.62463927268982 seconds ---
--------- 8000
[92m2020-09-10 19:01:49[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_rnn_model_A.yaml[0m
[92m2020-09-10 19:01:49[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 19:01:49[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 19:01:50[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 19:01:51[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 19:02:45[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_19:02:45 -- Epoch: 0/0; Test; loss: 1.096; acc: 0.681; precision: 0.721, recall: 0.592, macrof1: 0.679, weightedf1: 0.679[0m
--- 55.372541189193726 seconds ---
--------- 16000
[92m2020-09-10 19:02:45[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_rnn_model_A.yaml[0m
[92m2020-09-10 19:02:45[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 19:02:45[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 19:02:45[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 19:02:46[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 19:03:39[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_19:03:39 -- Epoch: 0/0; Test; loss: 1.301; acc: 0.669; precision: 0.702, recall: 0.587, macrof1: 0.667, weightedf1: 0.667[0m
--- 54.78247356414795 seconds ---
--------- 32000
[92m2020-09-10 19:03:39[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_rnn_model_A.yaml[0m
[92m2020-09-10 19:03:39[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 19:03:39[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 19:03:40[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 19:03:41[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 19:04:34[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_19:04:34 -- Epoch: 0/0; Test; loss: 1.453; acc: 0.637; precision: 0.685, recall: 0.507, macrof1: 0.631, weightedf1: 0.631[0m
--- 54.39713954925537 seconds ---
--------- 64000
[92m2020-09-10 19:04:34[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_rnn_model_A.yaml[0m
[92m2020-09-10 19:04:34[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 19:04:34[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 19:04:34[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 19:04:35[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 19:05:29[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_19:05:29 -- Epoch: 0/0; Test; loss: 1.530; acc: 0.662; precision: 0.702, recall: 0.564, macrof1: 0.659, weightedf1: 0.659[0m
--- 55.21286368370056 seconds ---
--------- 84000
[92m2020-09-10 19:05:29[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_rnn_model_A.yaml[0m
[92m2020-09-10 19:05:29[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 19:05:29[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 19:05:30[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 19:05:30[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 19:06:24[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_19:06:24 -- Epoch: 0/0; Test; loss: 1.523; acc: 0.656; precision: 0.702, recall: 0.543, macrof1: 0.652, weightedf1: 0.652[0m
--- 54.83734583854675 seconds ---


## Fine-Tuned, model B, GRU

In [17]:
from DeezyMatch import finetune as dm_finetune

for n_ft_examples in [250, 500, 1000, 2000, 4000, 8000, 16000, 32000, 64000, 84000]:
    print("---------", n_ft_examples)
    # model inference using a model stored at pretrained_model_path and pretrained_vocab_path 
    dm_inference(input_file_path="./inputs/input_dfm_gru_model_B.yaml",
             dataset_path="./dataset/ocr_test.txt", 
             pretrained_model_path=f"./models/wikigaz_en_ft_ocr_gru_v002_n{n_ft_examples}/wikigaz_en_ft_ocr_gru_v002_n{n_ft_examples}.model", 
             pretrained_vocab_path=f"./models/wikigaz_en_ft_ocr_gru_v002_n{n_ft_examples}/wikigaz_en_ft_ocr_gru_v002_n{n_ft_examples}.vocab")
    
    

s1 padding:   0%|          | 0/8508 [00:00<?, ?it/s]

--------- 250
[92m2020-09-10 21:45:47[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_gru_model_B.yaml[0m
[92m2020-09-10 21:45:47[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:45:47[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 21:45:47[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m
[92m2020-09-10 21:45:47[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                    

HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 21:45:50[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:45:50 -- Epoch: 0/0; Test; loss: 0.953; acc: 0.578; precision: 0.567, recall: 0.661, macrof1: 0.575, weightedf1: 0.575[0m
--- 3.2879772186279297 seconds ---
--------- 500
[92m2020-09-10 21:45:50[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_gru_model_B.yaml[0m
[92m2020-09-10 21:45:50[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:45:50[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 21:45:50[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 21:45:51[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 21:45:54[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:45:54 -- Epoch: 0/0; Test; loss: 0.753; acc: 0.637; precision: 0.621, recall: 0.701, macrof1: 0.635, weightedf1: 0.635[0m
--- 3.240983009338379 seconds ---
--------- 1000
[92m2020-09-10 21:45:54[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_gru_model_B.yaml[0m
[92m2020-09-10 21:45:54[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:45:54[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 21:45:54[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 21:45:54[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 21:45:57[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:45:57 -- Epoch: 0/0; Test; loss: 0.615; acc: 0.730; precision: 0.715, recall: 0.764, macrof1: 0.730, weightedf1: 0.730[0m
--- 3.3224997520446777 seconds ---
--------- 2000
[92m2020-09-10 21:45:57[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_gru_model_B.yaml[0m
[92m2020-09-10 21:45:57[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:45:57[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 21:45:57[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 21:45:57[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 21:46:00[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:46:00 -- Epoch: 0/0; Test; loss: 0.506; acc: 0.792; precision: 0.788, recall: 0.799, macrof1: 0.792, weightedf1: 0.792[0m
--- 3.4533793926239014 seconds ---
--------- 4000
[92m2020-09-10 21:46:00[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_gru_model_B.yaml[0m
[92m2020-09-10 21:46:00[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:46:00[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 21:46:01[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 21:46:01[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 21:46:04[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:46:04 -- Epoch: 0/0; Test; loss: 0.397; acc: 0.846; precision: 0.844, recall: 0.849, macrof1: 0.846, weightedf1: 0.846[0m
--- 3.291226863861084 seconds ---
--------- 8000
[92m2020-09-10 21:46:04[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_gru_model_B.yaml[0m
[92m2020-09-10 21:46:04[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:46:04[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 21:46:04[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 21:46:04[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 21:46:07[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:46:07 -- Epoch: 0/0; Test; loss: 0.299; acc: 0.886; precision: 0.880, recall: 0.894, macrof1: 0.886, weightedf1: 0.886[0m
--- 3.325597047805786 seconds ---
--------- 16000
[92m2020-09-10 21:46:07[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_gru_model_B.yaml[0m
[92m2020-09-10 21:46:07[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:46:07[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 21:46:07[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 21:46:07[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 21:46:10[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:46:10 -- Epoch: 0/0; Test; loss: 0.230; acc: 0.915; precision: 0.907, recall: 0.925, macrof1: 0.915, weightedf1: 0.915[0m
--- 3.320563554763794 seconds ---
--------- 32000
[92m2020-09-10 21:46:10[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_gru_model_B.yaml[0m
[92m2020-09-10 21:46:10[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:46:10[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 21:46:10[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 21:46:11[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 21:46:14[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:46:14 -- Epoch: 0/0; Test; loss: 0.175; acc: 0.935; precision: 0.936, recall: 0.934, macrof1: 0.935, weightedf1: 0.935[0m
--- 3.4649336338043213 seconds ---
--------- 64000
[92m2020-09-10 21:46:14[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_gru_model_B.yaml[0m
[92m2020-09-10 21:46:14[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:46:14[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 21:46:14[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 21:46:14[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 21:46:17[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:46:17 -- Epoch: 0/0; Test; loss: 0.146; acc: 0.951; precision: 0.963, recall: 0.939, macrof1: 0.951, weightedf1: 0.951[0m
--- 3.277069330215454 seconds ---
--------- 84000
[92m2020-09-10 21:46:17[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_gru_model_B.yaml[0m
[92m2020-09-10 21:46:17[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:46:17[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 21:46:17[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 21:46:17[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 21:46:20[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:46:20 -- Epoch: 0/0; Test; loss: 0.122; acc: 0.959; precision: 0.951, recall: 0.967, macrof1: 0.959, weightedf1: 0.959[0m
--- 3.261852979660034 seconds ---


In [18]:
from DeezyMatch import finetune as dm_finetune

for n_ft_examples in [250, 500, 1000, 2000, 4000, 8000, 16000, 32000, 64000, 84000]:
    print("---------", n_ft_examples)
    # model inference using a model stored at pretrained_model_path and pretrained_vocab_path 
    dm_inference(input_file_path="./inputs/input_dfm_gru_model_B.yaml",
             dataset_path="./dataset/wikigaz_en_test.txt", 
             pretrained_model_path=f"./models/wikigaz_en_ft_ocr_gru_v002_n{n_ft_examples}/wikigaz_en_ft_ocr_gru_v002_n{n_ft_examples}.model", 
             pretrained_vocab_path=f"./models/wikigaz_en_ft_ocr_gru_v002_n{n_ft_examples}/wikigaz_en_ft_ocr_gru_v002_n{n_ft_examples}.vocab")
    
    

--------- 250
[92m2020-09-10 21:46:20[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_gru_model_B.yaml[0m
[92m2020-09-10 21:46:20[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:46:20[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 21:46:21[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 21:46:22[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 21:46:51[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:46:51 -- Epoch: 0/0; Test; loss: 0.212; acc: 0.913; precision: 0.887, recall: 0.948, macrof1: 0.913, weightedf1: 0.913[0m
--- 30.192748546600342 seconds ---
--------- 500
[92m2020-09-10 21:46:51[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_gru_model_B.yaml[0m
[92m2020-09-10 21:46:51[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:46:51[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 21:46:51[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 21:46:52[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 21:47:21[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:47:21 -- Epoch: 0/0; Test; loss: 0.247; acc: 0.899; precision: 0.868, recall: 0.941, macrof1: 0.899, weightedf1: 0.899[0m
--- 30.098008632659912 seconds ---
--------- 1000
[92m2020-09-10 21:47:21[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_gru_model_B.yaml[0m
[92m2020-09-10 21:47:21[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:47:21[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 21:47:21[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 21:47:22[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 21:47:51[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:47:51 -- Epoch: 0/0; Test; loss: 0.337; acc: 0.865; precision: 0.854, recall: 0.881, macrof1: 0.865, weightedf1: 0.865[0m
--- 30.384669303894043 seconds ---
--------- 2000
[92m2020-09-10 21:47:51[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_gru_model_B.yaml[0m
[92m2020-09-10 21:47:51[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:47:51[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 21:47:52[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 21:47:52[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 21:48:21[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:48:21 -- Epoch: 0/0; Test; loss: 0.454; acc: 0.823; precision: 0.827, recall: 0.816, macrof1: 0.823, weightedf1: 0.823[0m
--- 30.343998193740845 seconds ---
--------- 4000
[92m2020-09-10 21:48:21[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_gru_model_B.yaml[0m
[92m2020-09-10 21:48:22[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:48:22[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 21:48:22[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 21:48:23[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 21:48:52[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:48:52 -- Epoch: 0/0; Test; loss: 0.685; acc: 0.779; precision: 0.800, recall: 0.744, macrof1: 0.778, weightedf1: 0.778[0m
--- 30.623249769210815 seconds ---
--------- 8000
[92m2020-09-10 21:48:52[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_gru_model_B.yaml[0m
[92m2020-09-10 21:48:52[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:48:52[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 21:48:53[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 21:48:53[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 21:49:22[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:49:22 -- Epoch: 0/0; Test; loss: 0.966; acc: 0.735; precision: 0.768, recall: 0.674, macrof1: 0.734, weightedf1: 0.734[0m
--- 30.20074987411499 seconds ---
--------- 16000
[92m2020-09-10 21:49:22[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_gru_model_B.yaml[0m
[92m2020-09-10 21:49:22[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:49:22[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 21:49:23[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 21:49:24[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 21:49:53[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:49:53 -- Epoch: 0/0; Test; loss: 1.193; acc: 0.712; precision: 0.751, recall: 0.633, macrof1: 0.710, weightedf1: 0.710[0m
--- 30.544854164123535 seconds ---
--------- 32000
[92m2020-09-10 21:49:53[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_gru_model_B.yaml[0m
[92m2020-09-10 21:49:53[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:49:53[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 21:49:53[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 21:49:54[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 21:50:23[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:50:23 -- Epoch: 0/0; Test; loss: 1.727; acc: 0.703; precision: 0.767, recall: 0.584, macrof1: 0.699, weightedf1: 0.699[0m
--- 29.904095888137817 seconds ---
--------- 64000
[92m2020-09-10 21:50:23[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_gru_model_B.yaml[0m
[92m2020-09-10 21:50:23[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:50:23[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 21:50:24[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 21:50:24[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 21:50:53[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:50:53 -- Epoch: 0/0; Test; loss: 1.546; acc: 0.722; precision: 0.801, recall: 0.590, macrof1: 0.717, weightedf1: 0.717[0m
--- 30.084099531173706 seconds ---
--------- 84000
[92m2020-09-10 21:50:53[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_gru_model_B.yaml[0m
[92m2020-09-10 21:50:53[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:50:53[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 21:50:53[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 21:50:54[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 21:51:22[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:51:22 -- Epoch: 0/0; Test; loss: 1.654; acc: 0.709; precision: 0.769, recall: 0.598, macrof1: 0.705, weightedf1: 0.705[0m
--- 29.485949993133545 seconds ---


## Fine-Tuned, model B, LSTM

In [19]:
from DeezyMatch import finetune as dm_finetune

for n_ft_examples in [250, 500, 1000, 2000, 4000, 8000, 16000, 32000, 64000, 84000]:
    print("---------", n_ft_examples)
    # model inference using a model stored at pretrained_model_path and pretrained_vocab_path 
    dm_inference(input_file_path="./inputs/input_dfm_lstm_model_B.yaml",
             dataset_path="./dataset/ocr_test.txt", 
             pretrained_model_path=f"./models/wikigaz_en_ft_ocr_lstm_v002_n{n_ft_examples}/wikigaz_en_ft_ocr_lstm_v002_n{n_ft_examples}.model", 
             pretrained_vocab_path=f"./models/wikigaz_en_ft_ocr_lstm_v002_n{n_ft_examples}/wikigaz_en_ft_ocr_lstm_v002_n{n_ft_examples}.vocab")
    
    

--------- 250
[92m2020-09-10 21:51:22[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_lstm_model_B.yaml[0m
[92m2020-09-10 21:51:22[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:51:22[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 21:51:23[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 21:51:23[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 21:51:26[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:51:26 -- Epoch: 0/0; Test; loss: 1.083; acc: 0.594; precision: 0.581, recall: 0.675, macrof1: 0.591, weightedf1: 0.591[0m
--- 3.68133282661438 seconds ---
--------- 500
[92m2020-09-10 21:51:26[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_lstm_model_B.yaml[0m
[92m2020-09-10 21:51:26[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:51:26[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 21:51:26[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 21:51:26[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 21:51:30[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:51:30 -- Epoch: 0/0; Test; loss: 0.882; acc: 0.646; precision: 0.631, recall: 0.703, macrof1: 0.645, weightedf1: 0.645[0m
--- 3.484748601913452 seconds ---
--------- 1000
[92m2020-09-10 21:51:30[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_lstm_model_B.yaml[0m
[92m2020-09-10 21:51:30[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:51:30[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 21:51:30[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 21:51:30[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 21:51:33[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:51:33 -- Epoch: 0/0; Test; loss: 0.651; acc: 0.735; precision: 0.728, recall: 0.750, macrof1: 0.735, weightedf1: 0.735[0m
--- 3.5071098804473877 seconds ---
--------- 2000
[92m2020-09-10 21:51:33[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_lstm_model_B.yaml[0m
[92m2020-09-10 21:51:33[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:51:33[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 21:51:33[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 21:51:33[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 21:51:37[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:51:37 -- Epoch: 0/0; Test; loss: 0.493; acc: 0.806; precision: 0.799, recall: 0.818, macrof1: 0.806, weightedf1: 0.806[0m
--- 3.550307273864746 seconds ---
--------- 4000
[92m2020-09-10 21:51:37[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_lstm_model_B.yaml[0m
[92m2020-09-10 21:51:37[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:51:37[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m


                                                    

[92m2020-09-10 21:51:37[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m
[92m2020-09-10 21:51:37[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 21:51:40[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:51:40 -- Epoch: 0/0; Test; loss: 0.371; acc: 0.856; precision: 0.864, recall: 0.845, macrof1: 0.856, weightedf1: 0.856[0m
--- 3.6087193489074707 seconds ---
--------- 8000
[92m2020-09-10 21:51:40[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_lstm_model_B.yaml[0m
[92m2020-09-10 21:51:40[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:51:40[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 21:51:40[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 21:51:40[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 21:51:44[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:51:44 -- Epoch: 0/0; Test; loss: 0.263; acc: 0.902; precision: 0.886, recall: 0.921, macrof1: 0.901, weightedf1: 0.901[0m
--- 3.484403371810913 seconds ---
--------- 16000
[92m2020-09-10 21:51:44[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_lstm_model_B.yaml[0m
[92m2020-09-10 21:51:44[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:51:44[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 21:51:44[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 21:51:44[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 21:51:47[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:51:47 -- Epoch: 0/0; Test; loss: 0.205; acc: 0.925; precision: 0.921, recall: 0.929, macrof1: 0.925, weightedf1: 0.925[0m
--- 3.4361681938171387 seconds ---
--------- 32000
[92m2020-09-10 21:51:47[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_lstm_model_B.yaml[0m
[92m2020-09-10 21:51:47[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:51:47[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m


                                                    

[92m2020-09-10 21:51:47[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m
[92m2020-09-10 21:51:48[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 21:51:51[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:51:51 -- Epoch: 0/0; Test; loss: 0.164; acc: 0.944; precision: 0.933, recall: 0.957, macrof1: 0.944, weightedf1: 0.944[0m
--- 3.6248276233673096 seconds ---
--------- 64000
[92m2020-09-10 21:51:51[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_lstm_model_B.yaml[0m
[92m2020-09-10 21:51:51[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:51:51[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 21:51:51[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 21:51:51[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 21:51:54[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:51:54 -- Epoch: 0/0; Test; loss: 0.136; acc: 0.956; precision: 0.939, recall: 0.975, macrof1: 0.956, weightedf1: 0.956[0m
--- 3.5730488300323486 seconds ---
--------- 84000
[92m2020-09-10 21:51:54[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_lstm_model_B.yaml[0m
[92m2020-09-10 21:51:54[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:51:55[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 21:51:55[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 21:51:55[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 21:51:58[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:51:58 -- Epoch: 0/0; Test; loss: 0.115; acc: 0.962; precision: 0.954, recall: 0.970, macrof1: 0.962, weightedf1: 0.962[0m
--- 3.5030529499053955 seconds ---


In [20]:
from DeezyMatch import finetune as dm_finetune

for n_ft_examples in [250, 500, 1000, 2000, 4000, 8000, 16000, 32000, 64000, 84000]:
    print("---------", n_ft_examples)
    # model inference using a model stored at pretrained_model_path and pretrained_vocab_path 
    dm_inference(input_file_path="./inputs/input_dfm_lstm_model_B.yaml",
             dataset_path="./dataset/wikigaz_en_test.txt", 
             pretrained_model_path=f"./models/wikigaz_en_ft_ocr_lstm_v002_n{n_ft_examples}/wikigaz_en_ft_ocr_lstm_v002_n{n_ft_examples}.model", 
             pretrained_vocab_path=f"./models/wikigaz_en_ft_ocr_lstm_v002_n{n_ft_examples}/wikigaz_en_ft_ocr_lstm_v002_n{n_ft_examples}.vocab")
    
    

--------- 250
[92m2020-09-10 21:51:58[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_lstm_model_B.yaml[0m
[92m2020-09-10 21:51:58[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:51:58[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 21:51:59[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 21:51:59[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 21:52:31[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:52:31 -- Epoch: 0/0; Test; loss: 0.210; acc: 0.914; precision: 0.903, recall: 0.928, macrof1: 0.914, weightedf1: 0.914[0m
--- 33.10648441314697 seconds ---
--------- 500
[92m2020-09-10 21:52:31[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_lstm_model_B.yaml[0m
[92m2020-09-10 21:52:31[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:52:31[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 21:52:32[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 21:52:32[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 21:53:04[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:53:04 -- Epoch: 0/0; Test; loss: 0.239; acc: 0.903; precision: 0.894, recall: 0.915, macrof1: 0.903, weightedf1: 0.903[0m
--- 32.95087957382202 seconds ---
--------- 1000
[92m2020-09-10 21:53:04[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_lstm_model_B.yaml[0m
[92m2020-09-10 21:53:04[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:53:04[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 21:53:05[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 21:53:05[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 21:53:37[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:53:37 -- Epoch: 0/0; Test; loss: 0.310; acc: 0.879; precision: 0.888, recall: 0.867, macrof1: 0.879, weightedf1: 0.879[0m
--- 33.19973707199097 seconds ---
--------- 2000
[92m2020-09-10 21:53:37[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_lstm_model_B.yaml[0m
[92m2020-09-10 21:53:37[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:53:37[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 21:53:38[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 21:53:38[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 21:54:11[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:54:11 -- Epoch: 0/0; Test; loss: 0.445; acc: 0.839; precision: 0.855, recall: 0.815, macrof1: 0.838, weightedf1: 0.838[0m
--- 33.24075388908386 seconds ---
--------- 4000
[92m2020-09-10 21:54:11[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_lstm_model_B.yaml[0m
[92m2020-09-10 21:54:11[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:54:11[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 21:54:11[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 21:54:12[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 21:54:44[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:54:44 -- Epoch: 0/0; Test; loss: 0.633; acc: 0.804; precision: 0.845, recall: 0.744, macrof1: 0.803, weightedf1: 0.803[0m
--- 33.05677127838135 seconds ---
--------- 8000
[92m2020-09-10 21:54:44[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_lstm_model_B.yaml[0m
[92m2020-09-10 21:54:44[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:54:44[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 21:54:44[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 21:54:45[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 21:55:17[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:55:17 -- Epoch: 0/0; Test; loss: 0.861; acc: 0.770; precision: 0.794, recall: 0.728, macrof1: 0.769, weightedf1: 0.769[0m
--- 32.96851706504822 seconds ---
--------- 16000
[92m2020-09-10 21:55:17[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_lstm_model_B.yaml[0m
[92m2020-09-10 21:55:17[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:55:17[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 21:55:17[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 21:55:18[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 21:55:50[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:55:50 -- Epoch: 0/0; Test; loss: 1.204; acc: 0.743; precision: 0.781, recall: 0.676, macrof1: 0.742, weightedf1: 0.742[0m
--- 33.30704116821289 seconds ---
--------- 32000
[92m2020-09-10 21:55:50[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_lstm_model_B.yaml[0m
[92m2020-09-10 21:55:50[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:55:50[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 21:55:50[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 21:55:51[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 21:56:23[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:56:23 -- Epoch: 0/0; Test; loss: 1.205; acc: 0.735; precision: 0.779, recall: 0.656, macrof1: 0.733, weightedf1: 0.733[0m
--- 32.63356304168701 seconds ---
--------- 64000
[92m2020-09-10 21:56:23[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_lstm_model_B.yaml[0m
[92m2020-09-10 21:56:23[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:56:23[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 21:56:23[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 21:56:24[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 21:56:56[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:56:56 -- Epoch: 0/0; Test; loss: 1.579; acc: 0.716; precision: 0.766, recall: 0.623, macrof1: 0.714, weightedf1: 0.714[0m
--- 33.02063274383545 seconds ---
--------- 84000
[92m2020-09-10 21:56:56[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_lstm_model_B.yaml[0m
[92m2020-09-10 21:56:56[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:56:56[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 21:56:56[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 21:56:57[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 21:57:29[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:57:29 -- Epoch: 0/0; Test; loss: 1.594; acc: 0.730; precision: 0.787, recall: 0.631, macrof1: 0.727, weightedf1: 0.727[0m
--- 32.9434278011322 seconds ---


## Fine-Tuned, model B, RNN

In [21]:
from DeezyMatch import finetune as dm_finetune

for n_ft_examples in [250, 500, 1000, 2000, 4000, 8000, 16000, 32000, 64000, 84000]:
    print("---------", n_ft_examples)
    # model inference using a model stored at pretrained_model_path and pretrained_vocab_path 
    dm_inference(input_file_path="./inputs/input_dfm_rnn_model_B.yaml",
             dataset_path="./dataset/ocr_test.txt", 
             pretrained_model_path=f"./models/wikigaz_en_ft_ocr_rnn_v002_n{n_ft_examples}/wikigaz_en_ft_ocr_rnn_v002_n{n_ft_examples}.model", 
             pretrained_vocab_path=f"./models/wikigaz_en_ft_ocr_rnn_v002_n{n_ft_examples}/wikigaz_en_ft_ocr_rnn_v002_n{n_ft_examples}.vocab")
    
    

--------- 250
[92m2020-09-10 21:57:29[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_rnn_model_B.yaml[0m
[92m2020-09-10 21:57:29[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:57:29[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m


                                                    

[92m2020-09-10 21:57:29[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m
[92m2020-09-10 21:57:29[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 21:57:32[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:57:32 -- Epoch: 0/0; Test; loss: 0.840; acc: 0.555; precision: 0.546, recall: 0.652, macrof1: 0.550, weightedf1: 0.550[0m
--- 3.487182378768921 seconds ---
--------- 500
[92m2020-09-10 21:57:32[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_rnn_model_B.yaml[0m
[92m2020-09-10 21:57:32[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:57:32[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 21:57:32[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 21:57:32[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 21:57:35[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:57:35 -- Epoch: 0/0; Test; loss: 0.696; acc: 0.590; precision: 0.577, recall: 0.675, macrof1: 0.587, weightedf1: 0.587[0m
--- 3.2782790660858154 seconds ---
--------- 1000
[92m2020-09-10 21:57:35[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_rnn_model_B.yaml[0m
[92m2020-09-10 21:57:35[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:57:35[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 21:57:35[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 21:57:35[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 21:57:39[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:57:39 -- Epoch: 0/0; Test; loss: 0.632; acc: 0.693; precision: 0.699, recall: 0.677, macrof1: 0.693, weightedf1: 0.693[0m
--- 3.2303292751312256 seconds ---
--------- 2000
[92m2020-09-10 21:57:39[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_rnn_model_B.yaml[0m
[92m2020-09-10 21:57:39[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:57:39[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 21:57:39[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 21:57:39[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 21:57:42[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:57:42 -- Epoch: 0/0; Test; loss: 0.535; acc: 0.767; precision: 0.762, recall: 0.778, macrof1: 0.767, weightedf1: 0.767[0m
--- 3.4540255069732666 seconds ---
--------- 4000
[92m2020-09-10 21:57:42[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_rnn_model_B.yaml[0m
[92m2020-09-10 21:57:42[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:57:42[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 21:57:42[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 21:57:42[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 21:57:45[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:57:45 -- Epoch: 0/0; Test; loss: 0.447; acc: 0.803; precision: 0.803, recall: 0.804, macrof1: 0.803, weightedf1: 0.803[0m
--- 3.2438724040985107 seconds ---
--------- 8000
[92m2020-09-10 21:57:45[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_rnn_model_B.yaml[0m
[92m2020-09-10 21:57:45[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:57:45[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 21:57:45[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 21:57:45[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 21:57:49[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:57:49 -- Epoch: 0/0; Test; loss: 0.367; acc: 0.845; precision: 0.826, recall: 0.875, macrof1: 0.845, weightedf1: 0.845[0m
--- 3.2652595043182373 seconds ---
--------- 16000
[92m2020-09-10 21:57:49[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_rnn_model_B.yaml[0m
[92m2020-09-10 21:57:49[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:57:49[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 21:57:49[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 21:57:49[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 21:57:52[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:57:52 -- Epoch: 0/0; Test; loss: 0.304; acc: 0.882; precision: 0.880, recall: 0.885, macrof1: 0.882, weightedf1: 0.882[0m
--- 3.279560089111328 seconds ---
--------- 32000
[92m2020-09-10 21:57:52[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_rnn_model_B.yaml[0m
[92m2020-09-10 21:57:52[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:57:52[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m


                                                    

[92m2020-09-10 21:57:52[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m
[92m2020-09-10 21:57:52[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 21:57:55[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:57:55 -- Epoch: 0/0; Test; loss: 0.236; acc: 0.906; precision: 0.900, recall: 0.913, macrof1: 0.906, weightedf1: 0.906[0m
--- 3.399199962615967 seconds ---
--------- 64000
[92m2020-09-10 21:57:55[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_rnn_model_B.yaml[0m
[92m2020-09-10 21:57:55[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:57:55[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 21:57:55[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 21:57:55[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 21:57:59[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:57:59 -- Epoch: 0/0; Test; loss: 0.199; acc: 0.923; precision: 0.911, recall: 0.937, macrof1: 0.923, weightedf1: 0.923[0m
--- 3.2592015266418457 seconds ---
--------- 84000
[92m2020-09-10 21:57:59[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_rnn_model_B.yaml[0m
[92m2020-09-10 21:57:59[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:57:59[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/ocr_test.txt[0m
[92m2020-09-10 21:57:59[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 4254 and False: 4254[0m


                                                    

[92m2020-09-10 21:57:59[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m




HBox(children=(FloatProgress(value=0.0, max=133.0), HTML(value='')))

[92m2020-09-10 21:58:02[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:58:02 -- Epoch: 0/0; Test; loss: 0.169; acc: 0.937; precision: 0.929, recall: 0.947, macrof1: 0.937, weightedf1: 0.937[0m
--- 3.2660605907440186 seconds ---


In [22]:
from DeezyMatch import finetune as dm_finetune

for n_ft_examples in [250, 500, 1000, 2000, 4000, 8000, 16000, 32000, 64000, 84000]:
    print("---------", n_ft_examples)
    # model inference using a model stored at pretrained_model_path and pretrained_vocab_path 
    dm_inference(input_file_path="./inputs/input_dfm_rnn_model_B.yaml",
             dataset_path="./dataset/wikigaz_en_test.txt", 
             pretrained_model_path=f"./models/wikigaz_en_ft_ocr_rnn_v002_n{n_ft_examples}/wikigaz_en_ft_ocr_rnn_v002_n{n_ft_examples}.model", 
             pretrained_vocab_path=f"./models/wikigaz_en_ft_ocr_rnn_v002_n{n_ft_examples}/wikigaz_en_ft_ocr_rnn_v002_n{n_ft_examples}.vocab")
    
    

--------- 250
[92m2020-09-10 21:58:02[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_rnn_model_B.yaml[0m
[92m2020-09-10 21:58:02[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:58:02[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 21:58:02[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 21:58:03[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 21:58:32[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:58:32 -- Epoch: 0/0; Test; loss: 0.264; acc: 0.891; precision: 0.877, recall: 0.910, macrof1: 0.891, weightedf1: 0.891[0m
--- 30.13415813446045 seconds ---
--------- 500
[92m2020-09-10 21:58:32[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_rnn_model_B.yaml[0m
[92m2020-09-10 21:58:32[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:58:32[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 21:58:32[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 21:58:33[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 21:59:02[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:59:02 -- Epoch: 0/0; Test; loss: 0.308; acc: 0.877; precision: 0.871, recall: 0.885, macrof1: 0.877, weightedf1: 0.877[0m
--- 30.00017547607422 seconds ---
--------- 1000
[92m2020-09-10 21:59:02[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_rnn_model_B.yaml[0m
[92m2020-09-10 21:59:02[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:59:02[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 21:59:03[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 21:59:03[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 21:59:32[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_21:59:32 -- Epoch: 0/0; Test; loss: 0.373; acc: 0.822; precision: 0.865, recall: 0.762, macrof1: 0.821, weightedf1: 0.821[0m
--- 30.187791109085083 seconds ---
--------- 2000
[92m2020-09-10 21:59:32[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_rnn_model_B.yaml[0m
[92m2020-09-10 21:59:32[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 21:59:32[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 21:59:33[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 21:59:33[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 22:00:02[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_22:00:02 -- Epoch: 0/0; Test; loss: 0.595; acc: 0.763; precision: 0.806, recall: 0.692, macrof1: 0.762, weightedf1: 0.762[0m
--- 29.946187496185303 seconds ---
--------- 4000
[92m2020-09-10 22:00:02[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_rnn_model_B.yaml[0m
[92m2020-09-10 22:00:02[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 22:00:02[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 22:00:03[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 22:00:04[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 22:00:32[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_22:00:32 -- Epoch: 0/0; Test; loss: 0.660; acc: 0.752; precision: 0.791, recall: 0.684, macrof1: 0.751, weightedf1: 0.751[0m
--- 30.19029951095581 seconds ---
--------- 8000
[92m2020-09-10 22:00:32[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_rnn_model_B.yaml[0m
[92m2020-09-10 22:00:32[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 22:00:32[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 22:00:33[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 22:00:34[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 22:01:02[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_22:01:02 -- Epoch: 0/0; Test; loss: 0.844; acc: 0.728; precision: 0.757, recall: 0.670, macrof1: 0.727, weightedf1: 0.727[0m
--- 29.614692211151123 seconds ---
--------- 16000
[92m2020-09-10 22:01:02[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_rnn_model_B.yaml[0m
[92m2020-09-10 22:01:02[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 22:01:02[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 22:01:03[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 22:01:03[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 22:01:32[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_22:01:32 -- Epoch: 0/0; Test; loss: 1.408; acc: 0.698; precision: 0.747, recall: 0.598, macrof1: 0.695, weightedf1: 0.695[0m
--- 29.56263279914856 seconds ---
--------- 32000
[92m2020-09-10 22:01:32[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_rnn_model_B.yaml[0m
[92m2020-09-10 22:01:32[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 22:01:32[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 22:01:32[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 22:01:33[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 22:02:01[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_22:02:01 -- Epoch: 0/0; Test; loss: 1.185; acc: 0.699; precision: 0.744, recall: 0.605, macrof1: 0.696, weightedf1: 0.696[0m
--- 29.859045267105103 seconds ---
--------- 64000
[92m2020-09-10 22:02:01[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_rnn_model_B.yaml[0m
[92m2020-09-10 22:02:01[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 22:02:01[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 22:02:02[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 22:02:03[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 22:02:31[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_22:02:31 -- Epoch: 0/0; Test; loss: 1.259; acc: 0.697; precision: 0.733, recall: 0.621, macrof1: 0.695, weightedf1: 0.695[0m
--- 30.01087999343872 seconds ---
--------- 84000
[92m2020-09-10 22:02:31[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread input file: ./inputs/input_dfm_rnn_model_B.yaml[0m
[92m2020-09-10 22:02:31[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mpytorch will use: cuda:1[0m
[92m2020-09-10 22:02:31[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mread CSV file: ./dataset/wikigaz_en_test.txt[0m
[92m2020-09-10 22:02:32[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;32mnumber of labels, True: 33469 and False: 33469[0m


length s2:   0%|          | 0/66938 [00:00<?, ?it/s]

[92m2020-09-10 22:02:33[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [2;32mskipping 0 lines[0m


                                                                     

HBox(children=(FloatProgress(value=0.0, max=1046.0), HTML(value='')))

[92m2020-09-10 22:03:01[0m [95mlwm-embeddings[0m [1m[90m[INFO][0m [1;31m09/10/2020_22:03:01 -- Epoch: 0/0; Test; loss: 1.408; acc: 0.695; precision: 0.756, recall: 0.575, macrof1: 0.690, weightedf1: 0.690[0m
--- 29.845705032348633 seconds ---
