# 1. Getting the Data

First we will scrape policies from the gov.ie website.

In your command line, ``cd`` into this repository.

``cd`` into the ``policy_scraping`` task directory, then ``cd`` again into the ``policy_scraping`` scrapy environment.

In [1]:
import os
cwd = os.getcwd() # should be base directory of repository
os.chdir(cwd+"/policy_scraping/policy_scraping")

Run ``scrapy crawl goviefor -O ../outputs/goviefor.json`` (or you can change the -O argument to whatever you would prefer the output file information to be).

This command will generate a json containing the metadata about all the policies as well as download all files to the same outputs directory under ``forestry/full``.

In [None]:
!! scrapy crawl goviefor -O ../outputs/goviefor.json

Next we will consolidate the metadata and text of the policy PDFs into one dictionary.

In [2]:
os.chdir(cwd) # back to base directory
import json
from populate_corpora.pdfs_to_jsons import scrp_itm_to_fulltxt
FILE_DIR= cwd+"/policy_scraping/policy_scraping/outputs" # or whatever output directory you gave the scraper for its output json

In [None]:
with open(cwd+"/policy_scraping/outputs/goviefor.json","r", encoding="utf-8") as f:
    metad = json.load(f)
pdf_dict = scrp_itm_to_fulltxt(metad, FILE_DIR+"/forestry/full")

If you have your own collection of pdfs to process and don't have a metadata file, you can use this next function on just the file directory.

In [None]:
from populate_corpora.pdfs_to_jsons import pdfs_to_txt_dct
pdf_dict = pdfs_to_txt_dct(FILE_DIR+"/forestry/full") # or whatever your policy directory is

For the purposes of this project, we only want the texts of the PDFs in cleaned sentences anyways. So we'll go ahead and extract/clean those sentences, then load them into the dictionary format that doccano (labeling platform) uses. Finally, if we want, we can use a simple keyword search to prelabel some of the sentences with a "incentive class mention" label.

In [None]:
import nltk
from populate_corpora.data_cleaning import get_clean_text_sents, format_sents_for_doccano, prelabeling
EN_TOKENIZER = nltk.data.load("tokenizers/punkt/english.pickle") # need tokenizer for our text cleaning
clean_sents= get_clean_text_sents(pdf_dict, EN_TOKENIZER)
doccano_dict = format_sents_for_doccano(clean_sents)
prelab_doccano_dict = prelabeling(doccano_dict)

Now we can download this dictionary as a json to import into our doccano instance for labeling.

In [None]:
with open(cwd+"/populate_corpora/outputs/ready_to_label.json", 'w', encoding="utf-8") as outfile:
    json.dump(prelab_doccano_dict, outfile, ensure_ascii=False, indent=4)

# 2. Labeling the Data

## Augmentation via Sentence Similarity Search

We also need to make a new human-in-the-loop dataset using by doing sentence similarity searches with predefined queries. We have five queries for each label.

In [None]:
with open(cwd+"/populate_corpora/outputs/ready_to_label.json","r", encoding="utf-8") as f:
    prelab_doccano_dict = json.load(f)

In [None]:
from populate_corpora.query_augment import run_embedder, run_queries, QUERIES_DCT
from populate_corpora.data_cleaning import dcno_to_only_sents

# loading all sentences, not just the labeled ones
# or reload cwd+"/populate_corpora/outputs/ready_to_label.json"
all_sents = dcno_to_only_sents(prelab_doccano_dict) 
embs, s_sentences, model = run_embedder(sample=False, dev='cuda', data=all_sents, unique=True)
# uses our queries dictionary, but obvs you can make your own
qry_dct = run_queries(embs, s_sentences, model, qry_dct=QUERIES_DCT, dev='cuda', sim_thresh=0.5, res_lim=1000)

NameError: name 'prelab_doccano_dict' is not defined

Now we'll parse the results and create a dataset of sentences labeled by the query process, but we first need to filter them to only include sentences found by at least 4/5 queries for each label.

In [None]:
from populate_corpora.query_augment import consolidate_sents, crossref_sents
lbl_qry_dct = consolidate_sents(qry_dct, QUERIES_DCT)
filt_qry_dct = crossref_sents(lbl_qry_dct, 4)
qry_rs_dataset = [{'text': sent, 'label': lbl} for lbl in list(filt_qry_dct) for sent in filt_qry_dct[lbl]]

In [None]:
with open(cwd+"/populate_corpora/outputs/augmented_to_label.json", 'w', encoding="utf-8") as outfile:
    json.dump(qry_rs_dataset, outfile, ensure_ascii=False, indent=4)

## External Annotation

We used a doccano instance for our labeling, but we also had to do some data validation with an external annotator. This section generates a subset for a labeler from the hand-labeled dataset.

In [5]:
from populate_corpora.annotators import resample_forannot
from populate_corpora.data_cleaning import dcno_to_sentlab, remove_duplicates, group_duplicates
with open(cwd+"/inputs/19Jan25_firstdatarev.json","r", encoding="utf-8") as f:
    dcno_json = json.load(f)
with open(cwd+"/inputs/27Jan25_query_checked.json","r", encoding="utf-8") as f:
    qry_json = json.load(f)
sents1, labels1 = dcno_to_sentlab(dcno_json)
sents2, labels2 = dcno_to_sentlab(qry_json)
sents3 = sents1+sents2
labels3 = labels1+labels2
all_sents, all_labs = remove_duplicates(group_duplicates(sents3,labels3,thresh=90))

1419 groups found with a threshold of 90
Sanity check: 1419 sentences and 1419 labels


In [10]:
ann_sents, ann_labels = resample_forannot(all_sents, all_labs, 0.3, 0.5)
print(round(len(ann_sents)/len(all_sents), 3))

Counter({'Non-Incentive': 1150, 'Supplies': 81, 'Technical_assistance': 75, 'Direct_payment': 62, 'Fine': 23, 'Credit': 19, 'Tax_deduction': 9})
Counter({'Supplies': 81, 'Technical_assistance': 75, 'Direct_payment': 62, 'Fine': 23, 'Credit': 19, 'Tax_deduction': 9})
Counter({'Supplies': 24, 'Technical_assistance': 22, 'Direct_payment': 19, 'Fine': 7, 'Credit': 6, 'Tax_deduction': 3})
Counter({'Non-Incentive': 81, 'Supplies': 24, 'Technical_assistance': 22, 'Direct_payment': 19, 'Fine': 7, 'Credit': 6, 'Tax_deduction': 3})
Should be true: True
162 162
0.114


In [13]:
import random
ann_frame = [{'text':ann_sents[i], 'label':[]} for i in range(len(ann_labels))]
random.shuffle(ann_frame)
with open(cwd+"/inputs/subsample.json", 'w', encoding="utf-8") as outfile:
    json.dump(ann_frame, outfile, ensure_ascii=False, indent=4)
val_frame = [{'text':ann_sents[i], 'label':ann_labels[i]} for i in range(len(ann_labels))]
with open(cwd+"/inputs/subsample_key.json", 'w', encoding="utf-8") as outfile:
    json.dump(val_frame, outfile, ensure_ascii=False, indent=4)

Now let's check the inter-annotator agreement.

In [None]:
with open(cwd+"/inputs/annotation_odon.json","r", encoding="utf-8") as f: #our hand-labeled dataset
    ann_json = json.load(f)

sents_a, labels_a = dcno_to_sentlab(ann_json)
# correct labels
swap_labs = {'non-incentive':'Non-Incentive', 'fine':'Fine', 'tax deduction':'Tax_deduction', 'credit':'Credit', 'direct payment':'Direct_payment', 'supplies':'Supplies', 'technical assistance':'Technical_assistance'}
sents_a2, labels_a2 = [], []
for i, lab in enumerate(labels_a):
  try:
    labels_a2.append(swap_labs[lab])
    sents_a2.append(sents_a[i])
  except:
    pass

In [None]:
from populate_corpora.annotators import get_common_sentlabs, all_to_bin, all_to_sharedmc
from sklearn.metrics import cohen_kappa_score

s_sents, labels_sc, labels_sa = get_common_sentlabs(sents_d, labels_d, sents_a2, labels_a2)
#print(f"All: {cohen_kappa_score(labels_sc, labels_sa)} for {len(labels_sc)} entries")

labs_binc, labs_bina = all_to_bin(labels_sc), all_to_bin(labels_sa)
print(f"Binary: {cohen_kappa_score(labs_binc, labs_bina)} for {len(labs_binc)} entries")

mclabsc, mclaba = all_to_sharedmc(labels_sc, labels_sa, labs_binc, labs_bina)
print(f"Multiclass: {cohen_kappa_score(mclabsc, mclaba)} for {len(mclabsc)} entries")

All: 0.7707100591715976 for 62 entries
Binary: 0.7114788004136505 for 62 entries
Multiclass: 0.9534883720930233 for 26 entries


## Consolidation into Final Dataset

We now have all of our data labeled, so it is time to create a final dataset broken into training and testing sets.

In [3]:
from populate_corpora.data_cleaning import dcno_to_sentlab
from classifier.run_classifiers import group_duplicates, remove_duplicates

with open(cwd+"/inputs/19Jan25_firstdatarev.json","r", encoding="utf-8") as f: #our hand-labeled dataset
    dcno_json = json.load(f)
with open(cwd+"/inputs/27Jan25_query_checked.json","r", encoding="utf-8") as f: #our human-in-the-loop dataset
    aug_json = json.load(f)

sents_d, labels_d = dcno_to_sentlab(dcno_json)
sents_a, labels_a = dcno_to_sentlab(aug_json)

all_sents = sents_d+sents_a
all_labs = labels_d+labels_a
sentences, labels = remove_duplicates(group_duplicates(all_sents,all_labs,thresh=90))

1419 groups found with a threshold of 90
Sanity check: 1419 sentences and 1419 labels


# 3. Fine-Tuning Our Models

First we will construct and save a few different splits of our data into DatasetDicts containing Training, Testing, and Holdout sets.

In [4]:
from classifier.finetune import load_labelintdcts, create_dsdict
int2label_dct, label2int_dct = load_labelintdcts()
sims = [0,3,6,9]
create_dsdict(sentences, labels, label2int_dct, amt=sims, save=True, output_dir=f"{cwd}/outputs/models")

Sanity Check: 269 incentive sentences and 1150 non-incentive sentences
Incentives: 0.18957011980267793; Non-Incentives: 0.8104298801973221
Sanity Check: 269 incentive sentences and 269 incentive labels

Round 0



Saving the dataset (0/1 shards):   0%|          | 0/851 [00:00<?, ? examples/s]

Saving the dataset (0/1 shards):   0%|          | 0/284 [00:00<?, ? examples/s]

Saving the dataset (0/1 shards):   0%|          | 0/284 [00:00<?, ? examples/s]

Saved ds_0_bn


Saving the dataset (0/1 shards):   0%|          | 0/161 [00:00<?, ? examples/s]

Saving the dataset (0/1 shards):   0%|          | 0/54 [00:00<?, ? examples/s]

Saving the dataset (0/1 shards):   0%|          | 0/54 [00:00<?, ? examples/s]

Saved ds_0_mc

Round 3



Saving the dataset (0/1 shards):   0%|          | 0/851 [00:00<?, ? examples/s]

Saving the dataset (0/1 shards):   0%|          | 0/284 [00:00<?, ? examples/s]

Saving the dataset (0/1 shards):   0%|          | 0/284 [00:00<?, ? examples/s]

Saved ds_3_bn


Saving the dataset (0/1 shards):   0%|          | 0/161 [00:00<?, ? examples/s]

Saving the dataset (0/1 shards):   0%|          | 0/54 [00:00<?, ? examples/s]

Saving the dataset (0/1 shards):   0%|          | 0/54 [00:00<?, ? examples/s]

Saved ds_3_mc

Round 6



Saving the dataset (0/1 shards):   0%|          | 0/851 [00:00<?, ? examples/s]

Saving the dataset (0/1 shards):   0%|          | 0/284 [00:00<?, ? examples/s]

Saving the dataset (0/1 shards):   0%|          | 0/284 [00:00<?, ? examples/s]

Saved ds_6_bn


Saving the dataset (0/1 shards):   0%|          | 0/161 [00:00<?, ? examples/s]

Saving the dataset (0/1 shards):   0%|          | 0/54 [00:00<?, ? examples/s]

Saving the dataset (0/1 shards):   0%|          | 0/54 [00:00<?, ? examples/s]

Saved ds_6_mc

Round 9



Saving the dataset (0/1 shards):   0%|          | 0/851 [00:00<?, ? examples/s]

Saving the dataset (0/1 shards):   0%|          | 0/284 [00:00<?, ? examples/s]

Saving the dataset (0/1 shards):   0%|          | 0/284 [00:00<?, ? examples/s]

Saved ds_9_bn


Saving the dataset (0/1 shards):   0%|          | 0/161 [00:00<?, ? examples/s]

Saving the dataset (0/1 shards):   0%|          | 0/54 [00:00<?, ? examples/s]

Saving the dataset (0/1 shards):   0%|          | 0/54 [00:00<?, ? examples/s]

Saved ds_9_mc


In [3]:
import torch
from classifier.finetune import finetune_automodel
from datasets import DatasetDict

We will run our finetuning, training the whole model.

In [6]:
for e in sims:
    bn_ds = DatasetDict.load_from_disk(f"{cwd}/outputs/models/ds_{e}_bn")
    mc_ds = DatasetDict.load_from_disk(f"{cwd}/outputs/models/ds_{e}_mc")
    for model in ["sentence-transformers/paraphrase-xlm-r-multilingual-v1"]:
        torch.cuda.empty_cache()
        finetune_automodel(bn_ds, int2label_dct["bn"], label2int_dct["bn"], "bn", model_name=model, dev='cuda', rstate=e, output_dir=f"{cwd}/outputs/models", only_head=False)
        print(f"\nSaved {model} binary model.")
        torch.cuda.empty_cache()
        finetune_automodel(mc_ds, int2label_dct["mc"], label2int_dct["mc"], "mc", model_name=model, dev='cuda', rstate=e, output_dir=f"{cwd}/outputs/models", only_head=False)
        print(f"\nSaved {model} multiclass model.")


Loading model sentence-transformers/paraphrase-xlm-r-multilingual-v1

Tokenizing


Map:   0%|          | 0/851 [00:00<?, ? examples/s]

Map:   0%|          | 0/284 [00:00<?, ? examples/s]

Map:   0%|          | 0/284 [00:00<?, ? examples/s]

Loading model


Some weights of XLMRobertaForSequenceClassification were not initialized from the model checkpoint at sentence-transformers/paraphrase-xlm-r-multilingual-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Training


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.343813,0.809859,0.894942
2,No log,0.283199,0.852113,0.903226
3,No log,0.305607,0.883803,0.928105
4,No log,0.404171,0.897887,0.937634
5,No log,0.486685,0.897887,0.938689
6,No log,0.456393,0.90493,0.941935
7,No log,0.496907,0.90493,0.942431
8,No log,0.492746,0.90493,0.942184
9,No log,0.521452,0.90493,0.942431
10,0.100400,0.52729,0.90493,0.942431


Saving


{'accuracy': 0.778169014084507, 'f1': 0.8496420047732696}

Saved paraphrase-xlm-r-multilingual-v1_bn_e10_r0.

Done in 8.78 min

Saved sentence-transformers/paraphrase-xlm-r-multilingual-v1 binary model.

Loading model sentence-transformers/paraphrase-xlm-r-multilingual-v1

Tokenizing


Map:   0%|          | 0/161 [00:00<?, ? examples/s]

Map:   0%|          | 0/54 [00:00<?, ? examples/s]

Map:   0%|          | 0/54 [00:00<?, ? examples/s]

Loading model


Some weights of XLMRobertaForSequenceClassification were not initialized from the model checkpoint at sentence-transformers/paraphrase-xlm-r-multilingual-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Training


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,1.574677,0.462963,0.411354
2,No log,1.402695,0.555556,0.491266
3,No log,1.245372,0.685185,0.612842
4,No log,1.061631,0.703704,0.64986
5,No log,0.896235,0.722222,0.676935
6,No log,0.793943,0.777778,0.754321
7,No log,0.722086,0.777778,0.751488
8,No log,0.683001,0.814815,0.795944
9,No log,0.659451,0.814815,0.796035
10,No log,0.653398,0.814815,0.796035


Saving


{'accuracy': 0.8333333333333334, 'f1': 0.819518257597605}

Saved paraphrase-xlm-r-multilingual-v1_mc_e10_r0.

Done in 4.34 min

Saved sentence-transformers/paraphrase-xlm-r-multilingual-v1 multiclass model.

Loading model sentence-transformers/paraphrase-xlm-r-multilingual-v1

Tokenizing


Map:   0%|          | 0/851 [00:00<?, ? examples/s]

Map:   0%|          | 0/284 [00:00<?, ? examples/s]

Map:   0%|          | 0/284 [00:00<?, ? examples/s]

Loading model


Some weights of XLMRobertaForSequenceClassification were not initialized from the model checkpoint at sentence-transformers/paraphrase-xlm-r-multilingual-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Training


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.368379,0.81338,0.896686
2,No log,0.273428,0.901408,0.939655
3,No log,0.31361,0.873239,0.917808
4,No log,0.328862,0.911972,0.946921
5,No log,0.370678,0.911972,0.946004
6,No log,0.428981,0.908451,0.942982
7,No log,0.443349,0.897887,0.936819
8,No log,0.432408,0.90493,0.941432
9,No log,0.446615,0.90493,0.941176
10,0.116600,0.447195,0.90493,0.941176


Saving


{'accuracy': 0.8767605633802817, 'f1': 0.9240780911062906}

Saved paraphrase-xlm-r-multilingual-v1_bn_e10_r3.

Done in 11.2 min

Saved sentence-transformers/paraphrase-xlm-r-multilingual-v1 binary model.

Loading model sentence-transformers/paraphrase-xlm-r-multilingual-v1

Tokenizing


Map:   0%|          | 0/161 [00:00<?, ? examples/s]

Map:   0%|          | 0/54 [00:00<?, ? examples/s]

Map:   0%|          | 0/54 [00:00<?, ? examples/s]

Loading model


Some weights of XLMRobertaForSequenceClassification were not initialized from the model checkpoint at sentence-transformers/paraphrase-xlm-r-multilingual-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Training


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,1.53572,0.407407,0.297855
2,No log,1.372852,0.574074,0.496311
3,No log,1.196437,0.759259,0.711751
4,No log,1.013258,0.796296,0.749774
5,No log,0.853096,0.833333,0.78566
6,No log,0.738578,0.833333,0.78566
7,No log,0.670138,0.87037,0.840965
8,No log,0.612666,0.87037,0.839929
9,No log,0.581006,0.87037,0.839929
10,No log,0.567984,0.87037,0.839929


Saving


{'accuracy': 0.8333333333333334, 'f1': 0.8053380810707397}

Saved paraphrase-xlm-r-multilingual-v1_mc_e10_r3.

Done in 4.72 min

Saved sentence-transformers/paraphrase-xlm-r-multilingual-v1 multiclass model.

Loading model sentence-transformers/paraphrase-xlm-r-multilingual-v1

Tokenizing


Map:   0%|          | 0/851 [00:00<?, ? examples/s]

Map:   0%|          | 0/284 [00:00<?, ? examples/s]

Map:   0%|          | 0/284 [00:00<?, ? examples/s]

Loading model


Some weights of XLMRobertaForSequenceClassification were not initialized from the model checkpoint at sentence-transformers/paraphrase-xlm-r-multilingual-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Training


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.363769,0.820423,0.899804
2,No log,0.306291,0.876761,0.926316
3,No log,0.321357,0.887324,0.930736
4,No log,0.354119,0.901408,0.939914
5,No log,0.476546,0.873239,0.919283
6,No log,0.521471,0.869718,0.917226
7,No log,0.521908,0.866197,0.915179
8,No log,0.520706,0.880282,0.924779
9,No log,0.533615,0.876761,0.922395
10,0.106700,0.542574,0.876761,0.922395


Saving


{'accuracy': 0.8767605633802817, 'f1': 0.9269311064718162}

Saved paraphrase-xlm-r-multilingual-v1_bn_e10_r6.

Done in 8.87 min

Saved sentence-transformers/paraphrase-xlm-r-multilingual-v1 binary model.

Loading model sentence-transformers/paraphrase-xlm-r-multilingual-v1

Tokenizing


Map:   0%|          | 0/161 [00:00<?, ? examples/s]

Map:   0%|          | 0/54 [00:00<?, ? examples/s]

Map:   0%|          | 0/54 [00:00<?, ? examples/s]

Loading model


Some weights of XLMRobertaForSequenceClassification were not initialized from the model checkpoint at sentence-transformers/paraphrase-xlm-r-multilingual-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Training


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,1.52292,0.333333,0.204793
2,No log,1.35363,0.574074,0.493772
3,No log,1.159755,0.685185,0.614969
4,No log,0.976095,0.685185,0.634619
5,No log,0.860402,0.685185,0.635558
6,No log,0.789794,0.722222,0.67898
7,No log,0.735101,0.740741,0.696979
8,No log,0.688434,0.759259,0.71761
9,No log,0.660684,0.777778,0.753469
10,No log,0.650929,0.796296,0.770208


Saving


{'accuracy': 0.9074074074074074, 'f1': 0.8886530208369289}

Saved paraphrase-xlm-r-multilingual-v1_mc_e10_r6.

Done in 2.86 min

Saved sentence-transformers/paraphrase-xlm-r-multilingual-v1 multiclass model.

Loading model sentence-transformers/paraphrase-xlm-r-multilingual-v1

Tokenizing


Map:   0%|          | 0/851 [00:00<?, ? examples/s]

Map:   0%|          | 0/284 [00:00<?, ? examples/s]

Map:   0%|          | 0/284 [00:00<?, ? examples/s]

Loading model


Some weights of XLMRobertaForSequenceClassification were not initialized from the model checkpoint at sentence-transformers/paraphrase-xlm-r-multilingual-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Training


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.358076,0.809859,0.894942
2,No log,0.300066,0.883803,0.929336
3,No log,0.43377,0.887324,0.932203
4,No log,0.456302,0.901408,0.940171
5,No log,0.590793,0.873239,0.921397
6,No log,0.616393,0.876761,0.924406
7,No log,0.61776,0.883803,0.929032
8,No log,0.666555,0.880282,0.925764
9,No log,0.660422,0.876761,0.924078
10,0.094800,0.657234,0.880282,0.926407


Saving


{'accuracy': 0.8450704225352113, 'f1': 0.905579399141631}

Saved paraphrase-xlm-r-multilingual-v1_bn_e10_r9.

Done in 9.43 min

Saved sentence-transformers/paraphrase-xlm-r-multilingual-v1 binary model.

Loading model sentence-transformers/paraphrase-xlm-r-multilingual-v1

Tokenizing


Map:   0%|          | 0/161 [00:00<?, ? examples/s]

Map:   0%|          | 0/54 [00:00<?, ? examples/s]

Map:   0%|          | 0/54 [00:00<?, ? examples/s]

Loading model


Some weights of XLMRobertaForSequenceClassification were not initialized from the model checkpoint at sentence-transformers/paraphrase-xlm-r-multilingual-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Training


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,1.517122,0.555556,0.486888
2,No log,1.272874,0.685185,0.610093
3,No log,1.064645,0.722222,0.666769
4,No log,0.842551,0.740741,0.696347
5,No log,0.689356,0.777778,0.732655
6,No log,0.593993,0.814815,0.769427
7,No log,0.520832,0.851852,0.825037
8,No log,0.468244,0.87037,0.851534
9,No log,0.443813,0.888889,0.872761
10,No log,0.433919,0.888889,0.872761


Saving


{'accuracy': 0.8703703703703703, 'f1': 0.839360929557008}

Saved paraphrase-xlm-r-multilingual-v1_mc_e10_r9.

Done in 4.42 min

Saved sentence-transformers/paraphrase-xlm-r-multilingual-v1 multiclass model.


Next we'll train models using only the classification head

In [7]:
from classifier.finetune import load_labelintdcts
int2label_dct, label2int_dct = load_labelintdcts()
sims=[0,3,6,9]
for e in sims:
    bn_ds = DatasetDict.load_from_disk(f"{cwd}/outputs/models/ds_{e}_bn")
    mc_ds = DatasetDict.load_from_disk(f"{cwd}/outputs/models/ds_{e}_mc")
    for model in ["sentence-transformers/paraphrase-xlm-r-multilingual-v1"]:
        torch.cuda.empty_cache()
        finetune_automodel(bn_ds, int2label_dct["bn"], label2int_dct["bn"], "bn", model_name=model, dev='cuda', rstate=e, output_dir=f"{cwd}/outputs/models", only_head=True)
        print(f"\nSaved {model} binary model.")
        torch.cuda.empty_cache()
        finetune_automodel(mc_ds, int2label_dct["mc"], label2int_dct["mc"], "mc", model_name=model, dev='cuda', rstate=e, output_dir=f"{cwd}/outputs/models", only_head=True)
        print(f"\nSaved {model} multiclass model.")


Loading model sentence-transformers/paraphrase-xlm-r-multilingual-v1

Tokenizing


Map:   0%|          | 0/284 [00:00<?, ? examples/s]

Loading model


Some weights of XLMRobertaForSequenceClassification were not initialized from the model checkpoint at sentence-transformers/paraphrase-xlm-r-multilingual-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Training


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.523799,0.809859,0.894942
2,No log,0.478252,0.809859,0.894942
3,No log,0.464461,0.809859,0.894942
4,No log,0.456022,0.809859,0.894942
5,No log,0.44928,0.809859,0.894942
6,No log,0.444067,0.809859,0.894942
7,No log,0.439994,0.809859,0.894942
8,No log,0.437136,0.809859,0.894942
9,No log,0.435478,0.809859,0.894942
10,0.467200,0.434909,0.809859,0.894942


Saving


{'accuracy': 0.8098591549295775, 'f1': 0.8949416342412452}

Saved paraphrase-xlm-r-multilingual-v1_bn_e10_r0onlyhead.

Done in 2.96 min

Saved sentence-transformers/paraphrase-xlm-r-multilingual-v1 binary model.

Loading model sentence-transformers/paraphrase-xlm-r-multilingual-v1

Tokenizing


Map:   0%|          | 0/54 [00:00<?, ? examples/s]

Some weights of XLMRobertaForSequenceClassification were not initialized from the model checkpoint at sentence-transformers/paraphrase-xlm-r-multilingual-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Loading model
Training


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,1.759354,0.296296,0.203114
2,No log,1.731851,0.296296,0.204727
3,No log,1.709877,0.314815,0.259039
4,No log,1.691832,0.37037,0.299275
5,No log,1.677372,0.425926,0.355907
6,No log,1.665834,0.444444,0.38309
7,No log,1.656386,0.462963,0.396825
8,No log,1.650141,0.462963,0.396825
9,No log,1.646526,0.462963,0.396825
10,No log,1.64527,0.481481,0.424784


Saving


{'accuracy': 0.5370370370370371, 'f1': 0.4718294051627386}

Saved paraphrase-xlm-r-multilingual-v1_mc_e10_r0onlyhead.

Done in 2.06 min

Saved sentence-transformers/paraphrase-xlm-r-multilingual-v1 multiclass model.

Loading model sentence-transformers/paraphrase-xlm-r-multilingual-v1

Tokenizing


Map:   0%|          | 0/284 [00:00<?, ? examples/s]

Some weights of XLMRobertaForSequenceClassification were not initialized from the model checkpoint at sentence-transformers/paraphrase-xlm-r-multilingual-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Loading model
Training


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.51847,0.809859,0.894942
2,No log,0.475412,0.809859,0.894942
3,No log,0.462726,0.809859,0.894942
4,No log,0.454434,0.809859,0.894942
5,No log,0.448233,0.809859,0.894942
6,No log,0.443016,0.809859,0.894942
7,No log,0.439712,0.809859,0.894942
8,No log,0.437161,0.809859,0.894942
9,No log,0.435615,0.809859,0.894942
10,0.468900,0.435145,0.809859,0.894942


Saving


{'accuracy': 0.8098591549295775, 'f1': 0.8949416342412452}

Saved paraphrase-xlm-r-multilingual-v1_bn_e10_r3onlyhead.

Done in 3.17 min

Saved sentence-transformers/paraphrase-xlm-r-multilingual-v1 binary model.

Loading model sentence-transformers/paraphrase-xlm-r-multilingual-v1

Tokenizing


Map:   0%|          | 0/54 [00:00<?, ? examples/s]

Some weights of XLMRobertaForSequenceClassification were not initialized from the model checkpoint at sentence-transformers/paraphrase-xlm-r-multilingual-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Loading model
Training


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,1.790219,0.277778,0.202778
2,No log,1.75962,0.296296,0.196296
3,No log,1.734415,0.314815,0.229333
4,No log,1.713307,0.351852,0.284394
5,No log,1.696309,0.333333,0.244092
6,No log,1.683246,0.351852,0.264534
7,No log,1.673312,0.351852,0.264534
8,No log,1.666089,0.37037,0.269632
9,No log,1.661973,0.388889,0.290647
10,No log,1.660497,0.388889,0.290647


Saving


{'accuracy': 0.46296296296296297, 'f1': 0.366358024691358}

Saved paraphrase-xlm-r-multilingual-v1_mc_e10_r3onlyhead.

Done in 0.58 min

Saved sentence-transformers/paraphrase-xlm-r-multilingual-v1 multiclass model.

Loading model sentence-transformers/paraphrase-xlm-r-multilingual-v1

Tokenizing


Map:   0%|          | 0/284 [00:00<?, ? examples/s]

Some weights of XLMRobertaForSequenceClassification were not initialized from the model checkpoint at sentence-transformers/paraphrase-xlm-r-multilingual-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Loading model
Training


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.52227,0.809859,0.894942
2,No log,0.47661,0.809859,0.894942
3,No log,0.462423,0.809859,0.894942
4,No log,0.453162,0.809859,0.894942
5,No log,0.44593,0.809859,0.894942
6,No log,0.440581,0.809859,0.894942
7,No log,0.436258,0.809859,0.894942
8,No log,0.433482,0.809859,0.894942
9,No log,0.431747,0.809859,0.894942
10,0.470200,0.431208,0.809859,0.894942


Saving


{'accuracy': 0.8098591549295775, 'f1': 0.8949416342412452}

Saved paraphrase-xlm-r-multilingual-v1_bn_e10_r6onlyhead.

Done in 3.72 min

Saved sentence-transformers/paraphrase-xlm-r-multilingual-v1 binary model.

Loading model sentence-transformers/paraphrase-xlm-r-multilingual-v1

Tokenizing


Map:   0%|          | 0/54 [00:00<?, ? examples/s]

Some weights of XLMRobertaForSequenceClassification were not initialized from the model checkpoint at sentence-transformers/paraphrase-xlm-r-multilingual-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Loading model
Training


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,1.758882,0.259259,0.212654
2,No log,1.730268,0.277778,0.198098
3,No log,1.707085,0.296296,0.209796
4,No log,1.687642,0.277778,0.184334
5,No log,1.671468,0.314815,0.231629
6,No log,1.659152,0.333333,0.24286
7,No log,1.649816,0.333333,0.216851
8,No log,1.643117,0.351852,0.253518
9,No log,1.639138,0.351852,0.253518
10,No log,1.63781,0.351852,0.253518


Saving


{'accuracy': 0.37037037037037035, 'f1': 0.2729148246389626}

Saved paraphrase-xlm-r-multilingual-v1_mc_e10_r6onlyhead.

Done in 1.2 min

Saved sentence-transformers/paraphrase-xlm-r-multilingual-v1 multiclass model.

Loading model sentence-transformers/paraphrase-xlm-r-multilingual-v1

Tokenizing


Map:   0%|          | 0/284 [00:00<?, ? examples/s]

Loading model


Some weights of XLMRobertaForSequenceClassification were not initialized from the model checkpoint at sentence-transformers/paraphrase-xlm-r-multilingual-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Training


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.515835,0.809859,0.894942
2,No log,0.480461,0.809859,0.894942
3,No log,0.467373,0.809859,0.894942
4,No log,0.458463,0.809859,0.894942
5,No log,0.451104,0.809859,0.894942
6,No log,0.445389,0.809859,0.894942
7,No log,0.441515,0.809859,0.894942
8,No log,0.438504,0.809859,0.894942
9,No log,0.436738,0.809859,0.894942
10,0.464000,0.436163,0.809859,0.894942


Saving


{'accuracy': 0.8098591549295775, 'f1': 0.8949416342412452}

Saved paraphrase-xlm-r-multilingual-v1_bn_e10_r9onlyhead.

Done in 3.47 min

Saved sentence-transformers/paraphrase-xlm-r-multilingual-v1 binary model.

Loading model sentence-transformers/paraphrase-xlm-r-multilingual-v1

Tokenizing


Map:   0%|          | 0/54 [00:00<?, ? examples/s]

Some weights of XLMRobertaForSequenceClassification were not initialized from the model checkpoint at sentence-transformers/paraphrase-xlm-r-multilingual-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Loading model
Training


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,1.764858,0.407407,0.321818
2,No log,1.733339,0.388889,0.277995
3,No log,1.707193,0.407407,0.297313
4,No log,1.685422,0.518519,0.38406
5,No log,1.668736,0.5,0.366613
6,No log,1.655198,0.518519,0.378122
7,No log,1.644625,0.5,0.364918
8,No log,1.637535,0.5,0.366575
9,No log,1.63326,0.5,0.369929
10,No log,1.631814,0.5,0.369929


Saving


{'accuracy': 0.5370370370370371, 'f1': 0.4158277936055714}

Saved paraphrase-xlm-r-multilingual-v1_mc_e10_r9onlyhead.

Done in 0.99 min

Saved sentence-transformers/paraphrase-xlm-r-multilingual-v1 multiclass model.


Now we'll evaluate our models using both the model classification heads and SVM classifiers based on model embeddings.

In [3]:
from classifier.ft_classification import run_experiments, modelpred_dsdct_clsf, svm_dsdct_clsf
from classifier.run_classifiers import res_dct_to_cls_rpt, cls_rpt_to_exp_rpt
from classifier.finetune import load_labelintdcts
int2label_dct, label2int_dct = load_labelintdcts()
outfn = "26Mar25"
model_results_dict, svm_results_dict = run_experiments(int2label_dct, label2int_dct, cwd+"/outputs/models", cwd+"/outputs/models", cuda=True)
with open(f"{cwd}/outputs/models/randp_{outfn}_model.json", 'w', encoding="utf-8") as outfile:
    json.dump(model_results_dict, outfile, ensure_ascii=False, indent=4)
with open(f"{cwd}/outputs/models/randp_{outfn}_svm.json", 'w', encoding="utf-8") as outfile:
    json.dump(svm_results_dict, outfile, ensure_ascii=False, indent=4)
mdl_cls_rpt = res_dct_to_cls_rpt(model_results_dict, int2label_dct)
mdl_exp_rpt = cls_rpt_to_exp_rpt(mdl_cls_rpt)
with open(f"{cwd}/outputs/models/exprpt_{outfn}_mdl.json", 'w', encoding="utf-8") as outfile:
    json.dump(mdl_exp_rpt, outfile, ensure_ascii=False, indent=4)
svm_cls_rpt = res_dct_to_cls_rpt(svm_results_dict, int2label_dct)
svm_exp_rpt = cls_rpt_to_exp_rpt(svm_cls_rpt)
with open(f"{cwd}/outputs/models/exprpt_{outfn}_svm.json", 'w', encoding="utf-8") as outfile:
    json.dump(svm_exp_rpt, outfile, ensure_ascii=False, indent=4)

# add parameter for model location?/datasetdict location in ft classification from below error
# add parameter for checking if label is int or str in run classifiers processing results


['bert_bn_e10_r0', 'bert_mc_e10_r0', 'bert_bn_e10_r0_oh', 'bert_mc_e10_r0_oh', 'bert_bn_e10_r3', 'bert_mc_e10_r3', 'bert_bn_e10_r3_oh', 'bert_mc_e10_r3_oh', 'bert_bn_e10_r6', 'bert_mc_e10_r6', 'bert_bn_e10_r6_oh', 'bert_mc_e10_r6_oh', 'bert_bn_e10_r9', 'bert_mc_e10_r9', 'bert_bn_e10_r9_oh', 'bert_mc_e10_r9_oh']

Running model bert_bn_e10_r0
Loading tokenizer
Loading model
Running model


100%|██████████| 9/9 [00:03<00:00,  2.96it/s]


Freeing memory


No sentence-transformers model found with name c:\Users\allie\Documents\GitHub\policy-classifier/outputs/models/paraphrase-xlm-r-multilingual-v1_bn_e10_r0.pt. Creating a new one with mean pooling.
Some weights of XLMRobertaModel were not initialized from the model checkpoint at c:\Users\allie\Documents\GitHub\policy-classifier/outputs/models/paraphrase-xlm-r-multilingual-v1_bn_e10_r0.pt and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
100%|██████████| 1135/1135 [00:15<00:00, 75.30it/s]


Encoding test sentences.


100%|██████████| 284/284 [00:03<00:00, 76.15it/s]


bert_bn_e10_r0 run completed in in 27.28s

Running model bert_mc_e10_r0
Loading tokenizer
Loading model
Running model


100%|██████████| 2/2 [00:00<00:00,  3.11it/s]


Freeing memory


No sentence-transformers model found with name c:\Users\allie\Documents\GitHub\policy-classifier/outputs/models/paraphrase-xlm-r-multilingual-v1_mc_e10_r0.pt. Creating a new one with mean pooling.
Some weights of XLMRobertaModel were not initialized from the model checkpoint at c:\Users\allie\Documents\GitHub\policy-classifier/outputs/models/paraphrase-xlm-r-multilingual-v1_mc_e10_r0.pt and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
100%|██████████| 215/215 [00:02<00:00, 80.15it/s]


Encoding test sentences.


100%|██████████| 54/54 [00:00<00:00, 73.71it/s]


bert_mc_e10_r0 run completed in in 7.03s

Running model bert_bn_e10_r0_oh
Loading tokenizer
Loading model
Running model


100%|██████████| 9/9 [00:03<00:00,  2.78it/s]


Freeing memory


No sentence-transformers model found with name c:\Users\allie\Documents\GitHub\policy-classifier/outputs/models/paraphrase-xlm-r-multilingual-v1_bn_e10_r0onlyhead.pt. Creating a new one with mean pooling.
Some weights of XLMRobertaModel were not initialized from the model checkpoint at c:\Users\allie\Documents\GitHub\policy-classifier/outputs/models/paraphrase-xlm-r-multilingual-v1_bn_e10_r0onlyhead.pt and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
100%|██████████| 1135/1135 [00:14<00:00, 79.12it/s]


Encoding test sentences.


100%|██████████| 284/284 [00:03<00:00, 71.12it/s]


bert_bn_e10_r0_oh run completed in in 25.75s

Running model bert_mc_e10_r0_oh

Error in bert_mc_e10_r0_oh: No such file: 'c:/Users/allie/Documents/GitHub/policy-classifier/outputs/models/ds_h_mc/dataset_dict.json'. Expected to load a `DatasetDict` object, but provided path is not a `DatasetDict`.

bert_mc_e10_r0_oh run completed in in 0.0s

Running model bert_bn_e10_r3
Loading tokenizer
Loading model
Running model


100%|██████████| 9/9 [00:03<00:00,  2.47it/s]


Freeing memory


No sentence-transformers model found with name c:\Users\allie\Documents\GitHub\policy-classifier/outputs/models/paraphrase-xlm-r-multilingual-v1_bn_e10_r3.pt. Creating a new one with mean pooling.
Some weights of XLMRobertaModel were not initialized from the model checkpoint at c:\Users\allie\Documents\GitHub\policy-classifier/outputs/models/paraphrase-xlm-r-multilingual-v1_bn_e10_r3.pt and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
100%|██████████| 1135/1135 [00:15<00:00, 71.81it/s]


Encoding test sentences.


100%|██████████| 284/284 [00:03<00:00, 72.51it/s]


bert_bn_e10_r3 run completed in in 26.39s

Running model bert_mc_e10_r3
Loading tokenizer
Loading model
Running model


100%|██████████| 2/2 [00:00<00:00,  3.42it/s]


Freeing memory


No sentence-transformers model found with name c:\Users\allie\Documents\GitHub\policy-classifier/outputs/models/paraphrase-xlm-r-multilingual-v1_mc_e10_r3.pt. Creating a new one with mean pooling.
Some weights of XLMRobertaModel were not initialized from the model checkpoint at c:\Users\allie\Documents\GitHub\policy-classifier/outputs/models/paraphrase-xlm-r-multilingual-v1_mc_e10_r3.pt and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
100%|██████████| 215/215 [00:02<00:00, 79.31it/s]


Encoding test sentences.


100%|██████████| 54/54 [00:00<00:00, 81.68it/s]


bert_mc_e10_r3 run completed in in 6.92s

Running model bert_bn_e10_r3_oh
Loading tokenizer
Loading model
Running model


100%|██████████| 9/9 [00:03<00:00,  2.47it/s]


Freeing memory


No sentence-transformers model found with name c:\Users\allie\Documents\GitHub\policy-classifier/outputs/models/paraphrase-xlm-r-multilingual-v1_bn_e10_r3onlyhead.pt. Creating a new one with mean pooling.
Some weights of XLMRobertaModel were not initialized from the model checkpoint at c:\Users\allie\Documents\GitHub\policy-classifier/outputs/models/paraphrase-xlm-r-multilingual-v1_bn_e10_r3onlyhead.pt and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
100%|██████████| 1135/1135 [00:15<00:00, 75.54it/s]


Encoding test sentences.


100%|██████████| 284/284 [00:03<00:00, 75.82it/s]


bert_bn_e10_r3_oh run completed in in 26.6s

Running model bert_mc_e10_r3_oh

Error in bert_mc_e10_r3_oh: No such file: 'c:/Users/allie/Documents/GitHub/policy-classifier/outputs/models/ds_h_mc/dataset_dict.json'. Expected to load a `DatasetDict` object, but provided path is not a `DatasetDict`.

bert_mc_e10_r3_oh run completed in in 0.0s

Running model bert_bn_e10_r6
Loading tokenizer
Loading model
Running model


100%|██████████| 9/9 [00:02<00:00,  3.28it/s]


Freeing memory


No sentence-transformers model found with name c:\Users\allie\Documents\GitHub\policy-classifier/outputs/models/paraphrase-xlm-r-multilingual-v1_bn_e10_r6.pt. Creating a new one with mean pooling.
Some weights of XLMRobertaModel were not initialized from the model checkpoint at c:\Users\allie\Documents\GitHub\policy-classifier/outputs/models/paraphrase-xlm-r-multilingual-v1_bn_e10_r6.pt and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
100%|██████████| 1135/1135 [00:14<00:00, 79.23it/s]


Encoding test sentences.


100%|██████████| 284/284 [00:03<00:00, 81.31it/s]


bert_bn_e10_r6 run completed in in 23.56s

Running model bert_mc_e10_r6
Loading tokenizer
Loading model
Running model


100%|██████████| 2/2 [00:00<00:00,  3.01it/s]


Freeing memory


No sentence-transformers model found with name c:\Users\allie\Documents\GitHub\policy-classifier/outputs/models/paraphrase-xlm-r-multilingual-v1_mc_e10_r6.pt. Creating a new one with mean pooling.
Some weights of XLMRobertaModel were not initialized from the model checkpoint at c:\Users\allie\Documents\GitHub\policy-classifier/outputs/models/paraphrase-xlm-r-multilingual-v1_mc_e10_r6.pt and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
100%|██████████| 215/215 [00:02<00:00, 81.93it/s]


Encoding test sentences.


100%|██████████| 54/54 [00:00<00:00, 77.87it/s]


bert_mc_e10_r6 run completed in in 6.88s

Running model bert_bn_e10_r6_oh
Loading tokenizer
Loading model
Running model


100%|██████████| 9/9 [00:02<00:00,  3.12it/s]


Freeing memory


No sentence-transformers model found with name c:\Users\allie\Documents\GitHub\policy-classifier/outputs/models/paraphrase-xlm-r-multilingual-v1_bn_e10_r6onlyhead.pt. Creating a new one with mean pooling.
Some weights of XLMRobertaModel were not initialized from the model checkpoint at c:\Users\allie\Documents\GitHub\policy-classifier/outputs/models/paraphrase-xlm-r-multilingual-v1_bn_e10_r6onlyhead.pt and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
100%|██████████| 1135/1135 [00:15<00:00, 73.69it/s]


Encoding test sentences.


100%|██████████| 284/284 [00:04<00:00, 70.42it/s]


bert_bn_e10_r6_oh run completed in in 26.49s

Running model bert_mc_e10_r6_oh

Error in bert_mc_e10_r6_oh: No such file: 'c:/Users/allie/Documents/GitHub/policy-classifier/outputs/models/ds_h_mc/dataset_dict.json'. Expected to load a `DatasetDict` object, but provided path is not a `DatasetDict`.

bert_mc_e10_r6_oh run completed in in 0.0s

Running model bert_bn_e10_r9
Loading tokenizer
Loading model
Running model


100%|██████████| 9/9 [00:03<00:00,  2.89it/s]


Freeing memory


No sentence-transformers model found with name c:\Users\allie\Documents\GitHub\policy-classifier/outputs/models/paraphrase-xlm-r-multilingual-v1_bn_e10_r9.pt. Creating a new one with mean pooling.
Some weights of XLMRobertaModel were not initialized from the model checkpoint at c:\Users\allie\Documents\GitHub\policy-classifier/outputs/models/paraphrase-xlm-r-multilingual-v1_bn_e10_r9.pt and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
100%|██████████| 1135/1135 [00:15<00:00, 73.55it/s]


Encoding test sentences.


100%|██████████| 284/284 [00:03<00:00, 75.59it/s]


bert_bn_e10_r9 run completed in in 25.24s

Running model bert_mc_e10_r9
Loading tokenizer
Loading model
Running model


100%|██████████| 2/2 [00:00<00:00,  2.88it/s]


Freeing memory


No sentence-transformers model found with name c:\Users\allie\Documents\GitHub\policy-classifier/outputs/models/paraphrase-xlm-r-multilingual-v1_mc_e10_r9.pt. Creating a new one with mean pooling.
Some weights of XLMRobertaModel were not initialized from the model checkpoint at c:\Users\allie\Documents\GitHub\policy-classifier/outputs/models/paraphrase-xlm-r-multilingual-v1_mc_e10_r9.pt and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
100%|██████████| 215/215 [00:02<00:00, 78.90it/s]


Encoding test sentences.


100%|██████████| 54/54 [00:00<00:00, 73.81it/s]


bert_mc_e10_r9 run completed in in 7.03s

Running model bert_bn_e10_r9_oh
Loading tokenizer
Loading model
Running model


100%|██████████| 9/9 [00:03<00:00,  2.63it/s]


Freeing memory


No sentence-transformers model found with name c:\Users\allie\Documents\GitHub\policy-classifier/outputs/models/paraphrase-xlm-r-multilingual-v1_bn_e10_r9onlyhead.pt. Creating a new one with mean pooling.
Some weights of XLMRobertaModel were not initialized from the model checkpoint at c:\Users\allie\Documents\GitHub\policy-classifier/outputs/models/paraphrase-xlm-r-multilingual-v1_bn_e10_r9onlyhead.pt and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
100%|██████████| 1135/1135 [00:14<00:00, 80.64it/s]


Encoding test sentences.


100%|██████████| 284/284 [00:03<00:00, 80.97it/s]


bert_bn_e10_r9_oh run completed in in 25.05s

Running model bert_mc_e10_r9_oh

Error in bert_mc_e10_r9_oh: No such file: 'c:/Users/allie/Documents/GitHub/policy-classifier/outputs/models/ds_h_mc/dataset_dict.json'. Expected to load a `DatasetDict` object, but provided path is not a `DatasetDict`.

bert_mc_e10_r9_oh run completed in in 0.0s
Time elapsed total: 3.0 min and 57 sec


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_pr

In [6]:
svm_results_dict

{'bn': {'bert_bn_e10_r0': {'real': [0,
    0,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    0,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    0,
    0,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    0,
    1,
    1,
    1,
    1,
    1,
    0,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    0,
    1,
    1,
    1,
    1,
    0,
    1,
    0,
    1,
    1,
    0,
    1,
    1,
    1,
    1,
    1,
    1,
    0,
    1,
    1,
    0,
    1,
    1,
    0,
    1,
    1,
    1,
    1,
    0,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    0,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    0,
    1,
    0,
    1,
    0,
    1,
    1,
    1,
    1,
    0,
    1,
    1,
    1,
    1,
    1,
    0,
    1,
    0,
    1,
    1,
    1,
    1,
    1,
    0,
    0,
    1,
    0,
    1,
    1,
    1,
    0,
    0,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    0,
    1,
    1,
    1,
  