## Select the best word embeddings

This notebook selects word embeddings with the highest Pearson and Spearman correlations for the different resources and copy them to a folder that will be published for distribution. It reads the CSV file with the results of all the experimentations described in the paper cited below.

There is another notebook for evaluating these embeddings. Basically, the evaluation is based on the same procedure that produced the CSV file read by this notebook.


Authors: 
- F.A. Cardillo, francoalberto.cardillo@cnr.it
- F. Debole, franca.debole@isti.cnr.it

Date: 22 March 2024



__If you use this notebook or the resources it builds, please cite:__

__"Italian Word Embeddings for the Medical Domain", F.A. Cardillo, F. Debole. Proc. of the 2024 Joint Int. Conf. on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), Turin, Italy, May 20-25, 2024.__

<hr>

In [3]:
import pandas as pd
from posixpath import join
import re

regexp_model = r"(?P<model>\w+)-dim_(?P<vector_size>\d+)-e_(?P<epoch>\d+)-seed_(\d+)-sg_(?P<sg>\d)-w_(?P<window>\d+)-n(?P<negative>\d+)-proc_?-?(?P<n_cores>\d+)\.model"
regext_wv = r"(?P<model>\w+)-dim_(?P<vector_size>\d+)-e_(\d+)-seed_(\d+)-sg_(?P<sg>\d)-w_(?P<window>\d+)-n(?P<negative>\d+)-proc_?-?(?P<n_cores>\d+)__e(?P<epoch>\d+)\.wv"

# recover exp params from filename
def split_filename(filename):
    m = re.search(regexp_model, filename)
    if m:
        res = m.groupdict()
    else:
        m = re.search(regext_wv, filename)
        if m:
            res = m.groupdict()
        else:
            assert False, "could not split filename"

    out = res["model"], int(res["vector_size"]), int(res["epoch"]), int(res["sg"]), int(res["window"]), int(res["negative"]), int(res["n_cores"])
    return out
#<


results = pd.read_csv(join("out", "results", "results_lrec24.csv"))
display(results.head(2))
results = results.drop(columns=["cui1","cui2","term1", "term2", "term1_it", "term2_it","v1", "v2", "score", "computed_score"])
res = results.model_name.apply(lambda x: split_filename(x))
results["model_type"], results["vector_size"], results["epoch"], results["sg"], results["window"], results["negative"], results["n_cores"] = zip(*res)

best_pearson = results.groupby(["filename", "annotator", "sg"]).apply(lambda g: g.sort_values(by=["r"], ascending=False).head(1))
print("LREC24, all exps on Pearson")
display(best_pearson[["filename", "annotator", "sg", "r", "stat_significant", "path", "model_name"]])

best_spearman = results.groupby(["filename", "annotator", "sg"]).apply(lambda g: g.sort_values(by=["spearman"], ascending=False).head(1))
print("LREC24, all exps on Spearman")
display(best_spearman[["filename", "annotator", "sg", "spearman", "spear_stat_significant", "path", "model_name"]])


Unnamed: 0,filename,cui1,cui2,annotator,term1,term2,score,term1_it,term2_it,v1,...,valid,n_valid,r,p,spearman,spearman_p,path,model_name,stat_significant,spear_stat_significant
0,MayoSRS_it.csv,C0311394,C0231685,coders,difficulty walking,antalgic gait,6.69,difficoltà di deambulazione,andatura antalgica,[ 1.6700364e+00 2.3651786e+00 -1.5965857e-01 ...,...,True,99,0.311056,0.001726,0.310554,0.001757,w2v_mult_sg2_25-100_30e-win_4_neg3/w2v-dim_25-...,w2v-dim_25-e_30-seed_2-sg_0-w_5-n1-proc_4__e1.wv,True,True
1,MayoSRS_it.csv,C0035450,C0034079,coders,rheumatoid nodule,lung nodule,2.38,nodulo reumatoide,nodulo polmonare,[-0.15526366 1.2724409 -0.38742244 0.472983...,...,True,99,0.311056,0.001726,0.310554,0.001757,w2v_mult_sg2_25-100_30e-win_4_neg3/w2v-dim_25-...,w2v-dim_25-e_30-seed_2-sg_0-w_5-n1-proc_4__e1.wv,True,True


LREC24, all exps on Pearson


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,filename,annotator,sg,r,stat_significant,path,model_name
filename,annotator,sg,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
MayoSRS_it.csv,coders,0,194227,MayoSRS_it.csv,coders,0,0.492332,True,w2v_mult_sg2_25-100_30e-win_4_neg3/w2v-dim_100...,w2v-dim_100-e_30-seed_2-sg_0-w_30-n5-proc_4__e...
MayoSRS_it.csv,coders,1,408033,MayoSRS_it.csv,coders,1,0.571283,True,w2v_mult_sg2_25-100_30e-win_4_neg3,w2v-dim_100-e_30-seed_2-sg_1-w_30-n1-proc_4.model
MiniMayoSRS_it.csv,coders,0,158864,MiniMayoSRS_it.csv,coders,0,0.822083,True,w2v_mult_sg2_25-100_30e-win_4_neg3/w2v-dim_100...,w2v-dim_100-e_30-seed_2-sg_0-w_15-n1-proc_4__e...
MiniMayoSRS_it.csv,coders,1,408148,MiniMayoSRS_it.csv,coders,1,0.779522,True,w2v_mult_sg2_25-100_30e-win_4_neg3,w2v-dim_100-e_30-seed_2-sg_1-w_30-n1-proc_4.model
MiniMayoSRS_it.csv,physicians,0,223184,MiniMayoSRS_it.csv,physicians,0,0.748259,True,w2v_mult_sg2_25-100_30e-win_4_neg3,w2v-dim_100-e_30-seed_2-sg_0-w_15-n5-proc_4.model
MiniMayoSRS_it.csv,physicians,1,408166,MiniMayoSRS_it.csv,physicians,1,0.779303,True,w2v_mult_sg2_25-100_30e-win_4_neg3,w2v-dim_100-e_30-seed_2-sg_1-w_30-n1-proc_4.model
UMNSRS_relatedness_it.csv,umnrs,0,181215,UMNSRS_relatedness_it.csv,umnrs,0,0.475774,True,w2v_mult_sg2_25-100_30e-win_4_neg3/w2v-dim_100...,w2v-dim_100-e_30-seed_2-sg_0-w_15-n5-proc_4__e...
UMNSRS_relatedness_it.csv,umnrs,1,343176,UMNSRS_relatedness_it.csv,umnrs,1,0.488323,True,w2v_sg1_50_25e,w2v-dim_50-e_25-seed_1-sg_1-w_5-n5-proc_4.model
UMNSRS_similarity_it.csv,umnrs,0,211230,UMNSRS_similarity_it.csv,umnrs,0,0.59294,True,w2v_mult_sg2_25-100_30e-win_4_neg3/w2v-dim_100...,w2v-dim_100-e_30-seed_2-sg_0-w_10-n5-proc_4__e...
UMNSRS_similarity_it.csv,umnrs,1,405123,UMNSRS_similarity_it.csv,umnrs,1,0.595121,True,w2v_mult_sg2_25-100_30e-win_4_neg3,w2v-dim_100-e_30-seed_2-sg_1-w_10-n1-proc_4.model


LREC24, all exps on Spearman


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,filename,annotator,sg,spearman,spear_stat_significant,path,model_name
filename,annotator,sg,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
MayoSRS_it.csv,coders,0,166674,MayoSRS_it.csv,coders,0,0.527977,True,w2v_mult_sg2_25-100_30e-win_4_neg3/w2v-dim_100...,w2v-dim_100-e_30-seed_2-sg_0-w_10-n1-proc_4__e...
MayoSRS_it.csv,coders,1,408033,MayoSRS_it.csv,coders,1,0.578107,True,w2v_mult_sg2_25-100_30e-win_4_neg3,w2v-dim_100-e_30-seed_2-sg_1-w_30-n1-proc_4.model
MiniMayoSRS_it.csv,coders,0,2739,MiniMayoSRS_it.csv,coders,0,0.804123,True,w2v_mult_sg2_25-100_30e-win_4_neg3/w2v-dim_25-...,w2v-dim_25-e_30-seed_2-sg_0-w_10-n1-proc_4__e1.wv
MiniMayoSRS_it.csv,coders,1,408148,MiniMayoSRS_it.csv,coders,1,0.835905,True,w2v_mult_sg2_25-100_30e-win_4_neg3,w2v-dim_100-e_30-seed_2-sg_1-w_30-n1-proc_4.model
MiniMayoSRS_it.csv,physicians,0,13251,MiniMayoSRS_it.csv,physicians,0,0.740491,True,w2v_mult_sg2_25-100_30e-win_4_neg3/w2v-dim_25-...,w2v-dim_25-e_30-seed_2-sg_0-w_10-n1-proc_4__e2.wv
MiniMayoSRS_it.csv,physicians,1,408189,MiniMayoSRS_it.csv,physicians,1,0.803554,True,w2v_mult_sg2_25-100_30e-win_4_neg3,w2v-dim_100-e_30-seed_2-sg_1-w_30-n1-proc_4.model
UMNSRS_relatedness_it.csv,umnrs,0,181361,UMNSRS_relatedness_it.csv,umnrs,0,0.506851,True,w2v_mult_sg2_25-100_30e-win_4_neg3/w2v-dim_100...,w2v-dim_100-e_30-seed_2-sg_0-w_15-n5-proc_4__e...
UMNSRS_relatedness_it.csv,umnrs,1,404547,UMNSRS_relatedness_it.csv,umnrs,1,0.500779,True,w2v_mult_sg2_25-100_30e-win_4_neg3,w2v-dim_100-e_30-seed_2-sg_1-w_10-n1-proc_4.model
UMNSRS_similarity_it.csv,umnrs,0,189675,UMNSRS_similarity_it.csv,umnrs,0,0.602876,True,w2v_mult_sg2_25-100_30e-win_4_neg3/w2v-dim_100...,w2v-dim_100-e_30-seed_2-sg_0-w_10-n5-proc_4__e...
UMNSRS_similarity_it.csv,umnrs,1,406157,UMNSRS_similarity_it.csv,umnrs,1,0.595249,True,w2v_mult_sg2_25-100_30e-win_4_neg3,w2v-dim_100-e_30-seed_2-sg_1-w_15-n1-proc_4.model


In [50]:
from gensim.models import Word2Vec, FastText, KeyedVectors
import os
from posixpath import join


def copy_files(row, fld):
    path = join(exp_base_path, row.path, row.model_name)
    is_model = row.model_name.endswith(".model")
    if is_model:
        print("COPYING MODEL", row.model_name)
        os.system(f"cp {path} {fld}")
        os.system(f"cp {path + '.syn1neg.npy'} {fld}")
        os.system(f"cp {path + '.wv.vectors.npy'} {fld}")
        # check if loading is fine
        model = Word2Vec.load(join(fld, row.model_name ))
        model = model.wv
    else:
        print("LOADING KEYED VECTORS. Path:", row.path, " file:", row.model_name)
        os.system(f"cp {path} {fld}")
        npy_exists = os.path.exists(path + ".vectors.npy")
        if npy_exists:
            os.system(f"cp {path + '.vectors.npy'} {fld}")
        model = KeyedVectors.load(join(fld, row.model_name ))
        if not npy_exists:
            print("No npy file, but keyed vectors could be loaded")
#<

we_fld = "word-embeddings"
exp_base_path = os.path.expanduser("~/datasets/embeddings/trained/")

# pearson
to_fld = join(we_fld, "pearson")
os.makedirs(to_fld, exist_ok=True)
best_pearson.apply(lambda x: copy_files(x, to_fld), axis=1)

# spearman
to_fld = join(we_fld, "spearman")
os.makedirs(to_fld, exist_ok=True)
best_spearman.apply(lambda x: copy_files(x, to_fld), axis=1)

LOADING KEYED VECTORS. Path: w2v_mult_sg2_25-100_30e-win_4_neg3/w2v-dim_100-e_30-seed_2-sg_0-w_30-n5-proc_4  file: w2v-dim_100-e_30-seed_2-sg_0-w_30-n5-proc_4__e10.wv
COPYING MODEL w2v-dim_100-e_30-seed_2-sg_1-w_30-n1-proc_4.model
LOADING KEYED VECTORS. Path: w2v_mult_sg2_25-100_30e-win_4_neg3/w2v-dim_100-e_30-seed_2-sg_0-w_15-n1-proc_4  file: w2v-dim_100-e_30-seed_2-sg_0-w_15-n1-proc_4__e1.wv
COPYING MODEL w2v-dim_100-e_30-seed_2-sg_1-w_30-n1-proc_4.model
COPYING MODEL w2v-dim_100-e_30-seed_2-sg_0-w_15-n5-proc_4.model
COPYING MODEL w2v-dim_100-e_30-seed_2-sg_1-w_30-n1-proc_4.model
LOADING KEYED VECTORS. Path: w2v_mult_sg2_25-100_30e-win_4_neg3/w2v-dim_100-e_30-seed_2-sg_0-w_15-n5-proc_4  file: w2v-dim_100-e_30-seed_2-sg_0-w_15-n5-proc_4__e5.wv
COPYING MODEL w2v-dim_50-e_25-seed_1-sg_1-w_5-n5-proc_4.model
LOADING KEYED VECTORS. Path: w2v_mult_sg2_25-100_30e-win_4_neg3/w2v-dim_100-e_30-seed_2-sg_0-w_10-n5-proc_4  file: w2v-dim_100-e_30-seed_2-sg_0-w_10-n5-proc_4__e25.wv
COPYING MODEL w2

filename                   annotator   sg        
MayoSRS_it.csv             coders      0   166674    None
                                       1   408033    None
MiniMayoSRS_it.csv         coders      0   2739      None
                                       1   408148    None
                           physicians  0   13251     None
                                       1   408189    None
UMNSRS_relatedness_it.csv  umnrs       0   181361    None
                                       1   404547    None
UMNSRS_similarity_it.csv   umnrs       0   189675    None
                                       1   406157    None
dtype: object

In [44]:
# THIS CELL CAN BE RUN ONLY ON THE REMOTE SERVER HOSTING THE RESULTS OF THE EXPS
import os
from gensim.models import Word2Vec, FastText, KeyedVectors

# select the best word embeddings for each evaluation resource according to person and spearman 
#     select path + filename, copy the file to the public repository
we_fld = "word-embeddings"
exp_base_path = os.path.expanduser("~/datasets/embeddings/trained/")

col2fld = {"r": "pearson", "spearman": "spearman"}

def copy_files(row, fld):
    path = join(exp_base_path, row.path, row.model_name)
    is_model = row.model_name.endswith(".model")
    if is_model:
        print("COPYING MODEL", row.model_name)
        os.system(f"cp {path} {join(we_fld, fld)}")
        os.system(f"cp {path + '.syn1neg.npy'} {join(we_fld, fld)}")
        os.system(f"cp {path + '.wv.vectors.npy'} {join(we_fld, fld)}")
        # check if loading is fine
        model = Word2Vec.load(join(we_fld, fld, row.model_name ))
        model = model.wv
    else:
        print("LOADING KEYED VECTORS. Path:", row.path, " file:", row.model_name)
        os.system(f"cp {path} {join(we_fld, fld)}")
        npy_exists = os.path.exists(path + ".vectors.npy")
        if npy_exists:
            os.system(f"cp {path + '.vectors.npy'} {join(we_fld, fld)}")
        model = KeyedVectors.load(join(we_fld, fld,row.model_name ))
        if not npy_exists:
            print("No npy file, but keyed vectors could be loaded")
        
for c, fld in col2fld.items():
    os.makedirs(join(we_fld, fld), exist_ok=True)
    best = results.groupby(["filename", "annotator", "sg"]).apply(lambda g: g.sort_values(by=[c], ascending=False).head(1))
    print()
    print(c)
    display(best)
    # apply copy_files to each row
    best.apply(lambda x: copy_files(x, fld), axis=1)    
    
    # paths = set([join(f,fn) for f, fn in zip(best.path, best.model_name)])
    # print(f"{fld}, saving {len(paths)} embeddings")
    # for p in paths:
    #     full_path = join(exp_base_path, p)
        
    #     if full_path.endswith(".model"):
    #         # print(f"copying MODEL {p} to {join(we_fld, fld)}")
    #         os.system(f"cp {full_path} {join(we_fld, fld)}")
    #         os.system(f"cp {full_path + '.syn1neg.npy'} {join(we_fld, fld)}")
    #         os.system(f"cp {full_path + '.wv.vectors.npy'} {join(we_fld, fld)}")
    #     else:
    #         print(f"word vectors:", full_path)
    #         npy_fn = full_path + ".vectors.npy"
    #         print(os.path.exists(npy_fn))
    #         # os.system(f"cp {npy_fn} {join(we_fld, fld)}")
    #         pass
print("all done")

r


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,filename,annotator,size,valid,n_valid,r,p,spearman,spearman_p,path,model_name,stat_significant,spear_stat_significant,model_type,vector_size,epoch,sg,window,negative,n_cores
filename,annotator,sg,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1
MayoSRS_it.csv,coders,0,194227,MayoSRS_it.csv,coders,101,True,99,0.492332,2.267198e-07,0.523935,2.617783e-08,w2v_mult_sg2_25-100_30e-win_4_neg3/w2v-dim_100...,w2v-dim_100-e_30-seed_2-sg_0-w_30-n5-proc_4__e...,True,True,w2v,100,10,0,30,5,4
MayoSRS_it.csv,coders,1,408033,MayoSRS_it.csv,coders,101,True,99,0.571283,6.615811e-10,0.578107,3.706919e-10,w2v_mult_sg2_25-100_30e-win_4_neg3,w2v-dim_100-e_30-seed_2-sg_1-w_30-n1-proc_4.model,True,True,w2v,100,30,1,30,1,4
MiniMayoSRS_it.csv,coders,0,158864,MiniMayoSRS_it.csv,coders,29,True,29,0.822083,4.529133e-08,0.712909,1.426805e-05,w2v_mult_sg2_25-100_30e-win_4_neg3/w2v-dim_100...,w2v-dim_100-e_30-seed_2-sg_0-w_15-n1-proc_4__e...,True,True,w2v,100,1,0,15,1,4
MiniMayoSRS_it.csv,coders,1,408148,MiniMayoSRS_it.csv,coders,29,True,29,0.779522,6.24782e-07,0.835905,1.658045e-08,w2v_mult_sg2_25-100_30e-win_4_neg3,w2v-dim_100-e_30-seed_2-sg_1-w_30-n1-proc_4.model,True,True,w2v,100,30,1,30,1,4
MiniMayoSRS_it.csv,physicians,0,223184,MiniMayoSRS_it.csv,physicians,29,True,29,0.748259,3.055797e-06,0.726728,8.033334e-06,w2v_mult_sg2_25-100_30e-win_4_neg3,w2v-dim_100-e_30-seed_2-sg_0-w_15-n5-proc_4.model,True,True,w2v,100,30,0,15,5,4
MiniMayoSRS_it.csv,physicians,1,408166,MiniMayoSRS_it.csv,physicians,29,True,29,0.779303,6.323217e-07,0.803554,1.534094e-07,w2v_mult_sg2_25-100_30e-win_4_neg3,w2v-dim_100-e_30-seed_2-sg_1-w_30-n1-proc_4.model,True,True,w2v,100,30,1,30,1,4
UMNSRS_relatedness_it.csv,umnrs,0,181215,UMNSRS_relatedness_it.csv,umnrs,587,True,544,0.475774,4.451703e-32,0.506851,7.616163e-37,w2v_mult_sg2_25-100_30e-win_4_neg3/w2v-dim_100...,w2v-dim_100-e_30-seed_2-sg_0-w_15-n5-proc_4__e...,True,True,w2v,100,5,0,15,5,4
UMNSRS_relatedness_it.csv,umnrs,1,343176,UMNSRS_relatedness_it.csv,umnrs,587,True,544,0.488323,6.058466e-34,0.491591,1.920916e-34,w2v_sg1_50_25e,w2v-dim_50-e_25-seed_1-sg_1-w_5-n5-proc_4.model,True,True,w2v,50,25,1,5,5,4
UMNSRS_similarity_it.csv,umnrs,0,211230,UMNSRS_similarity_it.csv,umnrs,566,True,531,0.59294,1.004279e-51,0.602035,1.131666e-53,w2v_mult_sg2_25-100_30e-win_4_neg3/w2v-dim_100...,w2v-dim_100-e_30-seed_2-sg_0-w_10-n5-proc_4__e...,True,True,w2v,100,25,0,10,5,4
UMNSRS_similarity_it.csv,umnrs,1,405123,UMNSRS_similarity_it.csv,umnrs,566,True,531,0.595121,3.470086e-52,0.594976,3.7256620000000005e-52,w2v_mult_sg2_25-100_30e-win_4_neg3,w2v-dim_100-e_30-seed_2-sg_1-w_10-n1-proc_4.model,True,True,w2v,100,30,1,10,1,4


LOADING KEYED VECTORS. Path: w2v_mult_sg2_25-100_30e-win_4_neg3/w2v-dim_100-e_30-seed_2-sg_0-w_30-n5-proc_4  file: w2v-dim_100-e_30-seed_2-sg_0-w_30-n5-proc_4__e10.wv
COPYING MODEL w2v-dim_100-e_30-seed_2-sg_1-w_30-n1-proc_4.model
LOADING KEYED VECTORS. Path: w2v_mult_sg2_25-100_30e-win_4_neg3/w2v-dim_100-e_30-seed_2-sg_0-w_15-n1-proc_4  file: w2v-dim_100-e_30-seed_2-sg_0-w_15-n1-proc_4__e1.wv
COPYING MODEL w2v-dim_100-e_30-seed_2-sg_1-w_30-n1-proc_4.model
COPYING MODEL w2v-dim_100-e_30-seed_2-sg_0-w_15-n5-proc_4.model
COPYING MODEL w2v-dim_100-e_30-seed_2-sg_1-w_30-n1-proc_4.model
LOADING KEYED VECTORS. Path: w2v_mult_sg2_25-100_30e-win_4_neg3/w2v-dim_100-e_30-seed_2-sg_0-w_15-n5-proc_4  file: w2v-dim_100-e_30-seed_2-sg_0-w_15-n5-proc_4__e5.wv
COPYING MODEL w2v-dim_50-e_25-seed_1-sg_1-w_5-n5-proc_4.model
LOADING KEYED VECTORS. Path: w2v_mult_sg2_25-100_30e-win_4_neg3/w2v-dim_100-e_30-seed_2-sg_0-w_10-n5-proc_4  file: w2v-dim_100-e_30-seed_2-sg_0-w_10-n5-proc_4__e25.wv
COPYING MODEL w2

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,filename,annotator,size,valid,n_valid,r,p,spearman,spearman_p,path,model_name,stat_significant,spear_stat_significant,model_type,vector_size,epoch,sg,window,negative,n_cores
filename,annotator,sg,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1
MayoSRS_it.csv,coders,0,166674,MayoSRS_it.csv,coders,101,True,99,0.488935,2.823265e-07,0.527977,1.954268e-08,w2v_mult_sg2_25-100_30e-win_4_neg3/w2v-dim_100...,w2v-dim_100-e_30-seed_2-sg_0-w_10-n1-proc_4__e...,True,True,w2v,100,2,0,10,1,4
MayoSRS_it.csv,coders,1,408033,MayoSRS_it.csv,coders,101,True,99,0.571283,6.615811e-10,0.578107,3.706919e-10,w2v_mult_sg2_25-100_30e-win_4_neg3,w2v-dim_100-e_30-seed_2-sg_1-w_30-n1-proc_4.model,True,True,w2v,100,30,1,30,1,4
MiniMayoSRS_it.csv,coders,0,2739,MiniMayoSRS_it.csv,coders,29,True,29,0.67168,6.621419e-05,0.804123,1.480616e-07,w2v_mult_sg2_25-100_30e-win_4_neg3/w2v-dim_25-...,w2v-dim_25-e_30-seed_2-sg_0-w_10-n1-proc_4__e1.wv,True,True,w2v,25,1,0,10,1,4
MiniMayoSRS_it.csv,coders,1,408148,MiniMayoSRS_it.csv,coders,29,True,29,0.779522,6.24782e-07,0.835905,1.658045e-08,w2v_mult_sg2_25-100_30e-win_4_neg3,w2v-dim_100-e_30-seed_2-sg_1-w_30-n1-proc_4.model,True,True,w2v,100,30,1,30,1,4
MiniMayoSRS_it.csv,physicians,0,13251,MiniMayoSRS_it.csv,physicians,29,True,29,0.678465,5.22937e-05,0.740491,4.377501e-06,w2v_mult_sg2_25-100_30e-win_4_neg3/w2v-dim_25-...,w2v-dim_25-e_30-seed_2-sg_0-w_10-n1-proc_4__e2.wv,True,True,w2v,25,2,0,10,1,4
MiniMayoSRS_it.csv,physicians,1,408189,MiniMayoSRS_it.csv,physicians,29,True,29,0.779303,6.323217e-07,0.803554,1.534094e-07,w2v_mult_sg2_25-100_30e-win_4_neg3,w2v-dim_100-e_30-seed_2-sg_1-w_30-n1-proc_4.model,True,True,w2v,100,30,1,30,1,4
UMNSRS_relatedness_it.csv,umnrs,0,181361,UMNSRS_relatedness_it.csv,umnrs,587,True,544,0.475774,4.451703e-32,0.506851,7.616163e-37,w2v_mult_sg2_25-100_30e-win_4_neg3/w2v-dim_100...,w2v-dim_100-e_30-seed_2-sg_0-w_15-n5-proc_4__e...,True,True,w2v,100,5,0,15,5,4
UMNSRS_relatedness_it.csv,umnrs,1,404547,UMNSRS_relatedness_it.csv,umnrs,587,True,544,0.48755,7.936215e-34,0.500779,7.109907e-36,w2v_mult_sg2_25-100_30e-win_4_neg3,w2v-dim_100-e_30-seed_2-sg_1-w_10-n1-proc_4.model,True,True,w2v,100,30,1,10,1,4
UMNSRS_similarity_it.csv,umnrs,0,189675,UMNSRS_similarity_it.csv,umnrs,566,True,531,0.592787,1.081598e-51,0.602876,7.420906e-54,w2v_mult_sg2_25-100_30e-win_4_neg3/w2v-dim_100...,w2v-dim_100-e_30-seed_2-sg_0-w_10-n5-proc_4__e...,True,True,w2v,100,10,0,10,5,4
UMNSRS_similarity_it.csv,umnrs,1,406157,UMNSRS_similarity_it.csv,umnrs,566,True,531,0.593554,7.452591000000001e-52,0.595249,3.2598580000000004e-52,w2v_mult_sg2_25-100_30e-win_4_neg3,w2v-dim_100-e_30-seed_2-sg_1-w_15-n1-proc_4.model,True,True,w2v,100,30,1,15,1,4


LOADING KEYED VECTORS. Path: w2v_mult_sg2_25-100_30e-win_4_neg3/w2v-dim_100-e_30-seed_2-sg_0-w_10-n1-proc_4  file: w2v-dim_100-e_30-seed_2-sg_0-w_10-n1-proc_4__e2.wv
COPYING MODEL w2v-dim_100-e_30-seed_2-sg_1-w_30-n1-proc_4.model
LOADING KEYED VECTORS. Path: w2v_mult_sg2_25-100_30e-win_4_neg3/w2v-dim_25-e_30-seed_2-sg_0-w_10-n1-proc_4  file: w2v-dim_25-e_30-seed_2-sg_0-w_10-n1-proc_4__e1.wv
No npy file, but keyed vectors could be loaded
COPYING MODEL w2v-dim_100-e_30-seed_2-sg_1-w_30-n1-proc_4.model
LOADING KEYED VECTORS. Path: w2v_mult_sg2_25-100_30e-win_4_neg3/w2v-dim_25-e_30-seed_2-sg_0-w_10-n1-proc_4  file: w2v-dim_25-e_30-seed_2-sg_0-w_10-n1-proc_4__e2.wv
No npy file, but keyed vectors could be loaded
COPYING MODEL w2v-dim_100-e_30-seed_2-sg_1-w_30-n1-proc_4.model
LOADING KEYED VECTORS. Path: w2v_mult_sg2_25-100_30e-win_4_neg3/w2v-dim_100-e_30-seed_2-sg_0-w_15-n5-proc_4  file: w2v-dim_100-e_30-seed_2-sg_0-w_15-n5-proc_4__e5.wv
COPYING MODEL w2v-dim_100-e_30-seed_2-sg_1-w_10-n1-pro