Builds vectors of ***ingroup*** and ***outgroup***.

**Note:** for Sentence-BERT approach, any *A-strategy* other than **really naive** does not make sense. E.g. consider exclusion due to overlap; exclusion of replacements "kriminella invandrare" because "kriminella" is found in replacements of both ingroup and outgroup is ***too strong***, *especially for the out-group in this case* ("kriminella gäng" is the most common ioutgroup replacment of *förortsgäng*). Perhaps, it is still worth testing the different forms of A-strategies as a form of "noise reduction". 

For the SGNS approach, A-strategies arguably are of importance, if we not going to build "sentence vectors" from concatenating word vectors. However, this might point to the limitations of SGNS for thi task, as it is not very good at representing phrases. 

Way forward: build in the possibility to use *A-selection* for SBERT, but focus on results for **really naive**. 

#### Procedure (psuedo code)
```
DWEs = {förortsgäng, återvandring, globalist, berika}
selection_strategies = {really_naive, naive_no_overlap, top1, top3, ...}
models = {sbert_kb, ...}

for DWE in DWEs:
    for strategy in selection_strategies:
        for model in models:

            IN_vectors, OUT_vectors = select(replacement_vectors_of_dwe, strategy)
            avg_IN_vec = mean(IN_vectors)
            avg_OUT_vec = mean(OUT_vectors)

            for year in years:
                dwe_vect_at_year = get_vec(DWE)

                IN_dimension_mean = cosine_similarity(avg_IN_vec, dwe_vect_at_year)
                IN_dimension_pairwise_mean = mean(cosine_similarity(IN_vectors, dwe_vect_at_year))

                OUT_dimension_mean = cosine_similarity(avg_OUT_vec, dwe_vect_at_year)
                OUT_dimension_pairwise_mean = mean(cosine_similarity(OUT_vectors, dwe_vect_at_year))  

                norm_dimension_mean = normalizer(IN_dimension_mean, OUT_dimension_mean)
                norm_dimension_pariwise_mean = normalizer(IN_dimension_pairwise_mean, OUT_dimension_pairwise_mean)

                # e.g. softmax or simply normalize(x, y) = x / (x+y)
```

#### Will get you something like (for each model)...

|DWE            |Selection strategy|Method Dimension|Year<sub>1</sub>|...|Year<sub>*n*</sub>|
|---------------|------------------|----------------|----------------|---|------------------|
|DWE<sub>1</sub>|Really naive      |Mean            |...             |...|...               |
|DWE<sub>1</sub>|Really naive      |Pairwise mean   |...             |...|...               |
|DWE<sub>1</sub>|Really naive      |Normalized      |...             |...|...               |
|DWE<sub>1</sub>|Naive no overlap  |Mean            |...             |...|...               |
|DWE<sub>1</sub>|Naive no overlap  |Pairwise mean   |...             |...|...               |
|DWE<sub>1</sub>|Naive no overlap  |Normalized      |...             |...|...               |
|DWE<sub>1</sub>|Top1              |...             |...             |...|...               |
|...            |...               |...             |...             |...|...               |
|DWE<sub>2</sub>|...               |...             |...             |...|...               |

#### Aditional considerations
* Finetuning
* Correlation with semantic change rates
    * Spearman, Pearson
    * Naive, Rectified
* Correlation with (normalized) frequency
    * Spearman, Pearson

In [1]:
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
import pandas as pd
from difflib import SequenceMatcher
from pathlib import Path
from sklearn.utils.extmath import softmax
from collections import Counter
import json
from gensim.models import KeyedVectors
import time
import logging

In [2]:
import unimorph

In [3]:
import stanza
nlp = stanza.Pipeline(lang='sv', processors='tokenize,pos,lemma')

2023-11-28 08:55:16 INFO: Loading these models for language: sv (Swedish):
| Processor | Package   |
-------------------------
| tokenize  | talbanken |
| pos       | talbanken |
| lemma     | talbanken |

2023-11-28 08:55:16 INFO: Use device: cpu
2023-11-28 08:55:16 INFO: Loading: tokenize
2023-11-28 08:55:16 INFO: Loading: pos
2023-11-28 08:55:18 INFO: Loading: lemma
2023-11-28 08:55:18 INFO: Done loading processors!


In [4]:
def keyness(trg, ref, min_frq = 3, verbose = True): # Consider metric
    
    d = dict()
    
#     trg_tot = sum(trg.values())
#     ref_tot = sum(ref.values())
    trg_tot = len(trg)
    ref_tot = len(ref)
    
    for w in trg.keys():
        if trg[w] < min_frq:
            continue
        if w in ref:
            d[w] = (trg[w] / trg_tot) / (ref[w] / ref_tot) # Odds Ratio (OR)
        else:
            d[w] = np.inf
    
    if verbose:
        for word, trg_freq, keyness  in sorted([(w, trg[w], k) for w, k in d.items()], key = lambda x: x[1], reverse = True)[:20]:
            if word in ref:
                ref_freq = ref[word]
            else:
                ref_freq = 0
            print(f"{word:<20}{trg_freq:<4}{(trg_freq/trg_tot):<6.3f}{ref_freq:<4}{(ref_freq/ref_tot):<6.3f}{keyness:.4}")        
    
    return d
    

In [5]:
def inspect(
    df,            # Replacement Dataframe
    dwe,           # Dog Whistle Expression
    meaning,       # 1 for ingroup, 2 for outgroup
    phase,         # 1 for first phase of data collection, 2 for second phase
    sw = None,     # stopwords
    punct = None,  # remove punctuations
    verbose = True,
    multi = False, # Keep the multi-word units of the replacements
    rel_freq = False # use relative frequncies freq / no. of documents
):
    
    counter = Counter()
    
    if type(df) == pd.DataFrame:
        column = df.loc[df[f"{dwe}_w{phase}_C"] == meaning, f"{dwe}_text_w{phase}"]
    else:
        column = df
    
    for x in column:
        if punct != None:
            for p in punct:
                x = x.replace(p, "")
        x = x.split()
        if sw != None:
            x = [w for w in x if w not in sw]
        
        if multi:
            x = ["_".join(x)]
        
        counter.update(set(x)) # Obs. terms are only counted once per "document"
    
    if rel_freq:
        counter = Counter({w: c/len(column) for w,c in counter.items()})
        
    if verbose:
        for w, f in sorted(counter.items(), key = lambda x: x[1], reverse = True)[:15]:
            print(f"{w:<30}{f}")
        print("-----------------------")
        print("Total no. of types:", len(counter))
    
    

    return counter

In [6]:
def select_A(
    df,             # Replacement Dataframe
    dwe,            # Dog Whistle Expression
    phase = "both", # 1 for first phase of data collection, 2 for second phase, "both" for both
    sw = None,      # stopwords
    punct = None,   # remove punctuations
    k = None,
    min_freq = None,
    min_OR = None,
    empty_intersect = False
):
    
    if type(k) == tuple:
        k_in, k_out = k
    else:
        k_in  = k
        k_out = k
    if type(min_freq) == tuple:
        min_freq_in, min_freq_out = min_freq
    else:
        min_freq_in  = min_freq
        min_freq_out = min_freq
    if type(min_OR) == tuple:
        min_OR_in, min_OR_out = min_OR
    else:
        min_OR_in  = min_OR
        min_OR_out = min_OR
    
    if phase == "both":
        x = pd.concat([
            df.loc[df[f"{dwe}_w{1}_C"] == 1, f"{dwe}_text_w{1}"],
            df.loc[df[f"{dwe}_w{2}_C"] == 1, f"{dwe}_text_w{2}"]
        ]).to_list()
                
        y = pd.concat([
            df.loc[df[f"{dwe}_w{1}_C"] == 2, f"{dwe}_text_w{1}"],
            df.loc[df[f"{dwe}_w{2}_C"] == 2, f"{dwe}_text_w{2}"]
        ]).to_list()

        ingroup = inspect(x, dwe, None, None, sw, punct, verbose = False, rel_freq = True)
        outgroup = inspect(y, dwe, None, None, sw, punct, verbose = False, rel_freq = True)
#         _ = inspect(x, dwe, None, None, sw, punct, multi = True)   
#         _ = inspect(y, dwe, None, None, sw, punct, multi = True)   

        keyness_in2out = keyness(ingroup, outgroup, verbose = False, min_frq = -1)
        keyness_out2in = keyness(outgroup, ingroup, verbose = False, min_frq = -1)
        
    else:    
    
        ingroup = inspect(df, dwe, 1, phase, sw, punct, verbose = False, rel_freq = True)
        outgroup = inspect(df, dwe, 2, phase, sw, punct, verbose = False, rel_freq = True)
#         _ = inspect(df, dwe, 1, phase, sw, punct, multi = True)
#         _ = inspect(df, dwe, 2, phase, sw, punct, multi = True)
        keyness_in2out = keyness(ingroup, outgroup, verbose = False, min_frq = -1)
        keyness_out2in = keyness(outgroup, ingroup, verbose = False, min_frq = -1)
    
    A_in  = [w for w in ingroup.keys()]
    A_out = [w for w in outgroup.keys()]
    #print(A_out)
    
    if empty_intersect:
        A_in  = [w for w in A_in if w not in outgroup.keys()]
        A_out = [w for w in A_out if w not in ingroup.keys()]
        
    if min_freq != None:
        A_in  = [w for w in A_in if ingroup[w] >= min_freq_in]
        A_out = [w for w in A_out if outgroup[w] >= min_freq_out]
    
    #print(A_out)
    
    if min_OR != None:
        A_in  = [w for w in A_in if keyness_in2out[w] >= min_OR_in]
        A_out = [w for w in A_out if keyness_out2in[w] >= min_OR_out] # too strict to have the same threshold for both
        
    #print(A_out)    
    
    if k != None:
        A_in  = [w for w,_ in sorted(ingroup.items(), key = lambda x: x[1], reverse = True) if w in A_in][:k_in]
        A_out = [w for w,_ in sorted(outgroup.items(), key = lambda x: x[1], reverse = True) if w in A_out][:k_out]
    
    
    return A_in, A_out

In [7]:
def matcher(string, A_list, punct = [",", "?", ".", "!", ";"]): 
    # add more sophistication?'
    # punct should match those of Select_A()
    
    match = False

    for p in punct:
        string = string.replace(p, "")
    for w in string.split(" "):
        if w in A_list:
            match = True
            
    return match

In [8]:
def load_replacements(dwe, meaning, rnd, model, data_path):
    
    with open(data_path / dwe / meaning / rnd / "replacements.txt") as f:
        idx_t, text = zip(*[tuple(line.strip("\n").split("\t")) for line in f.readlines()[1:]]) # Obs! skip first line
    with open(data_path / dwe / meaning / rnd / "vectors" / model / "vecs.txt") as f:
        lines = [tuple(line.strip("\n").split("\t")) for line in f.readlines()]
        lines = [(idx, [float(v) for v in vec.split()]) for idx, vec in lines]
        #print(lines[:2])
        idx_v, vectors = zip(*lines)
    
    assert idx_t == idx_v, "Vectors and text are not aligned."
    
#     for t, v in zip(idx_t, idx_v):
#         if t != v:
#             print(t,v)

    #print("MUUUU", vectors[:2])
    
    return [(text, vec) for text, vec in zip(text, vectors)]

In [9]:
def collect_vec(data_path, dwe, Aigt, Aogt, model, rounds = ["first_round", "second_round"]):
    """
    Based on A for the ingroup and the outgroup, collects vectors of the replacements that map to A. 
    Mapping between A and vectors of replacements uses `matcher()`.
    """
    
    igt_vectors = []
    ogt_vectors = []
    
    for meaning in ["ingroup", "outgroup"]:
        for rnd in rounds: # ["first_round", "second_round"] or just one of them
            for replacement, vector in load_replacements(dwe, meaning, rnd, model, data_path):
                if meaning == "ingroup":
                    if matcher(replacement, Aigt): # punctuation
                        igt_vectors.append(vector)
                else:
                    if matcher(replacement, Aogt): # punctuation
                        ogt_vectors.append(vector)
    
    return np.array(igt_vectors), np.array(ogt_vectors)

In [10]:
#     df,            # Replacement Dataframe
#     dwe,           # Dog Whistle Expression
#     phase,         # 1 for first phase of data collection, 2 for second phase, "both" for both
#     sw = None,     # stopwords
#     punct = None,  # remove punctuations
#     k = None,
#     min_freq = None,
#     min_OR = None,
#     empty_intersect = False

def strat2select(mode, dwe, wh_rnds, path_dfA, stopwords, punct, verbose = True):
    """
    Based on a strategy, i.e. `mode`, Select A and returns vectors of replacments that map to A. 
    Uses `select_A()` and `collect_vec()`.
    """
    
    #print(f"\tA-Strategy{mode}")

    if mode == "rn":    # Really naive; probably the most sensible for SBERT
        
        igt_vectors = []
        for rnd in wh_rnds:
            _, vecs = zip(*load_replacements(dwe, "ingroup", rnd, model, data_path))
            
            #print(vecs[1]. len(vecs[1]))
            
            igt_vectors.extend(vecs)
        
        ogt_vectors = []
        for rnd in wh_rnds:
            _, vecs = zip(*load_replacements(dwe, "outgroup", rnd, model, data_path))
            ogt_vectors.extend(vecs)  
            
#         print(ogt_vectors[:2])

        return np.array(igt_vectors), np.array(ogt_vectors)
    
    else:
        
        dfA = pd.read_csv(path_dfA, sep="\t") # check parameters
        dfA = dfA.applymap(lambda s: s.lower() if type(s) == str else s)
        
        if wh_rnds == ["first_round"]:
            PHASE = 1
        if wh_rnds == ["second_round"]:
            PHASE = 2
        else:
            PHASE = "both"
        
        if mode == "nno":   # Naive No Overlap
            Aigt, Aogt = select_A(
                df = dfA, 
                dwe = dwe, 
                phase = PHASE, 
                sw = stopwords, 
                punct = punct, 
                empty_intersect = True)

        if mode == "top1":  # Top 1 (no overlap)
            Aigt, Aogt = select_A(
                df = dfA, 
                dwe = dwe,
                phase = PHASE,
                sw = stopwords,
                punct = punct,
                k = 1,
                empty_intersect = True
            )

        if mode == "top3":  # Top 3 (no overlap)
            Aigt, Aogt = select_A(
                df = dfA, 
                dwe = dwe,
                phase = PHASE,
                sw = stopwords,
                punct = punct,
                k = 3,
                empty_intersect = True
            )

        if mode == "ms1":    # Multiple Selection; threshold ... 
            Aigt, Aogt = select_A(
                df = dfA, 
                dwe = dwe,
                phase = PHASE,
                sw = stopwords,
                punct = punct,
                k = 3,
                min_OR = 2.0,
                empty_intersect = False
            )
        
        if verbose:
            if len(Aigt) < 4:
#                 print("\t", mode)
                logging.info(f"Aigt: {', '.join(Aigt)}")
                logging.info(f"Aogt: {', '.join(Aogt)}")
        
        
        if wh_rnds == 1:
            rounds = ["first_round"]
        elif wh_rnds == 2:
            rounds = ["second_round"]
        else: # i.e. wh_rnds == "both"
            rounds = ["first_round", "second_round"]
        
        return Aigt, Aogt

        #return collect_vec(data_path, dwe, Aigt, Aogt, model, rounds)
        #dwe, Aigt, Aogt, model, rounds = ["first_round", "second_round"], data_path



In [11]:
def PairwiseMeanSimilarity(v, v_list):
    
    pairwise = cosine_similarity(v, v_list)
    pairwise_mean = pairwise.mean()
    
    return pairwise_mean
    

In [12]:
def similar_string(a, b):
    return SequenceMatcher(None, a, b).ratio()

In [13]:
def repl_dwe(dwe, rule = None, verbose = True):
    
    if rule != None:
        return rule[dwe]
    else: # infer!
        potential_dwes = ["forortsgang", "aterinvandring", "berikar", "globalister"]
        
        dwe = dwe.split("_")[-1]
        
        best_score = 0
        best_guess = None
        
        for candidate in potential_dwes:
            score = similar_string(dwe, candidate)
            if score > best_score:
                best_score = score
                best_guess = candidate
        
        if verbose:
            logging.info(f"Inference for {dwe}: {best_guess} (score = {best_score:.2f}).")
        
        return best_guess


In [14]:
class Config:
    def __init__(self):
        self
        
#         self.dwes = dwes
#         self.wh_rounds = rounds
#         self.dfA_path = dfA_path
#         self.strategies = strategies
#         self.years = years
#         self_measures = 
#         self.add_correlations
#         self.model
        

In [15]:
def get_keyed_vec(term, keyed_vecs):
    
    if term in keyed_vecs:
        vec = keyed_vecs[term]
    else:
        vec = None
    
    return vec

In [16]:
def load_vocab(filename): 
    vocab = {}
    with open(filename) as f:
        for line in f.readlines():
            w,frq = line.rstrip('\n').split()
            vocab[w] = int(frq)
    return vocab 

In [17]:
def lemmatize(A):
    """ 
    Lemmatizes a list of words. 
    If no lemma is found, the original term is kept as the lemma.
    """
    
    A_mod     = {}
    for w in A:
        doc = nlp(w)
        lemma = doc.sentences[0].words[0].lemma # it seems stanza always returns something
        
        if lemma in A_mod:
            A_mod[lemma].add(w)
        else:
            A_mod[lemma] = set()
            A_mod[lemma].add(w)
            

#     A_mod = {}
#     for lex in B.keys():
#         if len(B[lex]) > 1:
#             A_mod[lex] = B_mod[lex]
    
    return list(A_mod.keys()), A_mod


In [18]:
def wf_expand(B, A_mod, use_saldo, saldo, verbose = True):
    """ 
    Expands a lemma (lexeme) to its word forms. 
    If no expansion is found, the lemma form + the original word form(s) re kept as the only word forms.
    """
    
    exp_B = []
    
    for lemma in B:
        # Try first unimorph
        wfs = [line.split("\t")[1] for line in unimorph.inflect_word(lemma, lang="swe").split("\n") if line != ""]
        if wfs != []:
            exp_B.append(set(wfs))
        else: # wfs == []
            if use_saldo:
                # Try Saldo
                if lemma in saldo:
                    exp_B.append(set(saldo[lemma]))
                else:
                    if verbose:
                        logging.info(f"For {lemma}, neither `unimorph` nor `saldo` found nothing.")
                    logging.info(f"Amod to rescue...{set(A_mod[lemma])}")
                    exp_B.append(set(A_mod[lemma]))
            else:
                exp_B.append(set(A_mod[lemma]))
        
            
    return exp_B


In [19]:
def a2b2vec(A, strategy, wv, vocab, use_saldo, saldo):
    
    #print(f"\tB-Strategy{strategy}")    

    vecs = []             
    
    if strategy == "lazy": # do nothing; take them as they are
        
        for w in [w for w in A if w in wv]:
            vecs.append(wv[w])

    else:
        
        B, A_mod = lemmatize(A)
        B        = wf_expand(B, A_mod, use_saldo, saldo) # implement as a list of sets in order to pick to most common forms of a lemma
        B        = [[w for w in lexeme if w in wv and w in vocab] for lexeme in B]
        
        
        if strategy == "greedy": # hungry
            logging.info(f"B: {B}")
            for lexeme in B:
                for w in lexeme:
                    vecs.append(wv[w])
                    
        else:
            #print(B)
            lemmatized_voc_B   = {lexeme[0].upper(): {w: vocab[w] for w in lexeme if w in vocab} for lexeme in B if not lexeme == []}
            flattened_voc_B    = {wf: vocab[wf] for lexeme in B for wf in lexeme}
            proportional_voc_B = {lexeme: {w: (lemmatized_voc_B[lexeme][w]/sum(lemmatized_voc_B[lexeme].values())) for w in lemmatized_voc_B[lexeme]} for lexeme in lemmatized_voc_B}
            flatt_prop_voc_B   = dict()
            for p_dict in proportional_voc_B.values():
                for w, prop in p_dict.items():
                    flatt_prop_voc_B[w] = prop
            
            if strategy.startswith("top"): # e.g. top1, top3, etc.
                k = int(strategy.replace("top", ""))
                T = []

                for lexeme in B: # lexeme is a list
                    
                    VOC = {w:f for w,f in flattened_voc_B.items() if w in lexeme}
                    
                    ranked = sorted(VOC.items(), key = lambda x: x[1], reverse=True)[:k]
                    for w, _ in ranked:
                        T.append(w)

                logging.info(f"B: {', '.join(T)}")
                for w in T:
                    vecs.append(wv[w])
            
            if strategy.startswith("min"): # e.g. min0.1 
                threshold = float(strategy.replace("min", ""))
                T = []
            
                for lexeme in B:
                    for w in lexeme:
                        if flatt_prop_voc_B[w] >= threshold:
                            T.append(w)
                            
                logging.info(f"B: {', '.join(T)}")                
                for w in T:
                    vecs.append(wv[w])
    
    if vecs == []:
        return None
    else:
        
        return np.array(vecs)

In [20]:
def sgns_builder(config):
    
    t0 = time.time()
    
    for handler in logging.root.handlers[:]:
        logging.root.removeHandler(handler)
    logging.basicConfig(
        level=logging.INFO, 
        handlers=[
            logging.FileHandler(f"{config.log_prefix}_sgns_builder.log", mode= "w"),
            logging.StreamHandler()
        ])
    
    logging.info(vars(config))    
    results = []
    methods = ["I-cnt", "O-cnt", "cnt-ssc", "cnt-smx", "I-pwn", "O-pwn", "pwn-ssc", "pwn-smx"]
    # alternative: make this a config attribute
    years = [str(year) for year in range(config.first_year, config.last_year+1)] # Obs! need to add 1
    years.sort()
    
    if config.use_saldo:
        with open(config.saldo_path) as f:
            saldo = json.loads(f.read())
    else:
        saldo = None
    
    with open(config.stopwords) as f:
        stopwords = [w.strip("\n") for w in f.readlines()]
    
    for progress, dwe in enumerate(config.dwes, start = 1):
        t = time.time() - t0
        logging.info(f"PROCESSING {progress} OF {len(config.dwes)}: '{dwe}'; {int(t/60)} m. {int(t%60)} s.")
        dwe_in_replacement_test = repl_dwe(dwe)
        for a_strategy in config.Astrategies:
            logging.info(f"A-Strategy: {a_strategy}")
            
            Aigt, Aogt = strat2select(
                mode      = a_strategy, 
                dwe       = dwe_in_replacement_test, 
                wh_rnds   = config.wh_rounds, 
                #model     = config.model, 
                path_dfA  = config.dfA_path, 
                stopwords = stopwords, 
                punct     = config.punct, 
                #data_path = config.data_path
            )

            d = {b: {method: [] for method in methods} for b in config.Bstrategies}

            for year in years:
                
                wv = KeyedVectors.load_word2vec_format(config.sgns_path / f"{year}.w2v")
                vocab = load_vocab(config.vocab_path / f"{year}.txt")

                dwe_vector = get_keyed_vec(dwe, wv)
#                 dwe_vector = wv[dwe] 
                
                for b_strategy in config.Bstrategies:
                    logging.info(f"{year} :: B-strategy: {b_strategy}")
                
                    if type(dwe_vector) != np.ndarray:
                        d[b_strategy]["I-cnt"].append(None)
                        d[b_strategy]["O-cnt"].append(None)
                        d[b_strategy]["cnt-ssc"].append(None)
                        d[b_strategy]["cnt-smx"].append(None)
                        d[b_strategy]["I-pwn"].append(None)
                        d[b_strategy]["O-pwn"].append(None)
                        d[b_strategy]["pwn-ssc"].append(None)
                        d[b_strategy]["pwn-smx"].append(None)                    

                    else:                
                        logging.info("In-group")
                        INGROUPvec  = a2b2vec(Aigt, b_strategy, wv, vocab, config.use_saldo, saldo)
                        logging.info("Out-group")
                        OUTGROUPvec = a2b2vec(Aogt, b_strategy, wv, vocab, config.use_saldo, saldo)
                        
                        if type(INGROUPvec) == np.ndarray:
                            
                            #print(INGROUPvec)
                            
                            ING_centroid  = INGROUPvec.mean(axis=0)
                            i_cnt = cosine_similarity(dwe_vector.reshape(1,-1), ING_centroid.reshape(1,-1))[0][0]
                            i_pwn = PairwiseMeanSimilarity(dwe_vector.reshape(1, -1), INGROUPvec)
                            
                            d[b_strategy]["I-cnt"].append(i_cnt)
                            d[b_strategy]["I-pwn"].append(i_pwn)
                            
                            if type(OUTGROUPvec) == np.ndarray:
                        
                                OUTG_centroid = OUTGROUPvec.mean(axis=0)
                                o_cnt = cosine_similarity(dwe_vector.reshape(1,-1), OUTG_centroid.reshape(1,-1))[0][0]
                                o_pwn = PairwiseMeanSimilarity(dwe_vector.reshape(1, -1), OUTGROUPvec)

                                d[b_strategy]["O-cnt"].append(o_cnt)
                                d[b_strategy]["O-pwn"].append(o_pwn)
                                 
                                d[b_strategy]["cnt-ssc"].append(i_cnt / (i_cnt + o_cnt))
                                d[b_strategy]["cnt-smx"].append(softmax([[i_cnt, o_cnt]])[0][0])
                                d[b_strategy]["pwn-ssc"].append(i_pwn / (i_pwn + o_pwn))
                                d[b_strategy]["pwn-smx"].append(softmax([[i_pwn, o_pwn]])[0][0])
                            
                            else:
                                d[b_strategy]["O-cnt"].append(None)
                                d[b_strategy]["O-pwn"].append(None)
                                
                                d[b_strategy]["cnt-ssc"].append(None)
                                d[b_strategy]["cnt-smx"].append(None)
                                d[b_strategy]["pwn-ssc"].append(None)
                                d[b_strategy]["pwn-smx"].append(None)
                        else:
                            d[b_strategy]["I-cnt"].append(None)
                            d[b_strategy]["I-pwn"].append(None)      
                            d[b_strategy]["cnt-ssc"].append(None)
                            d[b_strategy]["cnt-smx"].append(None)
                            d[b_strategy]["pwn-ssc"].append(None)
                            d[b_strategy]["pwn-smx"].append(None)
                            if type(OUTGROUPvec) == np.ndarray:
                                OUTG_centroid = OUTGROUPvec.mean(axis=0)
                                o_cnt = cosine_similarity(dwe_vector.reshape(1,-1), OUTG_centroid.reshape(1,-1))[0][0]
                                o_pwn = PairwiseMeanSimilarity(dwe_vector.reshape(1, -1), OUTGROUPvec)

                                d[b_strategy]["O-cnt"].append(o_cnt)
                                d[b_strategy]["O-pwn"].append(o_pwn)                                

            if config.results_format == "long":
                
                for b_strategy in d.keys():
                    for method in d[b_strategy].keys():
                        line = [dwe, a_strategy, b_strategy, method]
                        line.extend(d[b_strategy][method])
    #                     if config.add_correlations:
    #                         r_naive
    #                         r_rect
    #                         rho_naive
    #                         rho_rect
    #                         r_fpm
    #                         rho_fpm
    #                         N                    
                        results.append(line)

            else: # if results_format == "wide"
                for b_strategy in d.keys():
                    line = [dwe, a_strategy, b_strategy]
                    for method in d[b_strategy].keys:
                        line.extend(d[b_strategy][method])
                results.append(line)
    
    if config.results_format == "long":
        features = ["DWE", "A-Strategy", "B-Strategy", "Method"] + years
        if config.add_correlations:
            additional_headings = ["r_naive", "r_rect", ...]
            features.extend(additional_headings)
        
    
    else: # if wide
        features = ["DWE", "A-Strategy", "B-Strategy"]
        for method in methods:
            m = [f"{method}_{year}" for year in years]
            features.extend(m)
    
#     print(features)
#     print(results[0])
    
    df = pd.DataFrame(results, columns = features)
    
    df.to_csv(config.results_path)
    
    t = time.time() - t0
    logging.info(f"Done! {int(t/60)} m. {int(t%60)} s.")    
                    

## Flashback

In [None]:
config = Config()

In [None]:
config.log_prefix = "fb"
config.first_year = 2000
config.last_year  = 2022
#config.last_year  = 2005
config.dwes       = [
                    "V1_berika",
                    "N1_berikare",
                    "V1_kulturberika",
                    "N1_kulturberikare",
                    "N1_globalist",
                    "A1_globalistisk",
                    "N1_återvandring",
                    "V1_återvandra",
                    #"V1_hjälpa_på_plats",
                    "N1_förortsgäng"
                    ]
#config.Astrategies = ["top1", "top3", "ms1"] # add "rn", "nno" but need to fix code in function
config.Astrategies = ["top3", "ms1"] # add "rn", "nno" but need to fix code in function
#config.Bstrategies = ["lazy", "greedy", "top1", "top3", "min0.5", "min0.2"]
config.Bstrategies = ["lazy", "greedy", "top3", "min0.2"]
config.wh_rounds  = ["first_round", "second_round"]
config.dfA_path   = Path("/home/max/Documents/research/replacement_data/panel_wide_onlyreplace.csv")
config.stopwords  = Path("../../data/utils/stopwords-sv.txt")
config.punct      = [",", "?", ".", "!", ";", "”", '"', ")", ")", "&", "=", "'"]
config.data_path   = Path("/home/max/Results/replacements/data")
config.results_format = "long"
config.add_correlations = False
config.results_path = Path("/home/max/Desktop/sgns_results.csv")
config.sgns_path = Path("/home/max/Results/fb_pol-yearly-rad3/models")
config.use_saldo = True
config.saldo_path = Path("/home/max/Datasets/saldom.json")
config.vocab_path = Path("/home/max/Corpora/flashback-pol-time/yearly/fb-pt-radical3/vocab")

In [None]:
sgns_builder(config)

## Familjeliv

In [24]:
config = Config()

In [25]:
config.log_prefix = "fl"
config.first_year = 2003
config.last_year  = 2022
#config.last_year  = 2005
config.dwes       = [
                    "V1_berika",
                    "N1_berikare",
                    "V1_kulturberika",
                    "N1_kulturberikare",
                    "N1_globalist",
                    "A1_globalistisk",
                    "N1_återvandring",
                    "V1_återvandra",
                    #"V1_hjälpa_på_plats",
                    "N1_förortsgäng"
                    ]
#config.Astrategies = ["top1", "top3", "ms1"] # add "rn", "nno" but need to fix code in function
config.Astrategies = ["top3", "ms1"] # add "rn", "nno" but need to fix code in function
#config.Bstrategies = ["lazy", "greedy", "top1", "top3", "min0.5", "min0.2"]
config.Bstrategies = ["lazy", "greedy", "top3", "min0.2"]
config.wh_rounds  = ["first_round", "second_round"]
config.dfA_path   = Path("/home/max/Documents/research/replacement_data/panel_wide_onlyreplace.csv")
config.stopwords  = Path("../../data/utils/stopwords-sv.txt")
config.punct      = [",", "?", ".", "!", ";", "”", '"', ")", ")", "&", "=", "'"]
config.data_path   = Path("/home/max/Results/replacements/data")
config.results_format = "long"
config.add_correlations = False
config.results_path = Path("/home/max/Desktop/fm_sgns_results.csv")
config.sgns_path = Path("/home/max/Results/fm_smh-yearly-rad3/models")
config.use_saldo = True
config.saldo_path = Path("/home/max/Datasets/saldom.json")
config.vocab_path = Path("/home/max/Corpora/familjeliv-smh-time/yearly/fm-sh-radical3/vocab")

In [26]:
sgns_builder(config)

INFO:root:{'log_prefix': 'fl', 'first_year': 2003, 'last_year': 2022, 'dwes': ['V1_berika', 'N1_berikare', 'V1_kulturberika', 'N1_kulturberikare', 'N1_globalist', 'A1_globalistisk', 'N1_återvandring', 'V1_återvandra', 'N1_förortsgäng'], 'Astrategies': ['top3', 'ms1'], 'Bstrategies': ['lazy', 'greedy', 'top3', 'min0.2'], 'wh_rounds': ['first_round', 'second_round'], 'dfA_path': PosixPath('/home/max/Documents/research/replacement_data/panel_wide_onlyreplace.csv'), 'stopwords': PosixPath('../../data/utils/stopwords-sv.txt'), 'punct': [',', '?', '.', '!', ';', '”', '"', ')', ')', '&', '=', "'"], 'data_path': PosixPath('/home/max/Results/replacements/data'), 'results_format': 'long', 'add_correlations': False, 'results_path': PosixPath('/home/max/Desktop/fm_sgns_results.csv'), 'sgns_path': PosixPath('/home/max/Results/fm_smh-yearly-rad3/models'), 'use_saldo': True, 'saldo_path': PosixPath('/home/max/Datasets/saldom.json'), 'vocab_path': PosixPath('/home/max/Corpora/familjeliv-smh-time/yearl

INFO:root:2007 :: B-strategy: min0.2
INFO:root:In-group
INFO:root:B: förstör, förstöra, utnyttja, utnyttjar, negativa, negativt
INFO:root:Out-group
INFO:root:B: positivt, positiva, positiv, ger, ge, gynna, gynnar
INFO:gensim.models.keyedvectors:loading projection weights from /home/max/Results/fm_smh-yearly-rad3/models/2008.w2v
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (52660, 200) matrix of type float32 from /home/max/Results/fm_smh-yearly-rad3/models/2008.w2v', 'binary': False, 'encoding': 'utf8', 'datetime': '2023-11-28T08:56:35.520999', 'gensim': '4.3.1', 'python': '3.8.5 (default, Sep  4 2020, 07:30:14) \n[GCC 7.3.0]', 'platform': 'Linux-5.4.0-166-generic-x86_64-with-glibc2.10', 'event': 'load_word2vec_format'}
INFO:root:2008 :: B-strategy: lazy
INFO:root:In-group
INFO:root:Out-group
INFO:root:2008 :: B-strategy: greedy
INFO:root:In-group
INFO:root:B: [['förstörs', 'förstört', 'förstörde', 'förstör', 'förstörts', 'förstörd', 'förstöras', 'förstöra', 'förstörde

INFO:root:2012 :: B-strategy: top3
INFO:root:In-group
INFO:root:B: förstör, förstöra, förstört, utnyttja, utnyttjar, utnyttjas, negativt, negativa, negativ
INFO:root:Out-group
INFO:root:B: positivt, positiva, positiv, ge, ger, gav, gynnar, gynna, gynnas
INFO:root:2012 :: B-strategy: min0.2
INFO:root:In-group
INFO:root:B: förstör, förstöra, utnyttja, utnyttjar, negativa, negativt
INFO:root:Out-group
INFO:root:B: positivt, positiva, positiv, ger, ge, gynna, gynnar
INFO:gensim.models.keyedvectors:loading projection weights from /home/max/Results/fm_smh-yearly-rad3/models/2013.w2v
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (34372, 200) matrix of type float32 from /home/max/Results/fm_smh-yearly-rad3/models/2013.w2v', 'binary': False, 'encoding': 'utf8', 'datetime': '2023-11-28T08:57:18.483001', 'gensim': '4.3.1', 'python': '3.8.5 (default, Sep  4 2020, 07:30:14) \n[GCC 7.3.0]', 'platform': 'Linux-5.4.0-166-generic-x86_64-with-glibc2.10', 'event': 'load_word2vec_format'}

INFO:root:B: positivt, positiv, positiva, ge, ger, gav, gynnar, gynna, gynnas
INFO:root:2017 :: B-strategy: min0.2
INFO:root:In-group
INFO:root:B: förstör, förstöra, utnyttja, utnyttjar, negativa, negativt, negativ
INFO:root:Out-group
INFO:root:B: positivt, positiva, positiv, ger, ge, gynna, gynnar
INFO:gensim.models.keyedvectors:loading projection weights from /home/max/Results/fm_smh-yearly-rad3/models/2018.w2v
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (12331, 200) matrix of type float32 from /home/max/Results/fm_smh-yearly-rad3/models/2018.w2v', 'binary': False, 'encoding': 'utf8', 'datetime': '2023-11-28T08:57:45.663678', 'gensim': '4.3.1', 'python': '3.8.5 (default, Sep  4 2020, 07:30:14) \n[GCC 7.3.0]', 'platform': 'Linux-5.4.0-166-generic-x86_64-with-glibc2.10', 'event': 'load_word2vec_format'}
INFO:root:2018 :: B-strategy: lazy
INFO:root:In-group
INFO:root:Out-group
INFO:root:2018 :: B-strategy: greedy
INFO:root:In-group
INFO:root:B: [['förstört', 'förstör'

INFO:root:2003 :: B-strategy: lazy
INFO:root:2003 :: B-strategy: greedy
INFO:root:2003 :: B-strategy: top3
INFO:root:2003 :: B-strategy: min0.2
INFO:gensim.models.keyedvectors:loading projection weights from /home/max/Results/fm_smh-yearly-rad3/models/2004.w2v
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (13386, 200) matrix of type float32 from /home/max/Results/fm_smh-yearly-rad3/models/2004.w2v', 'binary': False, 'encoding': 'utf8', 'datetime': '2023-11-28T08:58:06.875188', 'gensim': '4.3.1', 'python': '3.8.5 (default, Sep  4 2020, 07:30:14) \n[GCC 7.3.0]', 'platform': 'Linux-5.4.0-166-generic-x86_64-with-glibc2.10', 'event': 'load_word2vec_format'}
INFO:root:2004 :: B-strategy: lazy
INFO:root:In-group
INFO:root:Out-group
INFO:root:2004 :: B-strategy: greedy
INFO:root:In-group
INFO:root:B: [['förstört', 'förstörde', 'förstör', 'förstörd', 'förstöra'], ['utnyttja', 'utnyttjas', 'utnyttjade', 'utnyttjat', 'utnyttjande', 'utnyttjad', 'utnyttjar'], ['negativa', 'negativ

INFO:root:In-group
INFO:root:B: förstör, förstöra, utnyttja, utnyttjar, negativa, negativt, negativ
INFO:root:Out-group
INFO:root:B: tillföra, tillför, bidrar, bidrag, förbättra
INFO:gensim.models.keyedvectors:loading projection weights from /home/max/Results/fm_smh-yearly-rad3/models/2009.w2v
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (56321, 200) matrix of type float32 from /home/max/Results/fm_smh-yearly-rad3/models/2009.w2v', 'binary': False, 'encoding': 'utf8', 'datetime': '2023-11-28T08:58:53.146808', 'gensim': '4.3.1', 'python': '3.8.5 (default, Sep  4 2020, 07:30:14) \n[GCC 7.3.0]', 'platform': 'Linux-5.4.0-166-generic-x86_64-with-glibc2.10', 'event': 'load_word2vec_format'}
INFO:root:2009 :: B-strategy: lazy
INFO:root:In-group
INFO:root:Out-group
INFO:root:2009 :: B-strategy: greedy
INFO:root:In-group
INFO:root:B: [['förstörs', 'förstört', 'förstörde', 'förstör', 'förstörts', 'förstörd', 'förstörande', 'förstöras', 'förstöra', 'förstördes'], ['utnyttja', 'u

INFO:root:2013 :: B-strategy: top3
INFO:root:In-group
INFO:root:B: förstör, förstöra, förstört, utnyttja, utnyttjar, utnyttjad, negativt, negativa, negativ
INFO:root:Out-group
INFO:root:B: tillför, tillföra, tillfört, bidrag, bidrar, bidra, förbättra, förbättras, förbättrar
INFO:root:2013 :: B-strategy: min0.2
INFO:root:In-group
INFO:root:B: förstör, förstöra, utnyttja, utnyttjar, negativa, negativt, negativ
INFO:root:Out-group
INFO:root:B: tillföra, tillför, bidrag, förbättra
INFO:gensim.models.keyedvectors:loading projection weights from /home/max/Results/fm_smh-yearly-rad3/models/2014.w2v
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (32282, 200) matrix of type float32 from /home/max/Results/fm_smh-yearly-rad3/models/2014.w2v', 'binary': False, 'encoding': 'utf8', 'datetime': '2023-11-28T08:59:35.309977', 'gensim': '4.3.1', 'python': '3.8.5 (default, Sep  4 2020, 07:30:14) \n[GCC 7.3.0]', 'platform': 'Linux-5.4.0-166-generic-x86_64-with-glibc2.10', 'event': 'load_wo

INFO:root:Out-group
INFO:root:B: tillföra, tillför, bidrar, bidrag, bidra, förbättras, förbättra
INFO:gensim.models.keyedvectors:loading projection weights from /home/max/Results/fm_smh-yearly-rad3/models/2019.w2v
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (12084, 200) matrix of type float32 from /home/max/Results/fm_smh-yearly-rad3/models/2019.w2v', 'binary': False, 'encoding': 'utf8', 'datetime': '2023-11-28T08:59:59.873087', 'gensim': '4.3.1', 'python': '3.8.5 (default, Sep  4 2020, 07:30:14) \n[GCC 7.3.0]', 'platform': 'Linux-5.4.0-166-generic-x86_64-with-glibc2.10', 'event': 'load_word2vec_format'}
INFO:root:2019 :: B-strategy: lazy
INFO:root:In-group
INFO:root:Out-group
INFO:root:2019 :: B-strategy: greedy
INFO:root:In-group
INFO:root:B: [['förstört', 'förstörde', 'förstör', 'förstörd', 'förstöra'], ['utnyttja', 'utnyttjas', 'utnyttjade', 'utnyttjat', 'utnyttjad', 'utnyttjar'], ['negativa', 'negativt', 'negativ']]
INFO:root:Out-group
INFO:root:B: [['tillföra',

INFO:root:2005 :: B-strategy: min0.2
INFO:gensim.models.keyedvectors:loading projection weights from /home/max/Results/fm_smh-yearly-rad3/models/2006.w2v
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (52876, 200) matrix of type float32 from /home/max/Results/fm_smh-yearly-rad3/models/2006.w2v', 'binary': False, 'encoding': 'utf8', 'datetime': '2023-11-28T09:00:29.513887', 'gensim': '4.3.1', 'python': '3.8.5 (default, Sep  4 2020, 07:30:14) \n[GCC 7.3.0]', 'platform': 'Linux-5.4.0-166-generic-x86_64-with-glibc2.10', 'event': 'load_word2vec_format'}
INFO:root:2006 :: B-strategy: lazy
INFO:root:2006 :: B-strategy: greedy
INFO:root:2006 :: B-strategy: top3
INFO:root:2006 :: B-strategy: min0.2
INFO:gensim.models.keyedvectors:loading projection weights from /home/max/Results/fm_smh-yearly-rad3/models/2007.w2v
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (56556, 200) matrix of type float32 from /home/max/Results/fm_smh-yearly-rad3/models/2007.w2v', 'binary':

INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (12331, 200) matrix of type float32 from /home/max/Results/fm_smh-yearly-rad3/models/2018.w2v', 'binary': False, 'encoding': 'utf8', 'datetime': '2023-11-28T09:01:30.159950', 'gensim': '4.3.1', 'python': '3.8.5 (default, Sep  4 2020, 07:30:14) \n[GCC 7.3.0]', 'platform': 'Linux-5.4.0-166-generic-x86_64-with-glibc2.10', 'event': 'load_word2vec_format'}
INFO:root:2018 :: B-strategy: lazy
INFO:root:2018 :: B-strategy: greedy
INFO:root:2018 :: B-strategy: top3
INFO:root:2018 :: B-strategy: min0.2
INFO:gensim.models.keyedvectors:loading projection weights from /home/max/Results/fm_smh-yearly-rad3/models/2019.w2v
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (12084, 200) matrix of type float32 from /home/max/Results/fm_smh-yearly-rad3/models/2019.w2v', 'binary': False, 'encoding': 'utf8', 'datetime': '2023-11-28T09:01:31.820425', 'gensim': '4.3.1', 'python': '3.8.5 (default, Sep  4 2020, 07:30:14) \n[GCC 7.3.0]',

INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (47476, 200) matrix of type float32 from /home/max/Results/fm_smh-yearly-rad3/models/2010.w2v', 'binary': False, 'encoding': 'utf8', 'datetime': '2023-11-28T09:02:16.157460', 'gensim': '4.3.1', 'python': '3.8.5 (default, Sep  4 2020, 07:30:14) \n[GCC 7.3.0]', 'platform': 'Linux-5.4.0-166-generic-x86_64-with-glibc2.10', 'event': 'load_word2vec_format'}
INFO:root:2010 :: B-strategy: lazy
INFO:root:2010 :: B-strategy: greedy
INFO:root:2010 :: B-strategy: top3
INFO:root:2010 :: B-strategy: min0.2
INFO:gensim.models.keyedvectors:loading projection weights from /home/max/Results/fm_smh-yearly-rad3/models/2011.w2v
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (48757, 200) matrix of type float32 from /home/max/Results/fm_smh-yearly-rad3/models/2011.w2v', 'binary': False, 'encoding': 'utf8', 'datetime': '2023-11-28T09:02:22.504273', 'gensim': '4.3.1', 'python': '3.8.5 (default, Sep  4 2020, 07:30:14) \n[GCC 7.3.0]',

INFO:root:2022 :: B-strategy: lazy
INFO:root:2022 :: B-strategy: greedy
INFO:root:2022 :: B-strategy: top3
INFO:root:2022 :: B-strategy: min0.2
INFO:root:PROCESSING 3 OF 9: 'V1_kulturberika'; 6 m. 58 s.
INFO:root:Inference for kulturberika: berikar (score = 0.63).
INFO:root:A-Strategy: top3
INFO:root:Aigt: förstör, utnyttjar, negativt
INFO:root:Aogt: positivt, ger, gynnar
INFO:gensim.models.keyedvectors:loading projection weights from /home/max/Results/fm_smh-yearly-rad3/models/2003.w2v
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (80, 200) matrix of type float32 from /home/max/Results/fm_smh-yearly-rad3/models/2003.w2v', 'binary': False, 'encoding': 'utf8', 'datetime': '2023-11-28T09:02:51.692406', 'gensim': '4.3.1', 'python': '3.8.5 (default, Sep  4 2020, 07:30:14) \n[GCC 7.3.0]', 'platform': 'Linux-5.4.0-166-generic-x86_64-with-glibc2.10', 'event': 'load_word2vec_format'}
INFO:root:2003 :: B-strategy: lazy
INFO:root:2003 :: B-strategy: greedy
INFO:root:2003 :: B-st

INFO:root:2011 :: B-strategy: lazy
INFO:root:In-group
INFO:root:Out-group
INFO:root:2011 :: B-strategy: greedy
INFO:root:In-group
INFO:root:B: [['förstörs', 'förstört', 'förstörde', 'förstör', 'förstörts', 'förstörd', 'förstöras', 'förstöra', 'förstördes'], ['utnyttja', 'utnyttjas', 'utnyttjade', 'utnyttjat', 'utnyttjande', 'utnyttjad', 'utnyttjats', 'utnyttjades', 'utnyttjar'], ['negativa', 'negative', 'negativt', 'negativ']]
INFO:root:Out-group
INFO:root:B: [['positivt', 'positive', 'positiva', 'positiv'], ['gav', 'gett', 'givits', 'giver', 'ger', 'getts', 'givande', 'ges', 'gives', 'gavs', 'ge', 'giva', 'givit', 'giv', 'given', 'give'], ['gynnat', 'gynna', 'gynnar', 'gynnade', 'gynnats', 'gynnas']]
INFO:root:2011 :: B-strategy: top3
INFO:root:In-group
INFO:root:B: förstör, förstöra, förstört, utnyttja, utnyttjar, utnyttjas, negativt, negativa, negativ
INFO:root:Out-group
INFO:root:B: positivt, positiva, positiv, ge, ger, gav, gynnar, gynna, gynnas
INFO:root:2011 :: B-strategy: min0.

INFO:root:2019 :: B-strategy: lazy
INFO:root:2019 :: B-strategy: greedy
INFO:root:2019 :: B-strategy: top3
INFO:root:2019 :: B-strategy: min0.2
INFO:gensim.models.keyedvectors:loading projection weights from /home/max/Results/fm_smh-yearly-rad3/models/2020.w2v
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (8943, 200) matrix of type float32 from /home/max/Results/fm_smh-yearly-rad3/models/2020.w2v', 'binary': False, 'encoding': 'utf8', 'datetime': '2023-11-28T09:04:17.376812', 'gensim': '4.3.1', 'python': '3.8.5 (default, Sep  4 2020, 07:30:14) \n[GCC 7.3.0]', 'platform': 'Linux-5.4.0-166-generic-x86_64-with-glibc2.10', 'event': 'load_word2vec_format'}
INFO:root:2020 :: B-strategy: lazy
INFO:root:2020 :: B-strategy: greedy
INFO:root:2020 :: B-strategy: top3
INFO:root:2020 :: B-strategy: min0.2
INFO:gensim.models.keyedvectors:loading projection weights from /home/max/Results/fm_smh-yearly-rad3/models/2021.w2v
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded

INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (47476, 200) matrix of type float32 from /home/max/Results/fm_smh-yearly-rad3/models/2010.w2v', 'binary': False, 'encoding': 'utf8', 'datetime': '2023-11-28T09:05:03.052214', 'gensim': '4.3.1', 'python': '3.8.5 (default, Sep  4 2020, 07:30:14) \n[GCC 7.3.0]', 'platform': 'Linux-5.4.0-166-generic-x86_64-with-glibc2.10', 'event': 'load_word2vec_format'}
INFO:root:2010 :: B-strategy: lazy
INFO:root:In-group
INFO:root:Out-group
INFO:root:2010 :: B-strategy: greedy
INFO:root:In-group
INFO:root:B: [['förstörs', 'förstört', 'förstörde', 'förstör', 'förstörts', 'förstörd', 'förstöras', 'förstöra', 'förstördes'], ['utnyttja', 'utnyttjas', 'utnyttjade', 'utnyttjat', 'utnyttjande', 'utnyttjad', 'utnyttjats', 'utnyttjades', 'utnyttjar'], ['negativa', 'negativt', 'negativ']]
INFO:root:Out-group
INFO:root:B: [['tillförs', 'tillfört', 'tillföra', 'tillförde', 'tillför', 'tillföras'], ['bidraga', 'bidrar', 'bidrog', 'bidrag', 'bidragit', '

INFO:root:2016 :: B-strategy: lazy
INFO:root:2016 :: B-strategy: greedy
INFO:root:2016 :: B-strategy: top3
INFO:root:2016 :: B-strategy: min0.2
INFO:gensim.models.keyedvectors:loading projection weights from /home/max/Results/fm_smh-yearly-rad3/models/2017.w2v
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (14979, 200) matrix of type float32 from /home/max/Results/fm_smh-yearly-rad3/models/2017.w2v', 'binary': False, 'encoding': 'utf8', 'datetime': '2023-11-28T09:05:41.639991', 'gensim': '4.3.1', 'python': '3.8.5 (default, Sep  4 2020, 07:30:14) \n[GCC 7.3.0]', 'platform': 'Linux-5.4.0-166-generic-x86_64-with-glibc2.10', 'event': 'load_word2vec_format'}
INFO:root:2017 :: B-strategy: lazy
INFO:root:2017 :: B-strategy: greedy
INFO:root:2017 :: B-strategy: top3
INFO:root:2017 :: B-strategy: min0.2
INFO:gensim.models.keyedvectors:loading projection weights from /home/max/Results/fm_smh-yearly-rad3/models/2018.w2v
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loade

INFO:root:2007 :: B-strategy: top3
INFO:root:In-group
INFO:root:B: förstör, förstöra, förstört, utnyttja, utnyttjar, utnyttjas, negativt, negativa, negativ
INFO:root:Out-group
INFO:root:B: positivt, positiva, positiv, ge, ger, gav, gynnar, gynna, gynnas
INFO:root:2007 :: B-strategy: min0.2
INFO:root:In-group
INFO:root:B: förstör, förstöra, utnyttja, utnyttjar, negativa, negativt
INFO:root:Out-group
INFO:root:B: positivt, positiva, positiv, ger, ge, gynna, gynnar
INFO:gensim.models.keyedvectors:loading projection weights from /home/max/Results/fm_smh-yearly-rad3/models/2008.w2v
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (52660, 200) matrix of type float32 from /home/max/Results/fm_smh-yearly-rad3/models/2008.w2v', 'binary': False, 'encoding': 'utf8', 'datetime': '2023-11-28T09:06:18.641761', 'gensim': '4.3.1', 'python': '3.8.5 (default, Sep  4 2020, 07:30:14) \n[GCC 7.3.0]', 'platform': 'Linux-5.4.0-166-generic-x86_64-with-glibc2.10', 'event': 'load_word2vec_format'}

INFO:root:Out-group
INFO:root:B: [['positivt', 'positive', 'positiva', 'positiv'], ['gav', 'gett', 'givits', 'ger', 'getts', 'givande', 'ges', 'gives', 'gavs', 'ge', 'giva', 'givit', 'giv', 'given', 'give'], ['gynnat', 'gynna', 'gynnar', 'gynnade', 'gynnande', 'gynnas']]
INFO:root:2012 :: B-strategy: top3
INFO:root:In-group
INFO:root:B: förstör, förstöra, förstört, utnyttja, utnyttjar, utnyttjas, negativt, negativa, negativ
INFO:root:Out-group
INFO:root:B: positivt, positiva, positiv, ge, ger, gav, gynnar, gynna, gynnas
INFO:root:2012 :: B-strategy: min0.2
INFO:root:In-group
INFO:root:B: förstör, förstöra, utnyttja, utnyttjar, negativa, negativt
INFO:root:Out-group
INFO:root:B: positivt, positiva, positiv, ger, ge, gynna, gynnar
INFO:gensim.models.keyedvectors:loading projection weights from /home/max/Results/fm_smh-yearly-rad3/models/2013.w2v
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (34372, 200) matrix of type float32 from /home/max/Results/fm_smh-yearly-rad3/mod

INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (9693, 200) matrix of type float32 from /home/max/Results/fm_smh-yearly-rad3/models/2021.w2v', 'binary': False, 'encoding': 'utf8', 'datetime': '2023-11-28T09:07:23.752042', 'gensim': '4.3.1', 'python': '3.8.5 (default, Sep  4 2020, 07:30:14) \n[GCC 7.3.0]', 'platform': 'Linux-5.4.0-166-generic-x86_64-with-glibc2.10', 'event': 'load_word2vec_format'}
INFO:root:2021 :: B-strategy: lazy
INFO:root:2021 :: B-strategy: greedy
INFO:root:2021 :: B-strategy: top3
INFO:root:2021 :: B-strategy: min0.2
INFO:gensim.models.keyedvectors:loading projection weights from /home/max/Results/fm_smh-yearly-rad3/models/2022.w2v
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (9620, 200) matrix of type float32 from /home/max/Results/fm_smh-yearly-rad3/models/2022.w2v', 'binary': False, 'encoding': 'utf8', 'datetime': '2023-11-28T09:07:25.101123', 'gensim': '4.3.1', 'python': '3.8.5 (default, Sep  4 2020, 07:30:14) \n[GCC 7.3.0]', '

INFO:root:B: [['förstörs', 'förstört', 'förstörde', 'förstör', 'förstörts', 'förstörd', 'förstörande', 'förstöras', 'förstöra', 'förstördes'], ['utnyttja', 'utnyttjas', 'utnyttjade', 'utnyttjat', 'utnyttjande', 'utnyttjad', 'utnyttjats', 'utnyttjades', 'utnyttjar'], ['negativa', 'negative', 'negativt', 'negativ']]
INFO:root:Out-group
INFO:root:B: [['tillförs', 'tillfört', 'tillföra', 'tillförde', 'tillför'], ['bidraga', 'bidrar', 'bidrog', 'bidrag', 'bidragit', 'bidra', 'bidragande'], ['förbättrar', 'förbättre', 'förbättrats', 'förbättras', 'förbättrade', 'förbättrad', 'förbättrat', 'förbättra']]
INFO:root:2009 :: B-strategy: top3
INFO:root:In-group
INFO:root:B: förstör, förstöra, förstört, utnyttja, utnyttjar, utnyttjande, negativt, negativa, negativ
INFO:root:Out-group
INFO:root:B: tillföra, tillför, tillförde, bidrag, bidrar, bidra, förbättra, förbättras, förbättrar
INFO:root:2009 :: B-strategy: min0.2
INFO:root:In-group
INFO:root:B: förstör, förstöra, utnyttja, utnyttjar, negativa,

INFO:root:2014 :: B-strategy: lazy
INFO:root:In-group
INFO:root:Out-group
INFO:root:2014 :: B-strategy: greedy
INFO:root:In-group
INFO:root:B: [['förstörs', 'förstört', 'förstörde', 'förstör', 'förstörts', 'förstörd', 'förstöras', 'förstöra'], ['utnyttja', 'utnyttjas', 'utnyttjade', 'utnyttjat', 'utnyttjande', 'utnyttjad', 'utnyttjats', 'utnyttjar'], ['negativa', 'negativt', 'negativ']]
INFO:root:Out-group
INFO:root:B: [['tillförs', 'tillföra', 'tillförde', 'tillför'], ['bidrar', 'bidrog', 'bidrag', 'bidragit', 'bidra', 'bidragande'], ['förbättrar', 'förbättras', 'förbättrade', 'förbättrad', 'förbättrat', 'förbättra']]
INFO:root:2014 :: B-strategy: top3
INFO:root:In-group
INFO:root:B: förstör, förstöra, förstört, utnyttjar, utnyttja, utnyttjas, negativt, negativa, negativ
INFO:root:Out-group
INFO:root:B: tillför, tillföra, tillförde, bidrag, bidrar, bidra, förbättra, förbättras, förbättrar
INFO:root:2014 :: B-strategy: min0.2
INFO:root:In-group
INFO:root:B: förstör, förstöra, utnyttja,

INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (29186, 200) matrix of type float32 from /home/max/Results/fm_smh-yearly-rad3/models/2005.w2v', 'binary': False, 'encoding': 'utf8', 'datetime': '2023-11-28T09:09:08.264700', 'gensim': '4.3.1', 'python': '3.8.5 (default, Sep  4 2020, 07:30:14) \n[GCC 7.3.0]', 'platform': 'Linux-5.4.0-166-generic-x86_64-with-glibc2.10', 'event': 'load_word2vec_format'}
INFO:root:2005 :: B-strategy: lazy
INFO:root:2005 :: B-strategy: greedy
INFO:root:2005 :: B-strategy: top3
INFO:root:2005 :: B-strategy: min0.2
INFO:gensim.models.keyedvectors:loading projection weights from /home/max/Results/fm_smh-yearly-rad3/models/2006.w2v
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (52876, 200) matrix of type float32 from /home/max/Results/fm_smh-yearly-rad3/models/2006.w2v', 'binary': False, 'encoding': 'utf8', 'datetime': '2023-11-28T09:09:15.058315', 'gensim': '4.3.1', 'python': '3.8.5 (default, Sep  4 2020, 07:30:14) \n[GCC 7.3.0]',

INFO:root:2015 :: B-strategy: lazy
INFO:root:In-group
INFO:root:Out-group
INFO:root:2015 :: B-strategy: greedy
INFO:root:In-group
INFO:root:For soros, neither `unimorph` nor `saldo` found nothing.
INFO:root:Amod to rescue...{'soros'}
INFO:root:For elitist, neither `unimorph` nor `saldo` found nothing.
INFO:root:Amod to rescue...{'elitister'}
INFO:root:B: [['elit', 'eliten'], [], []]
INFO:root:Out-group
INFO:root:B: [[], ['världsmedborgare'], ['internationellt', 'internationell', 'internationella']]
INFO:root:2015 :: B-strategy: top3
INFO:root:In-group
INFO:root:For soros, neither `unimorph` nor `saldo` found nothing.
INFO:root:Amod to rescue...{'soros'}
INFO:root:For elitist, neither `unimorph` nor `saldo` found nothing.
INFO:root:Amod to rescue...{'elitister'}
INFO:root:B: elit, eliten
INFO:root:Out-group
INFO:root:B: världsmedborgare, internationella, internationellt, internationell
INFO:root:2015 :: B-strategy: min0.2
INFO:root:In-group
INFO:root:For soros, neither `unimorph` nor `s

INFO:root:Amod to rescue...{'elitister'}
INFO:root:B: [['elit', 'eliten'], [], []]
INFO:root:Out-group
INFO:root:B: [[], [], ['internationellt', 'internationell', 'internationella']]
INFO:root:2020 :: B-strategy: top3
INFO:root:In-group
INFO:root:For soros, neither `unimorph` nor `saldo` found nothing.
INFO:root:Amod to rescue...{'soros'}
INFO:root:For elitist, neither `unimorph` nor `saldo` found nothing.
INFO:root:Amod to rescue...{'elitister'}
INFO:root:B: eliten, elit
INFO:root:Out-group
INFO:root:B: internationellt, internationella, internationell
INFO:root:2020 :: B-strategy: min0.2
INFO:root:In-group
INFO:root:For soros, neither `unimorph` nor `saldo` found nothing.
INFO:root:Amod to rescue...{'soros'}
INFO:root:For elitist, neither `unimorph` nor `saldo` found nothing.
INFO:root:Amod to rescue...{'elitister'}
INFO:root:B: elit, eliten
INFO:root:Out-group
INFO:root:B: internationellt, internationell, internationella
INFO:gensim.models.keyedvectors:loading projection weights from

INFO:root:2011 :: B-strategy: lazy
INFO:root:2011 :: B-strategy: greedy
INFO:root:2011 :: B-strategy: top3
INFO:root:2011 :: B-strategy: min0.2
INFO:gensim.models.keyedvectors:loading projection weights from /home/max/Results/fm_smh-yearly-rad3/models/2012.w2v
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (42636, 200) matrix of type float32 from /home/max/Results/fm_smh-yearly-rad3/models/2012.w2v', 'binary': False, 'encoding': 'utf8', 'datetime': '2023-11-28T09:11:27.090854', 'gensim': '4.3.1', 'python': '3.8.5 (default, Sep  4 2020, 07:30:14) \n[GCC 7.3.0]', 'platform': 'Linux-5.4.0-166-generic-x86_64-with-glibc2.10', 'event': 'load_word2vec_format'}
INFO:root:2012 :: B-strategy: lazy
INFO:root:2012 :: B-strategy: greedy
INFO:root:2012 :: B-strategy: top3
INFO:root:2012 :: B-strategy: min0.2
INFO:gensim.models.keyedvectors:loading projection weights from /home/max/Results/fm_smh-yearly-rad3/models/2013.w2v
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loade

INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (12084, 200) matrix of type float32 from /home/max/Results/fm_smh-yearly-rad3/models/2019.w2v', 'binary': False, 'encoding': 'utf8', 'datetime': '2023-11-28T09:11:56.134523', 'gensim': '4.3.1', 'python': '3.8.5 (default, Sep  4 2020, 07:30:14) \n[GCC 7.3.0]', 'platform': 'Linux-5.4.0-166-generic-x86_64-with-glibc2.10', 'event': 'load_word2vec_format'}
INFO:root:2019 :: B-strategy: lazy
INFO:root:In-group
INFO:root:Out-group
INFO:root:2019 :: B-strategy: greedy
INFO:root:In-group
INFO:root:For jud, neither `unimorph` nor `saldo` found nothing.
INFO:root:Amod to rescue...{'judar'}
INFO:root:B: [['judar'], ['elit', 'eliten']]
INFO:root:Out-group
INFO:root:B: [[], [], ['internationellt', 'internationell', 'internationella']]
INFO:root:2019 :: B-strategy: top3
INFO:root:In-group
INFO:root:For jud, neither `unimorph` nor `saldo` found nothing.
INFO:root:Amod to rescue...{'judar'}
INFO:root:B: judar, eliten, elit
INFO:root:Out-gro

INFO:root:2008 :: B-strategy: lazy
INFO:root:2008 :: B-strategy: greedy
INFO:root:2008 :: B-strategy: top3
INFO:root:2008 :: B-strategy: min0.2
INFO:gensim.models.keyedvectors:loading projection weights from /home/max/Results/fm_smh-yearly-rad3/models/2009.w2v
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (56321, 200) matrix of type float32 from /home/max/Results/fm_smh-yearly-rad3/models/2009.w2v', 'binary': False, 'encoding': 'utf8', 'datetime': '2023-11-28T09:12:39.083325', 'gensim': '4.3.1', 'python': '3.8.5 (default, Sep  4 2020, 07:30:14) \n[GCC 7.3.0]', 'platform': 'Linux-5.4.0-166-generic-x86_64-with-glibc2.10', 'event': 'load_word2vec_format'}
INFO:root:2009 :: B-strategy: lazy
INFO:root:2009 :: B-strategy: greedy
INFO:root:2009 :: B-strategy: top3
INFO:root:2009 :: B-strategy: min0.2
INFO:gensim.models.keyedvectors:loading projection weights from /home/max/Results/fm_smh-yearly-rad3/models/2010.w2v
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loade

INFO:root:B: eliten, elit, soros
INFO:root:Out-group
INFO:root:B: internationella, internationell, internationellt
INFO:root:2019 :: B-strategy: min0.2
INFO:root:In-group
INFO:root:For soros, neither `unimorph` nor `saldo` found nothing.
INFO:root:Amod to rescue...{'soros'}
INFO:root:For elitist, neither `unimorph` nor `saldo` found nothing.
INFO:root:Amod to rescue...{'elitister'}
INFO:root:B: elit, eliten, soros
INFO:root:Out-group
INFO:root:B: internationellt, internationell, internationella
INFO:gensim.models.keyedvectors:loading projection weights from /home/max/Results/fm_smh-yearly-rad3/models/2020.w2v
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (8943, 200) matrix of type float32 from /home/max/Results/fm_smh-yearly-rad3/models/2020.w2v', 'binary': False, 'encoding': 'utf8', 'datetime': '2023-11-28T09:13:20.507477', 'gensim': '4.3.1', 'python': '3.8.5 (default, Sep  4 2020, 07:30:14) \n[GCC 7.3.0]', 'platform': 'Linux-5.4.0-166-generic-x86_64-with-glibc2.10', 

INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (48757, 200) matrix of type float32 from /home/max/Results/fm_smh-yearly-rad3/models/2011.w2v', 'binary': False, 'encoding': 'utf8', 'datetime': '2023-11-28T09:14:09.860677', 'gensim': '4.3.1', 'python': '3.8.5 (default, Sep  4 2020, 07:30:14) \n[GCC 7.3.0]', 'platform': 'Linux-5.4.0-166-generic-x86_64-with-glibc2.10', 'event': 'load_word2vec_format'}
INFO:root:2011 :: B-strategy: lazy
INFO:root:2011 :: B-strategy: greedy
INFO:root:2011 :: B-strategy: top3
INFO:root:2011 :: B-strategy: min0.2
INFO:gensim.models.keyedvectors:loading projection weights from /home/max/Results/fm_smh-yearly-rad3/models/2012.w2v
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (42636, 200) matrix of type float32 from /home/max/Results/fm_smh-yearly-rad3/models/2012.w2v', 'binary': False, 'encoding': 'utf8', 'datetime': '2023-11-28T09:14:15.417996', 'gensim': '4.3.1', 'python': '3.8.5 (default, Sep  4 2020, 07:30:14) \n[GCC 7.3.0]',

INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (9620, 200) matrix of type float32 from /home/max/Results/fm_smh-yearly-rad3/models/2022.w2v', 'binary': False, 'encoding': 'utf8', 'datetime': '2023-11-28T09:14:41.219242', 'gensim': '4.3.1', 'python': '3.8.5 (default, Sep  4 2020, 07:30:14) \n[GCC 7.3.0]', 'platform': 'Linux-5.4.0-166-generic-x86_64-with-glibc2.10', 'event': 'load_word2vec_format'}
INFO:root:2022 :: B-strategy: lazy
INFO:root:2022 :: B-strategy: greedy
INFO:root:2022 :: B-strategy: top3
INFO:root:2022 :: B-strategy: min0.2
INFO:root:PROCESSING 7 OF 9: 'N1_återvandring'; 18 m. 47 s.
INFO:root:Inference for återvandring: aterinvandring (score = 0.85).
INFO:root:A-Strategy: top3
INFO:root:Aigt: deportering, utvisningar, deportation
INFO:root:Aogt: möjlighet, hjälpa, hemvändning
INFO:gensim.models.keyedvectors:loading projection weights from /home/max/Results/fm_smh-yearly-rad3/models/2003.w2v
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (80

INFO:root:2009 :: B-strategy: lazy
INFO:root:In-group
INFO:root:Out-group
INFO:root:2009 :: B-strategy: greedy
INFO:root:In-group
INFO:root:B: [[], ['utvisning'], []]
INFO:root:Out-group
INFO:root:For hemvändning, neither `unimorph` nor `saldo` found nothing.
INFO:root:Amod to rescue...{'hemvändning'}
INFO:root:B: [['möjligheter', 'möjligheten', 'möjlighet', 'möjligheterna'], ['hjälptes', 'hjälp', 'hjälpas', 'hjälps', 'hjälpe', 'hjälpts', 'hjälpt', 'hjälpte', 'hjälper', 'hjälpande', 'hjälpa'], []]
INFO:root:2009 :: B-strategy: top3
INFO:root:In-group
INFO:root:B: utvisning
INFO:root:Out-group
INFO:root:For hemvändning, neither `unimorph` nor `saldo` found nothing.
INFO:root:Amod to rescue...{'hemvändning'}
INFO:root:B: möjlighet, möjligheter, möjligheten, hjälp, hjälpa, hjälper
INFO:root:2009 :: B-strategy: min0.2
INFO:root:In-group
INFO:root:B: utvisning
INFO:root:Out-group
INFO:root:For hemvändning, neither `unimorph` nor `saldo` found nothing.
INFO:root:Amod to rescue...{'hemvändnin

INFO:root:2016 :: B-strategy: lazy
INFO:root:2016 :: B-strategy: greedy
INFO:root:2016 :: B-strategy: top3
INFO:root:2016 :: B-strategy: min0.2
INFO:gensim.models.keyedvectors:loading projection weights from /home/max/Results/fm_smh-yearly-rad3/models/2017.w2v
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (14979, 200) matrix of type float32 from /home/max/Results/fm_smh-yearly-rad3/models/2017.w2v', 'binary': False, 'encoding': 'utf8', 'datetime': '2023-11-28T09:16:08.127042', 'gensim': '4.3.1', 'python': '3.8.5 (default, Sep  4 2020, 07:30:14) \n[GCC 7.3.0]', 'platform': 'Linux-5.4.0-166-generic-x86_64-with-glibc2.10', 'event': 'load_word2vec_format'}
INFO:root:2017 :: B-strategy: lazy
INFO:root:2017 :: B-strategy: greedy
INFO:root:2017 :: B-strategy: top3
INFO:root:2017 :: B-strategy: min0.2
INFO:gensim.models.keyedvectors:loading projection weights from /home/max/Results/fm_smh-yearly-rad3/models/2018.w2v
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loade

INFO:root:2003 :: B-strategy: lazy
INFO:root:2003 :: B-strategy: greedy
INFO:root:2003 :: B-strategy: top3
INFO:root:2003 :: B-strategy: min0.2
INFO:gensim.models.keyedvectors:loading projection weights from /home/max/Results/fm_smh-yearly-rad3/models/2004.w2v
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (13386, 200) matrix of type float32 from /home/max/Results/fm_smh-yearly-rad3/models/2004.w2v', 'binary': False, 'encoding': 'utf8', 'datetime': '2023-11-28T09:16:28.856562', 'gensim': '4.3.1', 'python': '3.8.5 (default, Sep  4 2020, 07:30:14) \n[GCC 7.3.0]', 'platform': 'Linux-5.4.0-166-generic-x86_64-with-glibc2.10', 'event': 'load_word2vec_format'}
INFO:root:2004 :: B-strategy: lazy
INFO:root:2004 :: B-strategy: greedy
INFO:root:2004 :: B-strategy: top3
INFO:root:2004 :: B-strategy: min0.2
INFO:gensim.models.keyedvectors:loading projection weights from /home/max/Results/fm_smh-yearly-rad3/models/2005.w2v
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loade

INFO:root:2010 :: B-strategy: top3
INFO:root:In-group
INFO:root:B: utvisning, skicka, skickar, skickade
INFO:root:Out-group
INFO:root:B: flytta, flyttar, flyttade, återvända, återvänder, återvändande, hemland, hemlandet, hemländer
INFO:root:2010 :: B-strategy: min0.2
INFO:root:In-group
INFO:root:B: utvisning, skickar, skicka
INFO:root:Out-group
INFO:root:B: flytta, återvända, hemländer, hemland, hemlandet
INFO:gensim.models.keyedvectors:loading projection weights from /home/max/Results/fm_smh-yearly-rad3/models/2011.w2v
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (48757, 200) matrix of type float32 from /home/max/Results/fm_smh-yearly-rad3/models/2011.w2v', 'binary': False, 'encoding': 'utf8', 'datetime': '2023-11-28T09:17:27.407185', 'gensim': '4.3.1', 'python': '3.8.5 (default, Sep  4 2020, 07:30:14) \n[GCC 7.3.0]', 'platform': 'Linux-5.4.0-166-generic-x86_64-with-glibc2.10', 'event': 'load_word2vec_format'}
INFO:root:2011 :: B-strategy: lazy
INFO:root:In-group
INF

INFO:gensim.models.keyedvectors:loading projection weights from /home/max/Results/fm_smh-yearly-rad3/models/2019.w2v
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (12084, 200) matrix of type float32 from /home/max/Results/fm_smh-yearly-rad3/models/2019.w2v', 'binary': False, 'encoding': 'utf8', 'datetime': '2023-11-28T09:17:59.748355', 'gensim': '4.3.1', 'python': '3.8.5 (default, Sep  4 2020, 07:30:14) \n[GCC 7.3.0]', 'platform': 'Linux-5.4.0-166-generic-x86_64-with-glibc2.10', 'event': 'load_word2vec_format'}
INFO:root:2019 :: B-strategy: lazy
INFO:root:In-group
INFO:root:Out-group
INFO:root:2019 :: B-strategy: greedy
INFO:root:In-group
INFO:root:B: [['utvisningar', 'utvisning'], ['skickar', 'skickade', 'skicka', 'skickas', 'skickat'], []]
INFO:root:Out-group
INFO:root:B: [['flyttade', 'flyttar', 'flytta', 'flyttas', 'flyttat'], ['återvändande', 'återvända', 'återvänder'], ['hemländer', 'hemland', 'hemlandet']]
INFO:root:2019 :: B-strategy: top3
INFO:root:In-group
IN

INFO:root:2007 :: B-strategy: greedy
INFO:root:2007 :: B-strategy: top3
INFO:root:2007 :: B-strategy: min0.2
INFO:gensim.models.keyedvectors:loading projection weights from /home/max/Results/fm_smh-yearly-rad3/models/2008.w2v
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (52660, 200) matrix of type float32 from /home/max/Results/fm_smh-yearly-rad3/models/2008.w2v', 'binary': False, 'encoding': 'utf8', 'datetime': '2023-11-28T09:18:38.047317', 'gensim': '4.3.1', 'python': '3.8.5 (default, Sep  4 2020, 07:30:14) \n[GCC 7.3.0]', 'platform': 'Linux-5.4.0-166-generic-x86_64-with-glibc2.10', 'event': 'load_word2vec_format'}
INFO:root:2008 :: B-strategy: lazy
INFO:root:2008 :: B-strategy: greedy
INFO:root:2008 :: B-strategy: top3
INFO:root:2008 :: B-strategy: min0.2
INFO:gensim.models.keyedvectors:loading projection weights from /home/max/Results/fm_smh-yearly-rad3/models/2009.w2v
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (56321, 200) matrix of type float

INFO:root:2014 :: B-strategy: lazy
INFO:root:2014 :: B-strategy: greedy
INFO:root:2014 :: B-strategy: top3
INFO:root:2014 :: B-strategy: min0.2
INFO:gensim.models.keyedvectors:loading projection weights from /home/max/Results/fm_smh-yearly-rad3/models/2015.w2v
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (24688, 200) matrix of type float32 from /home/max/Results/fm_smh-yearly-rad3/models/2015.w2v', 'binary': False, 'encoding': 'utf8', 'datetime': '2023-11-28T09:19:25.974938', 'gensim': '4.3.1', 'python': '3.8.5 (default, Sep  4 2020, 07:30:14) \n[GCC 7.3.0]', 'platform': 'Linux-5.4.0-166-generic-x86_64-with-glibc2.10', 'event': 'load_word2vec_format'}
INFO:root:2015 :: B-strategy: lazy
INFO:root:2015 :: B-strategy: greedy
INFO:root:2015 :: B-strategy: top3
INFO:root:2015 :: B-strategy: min0.2
INFO:gensim.models.keyedvectors:loading projection weights from /home/max/Results/fm_smh-yearly-rad3/models/2016.w2v
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loade

INFO:root:2006 :: B-strategy: lazy
INFO:root:2006 :: B-strategy: greedy
INFO:root:2006 :: B-strategy: top3
INFO:root:2006 :: B-strategy: min0.2
INFO:gensim.models.keyedvectors:loading projection weights from /home/max/Results/fm_smh-yearly-rad3/models/2007.w2v
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (56556, 200) matrix of type float32 from /home/max/Results/fm_smh-yearly-rad3/models/2007.w2v', 'binary': False, 'encoding': 'utf8', 'datetime': '2023-11-28T09:19:57.286453', 'gensim': '4.3.1', 'python': '3.8.5 (default, Sep  4 2020, 07:30:14) \n[GCC 7.3.0]', 'platform': 'Linux-5.4.0-166-generic-x86_64-with-glibc2.10', 'event': 'load_word2vec_format'}
INFO:root:2007 :: B-strategy: lazy
INFO:root:2007 :: B-strategy: greedy
INFO:root:2007 :: B-strategy: top3
INFO:root:2007 :: B-strategy: min0.2
INFO:gensim.models.keyedvectors:loading projection weights from /home/max/Results/fm_smh-yearly-rad3/models/2008.w2v
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loade

INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (32282, 200) matrix of type float32 from /home/max/Results/fm_smh-yearly-rad3/models/2014.w2v', 'binary': False, 'encoding': 'utf8', 'datetime': '2023-11-28T09:20:48.323733', 'gensim': '4.3.1', 'python': '3.8.5 (default, Sep  4 2020, 07:30:14) \n[GCC 7.3.0]', 'platform': 'Linux-5.4.0-166-generic-x86_64-with-glibc2.10', 'event': 'load_word2vec_format'}
INFO:root:2014 :: B-strategy: lazy
INFO:root:2014 :: B-strategy: greedy
INFO:root:2014 :: B-strategy: top3
INFO:root:2014 :: B-strategy: min0.2
INFO:gensim.models.keyedvectors:loading projection weights from /home/max/Results/fm_smh-yearly-rad3/models/2015.w2v
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (24688, 200) matrix of type float32 from /home/max/Results/fm_smh-yearly-rad3/models/2015.w2v', 'binary': False, 'encoding': 'utf8', 'datetime': '2023-11-28T09:20:51.527954', 'gensim': '4.3.1', 'python': '3.8.5 (default, Sep  4 2020, 07:30:14) \n[GCC 7.3.0]',

INFO:root:2005 :: B-strategy: top3
INFO:root:2005 :: B-strategy: min0.2
INFO:gensim.models.keyedvectors:loading projection weights from /home/max/Results/fm_smh-yearly-rad3/models/2006.w2v
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (52876, 200) matrix of type float32 from /home/max/Results/fm_smh-yearly-rad3/models/2006.w2v', 'binary': False, 'encoding': 'utf8', 'datetime': '2023-11-28T09:21:15.235052', 'gensim': '4.3.1', 'python': '3.8.5 (default, Sep  4 2020, 07:30:14) \n[GCC 7.3.0]', 'platform': 'Linux-5.4.0-166-generic-x86_64-with-glibc2.10', 'event': 'load_word2vec_format'}
INFO:root:2006 :: B-strategy: lazy
INFO:root:2006 :: B-strategy: greedy
INFO:root:2006 :: B-strategy: top3
INFO:root:2006 :: B-strategy: min0.2
INFO:gensim.models.keyedvectors:loading projection weights from /home/max/Results/fm_smh-yearly-rad3/models/2007.w2v
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (56556, 200) matrix of type float32 from /home/max/Results/fm_smh-year

INFO:gensim.models.keyedvectors:loading projection weights from /home/max/Results/fm_smh-yearly-rad3/models/2018.w2v
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (12331, 200) matrix of type float32 from /home/max/Results/fm_smh-yearly-rad3/models/2018.w2v', 'binary': False, 'encoding': 'utf8', 'datetime': '2023-11-28T09:22:12.268116', 'gensim': '4.3.1', 'python': '3.8.5 (default, Sep  4 2020, 07:30:14) \n[GCC 7.3.0]', 'platform': 'Linux-5.4.0-166-generic-x86_64-with-glibc2.10', 'event': 'load_word2vec_format'}
INFO:root:2018 :: B-strategy: lazy
INFO:root:2018 :: B-strategy: greedy
INFO:root:2018 :: B-strategy: top3
INFO:root:2018 :: B-strategy: min0.2
INFO:gensim.models.keyedvectors:loading projection weights from /home/max/Results/fm_smh-yearly-rad3/models/2019.w2v
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (12084, 200) matrix of type float32 from /home/max/Results/fm_smh-yearly-rad3/models/2019.w2v', 'binary': False, 'encoding': 'utf8', 'datetime

INFO:root:2009 :: B-strategy: min0.2
INFO:gensim.models.keyedvectors:loading projection weights from /home/max/Results/fm_smh-yearly-rad3/models/2010.w2v
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (47476, 200) matrix of type float32 from /home/max/Results/fm_smh-yearly-rad3/models/2010.w2v', 'binary': False, 'encoding': 'utf8', 'datetime': '2023-11-28T09:22:57.865343', 'gensim': '4.3.1', 'python': '3.8.5 (default, Sep  4 2020, 07:30:14) \n[GCC 7.3.0]', 'platform': 'Linux-5.4.0-166-generic-x86_64-with-glibc2.10', 'event': 'load_word2vec_format'}
INFO:root:2010 :: B-strategy: lazy
INFO:root:2010 :: B-strategy: greedy
INFO:root:2010 :: B-strategy: top3
INFO:root:2010 :: B-strategy: min0.2
INFO:gensim.models.keyedvectors:loading projection weights from /home/max/Results/fm_smh-yearly-rad3/models/2011.w2v
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (48757, 200) matrix of type float32 from /home/max/Results/fm_smh-yearly-rad3/models/2011.w2v', 'binary':

INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (9620, 200) matrix of type float32 from /home/max/Results/fm_smh-yearly-rad3/models/2022.w2v', 'binary': False, 'encoding': 'utf8', 'datetime': '2023-11-28T09:23:32.666992', 'gensim': '4.3.1', 'python': '3.8.5 (default, Sep  4 2020, 07:30:14) \n[GCC 7.3.0]', 'platform': 'Linux-5.4.0-166-generic-x86_64-with-glibc2.10', 'event': 'load_word2vec_format'}
INFO:root:2022 :: B-strategy: lazy
INFO:root:2022 :: B-strategy: greedy
INFO:root:2022 :: B-strategy: top3
INFO:root:2022 :: B-strategy: min0.2
INFO:root:Done! 27 m. 39 s.
