In [None]:
from collections import defaultdict

import json

import numpy as np

import pandas as pd

from scipy.stats import spearmanr 
from scipy.spatial.distance import cosine


In notebook we go through a simplified version of the algorithm proposed in the [Grammatical Profiling for Semantic Change Detection](https://aclanthology.org/2021.conll-1.33/) paper. The complete code can be found in the associated git repository [https://github.com/glnmario/semchange-profiling](https://github.com/glnmario/semchange-profiling).

# Loading data

In this session we will work with the __[Latin SemEval dataset](https://zenodo.org/record/3734089)__. We preprocessed corpus with UDPipe and extracted *profiles*, i.e. counts for each target-word form in each corpus. Profiles for other languages can be found in the git above.  


In [None]:
properties_1 = json.load(open("features/latin_corpus1_morph.json", "r"))
properties_2 = json.load(open("features/latin_corpus2_morph.json", "r"))                  

For example, in the first corpus word `imperator` was used 80 times in Nominative case with Masculine gender and Plural Number, while in the second corpus it was used in that specific form 3522 times.

In [None]:
properties_1['imperator']

In [None]:
properties_2['imperator']

# Cosine similarity

We can use each category combination as a feature and measure semantic shift as cosine similarity between vectors in this feature space.

In [None]:
def get_properties(word):
    return properties_1[word], properties_2[word]

In [None]:
def score(prop1, prop2, thr=0):
    
    # features is a combination of dictionary keys from two periods
    features = {k:prop1.get(k,0)+prop2.get(k,0) for k in set(prop1.keys()).union(set(prop2.keys()))}
    
    # FILTERING:
    # default: no filtering
    features = {k:v for k,v in features.items() if v>thr}
    
    # lets set a count to 0 if a feature is missed in a dictionary
    counts1 = [prop1.get(f,0) for f in features]
    counts2 = [prop2.get(f,0) for f in features]
    
    # now we can compute a score
    return cosine(counts1, counts2)

In [None]:
score(*get_properties('imperator'))

Now lets load the ground truth, compute score for all target words and calculate the method performance using Spearman rank correlation.

In [None]:
# Load target words with ground truth
graded = pd.read_csv('targets/latin/graded.txt', sep="\t", header=None, names=['word', 'truth'])

In [None]:
# Compute score for each word in the list
graded["score"] = graded.apply(lambda row: score(*get_properties(row.word)), axis = 1)
graded

In [None]:
# Evaluate using Spearman Rank Correlation
spearmanr(graded.truth, graded.score)

**Your turn** Try to add filtering, i.e. removing features that appear in both corpora less than 5 times in total (use `thr` parameter). What happens with the correlation score? Why? Try different values for the threshold. 

# Split features

In the section above the feature space was constructed from word forms: case, number and other grammatical properties were used in combination. However, we can split them and count each property separately. For example, we can count how many times a word has been used in a plural form, regardless the case.

In [None]:
# a function that splits properties
def split_props(properties):
    splt = defaultdict(int)
    for p,count in properties.items():
        for f in p.split("|"):
            splt[f] += count
    return splt

In [None]:
split_props(properties_1['imperator'])

In [None]:
def get_split_props(word):
    return split_props(properties_1[word]), split_props(properties_2[word])

In [None]:
# Evaluation
graded["split_score"] = graded.apply(lambda row: score(*get_split_props(row.word)), axis = 1)
graded

In [None]:
spearmanr(graded.truth, graded.split_score)

**Your turn** Check effect of various filtering thresholds in this method.

# Two-step

In the previous section, we used various morphological properties all together. `Case=Nom`, `Case=Acc` and `Number=Sing` were all treated equally, even though the first two are mutually exclusive while both can be combined with the third one. It can have more sense to compute distances for each morphological category separately, e.g. number distance, case distance and so forth.

In [None]:
# a function that splits properties by morphological category
def two_step_split(properties):
    splt = defaultdict(lambda: defaultdict(int))
    for p,count in properties.items():
        for f in p.split("|"):
            try:
                category, value = f.split("=")
            except ValueError:  #not enough values to unpack 
                continue
            splt[category][value] += count 
    return splt

In [None]:
two_step_split(properties_1['imperator'])

Now for each word we get a *set* of scores, one for each morphological category.

In [None]:
def two_step_score(word, thr=0):
    prop1 = two_step_split(properties_1[word])
    prop2 = two_step_split(properties_2[word])
    
    categories = {k:sum(prop1[k].values())+sum(prop2[k].values()) 
                  for k in set(prop1.keys()).union(set(prop2.keys()))}
    
    #filtering:
    total = sum(categories.values())
    categories = {k:v for k,v in categories.items() if v > thr }
    
    scores = {cat:score(prop1[cat], prop2[cat]) for cat in categories}
    
    return scores


In [None]:
two_step_score('imperator', thr=5)

We can compute the overall change score by *averaging* scores for each feature

In [None]:
def aggregated_score(scores):
    return np.mean(list(scores.values()))

In [None]:
# Evaluation
graded["two_step_score"] = graded.apply(lambda row: aggregated_score(two_step_score(row.word)), axis = 1)
graded

In [None]:
spearmanr(graded.truth, graded.two_step_score)

**Your turn:** 
<br>
1. Try to use maximum of the scores instead of averaging. How this affects the results? Why?
<br>
2. The threshold we are using here is the same for all words (e.g. 5), while in the paper we used a variable threshold, 5% of total word count. Implement this and see how it will affect the results. Hint: use `total` defined inside the `two_step_score` function.