# Generating RDMs with LM data
Jacob Matthews, 10/28/2022

## Outline
This notebook demonstrates how to use rsatoolbox to compare vector representations generated by a language model. It then demonstrates how RDMs can be converted to networkx graphs.

## Imports and preprocessing

In [None]:
import rsatoolbox
import re
import nltk
import rsatoolbox
import pandas as pd
import numpy as np
import networkx as nx
import gensim.downloader

Preprocessing

## w2v and words to probe

Our simple model

In [None]:
# Import w2v model
w2v_model = gensim.downloader.load('word2vec-google-news-300')

Create a list of words to construct the RDM:

In [None]:
words = [
        ]
num_words = len(words)
num_words

Get embeddings from w2v model

In [None]:
embeddings = {word: w2v_model[word].tolist() 
              for word in words}
model_df = pd.DataFrame.from_dict(embeddings, orient='index')
model_df

## Generating RDMs

Make Dataset objects from our embeddings DataFrame

In [None]:
ds = rsatoolbox.data.Dataset.from_df(model_df)
ds.obs_descriptors = {'words': model_df.index.to_list()}
ds.descriptors = {'year': 2022, 'model': 'w2v'}

Calculate RDM for 2022 data

In [None]:
rdm = rsatoolbox.rdm.calc_rdm(ds, method='euclidean', descriptor='words')
rsatoolbox.vis.show_rdm(rdm, 
                        rdm_descriptor='year', 
                        figsize=[num_words, num_words],
                        show_colorbar='figure',
                        pattern_descriptor='words'
                        ) 


## RDMs to graphs

Convert RDM to ndarray

In [None]:
dist_matrix = rdm.get_matrices()[0]
print(dist_matrix)

Generate masks for values of (what I am calling) epsilon. 
Masker finds the following:

$$ m^{\epsilon}_{ij} = \; ^{m_{ij} \; \textbf{if} \; m_{ij} > \epsilon}_{\textbf{else} \; 0}, \epsilon = m*10^n $$ 

 

In [None]:
def masker(matrix, max_m=10, n=-5):
    masks = {}
    for i in range(max_m):
        eps = (i + 1) * 10 ** n
        f = lambda x: x if x > eps else 0.0
        mask = np.vectorize(f)(matrix)
        masks.update({i+1: mask})
    return masks

In [None]:
masks = masker(dist_matrix, n=-2)
# masks returns a dict indexed by epsilon value
print(masks[5])

Make graphs from masks

In [None]:
labels = {i: words[i] for i in range(len(words))}
G = [nx.Graph(masks[i+1]) for i in range(len(masks))] 
G = [nx.relabel_nodes(g, labels) for g in G]

Draw the graph

In [None]:
nx.draw_networkx(G[1])

In [None]:
nx.draw_networkx(G[4])