# Characterize recommendation nondeterminism
We've noticed that the scores for our candidates are non-deterministic, and we'd like to characterize where that nondeterminism comes from in our algorithmic implementation.

In [7]:
import networkx as nx
import pandas as pd
import jsonlines
from collections import defaultdict
from os import listdir
from os.path import splitext

## Read in the comparative runs

In [13]:
datapath = '../data/conference_rec_output'
prefixes = ['16May2024', '20May2024']
runs = defaultdict(dict)
for run in prefixes:
    for f in listdir(datapath):
        if run in f:
            fname = splitext('_'.join(f.split('_')[1:]))[0]
            if splitext(f)[1] == '.graphml':
                runs[run][fname] = nx.read_graphml(f'{datapath}/{f}')
            elif splitext(f)[1] == '.jsonl':
                with jsonlines.open(f'{datapath}/{f}') as reader:
                    runs[run][fname] = [obj for obj in reader]
            elif splitext(f)[1] == '.csv':
                runs[run][fname] = pd.read_csv(f'{datapath}/{f}', index_col=0)

## Compare composite scores

In [23]:
all_composite = runs['16May2024']['composite_scores'].merge(runs['20May2024']['composite_scores'], left_index=True, right_index=True)

In [24]:
all_composite['difference'] = all_composite['composite_score_x'] - all_composite['composite_score_y']

In [26]:
print(f'The mean difference between scores across runs is {all_composite.difference.mean()}')

The mean difference between scores across runs is -1.25727481885282e-05


It's a small difference, but I believe it's larger than can be attributed to numerical overflow, so let's hunt down the source. 

## Compare individual scores
Since dictionary insertion order is preserved now in Python, we can directly compare these two lists (because of how I iterated to produce them in the algorithm).

In [29]:
score_diffs = defaultdict(list)
for x, y in zip(runs['16May2024']['individual_component_scores'], runs['20May2024']['individual_component_scores']):
    for score_type in ['co_citation', 'co_author', 'topic', 'geography']:
        d = x[score_type] - y[score_type]
        score_diffs[score_type].append(d)
score_mean_diffs = {k: sum(v)/len(v) for k, v in score_diffs.items()}
score_mean_diffs

{'co_citation': -4.275195028600522e-05,
 'co_author': -1.00389446458504e-05,
 'topic': 0.0,
 'geography': 0.0}

Topic modeling isn't the issue, which is good, because it means we correctly set the random seeds in our model instantiations. The randomness is coming from the co-networks, which means it's either the Louvain clustering, or the generation of the networks themselves.

## Compare the co-citation and co-author networks
First, let's see if the networks themselves are the same:
### Co-citation

In [30]:
nx.is_isomorphic(runs['16May2024']['co_citation_network'], runs['20May2024']['co_citation_network'])

True

In [31]:
nx.utils.graphs_equal(runs['16May2024']['co_citation_network'], runs['20May2024']['co_citation_network'])

False

In [34]:
nx.utils.nodes_equal(runs['16May2024']['co_citation_network'].nodes(data=True), runs['20May2024']['co_citation_network'].nodes(data=True))

True

In [35]:
nx.utils.edges_equal(runs['16May2024']['co_citation_network'].edges(data=True), runs['20May2024']['co_citation_network'].edges(data=True))

False

In [41]:
nx.utils.edges_equal(runs['16May2024']['co_citation_network'].edges, runs['20May2024']['co_citation_network'].edges)

True

Something about the edge attributes is different between the two graphs.

In [50]:
for e1, e2, attrs in runs['16May2024']['co_citation_network'].edges(data=True):
    matching_edge_attrs = runs['20May2024']['co_citation_network'].get_edge_data(e1, e2)
    if attrs != matching_edge_attrs:
        print('Edge attrs non-matching! For edge ', e1, e2)
        print(attrs, matching_edge_attrs)

Edge attrs non-matching! For edge  yang, l qiu, bs
{'weight': 1} {'weight': 2}
Edge attrs non-matching! For edge  sun, wq leopold, ac
{'weight': 6} {'weight': 49}
Edge attrs non-matching! For edge  sun, wq farrant, jm
{'weight': 5} {'weight': 16}
Edge attrs non-matching! For edge  sun, wq crowe, lm
{'weight': 7} {'weight': 5}
Edge attrs non-matching! For edge  sun, wq tsan, fy
{'weight': 6} {'weight': 9}
Edge attrs non-matching! For edge  come, d horbowicz, m
{'weight': 10} {'weight': 4}
Edge attrs non-matching! For edge  come, d pammenter, nw
{'weight': 5} {'weight': 11}
Edge attrs non-matching! For edge  come, d farrant, jm
{'weight': 15} {'weight': 5}
Edge attrs non-matching! For edge  come, d golovina, ea
{'weight': 6} {'weight': 2}
Edge attrs non-matching! For edge  come, d koster, kl
{'weight': 1} {'weight': 5}
Edge attrs non-matching! For edge  black, m hong, td
{'weight': 8} {'weight': 1}
Edge attrs non-matching! For edge  black, m bochicchio, a
{'weight': 1} {'weight': 5}
Edge

### Co-author

In [37]:
nx.is_isomorphic(runs['16May2024']['co_author_network'], runs['20May2024']['co_author_network'])

True

In [38]:
nx.utils.graphs_equal(runs['16May2024']['co_author_network'], runs['20May2024']['co_author_network'])

False

In [39]:
nx.utils.nodes_equal(runs['16May2024']['co_author_network'].nodes(data=True), runs['20May2024']['co_author_network'].nodes(data=True))

True

In [40]:
nx.utils.edges_equal(runs['16May2024']['co_author_network'].edges(data=True), runs['20May2024']['co_author_network'].edges(data=True))

False

In [42]:
nx.utils.edges_equal(runs['16May2024']['co_author_network'].edges, runs['20May2024']['co_author_network'].edges)

True

The same thing is true of the co-author network.

In [51]:
for e1, e2, attrs in runs['16May2024']['co_author_network'].edges(data=True):
    matching_edge_attrs = runs['20May2024']['co_author_network'].get_edge_data(e1, e2)
    if attrs != matching_edge_attrs:
        print('Edge attrs non-matching! For edge ', e1, e2)
        print(attrs, matching_edge_attrs)

Edge attrs non-matching! For edge  csintalan, z tuba, z
{'weight': 7} {'weight': 15}
Edge attrs non-matching! For edge  berjak, p walters, c
{'weight': 2} {'weight': 6}
Edge attrs non-matching! For edge  pammenter, nw walters, c
{'weight': 6} {'weight': 2}
Edge attrs non-matching! For edge  bartels, d phillips, j
{'weight': 7} {'weight': 4}
Edge attrs non-matching! For edge  karsten, u holzinger, a
{'weight': 5} {'weight': 11}
Edge attrs non-matching! For edge  pritchard, hw daws, mi
{'weight': 1} {'weight': 8}
Edge attrs non-matching! For edge  wang, xf tan, yh
{'weight': 4} {'weight': 2}
Edge attrs non-matching! For edge  farrant, jm hilhorst, hwm
{'weight': 9} {'weight': 4}
Edge attrs non-matching! For edge  farrant, jm moore, jp
{'weight': 12} {'weight': 2}
Edge attrs non-matching! For edge  perera-castro, av laza, jm
{'weight': 1} {'weight': 2}
Edge attrs non-matching! For edge  baskin, cc baskin, jm
{'weight': 4} {'weight': 13}
Edge attrs non-matching! For edge  oliver, mj ligter

This is certainly starting to look more buggy than simply indeterminate. However, I would not expect this level of differences to affect every non-zero score in the dataset, which it seems to be doing. So we need to track down the source of these differences, and if it's a bug, fix it and then check once again for non-determinism.