# Quantifying Centrality of a Commentator

The initial question of this notebook was: Is it true that Faversham shares doctrines with a lot of other commentators? Is he in the middle of a net of shared material? Does he pull from the whole tradition? This was based on the observation that there seemed to be a conspicuous overlap between the doctrines used by Anonymus Bazan 1 and those used by Faversham. This overlap was seen to be uni-directional, meaning that most or all of Bazan's doctrines were found in Faversham, but not the other way around (as Faversham has more doctrines than Bazan 1).

This let me ask: How do I investigate this observation more? This notebook shows the investigation of such observations of uni-directional shared material, and some reflections on what such investigations can show us.

This leads to an analysis of a relatively naive measure of centrality for the corpus of commentators.

# Procedure

## Utility functions and setup

In [62]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

%matplotlib inline

from neo4j.v1 import GraphDatabase

sns.set_context("paper", font_scale=1.2)

In [63]:
# Type hinting
from typing import Callable, Dict, List
from neo4j.v1.result import BoltStatementResult

In [64]:
def run_query(string: str) -> BoltStatementResult:
    with driver.session() as session:
        with session.begin_transaction() as tx:
            return tx.run(string)

In [65]:
def build_matrix(authors: List[str], 
                 calculator: Callable[[str], Dict[str, List[float]]]
                ) -> Dict[str, List[float]]:
    matrix = {}
    for author in authors:
        matrix[author] = []
        values = calculator(author)
        for name in authors:
            if name in values:
                matrix[author].append(values[name])
            else:
                matrix[author].append(0)
    return matrix            

## List of authors

Here we include all the authors mentioned in the database except Aristotle.

In [61]:
q: str = ('MATCH (n:Author) WHERE not(n.name = "Aristotle") '
          'RETURN n.name, id(n) ORDER BY n.name asc')
res = run_query(q)
author_table = {i[0]: i[1] for i in res.values()}
df = pd.DataFrame(author_table, index=['ID'])
df.T

Unnamed: 0,ID
Anonymus Bazan I,192
Anonymus Bernardini,218
Anonymus Giele,5
Anonymus Mertoniensis 275,7
Anonymus Oriel 33,4
Anonymus Steenberghen,23
Anonymus Vat. Lat. 2170 I,6
Anonymus Vat. Lat. 2170 II,21
Anonymus Vennebusch,187
Henric de la Wyle,1


In [41]:
author_names = author_table.keys()

## One-sided commonality

Now, let's look at the data in the more one sided perspective of how big a proportion of the material of one commentator is also to be found in the others (regardless of their shared sum of material). So this is the uni-directional, non-symmetric relation. 

The hypothesis is that this can be used to investigate which commentators have material that is also present in a high proportion of the other commentators (regardless of how much unique material they might have).

In [42]:
def unidirectional_shared_material(name: str) -> Dict[str, List[float]]:
    """
    Return a dictionary of how big a proportion 
    of his doctrines the searched commentator 
    shares with all the others.
    """
    q = ("MATCH (a1:Author)--(:Text)--(q1:Question)--(d:Doctrine) "
         "WHERE a1.name = '%s'"
         "MATCH (d)--(:Question)--(:Text)--(a2:Author) "
         "WHERE (a1) <> (a2) "
         "MATCH (q1)--(d1:Doctrine) " 
         "RETURN a2.name, collect(distinct d), "
         "toFloat(count(distinct d)) / count(distinct d1) as CNT " % (name))
    return {r[0]: r[2] for r in run_query(q).values()}


# Test it with an example
unidir_single = unidirectional_shared_material('Simon of Faversham')
unidir_single

{'Anonymus Bazan I': 0.38461538461538464,
 'Anonymus Bernardini': 0.3076923076923077,
 'Anonymus Giele': 0.46153846153846156,
 'Anonymus Mertoniensis 275': 0.46153846153846156,
 'Anonymus Oriel 33': 0.38461538461538464,
 'Anonymus Steenberghen': 0.15384615384615385,
 'Anonymus Vat. Lat. 2170 I': 0.5384615384615384,
 'Anonymus Vat. Lat. 2170 II': 0.5384615384615384,
 'Anonymus Vennebusch': 0.23076923076923078,
 'Henric de la Wyle': 0.3076923076923077,
 'John Dinsdale': 0.38461538461538464,
 'John of Jandun': 0.6153846153846154,
 'Radulphus Brito': 0.6153846153846154}

### Uni-directional overlap in full matrix

This shows how many of the doctrines of one commentator are also to be found in each of the others. This is not a symmetrical relation. 

This will show much much of the material of the column commnentator (X) is present in the others (Y). This only shows how big a proportion of the material (X) is also found in each of the others (Y). It thus does not show how close the two are to each other but merely how close the X is to Y.

Now let's try to build a matrix of those values for all the commentators. By default each column (X) will contain the array of results from the above function. 


In [43]:
uni_rel = build_matrix(author_names, calculator=unidirectional_shared_material)
matrix = pd.DataFrame(uni_rel, index=author_names)
matrix

Unnamed: 0,Anonymus Bazan I,Anonymus Bernardini,Anonymus Giele,Anonymus Mertoniensis 275,Anonymus Oriel 33,Anonymus Steenberghen,Anonymus Vat. Lat. 2170 I,Anonymus Vat. Lat. 2170 II,Anonymus Vennebusch,Henric de la Wyle,John Dinsdale,John of Jandun,Radulphus Brito,Simon of Faversham
Anonymus Bazan I,0.0,0.230769,0.5,0.157895,0.214286,0.285714,0.333333,0.571429,0.25,0.176471,0.230769,0.25,0.5,0.384615
Anonymus Bernardini,0.428571,0.0,0.5,0.578947,0.5,0.142857,0.333333,0.285714,0.625,0.235294,0.538462,0.1875,0.333333,0.307692
Anonymus Giele,0.428571,0.230769,0.0,0.157895,0.214286,0.142857,0.416667,0.428571,0.375,0.117647,0.230769,0.375,0.333333,0.461538
Anonymus Mertoniensis 275,0.428571,0.846154,0.5,0.0,0.714286,0.142857,0.416667,0.428571,0.75,0.352941,0.692308,0.25,0.416667,0.461538
Anonymus Oriel 33,0.428571,0.538462,0.5,0.526316,0.0,0.142857,0.333333,0.428571,0.5,0.529412,1.0,0.375,0.416667,0.384615
Anonymus Steenberghen,0.285714,0.076923,0.166667,0.052632,0.071429,0.0,0.25,0.285714,0.125,0.058824,0.076923,0.125,0.25,0.153846
Anonymus Vat. Lat. 2170 I,0.571429,0.307692,0.833333,0.263158,0.285714,0.428571,0.0,0.571429,0.375,0.235294,0.307692,0.5,0.5,0.538462
Anonymus Vat. Lat. 2170 II,0.571429,0.153846,0.5,0.157895,0.214286,0.285714,0.333333,0.0,0.25,0.176471,0.230769,0.3125,0.5,0.538462
Anonymus Vennebusch,0.285714,0.384615,0.5,0.315789,0.285714,0.142857,0.25,0.285714,0.0,0.117647,0.307692,0.1875,0.333333,0.230769
Henric de la Wyle,0.428571,0.307692,0.333333,0.315789,0.642857,0.142857,0.333333,0.428571,0.25,0.0,0.692308,0.3125,0.333333,0.307692


#### Single author proportions

Now with this matrix we can get any of the columns to see the data on that commentator.

In [45]:
matrix['Anonymus Bazan I'].sort_values(ascending=False)

Radulphus Brito               0.857143
Simon of Faversham            0.714286
John of Jandun                0.571429
Anonymus Vat. Lat. 2170 II    0.571429
Anonymus Vat. Lat. 2170 I     0.571429
John Dinsdale                 0.428571
Henric de la Wyle             0.428571
Anonymus Oriel 33             0.428571
Anonymus Mertoniensis 275     0.428571
Anonymus Giele                0.428571
Anonymus Bernardini           0.428571
Anonymus Vennebusch           0.285714
Anonymus Steenberghen         0.285714
Anonymus Bazan I              0.000000
Name: Anonymus Bazan I, dtype: float64

#### Centrality factor
We can turn the matrix 90° so that rows become columns and each column then shows the inverse relation for each commentator (X), namely how big a proportion of the doctrines of each other commentator (Y) is contained in X.

If we calculate the mean of each of those columns we get a single value for how strong a tendency there is for the doctrines of an author to also be present in other authors. It is not a commonality measure, but a centrality measure.

In [46]:
centrality = matrix.T.mean().sort_values(ascending=False).round(3)

In [47]:
centrality

Simon of Faversham            0.480
John of Jandun                0.460
Radulphus Brito               0.458
Anonymus Mertoniensis 275     0.457
Anonymus Oriel 33             0.436
John Dinsdale                 0.427
Anonymus Vat. Lat. 2170 I     0.408
Anonymus Bernardini           0.357
Henric de la Wyle             0.345
Anonymus Vat. Lat. 2170 II    0.302
Anonymus Bazan I              0.292
Anonymus Giele                0.279
Anonymus Vennebusch           0.259
Anonymus Steenberghen         0.141
dtype: float64

In [48]:
centrality.describe()

count    14.000000
mean      0.364357
std       0.099484
min       0.141000
25%       0.294500
50%       0.382500
75%       0.451750
max       0.480000
dtype: float64

Show the top quartile. The four commentators in that upper quantile are the ones with whom the total commentators share the most of their doctrines. 

In [49]:
centrality[centrality>centrality.quantile(0.75)]

Simon of Faversham           0.480
John of Jandun               0.460
Radulphus Brito              0.458
Anonymus Mertoniensis 275    0.457
dtype: float64

They are very close at this end of the spectrum, as the low standard deviation shows:

In [50]:
centrality[centrality>centrality.quantile(0.75)].std()

0.01090489186863704

### Who have high out-going overlap?

Now we can also take the max value of each of column in the recipiency matrix to reveal whether there are some commentators who have a very high out-going overlap, i.e. whether there are some commentators who have a very high proportion of their material present in that commentator.

In [51]:
max_values = matrix.T.max().sort_values(ascending=False)
max_values

Simon of Faversham            1.000000
John of Jandun                1.000000
Anonymus Oriel 33             1.000000
John Dinsdale                 0.928571
Radulphus Brito               0.857143
Anonymus Mertoniensis 275     0.846154
Anonymus Vat. Lat. 2170 I     0.833333
Henric de la Wyle             0.692308
Anonymus Bernardini           0.625000
Anonymus Vat. Lat. 2170 II    0.571429
Anonymus Bazan I              0.571429
Anonymus Vennebusch           0.500000
Anonymus Giele                0.461538
Anonymus Steenberghen         0.285714
dtype: float64

In [52]:
max_values.describe()

count    14.000000
mean      0.726616
std       0.229280
min       0.285714
25%       0.571429
50%       0.762821
75%       0.910714
max       1.000000
dtype: float64

### Top most shared with each of the max outliers

In [53]:
matrix.T['Simon of Faversham'].sort_values(ascending=False).head(4)

Anonymus Vat. Lat. 2170 II    1.000000
Anonymus Giele                1.000000
Anonymus Bazan I              0.714286
Radulphus Brito               0.666667
Name: Simon of Faversham, dtype: float64

In [54]:
matrix.T['Radulphus Brito'].sort_values(ascending=False).head(4)

Anonymus Vat. Lat. 2170 II    0.857143
Anonymus Bazan I              0.857143
Anonymus Giele                0.666667
Simon of Faversham            0.615385
Name: Radulphus Brito, dtype: float64

In [23]:
matrix.T['John of Jandun'].sort_values(ascending=False).head(4)

Anonymus Giele                1.000000
Anonymus Vat. Lat. 2170 II    0.833333
Simon of Faversham            0.666667
Anonymus Vat. Lat. 2170 I     0.666667
Name: John of Jandun, dtype: float64

## Observations

Faversham the commentator with highest degree of shared material with the others. Along with Brito and Jandun he seems to constitute a group of central and representative commentators (cross-correlating the mean and max listings).

There are some commentators who share all or almost all of their doctrines with one or more of these three guys.