In [14]:
### Loading Credentials from local file; 
### this cell is meant to be deleted before publishing
import yaml

with open("../creds.yml", 'r') as ymlfile:
    cfg = yaml.safe_load(ymlfile)

uri = cfg["sonar_creds"]["uri"]
user = cfg["sonar_creds"]["user"]
password = cfg["sonar_creds"]["pass"]

<font size = "20"> SoNAR (IDH) - HNA Curriculum </font>

<font size = "5">Notebook 4:  Example Case - History of Physiology</font>

# Defining the Physiology Graph

Search for "hysiolog" as substring for Physiology to retrieve every possible string containing.


In [15]:
from neo4j import GraphDatabase

driver = GraphDatabase.driver(uri, auth=(user, password))

query = """
MATCH (t:TopicTerm)
WHERE t.Name CONTAINS "hysiolog"
RETURN DISTINCT(t.Name)
"""

with driver.session() as session:
    all_physiology_terms = session.run(query).data()
    
all_physiology_terms

[{'(t.Name)': 'Arbeitsphysiologie'},
 {'(t.Name)': 'Neurophysiologie'},
 {'(t.Name)': 'Pathophysiologie'},
 {'(t.Name)': 'Pflanzenphysiologie'},
 {'(t.Name)': 'Physiologie'},
 {'(t.Name)': 'Sinnesphysiologie'},
 {'(t.Name)': 'Tierphysiologie'},
 {'(t.Name)': 'Physiologische Chemie'},
 {'(t.Name)': 'Physiologische Psychologie'},
 {'(t.Name)': 'Sprachphysiologie'},
 {'(t.Name)': 'Sportphysiologie'},
 {'(t.Name)': 'Leistungsphysiologie'},
 {'(t.Name)': 'Physiologische Psychiatrie'},
 {'(t.Name)': 'Elektrophysiologie'},
 {'(t.Name)': 'Altersphysiologie'},
 {'(t.Name)': 'Bewegungsphysiologie'},
 {'(t.Name)': 'Entwicklungsphysiologie'},
 {'(t.Name)': 'Ernährungsphysiologie'},
 {'(t.Name)': 'Ertragsphysiologie'},
 {'(t.Name)': 'Histophysiologie'},
 {'(t.Name)': 'Höhenphysiologie'},
 {'(t.Name)': 'Nacherntephysiologie'},
 {'(t.Name)': 'Physiologische Optik'},
 {'(t.Name)': 'Physiologische Uhr'},
 {'(t.Name)': 'Psychophysiologische Diagnostik'},
 {'(t.Name)': 'Stoffwechselphysiologie'},
 {'(t

With this in mind we can create a full network that contains every person connected to any kind of Physiological topic term. Also, we use the function `apoc.algo.cover(n)` to retrieve any kind of relationship between the persons connected to Physiological topic terms.

In [16]:
from helper_functions.helper_fun import to_nx_graph

query = """
MATCH (t:TopicTerm)-[r]-(n:PerName)
WHERE t.Name CONTAINS "hysiolog"
RETURN *
"""

driver = GraphDatabase.driver(uri, auth=(user, password))

G = to_nx_graph(neo4j_driver = driver, 
                query = query)

Check which topic terms aren't present in query result:

In [17]:
import numpy as np

relevant_topics = []
for node in list(G.nodes):
    if G.nodes[node]["type"] == "TopicTerm":
        relevant_topics.append((G.nodes[node]["label"]))

np.setdiff1d([d["(t.Name)"] for d in all_physiology_terms], relevant_topics)

array(['Altersphysiologie', 'Bewegungsphysiologie',
       'Elektrophysiologische Untersuchung', 'Ertragsphysiologie',
       'Experimentelle Physiologie', 'Histophysiologie',
       'Ignaz-L.-Lieben-Preis für Physik, Chemie und Physiologie',
       'Muskelphysiologie', 'Physiologische Optik', 'Physiologische Uhr',
       'Psychophysiologische Diagnostik', 'Reizphysiologie',
       'Sprachphysiologie', 'Tauchphysiologie', 'Umweltphysiologie',
       'Vergleichende Neurophysiologie', 'Vergleichende Physiologie',
       'Zellphysiologie'], dtype='<U57')

@todo add colors to network below by node type

In [6]:
from helper_functions.helper_fun import to_nx_graph
from pyvis.network import Network

nt = Network('750px', '100%', notebook=True, directed = True)
nt.from_nx(G)
nt.set_edge_smooth("dynamic")
#nt.show('./html_networks/physiological_net.html')

## Retrieving the Network

In [18]:
%%time

from helper_functions.helper_fun import to_nx_graph

query = """
MATCH (t:TopicTerm)--(n:PerName)
WHERE t.Name CONTAINS "hysiolog"
WITH DISTINCT [x in collect(t)+collect(n)|id(x)] as collectedIds 
MATCH (n)-[rel1:RelationToPerName*0..1]-(n2)
WHERE id(n) in collectedIds
RETURN n, n2, rel1
"""


driver = GraphDatabase.driver(uri, auth=(user, password))

G = to_nx_graph(neo4j_driver = driver, 
                query = query)

CPU times: user 14.5 s, sys: 526 ms, total: 15 s
Wall time: 15.4 s


## Descriptive Metrics

In [19]:
person_nodes = [x for x,y in G.nodes(data=True) if y['type']=="PerName"]
resources_nodes = [x for x,y in G.nodes(data=True) if y['type']=="Resource"]
topicterm_nodes = [x for x,y in G.nodes(data=True) if y['type']=="TopicTerm"]

In [7]:
G.number_of_nodes()

36187

In [20]:
print(len(person_nodes))
print(len(resources_nodes))
print(len(topicterm_nodes))

1856
34261
27


# Investigating the Persons

## Centrality Investigation

At first we are going to investigate the centrality of the physiology network. By assessing the centrality we actually calculate the importance of the nodes. 

 

### Betweenness Centrality

Betweenness is one of many ways to measure the centrality of nodes. Betweenness counts the number of shortest paths traversing a given node:

\begin{align}
x & = \frac{\Sigma \sigma(u, v | n)}{\Sigma \sigma(u, v)} \\
\end{align}


$ \sigma(u, v | n) $  describes the number of shortest paths between the nodes $u$ and $v$ and $\sigma(u, v | n)$ is the number of such paths passing through $n$ (Scifo, 2020). 

This measure is especially useful for identifying critical or highly important nodes in a network. High betweenness can be interpreted as power in a social network or as the amount of social capital a node owns. 

The calculation of betweenness centrality is costly though. When calculating the betweenness of the full physiology network, the calculation will take multiple hours to days depending on the machine you are calculating on. 

More details on the usage of the algorithm with the `networkx` library can be found [here](https://networkx.org/documentation/networkx-1.10/reference/generated/networkx.algorithms.centrality.betweenness_centrality.html#:~:text=Betweenness%20centrality%20of%20a%20node,through%20some%20node%20other%20than%20.).

Due to the high computational costs of the betweenness calculations, we use the `betweenness_centrality_subset()` function of the `networkx` library. This function allows us to calculate the betweenness in respect to a specific subset of nodes. In the example below we calculate the betweenness for all persons in the network only. 

In [33]:
import networkx as nx
import random

betweenness = nx.betweenness_centrality_subset(G, sources=person_nodes, targets=person_nodes)

CPU times: user 3min 45s, sys: 44.7 ms, total: 3min 45s
Wall time: 3min 45s


In [34]:
from operator import itemgetter

betweenness_sorted = sorted(betweenness.items(), key = itemgetter(1), reverse = True)
betweenness_filtered = [item for item in betweenness_sorted if item[0] in person_nodes]
top_betweenness = betweenness_filtered[:10]

In [35]:
for i in top_betweenness: 
    degree = betweenness[i[0]] 
    print("Name:", G.nodes(data = True)[i[0]]["label"], "| Betweenness Centrality:", i[1])

Name: Wundt, Wilhelm | Betweenness Centrality: 98834.3937482409
Name: Eccles, John C. | Betweenness Centrality: 34730.79224155618
Name: Sherrington, Charles Scott | Betweenness Centrality: 28845.307688601402
Name: Pirson, André | Betweenness Centrality: 24827.09301547723
Name: Asher, Leon | Betweenness Centrality: 21577.10243170289
Name: Oken, Lorenz | Betweenness Centrality: 20279.19793097923
Name: Mothes, Kurt | Betweenness Centrality: 14311.347611623598
Name: Baer, Karl Ernst von | Betweenness Centrality: 13285.350541327418
Name: Helmholtz, Hermann von | Betweenness Centrality: 12032.185088709262
Name: Engelmann, Wilhelm | Betweenness Centrality: 11876.72544075516


### Eigenvector Centrality

In [70]:
eigenvectors = nx.eigenvector_centrality_numpy(G)

In [72]:
from operator import itemgetter

eigenvectors_sorted = sorted(eigenvectors.items(), key = itemgetter(1), reverse = True)
eigenvectors_filtered = [item for item in eigenvectors_sorted if item[0] in person_nodes]
top_eigenvectors = eigenvectors_filtered[:10]

In [73]:
for i in top_eigenvectors: 
    degree = eigenvectors[i[0]] 
    print("Name:", G.nodes(data = True)[i[0]]["label"], "| Eigenvector Centrality:", i[1])

Name: Wundt, Wilhelm | Name: PerName | Eigenvector Centrality: 0.7070494505966637
Name: Wundt, Marie Friederike | Name: PerName | Eigenvector Centrality: 0.009074726145276534
Name: Arnold, Ida Eberhardina | Name: PerName | Eigenvector Centrality: 0.009074726145276532
Name: Wundt, Wilhelm | Name: PerName | Eigenvector Centrality: 0.009062579963126092
Name: Wundt, Ludwig | Name: PerName | Eigenvector Centrality: 0.009062579963126087
Name: Wundt, Eleonore | Name: PerName | Eigenvector Centrality: 0.009062579963126087
Name: Wundt, Max | Name: PerName | Eigenvector Centrality: 0.009062579963126087
Name: Wundt, Maximilian | Name: PerName | Eigenvector Centrality: 0.009062579963126087
Name: Wundt, Sophie | Name: PerName | Eigenvector Centrality: 0.009062579963126084
Name: Wundt, Magdalena | Name: PerName | Eigenvector Centrality: 0.009062579963126082


### PageRank

In [74]:
pageranks = nx.pagerank(G)

In [75]:
from operator import itemgetter

pageranks_sorted = sorted(pageranks.items(), key = itemgetter(1), reverse = True)
pageranks_filtered = [item for item in pageranks_sorted if item[0] in person_nodes]
top_pageranks = pageranks_filtered[:10]

In [76]:
for i in top_pageranks: 
    degree = pageranks[i[0]] 
    print("Name:", G.nodes(data = True)[i[0]]["label"], "| PageRank Centrality:", i[1])

Name: Wundt, Wilhelm | Name: PerName | PageRank Centrality: 0.07749802724851465
Name: Oken, Lorenz | Name: PerName | PageRank Centrality: 0.028146978668306927
Name: Baer, Karl Ernst von | Name: PerName | PageRank Centrality: 0.019040751791885017
Name: Cohn, Jonas | Name: PerName | PageRank Centrality: 0.01496615179743833
Name: Pirson, André | Name: PerName | PageRank Centrality: 0.014152577093836416
Name: Eccles, John C. | Name: PerName | PageRank Centrality: 0.013947577661537536
Name: Haller, Albrecht von | Name: PerName | PageRank Centrality: 0.013138540797033306
Name: Du Bois- Reymond, Emil Heinrich | Name: PerName | PageRank Centrality: 0.010103140633417208
Name: Sömmerring, Samuel Thomas von | Name: PerName | PageRank Centrality: 0.009926079219744307
Name: Euler, Leonhard | Name: PerName | PageRank Centrality: 0.008790405208168936


## Centrality of Women

We can apply another filter to our 

# Investigating predefined set of Physiologists

* Gustav Fritsch (DE-588)115568808
* Eduard Hitzig (DE-588)116917423
* Hermann Munk (DE-588)117185930
* Nathan Zuntz (DE-588)118896202
* Friedrich Goltz (DE-588)116764694
* Adolf Fick (DE-588)118800000
* Jacques Loeb (DE-588)119133628

In [None]:
physiologists = ["(DE-588)115568808", "(DE-588)116917423", "(DE-588)117185930", "(DE-588)118896202", 
                 "(DE-588)116764694", "(DE-588)118800000", "(DE-588)119133628"]

# Bibliography

Scifo, E. (2020). Hands-On Graph Analytics with Neo4j: Perform graph processing and visualization techniques using connected data across your enterprise. Birmingham, England: Packt Publishing.