## Network measures

### Local structures

**Indegree**
This is mostly a function of how Wikipedians revised the document and should largely be uniform across pages. The large values are likely pages with 'lists' of links.

**Outdegree**
This is 1st-order measure of an idea's influence.

### Mesoscale structures

**Clustering**
These look equally clustered among the topics.

**Centrality**
This reveals the distribution of sources of ideas within a field.

**Path lengths**

**Rich-club coefficient**

**Modularity**

**Controllability**
This is an nth-order measure of influence.

**Observability**
This is an nth-order measure of the inverse of influence.

**Coreness**
It seems that the more focused a topic is on a subtopic, the stronger the coreness. For example, genetics is heavily focused on DNA, and so it has high coreness. At the same time, in the field of economics, the concept of "economics" has high degree. Yet, it has low coreness because the field itself is heterogeneous, with major subfields such as "macroeconomics" and "microeconomics".

**Characteristic path length**
I'm not sure what path length reveals. Perhaps it is a measure of the heterogeneity in research? It describes how far one idea is to another, topologically. Cognitive science and earth science have ideas that are far away.

In [None]:
%reload_ext autoreload
%autoreload 2
import os,sys
sys.path.insert(1, os.path.join(sys.path[0], '..', 'module'))
import wiki
import numpy as np
import pandas as pd
import networkx as nx

In [None]:
path_analysis = '/Users/harangju/Developer/data/wiki/analysis/'
path_networks = '/Users/harangju/Developer/data/wiki/graphs/'

## Load networks

In [None]:
topics = ['anatomy', 'biochemistry', 'cognitive science', 'evolutionary biology',
          'genetics', 'immunology', 'molecular biology', 'chemistry', 'biophysics',
          'energy', 'optics', 'earth science', 'geology', 'meteorology',
          'philosophy of language', 'philosophy of law', 'philosophy of mind',
          'philosophy of science', 'economics', 'accounting', 'education',
          'linguistics', 'law', 'psychology', 'sociology', 'electronics',
          'software engineering', 'robotics',
          'calculus', 'geometry', 'abstract algebra',
          'Boolean algebra', 'commutative algebra', 'group theory', 'linear algebra',
          'number theory', 'dynamical systems and differential equations']

In [None]:
networks = {}
for topic in topics:
    print(topic, end=' ')
    networks[topic] = wiki.Net()
    networks[topic].load_graph(path_networks+'dated/'+topic+'.pickle')

In [None]:
num_nulls = 10
null_targets = {}
for topic in topics:
    print(topic, end=' ')
    null_targets[topic] = []
    for i in range(num_nulls):
        network = wiki.Net()
        network.load_graph(path_networks+'null-target/'+topic+'-null-'+str(i)+'.pickle')
        null_targets[topic].append(network)

## Run analysis

**NOTE:** Skip section if loading stats.

In [None]:
import bct
import pickle
from networkx.algorithms.cluster import clustering
from networkx.algorithms import betweenness_centrality
from networkx.convert_matrix import to_numpy_array

In [None]:
measures = {'indegree': lambda g: [x[1] for x in g.in_degree],
            'outdegree': lambda g: [x[1] for x in g.out_degree],
            'clustering': lambda g: list(clustering(g).values()),
            'centrality': lambda g: list(betweenness_centrality(g).values()),
#             'path-length': lambda g: [y for x in list(nx.shortest_path_length(g))
#                                       for y in list(x[1].values())],
            'char-path-length': lambda g: bct.charpath(to_numpy_array(g))[0],
            'modularity': lambda g: g.graph['modularity'],
            'coreness': lambda g: g.graph['coreness_be']}

In [None]:
df = pd.DataFrame(columns=['topic','measure','value'])
for topic, network in networks.items():
    print(topic, end=' ')
    df = pd.concat([df] +
                   [pd.DataFrame([[topic, measure, func(network.graph)]],
                                 columns=['topic','measure','value'])
                    for measure, func in {'coreness': measures['coreness']}.items()],#measures.items()],
                   ignore_index=True)

In [None]:
for topic, null_networks in null_targets.items():
    print(topic, end=' ')
    for network in null_networks:
        df = pd.concat([df] + 
                       [pd.DataFrame([[topic, measure+'-null', func(network.graph)]],
                                     columns=['topic','measure','value'])
                        for measure, func in measures.items()],
                       ignore_index=True)

In [None]:
df

## Save analysis

In [None]:
pickle.dump(df, open(path_analysis + 'stats.pickle','wb'))

In [None]:
df.topic = df.topic.astype('category')
df.measure = df.measure.astype('category')
df.dtypes

In [None]:
df_expand = df.value\
              .apply(pd.Series)\
              .merge(df, left_index=True, right_index=True)\
              .drop('value', axis=1)\
              .melt(id_vars=['topic','measure'])\
              .drop('variable', axis=1)\
              .dropna()
df_expand

In [None]:
pickle.dump(df_expand, open(path_analysis + 'stats_expand.pickle','wb'))

## Load analysis

In [None]:
import pickle
df = pickle.load(open(path_analysis+'stats.pickle', 'rb'))
df_expand = pickle.load(open(path_analysis+'stats_expand.pickle', 'rb'))

In [None]:
df

In [None]:
df.topic = df.topic.astype('object')
df.measure = df.measure.astype('object')
df.dtypes

In [None]:
df_expand.topic = df_expand.topic.astype('object')
df_expand.measure = df_expand.measure.astype('object')
df_expand.dtypes

In [None]:
pd.unique(df.topic)

In [None]:
pd.unique(df.measure)

In [None]:
df.dtypes

## Plot

* nice plots [seaborn](https://seaborn.pydata.org/examples/index.html)
* interactive [Bokeh](https://bokeh.pydata.org/en/latest/docs/gallery.html#gallery)

In [None]:
from ipywidgets import interact, widgets, Layout
import plotly
import plotly.express as px
import plotly.graph_objs as go
import plotly.figure_factory as ff
from IPython.display import display
plotly.offline.init_notebook_mode(connected=True)

### Static measures

In [None]:
import os

if not os.path.exists("static_measures"):
    os.mkdir("static_measures")

In [None]:
for stat in ['indegree', 'outdegree', 'clustering', 'centrality']:#, 'path-length']:
    fig = px.box(df_expand[(df_expand.measure==stat) | (df_expand.measure==stat+'-null-target')],
                 x='topic', y='value', color='measure')
    fig.update_layout(template='plotly_white',
                      yaxis_title=stat)
    fig.show()
    fig.write_image(f"static_measures/{stat}.pdf")

In [None]:
for measure in ['coreness', 'modularity', 'char-path-length']:
    fig = px.scatter(df_expand[(df_expand.measure==measure) |
                               (df_expand.measure==measure+'-null-target')],
                     x='topic', y='value', color='measure')
    fig.update_layout(template='plotly_white',
                      yaxis_title=measure)
    fig.show()
    fig.write_image(f"static_measures/{measure}.pdf")

In [None]:
data = df_expand\
        .groupby(['topic', 'measure'], as_index=False)\
        .mean()\
        .pivot(index='topic', columns='measure', values='value')\
        .reset_index()

In [None]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=data['coreness-null-target'],
                         y=data['coreness'],
                         mode='markers',
                         name='coreness',
                         hovertext=data['topic']))
fig.add_trace(go.Scatter(x=data['modularity-null-target'],
                         y=data['modularity'],
                         mode='markers',
                         name='modularity',
                         hovertext=data['topic']))
fig.add_trace(go.Scatter(x=[0,1], y=[0,1],
                         mode='lines',
                         line=dict(dash='dash'),
                         name='1:1'))
fig.update_layout(template='plotly_white',
                  width=500, height=500,
                  xaxis=dict(title='null',
                             range=[0,1]),
                  yaxis=dict(title='real',
                             range=[0,1],
                             scaleanchor='x',
                             scaleratio=1))
fig.show()
fig.write_image('static_measures/coreness_modularity.pdf')

### Measures in growing networks

In [None]:
comm_t = pd.DataFrame()
for topic, network in networks.items():
    print(topic, end=' ')
    comm_t = pd.concat([comm_t] +
                       [pd.DataFrame([[topic,
                                       node,
                                       network.graph.nodes[node]['year'],
                                       network.graph.nodes[node]['community'],
                                       network.graph.nodes[node]['core_be'],
                                       network.graph.nodes[node]['core_rb'],
                                       1]],
                                     columns=['topic','node','year',
                                              'comm','core_be','core_rb',
                                              'count'])
                        for node in network.graph.nodes],
                       ignore_index=True)
comm_t = comm_t.merge(comm_t.groupby(['topic','comm'])['count'].sum(),
                      on=['topic','comm'],
                      suffixes=('','_topic_comm'))\
               .merge(comm_t.groupby(['topic','core_be'])['count'].sum(),
                      on=['topic','core_be'],
                      suffixes=('','_topic_core_be'))\
               .sort_values(by=['topic','year'])\
               .reset_index(drop=True)
comm_t['comm_count'] = comm_t.groupby(['topic','comm'])['count']\
                             .transform(pd.Series.cumsum)
comm_t['core_be_count'] = comm_t.groupby(['topic','core_be'])['count']\
                                .transform(pd.Series.cumsum)
comm_t['comm_frac'] = comm_t['comm_count']/comm_t['count_topic_comm']
comm_t['core_be_frac'] = comm_t['core_be_count']/comm_t['count_topic_core_be']
comm_t = comm_t.drop(['count','count_topic_comm','count_topic_core_be'], axis=1)

In [None]:
comm_t

### Growth in core-periphery & modules

In [None]:
for topic in ['anatomy']:
    fig = go.Figure()
    x = comm_t[comm_t.topic==topic]
    fig.add_trace(go.Scatter(x=0,
                             y=0))
    fig.update_layout(template='plotly_white',
                      title_text=topic,
                      width=500, height=500,
                      xaxis={'range': [0,2020]})
    fig.show()

In [None]:
import os

if not os.path.exists("core_growth"):
    os.mkdir("core_growth")

In [None]:
for topic in networks.keys():
    fig = go.Figure()
    fig.add_trace(go.Scatter(x=comm_t[(comm_t.topic==topic) &\
                                      (comm_t.core_be==0)]['year'],
                             y=comm_t[(comm_t.topic==topic) &\
                                      (comm_t.core_be==0)]['core_be_count'],
                             name='periphery'))
    fig.add_trace(go.Scatter(x=comm_t[(comm_t.topic==topic) &\
                                      (comm_t.core_be==1)]['year'],
                             y=comm_t[(comm_t.topic==topic) &\
                                      (comm_t.core_be==1)]['core_be_count'],
                             name='core'))
    fig.update_layout(template='plotly_white',
                      title_text=topic,
                      xaxis={'range': [0,2020]})
    fig.show()
    fig.write_image(f"core_growth/{topic}.pdf")

In [None]:
for topic in networks.keys():
    fig = go.Figure()
    fig.add_trace(go.Scatter(x=comm_t[comm_t.topic==topic].year,
                             y=comm_t[comm_t.topic==topic].core_rb,
                             mode='markers',
                             marker={'size': 2}))
    fig.update_layout(template='plotly_white',
                      title_text=topic,
                      width=500, height=500,
                      xaxis={'title': 'year',
                             'range': [1000,2020]},
                      yaxis={'title': 'coreness'})#,
#                              'scaleanchor': 'x',
#                              'scaleratio': 1})
    fig.show()

In [None]:
for topic in networks.keys():
    fig = px.line(comm_t[comm_t.topic==topic],
                  x='year', y='comm_count', color='comm')
    fig.update_layout(template='plotly_white',
                      title_text=topic,
                      xaxis={'range': [0,2000]})
    fig.show()

### Birth: core vs. periphery

In [None]:
birth = pd.concat([pd.DataFrame([[comm_t.iloc[i].topic,
                                  comm_t.iloc[i].node,
                                  comm_t.iloc[i].year, 
                                  [c for c in 
                                   list(networks[comm_t.iloc[i].topic]\
                                        .graph.successors(comm_t.iloc[i].node)) + 
                                   list(networks[comm_t.iloc[i].topic]\
                                        .graph.predecessors(comm_t.iloc[i].node))
                                   if networks[comm_t.iloc[i].topic].graph.nodes[c]['core_be']]
                                 ]],
                                columns=['topic','periphery','year','cores'])
                   for i in range(len(comm_t.index))
                   if not comm_t.iloc[i].core_be],
                  ignore_index=True)
birth

In [None]:
birth_exp = birth.cores.apply(pd.Series)\
                 .merge(birth, left_index=True, right_index=True)\
                 .drop(['cores'], axis=1)\
                 .melt(id_vars=['topic','periphery','year'], value_name='core')\
                 .drop('variable', axis=1)\
                 .dropna()\
                 .sort_values(by=['topic','year', 'periphery'])\
                 .reset_index(drop=True)
birth_exp['core_year'] = [networks[birth_exp.iloc[i].topic].graph\
                          .nodes[birth_exp.iloc[i].core]['year']
                          for i in range(len(birth_exp.index))]
birth_exp['centrality'] = [df[df.topic==birth_exp.iloc[i].topic]\
                           [df.measure=='centrality'].value.values[0]\
                           [networks[birth_exp.iloc[i].topic].nodes.index(birth_exp.iloc[i].core)]
                           for i in range(len(birth_exp.index))]
birth_exp['indegree'] = [df[df.topic==birth_exp.iloc[i].topic]\
                           [df.measure=='indegree'].value.values[0]\
                           [networks[birth_exp.iloc[i].topic].nodes.index(birth_exp.iloc[i].core)]
                           for i in range(len(birth_exp.index))]
birth_exp['outdegree'] = [df[df.topic==birth_exp.iloc[i].topic]\
                           [df.measure=='outdegree'].value.values[0]\
                           [networks[birth_exp.iloc[i].topic].nodes.index(birth_exp.iloc[i].core)]
                           for i in range(len(birth_exp.index))]

In [None]:
birth_exp

In [None]:
import os

if not os.path.exists("periphery_v_core"):
    os.mkdir("periphery_v_core")

In [None]:
for topic in networks.keys():
    fig = go.Figure()
    fig.add_trace(go.Scatter(x=birth_exp[birth_exp.topic==topic].year,
                             y=birth_exp[birth_exp.topic==topic].core_year,
                             mode='markers',
                             marker={'size': 2}))
#     fig = px.scatter(birth_exp[birth_exp.topic==topic],
#                      x='year', y='core_year', color='outdegree')
    fig.update_layout(template='plotly_white',
                      title_text=topic,
                      width=500, height=500,
                      xaxis={'title': 'year (periphery)',
                             'range': [np.min(birth_exp.year), np.max(birth_exp.year)]},
                      yaxis={'title': 'year (neighboring core)',
                             'range': [np.min(birth_exp.year), np.max(birth_exp.year)],
                             'scaleanchor': 'x',
                             'scaleratio': 1})
    fig.show()
    fig.write_image(f"periphery_v_core/{topic}.pdf")

### Cores in communities

In [None]:
comm_core = pd.concat([pd.DataFrame([[topic,
                                      node,
                                      network.graph.nodes[node]['year'],
                                      network.graph.nodes[node]['community'],
                                      network.graph.nodes[node]['community_core_be'],
                                      1 if network.graph.nodes[node]['community_core_be']==0 else 0,
                                      network.graph.graph['community_coreness_be']\
                                          [network.graph.nodes[node]['community']],
                                      1
                                     ]],
                                    columns=['topic','node','year','community','community_core',
                                             'community_peri','community_coreness','count'])
                       for topic, network in networks.items()
                       for node in network.graph.nodes],
                      ignore_index=True)\
              .sort_values(by='year')
comm_core = comm_core\
              .merge(comm_core.groupby(['community'])['count'].sum(),
                     on=['community'],
                     suffixes=('','_sum'))
comm_core['core_count'] = comm_core.groupby(['community'])['community_core']\
                                             .transform(pd.Series.cumsum)
comm_core['peri_count'] = comm_core.groupby(['community'])['community_peri']\
                                             .transform(pd.Series.cumsum)
comm_core = comm_core.drop(['count', 'count_sum', 'community_core', 'community_peri'], axis=1)
comm_core

In [None]:
for i in range(4):#range(max([graph.nodes[n]['community'] for n in graph.nodes]) + 1):
    fig = go.Figure()
    data = comm_core[comm_core.community==i]
    fig.add_trace(go.Scatter(x=data['year'],
                             y=data['core_count'],
                             mode='lines',
                             name='# cores'))
    fig.add_trace(go.Scatter(x=data['year'],
                             y=data['peri_count'],
                             mode='lines',
                             name='# periphery'))
    fig.update_layout(template='plotly_white',
                      title_text=f"community {i+1}",
                      xaxis={'range': [1500,2030]},
                      height=400)
    fig.show()

In [None]:
import os

if not os.path.exists("core_over_periphery"):
    os.mkdir("core_over_periphery")

In [None]:
for topic, network in networks.items():
    fig = go.Figure()
    for i in range(5):
        data = comm_core[(comm_core.topic==topic) & (comm_core.community==i)]
        fig.add_trace(go.Scatter(x=data['year'],
                                 y=(data['core_count']/np.max(data['core_count']))\
                                     /(data['peri_count']/np.max(data['peri_count'])),
                                 mode='lines',
                                 name=f"community {i}"))
    fig.add_trace(go.Scatter(x=[np.min(comm_core.year),np.max(comm_core.year)],
                             y=[1,1],
                             mode='lines',
                             line={'dash': 'dash'},
                             name='equal'))
    fig.update_layout(template='plotly_white',
                      title_text=f"topic: {topic}",
                      xaxis={'range': [1500,2030]},
                      yaxis={'title': '% cores/% periphery'},
                      height=400)
    fig.show()
    fig.write_image(f"core_over_periphery/{topic}.pdf")

**Note**: We're more explorers than formulists.