# 5. Network Construction and Visualization

graphs and networks

options in python and beyond

why networkx ... open a browser window https://networkx.org/documentation/stable/reference/index.html to ...

Four construction types

Network drawing: goal to visualize much information (Krempel book)

...

**Python textbooks using networkx**

Caldarelli & Chessa 2016. Data Science and Complex Networks: Real Case Studies with Python. https://doi.org/10.1093/acprof:oso/9780199639601.001.0001. Code: https://github.com/datascienceandcomplexnetworks.

Platt 2019. Network Science with Python and NetworkX Quick Start Guide. https://www.packtpub.com/product/network-science-with-python-and-networkx-quick-start-guide/9781789955316. Code: https://github.com/PacktPublishing/Network-Science-with-Python-and-NetworkX-Quick-Start-Guide.

Menczer et al. 2020. A First Course in Network Science. https://doi.org/10.1017/9781108653947. Code: https://cambridgeuniversitypress.github.io/FirstCourseNetworkScience/.

Ma & Seth 2022. Network Analysis Made Simple. https://leanpub.com/nams. Code: https://ericmjl.github.io/Network-Analysis-Made-Simple/.

...

**Other textbooks**

Newman FOR ERRORS

Barabasi

Wasserman & Faust

Zweig

Borgatti & Everett

...

In [None]:
## nodelists and edgelists
### attributes
#nx.from_pandas_edgelist()
#g.nodes()
#g.edges()
#g.number_of_nodes()
#g.number_of_edges()
#
## copenhagen network data
#
## 1 static
#
## plotting
## spring embedding
### connected components
#nx.is_connected()
#
## 2 discrete dynamic
### slice by timestamp (sekara paper)
### slice by degree
#
## 3 multilayer
### from variable
### from data via clustering (node2vec?)
#nx.MultiGraph()
#
## 4 multimodal
### tweetskb
### 2-partite
### k-partite
### pandas to scipy
#
## import/export

TODO: LINK CLASSES TO NETWORKX WEBSITE

## 5.1. Static networks

...

### 5.1.1. Constructing from scratch

In [None]:
import networkx as nx

In [None]:
# Nodes can be added either node by node ...
G = nx.Graph()
G.add_node('Peter')
G.add_node('Paul')
G.add_node('Mary')
G.nodes()

In [None]:
# or they can be added at once
G = nx.Graph()
G.add_nodes_from(['Peter', 'Paul', 'Mary'])
G.nodes()

In [None]:
# Similarly, edges can be added edge by edge ...
G.add_edge('Peter', 'Paul')
G.add_edge('Peter', 'Mary')
G.add_edge('Paul', 'Mary')
G.edges()

In [None]:
# or all at once
G.add_edges_from([['Peter', 'Paul'], ['Peter', 'Mary'], ['Paul', 'Mary']])
G.edges()

In [None]:
#nx.Graph({'Peter': ('Paul', 'Mary'), 'Paul': 'Mary'})

### 5.1.2. Constructing from Pandas dataframes

In [None]:
import pandas as pd

In [None]:
# Suppose we have a list of nodes with attributes
nodelist = pd.DataFrame([['Peter', 100, 'red'], ['Paul', 300, 'green'], ['Mary', 500, 'blue']], columns=['name', 'size', 'color'])
#nodelist = pd.DataFrame(['Peter', 'Paul', 'Mary'], columns=['name'])
nodelist

In [None]:
# Add nodes from list
G = nx.Graph()
G.add_nodes_from(nodelist['name'].tolist())
#G.add_nodes_from(nodelist.index.tolist())
G.nodes()

In [None]:
# Suppose we also have a list of edges with attributes
edgelist = pd.DataFrame([['Peter', 'Paul', 1, 'red', 'solid'], ['Peter', 'Mary', 2, 'red', 'solid'], ['Paul', 'Mary', 3, 'green', 'dashed']], columns=['source', 'target', 'weight', 'color', 'style'])
edgelist

In [None]:
# Add edges from list
#G = nx.from_pandas_edgelist(df=edgelist, source='source', target='target', edge_attr=['width', 'color'])
G = nx.from_pandas_edgelist(df=edgelist, source='source', target='target')
G.edges()

In [None]:
# Some users may store their networks as adjacency matrices
matrix = pd.DataFrame([[0, 1, 2], [0, 0, 3], [0, 0, 0]], index=['Peter', 'Paul', 'Mary'], columns=['Peter', 'Paul', 'Mary'])
matrix

In [None]:
# Add edges from adjacency matrix
G = nx.Graph()
G = nx.from_pandas_adjacency(matrix)
G.edges(data=True)

Note that `from_pandas_adjacency()` directly adds edge weights stored in the cells of the adjacency matrix. To utilize them for network drawing, they must be transformed into a list:

In [None]:
edge_weight = list(nx.get_edge_attributes(G, 'weight').values())

This list is then called in the `draw()` function:

In [None]:
nx.draw(G=G, with_labels=True, width=edge_weight)

### 5.1.3. Internalizing attributes
While it is nice that edge weights can be stored in the adjacency matrix and easily added to the graph, the matrix format wates memory because -- other than in the edgelist -- zeros are also stored. The matrix format can also store only one edge attribute whereas the edgelist can store as many as you want. In fact, edge weights do not have to be internalized in the graph object. Here, we call all node and edge attributes from the node and edge lists:

In [None]:
nx.draw(
    G=G, 
    with_labels=True, 
    node_size=nodelist['size'], 
    node_color=nodelist['color'], 
    width=edgelist['weight'], 
    edge_color=edgelist['color'], 
    style=edgelist['style']
)

Still, it might be a good idea to internalize attributes -- to store them in the graph object. For example, you may prefer to keep track of just one object and not an additional object for each attribute. Or you may want to store graph and attributes in a pickle file to use transfer a preprocessed graph to another notebook.

In [None]:
# Add edges with attributes from edgelist
G = nx.Graph()
G = nx.from_pandas_edgelist(df=edgelist, source='source', target='target', edge_attr=['weight', 'color'])
G.edges(data=True)

Once you have read an edge list and internalized the edge attributes, you can still add more attributes. However, this is relatively complicated, it requires some extra steps that are nest put into a function:

In [None]:
# Add additional edge attribute (style)
def get_edge_attr_values(df, edge_attr, source='source', target='target'):
    '''
        Bla
    '''
    df = pd.DataFrame([edgelist[[edge_attr]].to_dict('records')]).T
    df.index = list(zip(edgelist[source], edgelist[target]))
    return list(df.to_dict().values())[0]
style = get_edge_attr_values(edgelist, 'style')
style

The additional edge attributes created with the `get_edge_attr_values()` function can then be internalized:

In [None]:
nx.set_edge_attributes(G, style)

In [None]:
G.edges(data=True)

For node attributes there is no function that directly adds them from the nodelist. We have to write our own little function `get_node_attr_values()` to take node attributes from the nodelist and store them in a way that NetworkX needs. The function works attribute by attribute:

In [None]:
# Internalize node attributes
def get_node_attr_values(df, node, node_attr):
    '''
        Bla
    '''
    return dict(zip(df[node], df[node_attr]))
size = get_node_attr_values(nodelist, 'name', 'size')
color = get_node_attr_values(nodelist, 'name', 'color')

In [None]:
nx.set_node_attributes(G, size, 'size')
nx.set_node_attributes(G, color, 'color')

In [None]:
G.nodes(data=True)

Finally, we can also add an attribute to the network itself, usually just a description:

In [None]:
G.graph['name'] = 'Toy example'
G.graph

### 5.1.4. Drawing networks with internal attributes
To draw the network with all the internalized attributes (all but the graph attribute), we must transform them to lists:

In [None]:
# Transform internal attributes back into lists that can be used for network drawing
node_size = list(nx.get_node_attributes(G, 'size').values())
node_color = list(nx.get_node_attributes(G, 'color').values())
edge_weight = list(nx.get_edge_attributes(G, 'weight').values())
edge_color = list(nx.get_edge_attributes(G, 'color').values())
edge_style = list(nx.get_edge_attributes(G, 'style').values())

In [None]:
nx.draw(
    G=G, 
    with_labels=True, 
    node_size=node_size, 
    node_color=node_color, 
    width=edge_weight, 
    edge_color=edge_color, 
    style=edge_style
)

To repeatedly draw the same graph with the same node positions, it is useful to also internalize those:

In [None]:
pos = nx.circular_layout(G)

In [None]:
nx.set_node_attributes(G, pos, 'pos')

In [None]:
node_pos = nx.get_node_attributes(G, 'pos')

Note that the input for the `pos` parameter in the `draw()` function must be a dictionary, not as list as for most other parameters:

In [None]:
nx.draw(
    G=G, 
    pos=node_pos, 
    with_labels=True, 
    node_size=node_size, 
    node_color=node_color, 
    width=edge_weight, 
    edge_color=edge_color, 
    style=edge_style
)

### 5.1.5. Directed networks

...

<div class='alert-info'>
<big><b>Nodes as labels or integers?</b></big>

NetworkX can handle any definition of nodes, whether they are strings, integers, or even both. This makes NetworkX easier to use. But you may still want to use integers $\{0, 1, ..., N\}$ to specify your nodes in a more formal way where $N$ is the number of nodes. If you want that, you can use the index of your nodelist as node labels and use those integers in the edgelist. More information is [here](https://networkx.org/documentation/stable/tutorial.html#what-to-use-as-nodes-and-edges).
</div>

### 5.1.6. The Copenhagen Networks Study interaction data
On the data ...

https://doi.org/10.1038/s41597-019-0325-x

https://doi.org/10.6084/m9.figshare.11283407

Load data from figshare into the `data/cns/` folder.

The Copenhagen_Networks_Study_Notebook demonstrated how to load the data and do some visualizations...

In [None]:
nodelist_cns = pd.read_csv('data/cns/genders.csv')
nodelist_cns

The nodelist contains 787 rows, and the largest user number is 847, so we use user numbers as labels in the standard NetworkX way:

In [None]:
G_cns = nx.Graph()
G_cns.add_nodes_from(nodelist_cns['# user'].tolist())
female = get_node_attr_values(nodelist_cns, '# user', 'female')
nx.set_node_attributes(G_cns, female, 'female')
G_cns.nodes(data=True)

We start with the Facebook friendship relationships because the resulting graph is simple (i.e., undirected and unweighted):

In [None]:
edgelist_cns_fb = pd.read_csv('data/cns/fb_friends.csv')
edgelist_cns_fb

In [None]:
print(open('data/cns/fb_friends.README', 'r').read())

In [None]:
# TALK ABOUT INHERITANCE OF CHANGES
G_cns_fb = G_cns.copy()

In [None]:
G_cns_fb = nx.from_pandas_edgelist(df=edgelist_cns_fb, source='# user_a', target='user_b')
G_cns_fb.edges()

### 5.1.7. Layouting networks

...

adjust window size using matplotlib

CHECK IF NODES AND FEMALE ATTRIBUTES MATCH

...

FIX VERSION CONFLICT:

In [None]:
#nx.draw(G_cns_fb)

## 5.2. Network snapshots from link streams

Link streams are ... can originate in ...

Exogenous time vs. endogenous time ...

...

### 5.2.1. Aggregating edges by clock

Sekara paper: https://doi.org/10.1073/pnas.1602803113

...

In [None]:
edgelist_cns_bt = pd.read_csv('data/cns/bt_symmetric.csv')
edgelist_cns_bt

In [None]:
print(open('data/cns/bt_symmetric.README', 'r').read())

How Bluetooth works ... scans ... experiment devides ... RSSI ...

The Sekara paper reports (OR DOES IT REPORT THAT THESE ARE SECONDS IN % MINUTE INTERVALS?) that Bluetooth searches were performed every five minutes. By transforming `# timestamp` into a categorical variable, we see that time is measured in seconds. ...

4 weeks consist of 8,064 intervals of 5 minutes.

In [None]:
# '# timestamp'
edgelist_cns_bt['# timestamp'].astype('category').cat.categories

In [None]:
# Transform `rssi` into signal strength between 0 and 8063
edgelist_cns_bt['# timestamp'] = edgelist_cns_bt['# timestamp'].astype('category').cat.codes
edgelist_cns_bt.rename(columns={'# timestamp': 'time'}, inplace=True)
edgelist_cns_bt['time'].max()

In [None]:
# Remove (non-experiment devides) & (four errors)
edgelist_cns_bt = edgelist_cns_bt[(edgelist_cns_bt['user_b'] >= 0) & (edgelist_cns_bt['rssi'] < 0)].reset_index(drop=True)

In [None]:
# Transform `rssi` into signal strength between 0 and 92
edgelist_cns_bt['rssi'] = edgelist_cns_bt['rssi']+100
edgelist_cns_bt.rename(columns={'rssi': 'strength'}, inplace=True)
edgelist_cns_bt['strength'].max()

In [None]:
edgelist_cns_bt

In [None]:
edgelist_cns_bt['strength'].hist()

In [None]:
# Construct networks just from edgelist (i.e., isolated persons will be missing)
def aggregate_edges(df, time, source, target, weight, time_zero, time_window, fun, directed=False):
    '''
        Bla
    '''
    if weight == None:
        weight = 'weight'
        df[weight] = 1
    if fun == 'max':
        df_agg = df[df[time].between(time_zero, time_zero+time_window-1)].groupby([source, target]).max().reset_index()[[source, target, weight]]
    if fun == 'sum':
        df_agg = df[df[time].between(time_zero, time_zero+time_window-1)].groupby([source, target]).sum().reset_index()[[source, target, weight]]
    if directed == False:
        graph_type = nx.Graph
    else:
        graph_type = nx.DiGraph
    G_agg = nx.from_pandas_edgelist(
        df=df_agg, 
        source=source, 
        target=target, 
        edge_attr=weight, 
        create_using=graph_type
    )
    return df_agg, G_agg

In [None]:
import matplotlib.pyplot as plt

In [None]:
time_zero = 0
time_windows = [3, 36, 288]
window_labels = ['15 minutes', '3 hours', '1 day']

fig, axs = plt.subplots(1, len(time_windows), figsize=(15, 5))
for i in range(len(time_windows)):
    _, G_agg = aggregate_edges(
        df=edgelist_cns_bt, 
        time='time', 
        source='user_a', 
        target='user_b', 
        weight='strength', 
        time_zero=time_zero, 
        time_window=time_windows[i], 
        fun='max'
    )
    axs[i].set_title('Edges aggregated over '+window_labels[i])
    nx.draw(
        G=G_agg, 
        ax=axs[i], 
        node_size=5, 
        width=[strength/20. for strength in list(nx.get_edge_attributes(G_agg, 'strength').values())]
    )

INTERPRET ABOVE RESULT IN LIGHT OF SEKARA PAPER

28 1-day snapshots:

In [None]:
#time_zero = 0
#time_window = 288
#l_df_cns_bt = []
#l_G_cns_bt = []
#
##fig, axs = plt.subplots(4, 7, figsize=(14, 8))
#for i in range(28):
#    df_agg, G_agg = aggregate_edges(
#        df=edgelist_cns_bt, 
#        time='time', 
#        source='user_a', 
#        target='user_b', 
#        weight='strength', 
#        time_zero=time_zero, 
#        time_window=time_window, 
#        fun='max'
#    )
#    l_df_cns_bt.append(df_agg)
#    l_G_cns_bt.append(G_agg)
#    time_zero += time_window
#    #axs[i].set_title('Edges aggregated for day '+str(i))
#    nx.draw(
#        G=G_agg, 
#        #ax=axs[i], 
#        node_size=5, 
#        width=[strength/20. for strength in list(nx.get_edge_attributes(G_agg, 'strength').values())]
#    )

### 5.2.2. Aggregating edges by connectivity

...

In [None]:
#k_threshold = 10.
#time_zero = 0
#l_df_cns_bt = []
#l_G_cns_bt = []
#
#while time_zero < edgelist_cns_bt['time'].max():
#    print(time_zero)
#    k_mean = 0.
#    time_window = 1
#    while k_mean < k_threshold:
#        #print(time_window)
#        df_agg, G_agg = aggregate_edges(
#            df=edgelist_cns_bt, 
#            time='time', 
#            source='user_a', 
#            target='user_b', 
#            weight='strength', 
#            time_zero=time_zero, 
#            time_window=time_window, 
#            fun='max'
#        )
#        k_mean = sum(dict(dict(G_agg.degree())).values())/G_agg.number_of_nodes()
#        time_window += 1
#        if time_zero + time_window > edgelist_cns_bt['time'].max(): break
#    l_df_cns_bt.append(df_agg)
#    l_G_cns_bt.append(G_agg)
#    time_zero += time_window

Giant connected components in the first day:

In [None]:
time_zero = 0
l_df_cns_bt = []
l_G_cns_bt = []

while time_zero < 288: # edgelist_cns_bt['time'].max()
    #print(time_zero)
    P = 0.
    time_window = 1
    while P < .5:
        #print(time_window)
        df_agg, G_agg = aggregate_edges(
            df=edgelist_cns_bt, 
            time='time', 
            source='user_a', 
            target='user_b', 
            weight='strength', 
            time_zero=time_zero, 
            time_window=time_window, 
            fun='max'
        )
        P = G_agg.subgraph(sorted(nx.connected_components(G_agg), key=len, reverse=True)[0]).number_of_nodes()/G_agg.number_of_nodes()
        time_window += 1
        if time_zero + time_window > edgelist_cns_bt['time'].max(): break
    l_df_cns_bt.append(df_agg)
    l_G_cns_bt.append(G_agg)
    time_zero += time_window

In [None]:
len(l_G_cns_bt)

In [None]:
fig, axs = plt.subplots(1, 4, figsize=(16, 4))
for i in range(4):
    axs[i].set_title('Snapshot '+str(i))
    nx.draw(
        G=l_G_cns_bt[i], 
        ax=axs[i], 
        node_size=5, 
        width=[strength/20. for strength in list(nx.get_edge_attributes(l_G_cns_bt[i], 'strength').values())]
    )

## 5.3. Multilayer networks

Meaning of multiple layers ... literature

In NetworkX, this is realized by creating parellel edges ...

...

Two layers from CNS: calls and SMS ... directed edges

Add an edge attribute for the layer and combine both edge lists into one.

...

1st week

In [None]:
edgelist_cns_calls = pd.read_csv('data/cns/calls.csv')
edgelist_cns_calls

In [None]:
print(open('data/cns/calls.README', 'r').read())

In [None]:
# Remove missed calls
edgelist_cns_calls = edgelist_cns_calls[edgelist_cns_calls['duration'] > 0]

In [None]:
edgelist_cns_sms = pd.read_csv('data/cns/sms.csv')
edgelist_cns_sms

In [None]:
print(open('data/cns/sms.README', 'r').read())

Timestamps are seconds after beginning of experiment ... First week ends after $60*60*24*7=604800$ seconds. ... As edge weights we want to use the summed durations and numbers of short messages, respectively. That means, we use the `sum` function when aggregating edges:

In [None]:
edgelist_cns_calls_week1, _ = aggregate_edges(
    df=edgelist_cns_calls, 
    time='timestamp', 
    source='caller', 
    target='callee', 
    weight='duration', 
    time_zero=0, 
    time_window=604800, 
    fun='sum', 
    directed=True
)

In [None]:
edgelist_cns_calls_week1

No weight for short messages, set to None, function sets a 1 ...

In [None]:
edgelist_cns_sms_week1, _ = aggregate_edges(
    df=edgelist_cns_sms, 
    time='timestamp', 
    source='sender', 
    target='recipient', 
    weight=None, 
    time_zero=0, 
    time_window=604800, 
    fun='sum', 
    directed=True
)

In [None]:
edgelist_cns_sms_week1

Edge weights for calls and short messages differ in an order of magnitude:

In [None]:
edgelist_cns_calls_week1['duration'].max()

In [None]:
edgelist_cns_sms_week1['weight'].max()

Therefore, we take the natural logarithm of both scores. For this, we need the NumPy scientific computing package:

In [None]:
import numpy as np

In [None]:
edgelist_cns_calls_week1['duration'] = round(np.log(edgelist_cns_calls_week1['duration']) + 1, 2)
edgelist_cns_sms_week1['weight'] = round(np.log(edgelist_cns_sms_week1['weight']) + 1, 2)

Before combining both edgelists into one, we harmonize their column names:

In [None]:
edgelist_cns_calls_week1.columns = ['source', 'target', 'weight']
edgelist_cns_sms_week1.columns = ['source', 'target', 'weight']

Finally, we add layer attributes:

In [None]:
edgelist_cns_calls_week1['layer'] = 0
edgelist_cns_sms_week1['layer'] = 1

Now that the two edge lists are ready, we concatenate them, resetting the index and dropping the old indices:

In [None]:
edgelist_cns_mobile_week1 = pd.concat([edgelist_cns_calls_week1, edgelist_cns_sms_week1]).reset_index(drop=True)
edgelist_cns_mobile_week1

Use two parameters in `from_pandas_edgelist()` function ...

In [None]:
G_cns_mobile_week1 = nx.from_pandas_edgelist(
    df=edgelist_cns_mobile_week1, 
    source='source', 
    target='target', 
    edge_attr='weight', 
    create_using=nx.MultiDiGraph, 
    edge_key='layer'
)

Inspect the edges ... `keys=True` ... third integer is the layer attribute

In [None]:
G_cns_mobile_week1.edges(data=True, keys=True)

Alternatively:

In [None]:
G_cns_mobile_week1.edges.keys()

Again we draw the network using the `draw()` method attached to the graph object:

In [None]:
plt.figure(figsize=(8, 8))
nx.draw(
    G=G_cns_mobile_week1, 
    #pos=pos_cns_mobile_week1, 
    node_size=20, 
    width=[strength/5. for strength in list(nx.get_edge_attributes(G_cns_mobile_week1, 'weight').values())], 
    edge_color=[key for (u, v, key) in G_cns_mobile_week1.edges.keys()]
)

## 5.4. Sophisticated network drawing

The `draw()` method is not convincing for all purposes (e.g., multilayer networks). `draw()` is a so-called wrapper that calls multiple methods to draw nodes, links, and labels step by step. But the wrapper does not unlock the full potential of NetworkX. We will go beyond what the wrapper can do in the following steps. First, we want to influence they way nodes are positioned. There are a few layouting algorithms, and we start with **spring embedding**. From the NetworkX [documentation](https://networkx.org/documentation/stable/reference/generated/networkx.drawing.layout.spring_layout.html):

> The algorithm simulates a force-directed representation of the network treating edges as springs holding nodes close, while treating nodes as repelling objects, sometimes called an anti-gravity force. Simulation continues until the positions are close to an equilibrium.

The result of such an algorithm is the placing of nodes in a usually 2-dimensional space where axes have no interpretable meaning. The spring embedder used in NetworkX by default was developed by Fruchterman and Reingold:

In [None]:
# https://stackoverflow.com/questions/14943439/how-to-draw-multigraph-in-networkx-using-matplotlib-or-graphviz

In [None]:
pos_cns_mobile_week1 = nx.spring_layout(G_cns_mobile_week1) # Same as nx.fruchterman_reingold_layout(G_cns_mobile_week1)

In [None]:
nx.draw(
    G=G_cns_mobile_week1, 
    pos=pos_cns_mobile_week1, 
    node_size=20
)

To move on with our discussion, we introduce the graph-theoretical concept of the [connected component](https://en.wikipedia.org/wiki/Component_(graph_theory)), a subgraph in which the nodes are reachable via edges. Since we are dealing with a directed graph, there are two kind of components. In a strongly connected component, all nodes are mutually reachable taking the direction of edges into account. In a weakly connected component, they are reachable not taking edge directions info account. NetworkX provides functions for [strong](https://networkx.org/documentation/stable/reference/algorithms/component.html#strong-connectivity) and [weak](https://networkx.org/documentation/stable/reference/algorithms/component.html#weak-connectivity) connectivity.

Since nodes repell each other, the spring embedder has a desirable result: Components are not drawn on top of each other. We see that the largest component is depicted in the center of the figure, and smaller components are drawn to the periphery. However, there are also many very long edges, and it is not clear to which component the belong. Hence, our first step is to only draw the largest component: the large weakly connected component in the center of the figure.

The whole graph is not weakly connected:

In [None]:
nx.is_weakly_connected(G_cns_mobile_week1)

There are that many weakly connected components:

In [None]:
nx.number_weakly_connected_components(G_cns_mobile_week1)

This is the ordered list of node sets that make up those components (the largest comes first):

In [None]:
l_wcc = sorted(nx.weakly_connected_components(G_cns_mobile_week1), key=len, reverse=True)
l_wcc

We can extract the largest weakly connected component from the graph by using the `subgraph()` method on the original graph, extracting the first node set in the list:

In [None]:
G_cns_mobile_week1_lwcc = G_cns_mobile_week1.subgraph(l_wcc[0])

In [None]:
nx.draw(
    G=G_cns_mobile_week1_lwcc, 
    pos=pos_cns_mobile_week1, 
    node_size=20
)

The [Fruchterman-Reingold algorithm](https://networkx.org/documentation/stable/reference/generated/networkx.drawing.layout.spring_layout.html) has a few parameters that we must know when layouting a graph. First of all, the `weight` of edges is assumed to be stored as an edge attribute called 'weight'. Since this is true in our case, we do not have to specify it manually (if you do not want to use edge weights, set `weight=None`). Parameter `k` can be changed to influence the distance between nodes. The number of `iterations` can be tuned when layouts have not converged yet to an equilibrium which can be the case when graphs are large. Finally, layouting can be initialized with an existing layout, specified by the `pos`parameter, to have visual continuity.

Now experiment with changing parameter settings:

In [None]:
pos_cns_mobile_week1_lwcc = nx.spring_layout(
    G=G_cns_mobile_week1_lwcc, 
    k=None, # (default=None)
    pos=pos_cns_mobile_week1, # (default=None)
    iterations=50, # (default=50)
    weight='weight' # (default='weight')
)

nx.draw(
    G=G_cns_mobile_week1_lwcc, 
    pos=pos_cns_mobile_week1_lwcc, 
    node_size=20
)

Are you also not quite happy with the result? Nodes in clusters tend to be placed on top of each other and there are some very long edge that confuse the whole picture. Let us try another standard layout algorithm. The [algorithm by Kamada and Kawai](https://networkx.org/documentation/stable/reference/generated/networkx.drawing.layout.kamada_kawai_layout.html) discards edge directions and places nodes far away from each other if they are connected by long sequences of edges, but it layouts component by component and stacks them all on top of each other:

In [None]:
nx.draw(
    G=G_cns_mobile_week1, 
    pos=nx.kamada_kawai_layout(G_cns_mobile_week1), 
    node_size=20
)

Hency, we use the algorithm on the largest component. The algorithm hardly requires parameter tuning, although there are [options](https://networkx.org/documentation/stable/reference/generated/networkx.drawing.layout.kamada_kawai_layout.html).

In [None]:
pos_cns_mobile_week1_lwcc = nx.kamada_kawai_layout(
    G=G_cns_mobile_week1_lwcc, 
    #pos=pos_cns_mobile_week1, # (default=None)
    weight='weight' # (default='weight')
)

nx.draw(
    G=G_cns_mobile_week1_lwcc, 
    pos=pos_cns_mobile_week1_lwcc, 
    node_size=20
)

The layout uncovers that the largest component has a very stringy nature and that there are hardly any densely connected groups.

IMPROVE THE FOLLOWING LAYOUT -- PARALLEL EDGES SHOULD NOT BE PLACED ON TOP OF EACH OTHER, USE SPECIFIABLE COLORS FOR DIFFERENT EDGE KEYS, DISPLAY EDGE WIDTH BY WEIGHT, NOTE THE URL(S) THAT GAVE THE ANSWER(S):

https://stackoverflow.com/questions/60067022/multidigraph-edges-from-networkx-draw-with-connectionstyle

https://stackoverflow.com/questions/22785849/drawing-multiple-edges-between-two-nodes-with-networkx

https://stackoverflow.com/questions/14943439/how-to-draw-multigraph-in-networkx-using-matplotlib-or-graphviz

...

In [None]:
plt.figure(figsize=(8, 8))
# Draw nodes
nx.draw_networkx_nodes(
    G=G_cns_mobile_week1_lwcc, 
    pos=pos_cns_mobile_week1_lwcc, 
    node_size=20, 
    #node_color='black', 
    #node_shape='o'
)
## Draw edges
#nx.draw_networkx_edges(
#    G=G_cns_mobile_week1_lwcc, 
#    pos=pos_cns_mobile_week1_lwcc, 
#    width=[strength/5. for strength in list(nx.get_edge_attributes(G_cns_mobile_week1, 'weight').values())], 
#    edge_color=[key for (u, v, key) in G_cns_mobile_week1.edges.keys()], 
#    connectionstyle='arc3, rad=.1'
#)
ax = plt.gca()
for e in G_cns_mobile_week1_lwcc.edges:
    ax.annotate(
        '', 
        xy=pos_cns_mobile_week1_lwcc[e[0]], 
        #xycoords='data', 
        xytext=pos_cns_mobile_week1_lwcc[e[1]], 
        #textcoords='data', 
        arrowprops=dict(
            arrowstyle='->', 
            #color='gray', 
            #shrinkA=5, 
            #shrinkB=5, 
            patchA=None, 
            patchB=None, 
            connectionstyle='arc3, rad=rrr'.replace('rrr', str(0.3*e[2]))
        )
    )
## Label nodes
#nx.draw_networkx_labels(
#    G=G_cns_mobile_week1_lwcc, 
#    pos=pos_cns_mobile_week1_lwcc, 
#    font_color='black'
#)
plt.axis('off') # Toggle off box around figure
plt.show() # ...

USE DEGREE AS NODE AND FONT SIZE

https://stackoverflow.com/questions/62649745/is-it-possible-to-change-font-sizes-according-to-node-sizes

## 5.5. Multimodal networks

Multimodal ... rich data ...

Matrix multiplication

### 5.5.1. Bipartite networks and their projections

https://doi.org/10.1093/sf/53.2.181

...

In [None]:
matrix_davis = pd.DataFrame(
    data=[
        [0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0], 
        [0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0], 
        [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0], 
        [0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0], 
        [1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0], 
        [1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0], 
        [0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0], 
        [0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0], 
        [0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0], 
        [0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0], 
        [0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1], 
        [0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1], 
        [0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0], 
        [0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0], 
        [0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0], 
        [0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0], 
        [1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0], 
        [1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1]
    ], 
    index=['Eleanor', 'Brenda', 'Dorothy', 'Verne', 'Flora', 'Olivia', 'Laura', 'Evelyn', 'Pearl', 'Ruth', 'Sylvia', 'Katherine', 'Myrna', 'Theresa', 'Charlotte', 'Frances', 'Helen', 'Nora'], 
    columns=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
)

The `from_pandas_adjacency()` function is made for adjacency matrices, not for bipartite matrices. We must transform the matrix into an edgelist first, then use the `from_pandas_edgelist()` function:

In [None]:
#matrix_davis = matrix_davis/matrix_davis.sum(axis=0) # Do column normalization
edgelist_davis = matrix_davis.stack().reset_index() # transform to edgelist
edgelist_davis = edgelist_davis[edgelist_davis[0] > 0].reset_index(drop=True) # remove zero relations
#edgelist_davis.drop(labels=0, axis=1, inplace=True) # drop edge weight column
edgelist_davis.columns = ['woman', 'event', 'weight'] # rename columns
edgelist_davis

In [None]:
G_davis = nx.from_pandas_edgelist(df=edgelist_davis, source='woman', target='event')

In [None]:
nx.is_bipartite(G_davis)

To distinguish between the two node types and to project the bipartite to either unipartite network, we must identify the two sets:

In [None]:
nodes_women = list(nx.bipartite.sets(G_davis)[0])
print('Women:', nodes_women)
nodes_events = list(nx.bipartite.sets(G_davis)[1])
print('Events:', nodes_events)

When the bipartite network is not connected in one component, the two modes cannot be identified from the data. In that case, we can work with the column labels:

In [None]:
nodes_women = set(edgelist_davis['woman'])
print('Women:', nodes_women)
nodes_events = set(edgelist_davis['event'])
print('Events:', nodes_events)

Draw the bipartite network:

In [None]:
pos_bipartite = nx.bipartite_layout(G_davis, nodes_women)

In [None]:
nx.draw(G=G_davis, pos=pos_bipartite, with_labels=True)

The result is not quite satisfying. For example, we want to give the nodes of the two modes different colors, shapes, and font colors. To do this, we must again interact with Matplotlib directly.

As node size, we want to display the number of events per woman and the number of women per event, respectively. To obtain these numbersr, we can simply use the column and row sums of the matrix:

In [None]:
number_of_events = matrix_davis.sum(axis=1).tolist()
print('Women:', number_of_events)
number_of_women = matrix_davis.sum(axis=0).tolist()
print('Events:', number_of_women)

To use mode-specific font colors, we must prepare two dictionaries that map node labels (dictionary values) to node identifiers (dictionary keys). Recall that in NetworkX node identifiers are node labels by default:

In [None]:
labels_women = {woman: woman for woman in nodes_women}
print('Women:', labels_women)
labels_event = {event: event for event in nodes_events}
print('Events:', labels_event)

Now we have everything in place to draw a network that transports more information:

In [None]:
pos_bipartite = nx.fruchterman_reingold_layout(G_davis)

In [None]:
plt.figure(figsize=(8, 8))
# Draw woman nodes as red squares with the number of events as node size
nx.draw_networkx_nodes(
    G=G_davis, 
    pos=pos_bipartite, 
    nodelist=nodes_women, 
    node_size=[100*size for size in number_of_events], # Using list comprehension to increase node size
    node_color='red', 
    node_shape='s'
)
# Draw event nodes as blue circles with the number of women as node size
nx.draw_networkx_nodes(
    G=G_davis, 
    pos=pos_bipartite, 
    nodelist=nodes_events, 
    node_size=[100*size for size in number_of_women], 
    node_color='blue', 
    node_shape='o'
)
# Draw edges in gray
nx.draw_networkx_edges(
    G=G_davis, 
    pos=pos_bipartite, 
    edge_color='gray'
)
# Label women nodes in black
nx.draw_networkx_labels(
    G=G_davis, 
    pos=pos_bipartite, 
    labels=labels_women, 
    font_color='black'
)
# Label event nodes in white
nx.draw_networkx_labels(
    G=G_davis, 
    pos=pos_bipartite, 
    labels=labels_event, 
    font_color='white'
)
plt.axis('off') # Toggle off box around figure
plt.show() # ...

To also change font size: https://stackoverflow.com/questions/62649745/is-it-possible-to-change-font-sizes-according-to-node-sizes

#### Projection

In [None]:
G_davis_women = nx.bipartite.weighted_projected_graph(G_davis, nodes_women)

In [None]:
plt.figure(figsize=(8, 8))
nx.draw(
    G=G_davis_women, 
    pos=pos_bipartite, 
    with_labels=True, 
    node_size=[100*size for size in number_of_events], # Using list comprehension to increase node size
    node_color='red', 
    node_shape='s', 
    width=list(nx.get_edge_attributes(G_davis_women, 'weight').values()), 
    edge_color='gray', 
    font_color='black'
)

In [None]:
nx.to_pandas_adjacency(G_davis_women)

ADD SLIDER:

In [None]:
G_davis_women_geq4 = nx.Graph()
G_davis_women_geq4.add_nodes_from(nodes_women)
G_davis_women_geq4.add_edges_from([(u, v, edge_attr) for u, v, edge_attr in G_davis_women.edges(data=True) if edge_attr['weight'] >= 4])

In [None]:
plt.figure(figsize=(8, 8))
nx.draw(
    G=G_davis_women_geq4, 
    pos=pos_bipartite, 
    with_labels=True, 
    node_size=[100*size for size in number_of_events], # Using list comprehension to increase node size
    node_color='red', 
    node_shape='s', 
    width=list(nx.get_edge_attributes(G_davis_women_geq4, 'weight').values()), 
    edge_color='gray', 
    font_color='black'
)

Projection to the other side:

In [None]:
G_davis_events = nx.bipartite.weighted_projected_graph(G_davis, nodes_events)

In [None]:
plt.figure(figsize=(8, 8))
nx.draw(
    G=G_davis_events, 
    pos=pos_bipartite, 
    with_labels=True, 
    node_size=[100*size for size in number_of_women], # Using list comprehension to increase node size
    node_color='blue', 
    node_shape='o', 
    width=list(nx.get_edge_attributes(G_davis_events, 'weight').values()), 
    edge_color='gray', 
    font_color='white'
)

In [None]:
nx.to_pandas_adjacency(G_davis_events)

### 5.5.2. Matrix multiplication

NetworkX is slow when networks are large. It is then useful to handle matrices in SciPy before loading them into NetworkX.

...

REPLICATE ABOVE RESULTS USING SciPy

...

More complicated algebraic operations

...

Such as closing triangles

...

https://doi.org/10.1016/j.poetic.2018.01.001

...