# Networks from Data

This notebook shows several examples of constructing networks from raw datasets in various formats. We will use *Pandas* to load and represent the original data, and *NetworkX* to create the networks.

In [None]:
import networkx as nx
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

### Network from CSV Data

We will read a CSV file containing route data from Irish rail.

In [None]:
df = pd.read_csv("irishrail.csv")
df

Create a network, with edges for each route. Note that all naturally routes go in both directions, so an undirected network is appropriate here.

In [None]:
g = nx.Graph()

In [None]:
for i, row in df.iterrows():
    # add a edge, with "kind" as an attribute
    g.add_edge( row["start"], row["end"], kind = row["kind"] )

Check the size of the network:

In [None]:
g.number_of_nodes(), g.number_of_edges()

Iterate over all of the edges:

In [None]:
for e in g.edges(data=True):
    print(e)

Draw the rail network:

In [None]:
plt.figure(figsize=(12,10))
nx.draw_networkx( g, with_labels=True, node_size=800, node_color="lightblue" )
plt.axis("off")
plt.show()

We could find all routers that services a particular destination:

In [None]:
list( g.neighbors("Cork") )

In [None]:
list( g.neighbors("Limerick") )

### Network from JSON Data

In this case we will read a JSON file describing character interactions for the movie *Star Wars: Episode IV*.

Firstly read in the JSON data:

In [None]:
import json
json_file = open("starwars-episode-4.json","r")
data = json.load( json_file )
json_file.close()

In [None]:
data

Since interactions are naturally reciprocal, we will use an undirected network again to represent the data. Because we have a frequencies data for the interactions (i.e. the number of times 2 characters interacted), it makes sense for this to be a weighted undirected network.

Create an empty network and add the nodes. Note that we will also create a mapping from the ID numbers in the file to the character names, as we want to use names for node identifiers:

In [None]:
g = nx.Graph()
name_map = {}
for char in data["characters"]:
    name_map[ char["id"] ] = char["name"]
    g.add_node( char["name"] )

In [None]:
g.number_of_nodes()

Now add the edges based on the interactions in the JSON data:

In [None]:
for interaction in data["interactions"]:
    name1 = name_map[ interaction["id1"] ]
    name2 = name_map[ interaction["id2"] ]
    g.add_edge( name1, name2, weight=interaction["frequency"] )

In [None]:
g.number_of_edges()

Draw the resulting network:

In [None]:
plt.figure(figsize=(12,10))
nx.draw_networkx( g, with_labels=True, node_size=800, node_color="lightblue" )
plt.axis("off")
plt.show()

We might want to look at the most frequent interactions in the network:

In [None]:
# convert the edges in the network to a Pandas DataFrame
df = nx.to_pandas_edgelist(g)
df.head(10)

In [None]:
# sort the rows by weight (i.e interaction frequency)
df.sort_values(by="weight", ascending=False).head(10)

We could also plot a histogram showing the distribution of edge weights. As we can see, in most cases the interactions are "once off" events.

In [None]:
ax = df.plot.hist(figsize=(8,6), fontsize=14, legend=None, color="darkred")
ax.set_ylabel("Number of Edges", fontsize=14)
ax.set_xlabel("Weight", fontsize=14);