#Introduction to NetworkX

## Contents
1. Setup
2. Network structure and types
3. Creating graph object, adding and removing nodes and edges
     - Adding and removing nodes

##1. Setup

`networkx` is available for Python 2.7 and later versions.

Check that we are running Python version 3.

In [6]:
import sys
print(sys.version)

If `networkx` is not installed then install it using `pip3`, but first remove `%md ` from the beginning of the next cell before you run the command.

%sh /databricks/python3/bin/pip3 install networkx

Load necesssary packages.

In [10]:
import networkx          as nx
import pandas            as pd
import numpy             as np
import matplotlib.pyplot as plt

## 2. Network structure and types

`networkx` is a package designed for network analysis in the Python environment. It provides functions which:
- Load and store networks
- Generate random and classic networks
- Analyze network structure
- Build network model
- Draw networks etc.

A network consists of 2 types of elements:
1. A __node (vertex)__ represents a single thing (and so is similar to a record in a table) and is often drawn as a circle
1. An __edge__ is a connection or relationship between two nodes

A graph is a collection of nodes and edges.

For example, the following graph shows the friendship network of 6 people. In the graph, the nodes represent people, and the edges represent friendships between the people represented by the nodes they connect. This graph has 6 nodes and 8 edges.

In [15]:
H=nx.Graph()
H.add_nodes_from(["Jane","Joe","Martin","Harris","Calvin","Rob"])
H.add_edges_from([("Jane","Joe"),("Joe","Martin"),("Martin","Harris"),("Harris","Calvin"), 
                  ("Jane","Rob"),("Harris","Rob"),("Rob","Martin"),("Jane","Martin")])
BLUE = "#05006d"
bluee=nx.draw(H, node_color=BLUE, with_labels=True, node_size=4200, font_size=18, font_weight='bold', font_color='white')
display(plt.show())
plt.clf()

Networks can be directed and undirected. The edges of a __directed__ network have direction; they are directed _from_ one node _to_ another node. The edges of an __undirected__ network indicate only a relationship _between_ two nodes. 

For example, being friends on Facebook would be represented by an undirected edge. 

On the other hand, the edge _from_ one Facebook user _to_ the Facebook user they follow is directed. 

Another example of a directed network is the graph of Twitter users where edges are directed from the follower to the followee. 

The graph below is directed.

In [17]:
G=nx.DiGraph()
G.add_nodes_from(["Jane","Joe","Martin","Harris","Calvin","Rob"])
G.add_edges_from([("Jane","Joe"), ("Joe","Martin"), ("Martin","Harris"), ("Harris","Calvin"), 
                  ("Jane","Rob"), ("Harris","Rob"),("Rob","Martin"),("Jane","Martin")])
BLUE = "#05006d"
nx.draw(G, node_color=BLUE,
               with_labels=True, 
               node_size=4200, 
               font_size=18, 
               font_weight='bold',
               font_color='white',
               arrows=True,
               arrowsize=40)
display(plt.show())
plt.clf()

An arrowhead indicates the direction of the relationship, where the arrowhead points to the _to_ node. 

Assuming that Graph 2 (above) is a Twitter follower graph, then Joe, Jane and Rob follow Martin and Martin follows Harris.  
In this case, Martin is an _important_ node in the graph because it has the highest number of edges to or from that node, which is referred to as the _degree_ of the node and will be discussed in a later notebook.

##3. Creating graph object, adding and removing nodes and edges

When graphs are created in `networkx` they must be specified as one of the four graph types listed below.

In [21]:
def_graph =[{"Networkx class":"nx.Graph()", "Graph type":"Undirected"},
           {"Networkx class":"nx.DiGraph()", "Graph type":"Directed"},
           {"Networkx class": "nx.MultiGraph()", "Graph type":"Multiple Undirected edges"},
           {"Networkx class": "nx.MultiDiGraph()", "Graph type": "Multiple Directed edges"}]
pd.DataFrame(def_graph)

The first two graph types can have at most one edge between a single pair of nodes.

Create a graph object called `G`, as a directed graph (with at most one edge per pair of nodes.)

In [24]:
G=nx.DiGraph()

This graph `G` is used in the examples below.

#### 3.1 Adding and removing nodes

A single node is added to a graph using the `add_node()` function. To view the nodes of a graph object, use the `nodes()` function.

In [28]:
G.add_node("Jane")
G.nodes()

The `add_node()` function only adds one node at the time. 
Use the `add_nodes_from()` function if we want to add multiple nodes.

In [30]:
G.add_nodes_from(["Joe","Martin","Harris"])
G.nodes()

The `add_nodes_from()` function also adds nodes from another graph object.

In [32]:
H=nx.Graph()
H.add_nodes_from(["Calvin","Rob","Bob", "Harris"])

In [33]:
G.add_nodes_from(H)
G.nodes()

Notice that there are no duplicate node names in a graph object. 
If we add a node that already exists in a graph object, the graph object will remove the duplicated node.

To remove a node from a graph object, use the `remove_node()` or `remove_nodes_from()` functions.

In [36]:
G.remove_node("Bob")
G.nodes()

#### 3.2 Adding edges

So far `G` graph doesn't have any edges because we have only added nodes to the graph. 
To list the edges of a graph, use the `edges()` function.

In [39]:
G.edges()

The `add_edge()` function adds one edge at the time and the `add_edges_from()` function adds multiple edges. 
The edge to add is indicated by listing the names of the nodes it connects:
- in a directed graph the _from_ node is listed first and the _to_ node is listed second
- in an undirected graph the order that nodes are listed is not relevant
Sometimes a tuple is used to list the two nodes to connect.

In [41]:
G.add_edge("Jane","Joe")
G.edges()

In [42]:
G.add_edges_from([("Joe","Martin"), ("Martin","Harris"), ("Harris","Calvin"), ("Harris","Calvin")])
G.edges()

Notice that for single undirected and directed edge graphs (`nx.Graph()`, `nx.DiGraph()`) repitition of edge is taken as single edge. However, this becomes a concern when we work with multi edged graphs (`nx.MultiGraph()`, `nx.MultiDiGraph()`), in which

In [44]:
H.add_edges_from([("Jane","Rob"), ("Harris","Rob"),("Rob","Martin"),("Jane","Martin"),("Martin", "Calvin")])
H.edges()

Edges can also be added from other graph objects.

In [46]:
G.add_edges_from(H.edges())
G.edges()

#### 3.3 Removing edges

In [48]:
G.remove_edge('Jane', 'Joe')
G.edges()

In [49]:
BLUE = "#05006d"
nx.draw(G, node_color=BLUE,
               with_labels=True, 
               node_size=4200, 
               font_size=18, 
               font_weight='bold',
               font_color='white',
               arrowheads=True,
               arrowsize=50)
display(plt.show())
plt.clf()

### 4. Weighted edges

An edge in a graph may have a _weight_, which is a number. 

For example, in a transportation network, this weight could indicate distance between two cities (which are nodes.)

In [52]:
G = nx.Graph()
G.add_edges_from([("BOS","ORD"), ("BOS","DEN"), ("ORD","LAX"), ("DEN","LAX")])
G.edges()

Weights can be added to edges in several ways.

First, weights can be specified when the edge is created with `add_edge()`.

In [55]:
G.add_edge("ORD","DEN", weight=888)
G.edges.data("weight")

Notice that `None` indicates there is no weight for that edge.

Second, weighted edges can be added with the `add_weighted_edges` function.

In [58]:
weighted_edges=[("BOS", "ORD", 986), ("BOS", "DEN", 1770)]
G.add_weighted_edges_from(weighted_edges)
G.edges.data("weight")

Third, weights can be added to existing edges using dictionary syntax with either syntax below.

In [60]:
G["DEN"]["ORD"]["weight"]= 888
G["BOS"]["ORD"]["weight"]= 986
G.edges["BOS","DEN"]["weight"]=1770
G.edges["LAX","ORD"]["weight"]=2016
G.edges["LAX","DEN"]["weight"]=1017
G.edges.data("weight")

These weight values can be retrieved in at least two different ways. 

First, as above using the `data()` method of the `edges` attribute of `G`.

Second, as key-value pairs in the dictionary associated to each edge.

In [63]:
for node_from, node_to, edge_data in  G.edges().data():
  print (node_from, "-", node_to, edge_data)

### 5. Attributes of Nodes and Edges

The weight of an edge is an example of an edge _attribute_. Nodes can also have attributes. 

These attributes (for nodes and edges in `networkx`) are Python dictionaries.

In [66]:
G.nodes["DEN"]

In [67]:
G.nodes["DEN"]["altitude"] = 5280
G.nodes["DEN"]

In [68]:
G.edges["BOS","ORD"]

In [69]:
G.edges["BOS","DEN"]['duration'] = '4h 40m'
G.edges["BOS","DEN"]

Add a "Population" attribute to each node.

In [71]:
pas={"LAX":84557968, "ORD":79828183, "DEN":61379396, "BOS":28866313}
nx.set_node_attributes(G, pas, "Passengers")
G.nodes(data=True)

# Centrality

Let's add `betweenness` attribute to metadata. Details of betweenness will be discussed in the next chapter. Edge metadata should be in form of edge nodes as dictionary key.

In [74]:
edge_bc=nx.edge_betweenness(G)
edge_bc

In [75]:
nx.set_edge_attributes(G, name="betweenness", values=edge_bc)
G.edges(data=True)

Here we have 2 edge attributes in our graph. To access metadata, we pass attribute name to `.edges.data()` function.

In [77]:
G.edges.data("betweenness")

__The End__