<div class="frontmatter text-center">
<h1> Introduction to Data Science and Programming</h1>
<h2>Exercise 19: Introduction to network science</h2>
<h3>IT University of Copenhagen, Fall 2020</h3>
Notebook adapted from: A network science class by Sean Cornelius and Emma Thompson
</div>
<hr>

## Importing required modules

In [1]:
import numpy as np
import networkx as nx
from matplotlib import pyplot as plt
import matplotlib as mpl
%matplotlib inline

## Basic data types in NetworkX
NetworkX provides the following classes that represent network-related data,
as well as network analysis algorithms that operate on these objects:

**Graph**       - Undirected graph with self loops


**DiGraph**      - Directed graph with self loops

**MultiGraph**   - Undirected Graph with self loops and multiple edges between the same node pair


**MultiDiGraph** - Directed Graph with self loops and multiple edges between the same node pair

## Getting started
Create an empty undirected graph and call it G

In [2]:
G = nx.Graph()

## Nodes
A node mode can be pretty much anything, including numbers, strings, GPS coordinates, even another graph.
Nodes can be added one at a time using `G.add_node(n)` (the `G` being the graph and `n` being the node to add).

- Add 7 nodes one at a time to G. Make sure there are no more than two of a type.
- Print the added nodes using `G.nodes`

In [3]:
G

<networkx.classes.graph.Graph at 0x7f97ac4a0e20>

In [4]:
G.add_node(6) # we add a node
n = 10 # we make a variable
G.add_node("string") # we add the variable as a node
G.add_node(0.5) # we add the variable as a node
G.add_node(7.0) # we add the variable as a node
G.add_node("8") # we add the variable as a node
G.add_node(9) # we add the variable as a node
G.add_node((0,9)) # we add the variable as a node

We can also add many nodes at a time using `G.add_nodes_from(lst)` as below. We've added one list for you.
- Make your own list of nodes and add them too:

In [5]:
G.add_nodes_from([1, 2, 3]) # our list
G.add_nodes_from([4,5]) # your list goes here

In [6]:
G.nodes

NodeView((6, 'string', 0.5, 7.0, '8', 9, (0, 9), 1, 2, 3, 4, 5))

## Node attributes
Nodes can have arbitrary attributes associated with them, contained in a string-index dictionary.
You can add attributes at the time of node creation using keyword arguments as follows:

In [7]:
G.add_node("Noah", eye_color='brown', height=193)

You can also add attributes to an already existing node:

In [11]:
G.add_node("Natalie")
# add an attribute "books" with value 500 to Natalie
G.nodes["Natalie"]["books"] = 500
G.nodes["Natalie"]
G.nodes

NodeView((6, 'string', 0.5, 7.0, '8', 9, (0, 9), 1, 2, 3, 4, 5, 'Noah', 'Natalie'))

- Make two new nodes with attributes
- Add 2 attributes to at least 4 of your pre-existing nodes

In [16]:
G.add_node("Simon", age=21, job="TA")
G.add_node("ITU", oppen_to_public=True, wear_masks=True)

In [20]:
G.nodes["Simon"]["age"]

21

`G.nodes[n]` gives a dictionary containing all the `attribute: value` pairs associated with node n:

In [None]:
print("Noah's eyes are ", G.nodes["Noah"]["eye_color"], " and he is ", G.nodes['Noah']['height'], " cm tall.")
print("Natalie has ", G.nodes["Natalie"]["books"], " books.")

- Make a sentence (as we did above) using nodes attributes. If you're cooler than us use `.format()` or `f""`

In [None]:
# your code here

## Edges

An edge between node `node1` and node `node2` is represented by the tuple (`node1`, `node2`).

In [12]:
# add edge between node 0 and node 1
G.add_edge(0, 1) # They can be added one at a time:
G.add_edges_from([ (2, 1), ("Michael", "Natalie"), (3, 4) ]) # or from a list


In [13]:
G.edges()

EdgeView([(1, 0), (1, 2), (3, 4), ('Natalie', 'Michael')])

- Add 4 edges however you want, between whatever nodes you want.
- Print a list of nodes `print(list(G.nodes))`
- Print a list of edges

In [None]:
# your code here

**Note:** When adding an edge, nodes will be automatically created if they don't already exist.

## Edge attributes

Like nodes, edges can also have attributes. An important and special attribute (for many algorithms) is "edge weight"

The syntax for adding/accessing edge attributes is the similar to that for nodes:

In [None]:
G.add_edge("Michael", "Natalie", weight=10)
G["Michael"]["Natalie"]

- Add attributes to three of your edges

In [None]:
# your code

`G[node1][node2]` is a dictionary containing all attribute:value pairs associated with the edge from node1 to node2.

You can also get edge attributes using `G.get_edge_data("node1", "node2")` and **set** attributes using
`G.set_edge_attributes(G, attributes)` like so:

In [None]:
G.add_edge("Copenhagen", "Aarhus")
attrs = {("Copenhagen", "Aarhus"): {'distance': 186.7}} # key is edge, value is attribute
nx.set_edge_attributes(G, attrs)

print(list(G.edges))
print(G["Copenhagen"]["Aarhus"]['distance'])
print(G.get_edge_data("Copenhagen", "Aarhus"))

## Size of the network

- use `G.number_of_nodes()` or `len(G)` to determine node count of the network.
- use `G.number_of_edges()` or `G.size()` to determine node count of the network.
- print a sentence of stating the number of nodes and the number of edges (use .format())

In [25]:
print(G.number_of_nodes())
print(len(G))

19
19


In [27]:
print(G.number_of_edges())
print(G.size())

4
4


In [30]:
print(f'The Network has {len(G)} nodes and {G.size()} edges.')

The Network has 19 nodes and 4 edges.


## Testing to see whether nodes or edges exist
- Use `G.has_node(n)` to see if `"michael"` is a node in `G`:
- Do the same by using `"michael" in G`

In [14]:
'Michael' in G

True

For edges, you must use `has_edge()` (no syntax like `edge in G`)
- Check for the existence of an edge in the network, and an edge **not** in the network.

In [33]:
G.has_edge('Simon', 'Michael')

False

## Finding neighbors of a node

In [15]:
list(G.neighbors(1))

[0, 2]

* In `DiGraph` objects, `G.neighbors(node)` gives the successors of `node`, as does `G.successors(node)`  
* Predecessors of `node` can be obtained with `G.predecessors(node)`

## Iterating over nodes and edges
Nodes and edges can be iterated over with `G.nodes()` and `G.edges()` respectively  

In [None]:
for node, data in list(G.nodes(data=True)): # data=True includes node attributes as dictionaries
    print("Node {0}\t\t\t: {1}".format(node, data))

In [None]:
for n1, n2, data in list(G.edges(data=True)):
    print("{0} <----> {1}: {2}".format(n1, n2, data))

- Use a for loop like the one above to only print nodes that are strings
- Use a for loop to only print edges whose nodes are integers

In [None]:
# your code

## Calculating degrees

- Get the degree of a node you like using `G.degree(node)`
- Get a dictionary of the from node:degree using `G.degree()` for yor network
- Make a list of only degrees (without the corresponding nodes)

In [None]:
# your code

As you know, in directed graphs (of class `DiGraph`) there are two types of degree. Things work just as you expect
* `G.in_degree(node) `
* `G.out_degree(node) `


### Other operations with `x`

* ***`subgraph(G, nodes)` or `G.subgraph(nodes)`***
subgraph of `G induced by nodes in `nodes`

* ***`reverse(G)`***       
DiGraph with all edges reversed

* ***`union(G1, G2)`***      
graph union    

* ***`disjoint_union(G1, G2)`***     
same, but treats nodes of G1, G2 as different 

* ***`intersection(G1, G2)`***      
graph with only the edges in common between G1, G2

* ***`difference(G1, G2)`***      
graph with only the edges G1 that aren't in G2

* ***`copy(G)` or `G.copy()`***     
copy of G

* ***`complement(G)` or `G.complement()`***     
the complement graph of G 

* ***`convert_to_undirected(G)` or `G.to_undirected()`***     
undirected version of G (a Graph or MultiGraph)  

* ***`convert_to_directed(G)` or `G.to_directed()`***      
directed version of G (a DiGraph of MultiDiGraph)

* ***`adjacency_matrix(G)`***      
adjacency matrix A of G (in sparse matrix format; to get full matrix, use A.toarray() )

# Graph I/O

Usually you will not be building a network from scratch one node/link at a time. Instead, you will
want to read it in from an appropriate data file. NetworkX can understand the following common graph
formats:

* edge lists
* adjacency lists
* GML
* GEXF
* Python 'pickle'
* GraphML
* Pajek
* LEDA
* YAML

# Reading in an edge list

Read in the file `test.txt` with the following options
* lines starting with `#` are treated as comments and ignored  
* use a `Graph` object to hold the data (i.e., network is undirected)  
* data are separated by whitespace (' ')
* nodes should be treated as integers (`int`)
* encoding of the text file containing the edge list is utf-8

In [None]:
# read in an edge list from the file 'test.txt'
G = nx.read_edgelist('files/test.txt', comments='#',
                     create_using=nx.Graph(), 
                     delimiter=' ', 
                     nodetype=int, 
                     encoding='utf-8')

### Allowed formats
* Node pairs with no data  
`1 2`
* Node pairs with python dictionary  
`1 2 {weight:7, color:"green"}`

# Basic analysis example
A large number of basic analyses can be done in one line using NetworkX + numpy or builtin python functions like `min`, `max`, etc.

In [None]:
N = len(G)
L = G.size()
degrees = [d for _, d in G.degree()]
kmin = min(degrees)
kmax = max(degrees)

In [None]:
print("Number of nodes: ", N)
print("Number of edges: ", L)
print()
print("Average degree: ", 2*L/N)
print("Average degree (alternate calculation)", np.mean(degrees))
print()
print("Minimum degree: ", kmin)
print("Maximum degree: ", kmax)

# The gnutella network

- Read in the provided Gnutella network and perform a basic network analysis like the one above.

In [None]:
# your network analysis code here

# Drawing the network
* NetworkX can draw networks using a large number of layout algorithms
* The results are not as pretty as Gephi, but NetworkX is better for a quick 'n dirty visualization and
gives you finer-grained control over the layout.

In [None]:
# using the force-based or "spring" layout algorithm
fig = plt.figure(figsize=(8,8))
nx.draw_spring(G, node_size=30)

In [None]:
# using the fcircular layout algorithm
fig = plt.figure(figsize=(8,8))
nx.draw_circular(G, node_size=30)