# Exercise goals
In this exercise, we will see how to use Python to build a network graph.

We will start by running a simulation of a random event, a coin flip.  We will then build on that simulation to create different types of visualizations.

# Network graph demo
We will be using data from Stackoverflow's Developer
Stories to create a network graph. Follow the instructions to download the
data.


## Data
This demo uses Stackoverflow tags. There are two data files, for nodes and edges, that are provided with this notebook. (You don't need to do anything to get them.)

For more information, see:
https://www.kaggle.com/stackoverflow/stack-overflow-tag-network


## Peak at data
Developers put different tags in their Developer Stories on
Stackoverflow. They've collected some data about these tags and provided it to
the public for analysis. Note that:
- Nodesize is proportional to how many people use aspecific tag
- Tags are grouped together based on similarity. Each group is calculated with a cluster walktrap.
- Value is the correlation between the pair * 100. 
- The greater the value, the closer the pair will be in the graph.

## Goal
The goal of the demo is to create a network graph of this data showing the strength of connections between nodes,
how nodes are grouped together, and the size of each node. We have all the data necessary to accomplish this.

# Code documentation
Run the following cell to see docs for the code we will use to position nodes on the graph. Change the line that is not commented out (by `#`) to change the doc that opens when you run the cell.

In [None]:
import networkx as nx
#?nx.Graph
#?nx.Graph.add_node
#?nx.Graph.add_weighted_edges_from
#?nx.draw
?nx.spring_layout

# Read in the data
Run the following cell to read in the Stack Overflow data. Because of the large amount of data, this cell could take up to 30 seconds before you see results.

In [None]:
# networkx is a library for making network graphs
import networkx as nx
import matplotlib.pyplot as plt
import pandas as pd
import warnings

%matplotlib inline
# we're turning off some warnings
warnings.filterwarnings('ignore')

# make sure you've moved these files from your Downloads folder, or provide the full file path to them in 
# the Downloads folder
df_nodes = pd.read_csv('stack_network_nodes.csv')
df_edges = pd.read_csv('stack_network_links.csv')
print("Head of Nodes:",'\n', df_nodes.head(), '\n')
print("Head of Edges:",'\n', df_edges.head())

# Create the network graph
Run the following cell to see the network graph.

In [None]:
# initialize the graph
G = nx.Graph(name="Stackoverflow_Tags")

# add the nodes
# iterrows takes each row and returns (index, row), similar to enumerate
for index, row in df_nodes.iterrows():
    G.add_node(row['name'], group=row['group'], nodesize=row['nodesize'])

# add the links between nodes (called edges)
for index, row in df_edges.iterrows():
    G.add_weighted_edges_from([(row['source'], row['target'], row['value'])])

# there will be a different color for each of the 14 groups of nodes
# each of these codes is mapped to a different color
color_map = {1:'#f09494', 2:'#eebcbc', 3:'#72bbd0', 4:'#91f0a1', 5:'#629fff', 6:'#bcc2f2',  
             7:'#eebcbc', 8:'#f1f0c0', 9:'#d2ffe7', 10:'#caf3a6', 11:'#ffdf55', 12:'#ef77aa', 
             13:'#d6dcff', 14:'#d2f5f0'}

# When you draw the graph, this controls the size of the window/picture frame it comes in.
plt.figure(figsize=(25,25))

# maps each node's group to the color assigned in color_map
# this is a list comprehension. It's equivalent to 
# colors=[]
# for node in G:
#    colors.append(color_map[G.node[node]['group']])
colors = [color_map[G.node[node]['group']] for node in G]

# a list of nodesizes. Modify 10 to change how big the resulting circles are in the graph
sizes = [G.node[node]['nodesize']*10 for node in G]

# command to plot the graph
# G is the networkx graph we've constructed
# pos controls the position of the nodes
# spring_layout uses a particular algorithm to calculate node position-see more via ?nx.spring_layout
# k is in [0,1]. Increase it to increase the distance between circles 
# with_labels writes the language inside the circles
nx.draw(G, node_color=colors, node_size=sizes, 
        pos=nx.spring_layout(G, k=0.5, iterations=50), with_labels=True)

# gca stands for grab current axis
ax = plt.gca()
# sets the color of the edges of the circles
ax.collections[0].set_edgecolor("#555555")
plt.show()

# Network graph exercise
Your task is to use the data provided to create a network graph using the demo as a guide.

Run the cell below to generate sample nodes and edges to use in the network graph.

In [None]:
import networkx as nx
import matplotlib.pyplot as plt
import pandas as pd

nodes = [['python',1,100],
         ['javascript',2,120],
         ['c++',3,75],
         ['bash',4,65]]

edges = [['python','c++',.33],
         ['javascript','c++',.11],
         ['python','bash',.24],
         ['bash','c++', .15],
         ['javascript','c++', .1]]

nodes = pd.DataFrame(nodes, columns=['name','group','nodesize'])
edges = pd.DataFrame(edges, columns=['source','target','value'])
print(nodes.head())
print(edges.head())


# Create your graph
Now lets use that data to create a new graph.

Your must:
- Initialize the graph
- Add the nodes to the graph
- Add the edges to the graph

You will need to use the following functions from `networkx`:
- `nx.Graph`
- `nx.Graph.add_node` 
- `nx.Graph.add_weighted_edges_from`

These functions are used in the cell below.

Rrun the cell below to see the results.

In [None]:
# Network Graph Exercise


#---------------------Your code here------------------------#
# initialize the graph
G = nx.Graph()

# we add each node to the graph
for i, row in nodes.iterrows():
    G.add_node(row['name'], group = row['group'], nodesize=row['nodesize'])
# add each edge/connection to the graph
for i, row in edges.iterrows():
    G.add_weighted_edges_from([(row['source'], row['target'], row['value'])])
#-----------------------------------------------------------#

plt.figure(figsize=(25,25))

# k is in [0,1]. Increase it to increase the distance between circles 
sizes = [size*70 for size in nodes['nodesize']]
color_map = {1:'#f09494', 2:'#eebcbc', 3:'#72bbd0', 4:'#91f0a1'}
colors = [color_map[G.node[node]['group']] for node in G]
# this plots the graph
nx.draw(G, node_color=colors, node_size=sizes, pos=nx.spring_layout(G, k=0.5, iterations=50), 
        with_labels=True)

# gca stands for grab current axis
ax = plt.gca()
# sets the color of the edges of the circles
ax.collections[0].set_edgecolor("#555555")
plt.show()