# Homework 2.1 - Assignment: Graph Visualization
###### Stefano Biguzzi, Ian Costello, Dennis Pong

## Assignment Description
This week's assignment is to:
1. Load a graph database of your choosing from a text file or other source. If you take a large network dataset from the web (such as from https://snap.stanford.edu/data/), please feel free at this point to load just a small subset of the nodes and edges.
2. Create basic analysis on the graph, including the graph’s diameter, and at least one other metric of your choosing. You may either code the functions by hand (to build your intuition and insight), or use functions in an existing package.
3. Use a visualization tool of your choice (Neo4j, Gephi, etc.) to display information.
4. Please record a short video (~ 5 minutes), and submit a link to the video as part of your homework submission.

## Assignment Steps
### Loading graph
First import the libraries necessary for this assignment: `networkx` as "nx" and `network` from `pyvis` as "net". We also import `pandas`.

In [85]:
import networkx as nx
from pyvis import network as net
import pandas as pd
import matplotlib.pyplot as plt

First we import the Bitcoin OTC transaction database from Github, with user rating data from [Bitcoin OTC trust weighted signed network](https://snap.stanford.edu/data/soc-sign-bitcoin-otc.html). The original dataset has 5,881 nodes, but was cut down to 500 nodes for the purposes of this assignment. Taken from the website here is a little description of the data:  
"_This is who-trusts-whom network of people who trade using Bitcoin on a platform called Bitcoin OTC. Since Bitcoin users are anonymous, there is a need to maintain a record of users' reputation to prevent transactions with fraudulent and risky users. Members of Bitcoin OTC rate other members in a scale of -10 (total distrust) to +10 (total trust) in steps of 1. This is the first explicit weighted signed directed network available for research._"

In [86]:
url = 'https://raw.githubusercontent.com/sbiguzzi/data620/main/Assignments/soc-sign-bitcoinotc.csv'
bitcoin_data = pd.read_csv(url,header=0)[:100]

Next we instantiate the `nx.Graph()`

In [91]:
bit_g = net.Network(notebook=True)
bit_g = nx.Graph()

We then have to assign nodes and edges to our graph, `bit_g`. To do this we  
1. Seperate the source node, the target node, and the rating into seperate lists

In [92]:
source = bitcoin_data['Source']
target = bitcoin_data['Target']
rating = bitcoin_data['Rating']

2. Zip the data into a list of tuple containing the (source,target,rating)

In [93]:
edge_data = zip(source, target, rating)

3. Use the function `add_node()` to assign each source node in the tuple to a node on the graph
4. Use the function `add_edges()` to assign edges connecting each source and target node, using the ratings as a weight for the edge

In [94]:
for e in edge_data:
    src = e[0]
    tgt = e[1]
    rat = e[2]
    bit_g.add_node(src)
    bit_g.add_edge(src,tgt,weight=rat)

### Graph metrics
We find the diameter of the graph by using the function `diameter()` from the nx library.

In [95]:
print("The diameter of the graph is:",nx.diameter(bit_g))

The diameter of the graph is: 6


We find the degree centrality score for each node to decide which node is the most important based on the number of connections it has. We use degree_centrality() function which returns back a dictionary of node as keys and degree centrality as values.

In [96]:
deg_centrality = nx.degree_centrality(bit_g)
print('Example dictionary: \n',{str(i)+': '+str(deg_centrality[i]) for i in range(1,11) if i in deg_centrality.keys()})

Example dictionary: 
 {'6: 0.1891891891891892', '7: 0.2702702702702703', '5: 0.08108108108108109', '8: 0.08108108108108109', '1: 0.3783783783783784', '4: 0.13513513513513514', '10: 0.21621621621621623', '2: 0.13513513513513514', '3: 0.13513513513513514'}


We could then find the node with the highest degree centrality by assigning the values and keys to a list and taking the index of the max value and finding the key with the same index. We thought this was the most relevant measure because each user is rating their experience with other users, so the more connections you have the more ratings and presumably the higher ratings you have.

In [97]:
deg_centrality_vals = list(deg_centrality.values())
deg_centrality_keys = list(deg_centrality.keys())
print("The node with the highest degree centrality is:",
      deg_centrality_keys[deg_centrality_vals.index(max(deg_centrality.values()))])

The node with the highest degree centrality is: 1


### Visualizing graph
we use the `pyvis` package Network class to instatiate a new Network.

In [98]:
bit_vis_g = net.Network(height='1000px', width='100%', bgcolor='#222222', font_color='white', notebook=True)

We assign the node list as the list of all the source and target nodes from the dataset.

In [99]:
nodes = set(source.append(target))
bit_vis_g.add_nodes(nodes)

We reassign the edge_data by using zip and add the edges to the Network by making the zip into a list.

In [100]:
edge_data = zip(source, target, rating)
bit_vis_g.add_edges(list(edge_data))

Finally we call the `show()` funciton to show the graph.

In [101]:
bit_vis_g.show("Bitcoin_OTC_Graph.html")