# Network Analysis Lab

Complete the following exercises to help solidify your understanding of network analysis.

In [51]:
import networkx as nx
import nxviz
import community
import pandas as pd
import matplotlib.pyplot as plt

## U.S. Mens Basketball Data Set

In the `us_mens_basketball.csv` data set, each row represents an single basketball player's participation in a single event at a single Olympics. 

In [25]:
basketball = pd.read_csv('./data/us_mens_basketball.csv')

In [26]:
basketball.head(15)

Unnamed: 0,ID,Name,Sex,Age,Height,Weight,Team,NOC,Games,Year,Season,City,Sport,Event,Medal
0,351,Julius Shareef Abdur-Rahim,M,23.0,202.0,104.0,United States,USA,2000 Summer,2000,Summer,Sydney,Basketball,Basketball Men's Basketball,Gold
1,2636,"Stephen Todd ""Steve"" Alford",M,19.0,185.0,74.0,United States,USA,1984 Summer,1984,Summer,Los Angeles,Basketball,Basketball Men's Basketball,Gold
2,2863,Walter Ray Allen,M,25.0,192.0,93.0,United States,USA,2000 Summer,2000,Summer,Sydney,Basketball,Basketball Men's Basketball,Gold
3,3874,"William Lloyd ""Willie"" Anderson, Jr.",M,21.0,200.0,86.0,United States,USA,1988 Summer,1988,Summer,Seoul,Basketball,Basketball Men's Basketball,Bronze
4,4505,Carmelo Kyan Anthony,M,20.0,203.0,109.0,United States,USA,2004 Summer,2004,Summer,Athina,Basketball,Basketball Men's Basketball,Bronze
5,4505,Carmelo Kyan Anthony,M,24.0,203.0,109.0,United States,USA,2008 Summer,2008,Summer,Beijing,Basketball,Basketball Men's Basketball,Gold
6,4505,Carmelo Kyan Anthony,M,28.0,203.0,109.0,United States,USA,2012 Summer,2012,Summer,London,Basketball,Basketball Men's Basketball,Gold
7,4505,Carmelo Kyan Anthony,M,32.0,203.0,109.0,United States,USA,2016 Summer,2016,Summer,Rio de Janeiro,Basketball,Basketball Men's Basketball,Gold
8,5173,"Michel Taylor ""Tate"" Armstrong",M,20.0,190.0,77.0,United States,USA,1976 Summer,1976,Summer,Montreal,Basketball,Basketball Men's Basketball,Gold
9,5212,Jay Joseph Hoyland Arnette,M,21.0,188.0,79.0,United States,USA,1960 Summer,1960,Summer,Roma,Basketball,Basketball Men's Basketball,Gold


## 1. Transform this data set into one that can be turned into a graph where the entities are represented by the Name field and the relationships are represented by whether the players played in the same Olympics together (Games field).

Sort descending by the number of pairwise interactions. Which pair of players have competed in the most Olympics together?

In [27]:
#Crear el gráfico
g = nx.Graph() 

In [28]:
#Añade nódulos - como cada nombre - fila hay que convertirlo en un nodo, lo hacenos con un for loop

#Para ello primero creamos una lista
names = basketball.Name
names

0                Julius Shareef Abdur-Rahim
1               Stephen Todd "Steve" Alford
2                          Walter Ray Allen
3      William Lloyd "Willie" Anderson, Jr.
4                      Carmelo Kyan Anthony
                       ...                 
217                  Deron Michael Williams
218                  Deron Michael Williams
219            Howard Earl "Howie" Williams
220                    George "Jiff" Wilson
221                     Osie Leon Wood, III
Name: Name, Length: 222, dtype: object

In [29]:
#Creamos una lista vacía
nodes = []

In [30]:
#Creamos el for loop
for i in names:
    nodes.add_node(i)  
    print(nodes)

AttributeError: 'list' object has no attribute 'add_node'

In [31]:
#Magnífico veo que para cargar datos de un data set como nodos no se hace con un for loop sino con una función que aparece en los apuntes del readme...
#Con lo bien que se me dan las funciones! :))))))

In [32]:
#Parto de la función que hay en los apuntes:

#entidad: nombre de la columna que será el nodo
#edte: nombre de la columna que hará de conexión

def df_to_graph(df, entity, edge):
    
    #Creamos una copia del df para luego unirlo al df por la columna edge
    df2 = df.copy()
    graph_df = pd.merge(df, df2, how='inner', on=edge)
    
    graph_df = graph_df.groupby([entity + '_x', entity + '_y']).count().reset_index()
    
    #Quitamos los edges a ellos mismos
    graph_df = graph_df[graph_df[entity + '_x'] != graph_df[entity + '_y']]
    
    if type(edge) == list:
        graph_df = graph_df[[entity + '_x', entity + '_y'] + edge]
    else:
        graph_df = graph_df[[entity + '_x', entity + '_y', edge]]
    
    return graph_df

In [33]:
#Creamos el df data con las columnas de los nodos y una de edge creo

data = df_to_graph(basketball, 'Name', 'Games')
data.sort_values(['Games'], ascending=False).head(10)

Unnamed: 0,Name_x,Name_y,Games
1557,LeBron Raymone James,Carmelo Kyan Anthony,3
282,Carmelo Kyan Anthony,LeBron Raymone James,3
1347,Karl Malone,Charles Wade Barkley,2
1487,Kobe Bean Bryant,LeBron Raymone James,2
429,"Christopher Paul ""Chris"" Mullin",Michael Jeffrey Jordan,2
617,Deron Michael Williams,Kobe Bean Bryant,2
618,Deron Michael Williams,LeBron Raymone James,2
1206,John Houston Stockton,Charles Wade Barkley,2
244,"Carlos Austin Boozer, Jr.","Dwyane Tyrone Wade, Jr.",2
1744,"Mitchell James ""Mitch"" Richmond, III",David Maurice Robinson,2


## 2. Use the `from_pandas_edgelist` method to turn the data frame into a graph.

In [34]:
#Creamos el grafo des del dataframe

G = nx.from_pandas_edgelist(data, source='Name_x', target='Name_y', edge_attr=True)

In [35]:
#No entiendo por qué no me deja plotearlo

nx.draw(G)

ImportError: cannot import name '_png' from 'matplotlib' (/Users/blancurri/opt/anaconda3/lib/python3.7/site-packages/matplotlib/__init__.py)

<Figure size 432x288 with 1 Axes>

## 3. Compute and print the following graph statistics for the graph:

- Number of nodes
- Number of edges
- Average degree
- Density

In [36]:
#Número de nodos: 
G.number_of_nodes()

196

In [38]:
#Número de conexiones:
G.number_of_edges()

1232

In [41]:
#Media de conexiones por nodo:
G.degree()

DegreeView({'Adrian Delano Dantley': 11, 'Ernest "Ernie" Grunfeld': 11, 'Kenneth Alan "Kenny" Carr': 11, 'Michel Taylor "Tate" Armstrong': 11, 'Mitchell William "Mitch" Kupchak': 11, 'Philip Jackson "Phil" Ford, Jr.': 11, 'Phillip Gregory "Phil" Hubbard': 11, 'Scott Glenn May': 11, 'Steven Bernard "Steve" Sheppard': 11, 'Thomas Joseph "Tom" LaGarde': 11, 'Walter Paul Davis': 11, 'William Quinn Buckner': 11, 'Adrian Howard Smith': 11, 'Burdette Eliele "Burdie" Haldorson': 22, 'Darrall Tucker Imhoff': 11, 'Earl Allen Kelley': 11, 'Jay Joseph Hoyland Arnette': 11, 'Jerome Alan "Jerry" West': 11, 'Jerry Ray Lucas': 11, 'Lester Everett "Les" Lane': 11, 'Oscar Palmer Robertson': 11, 'Robert Lewis "Bob" Boozer': 11, 'Terence Gilbert "Terry" Dischinger': 11, 'Walter Jones "Walt" Bellamy, Jr.': 11, 'Alexander John "Alex" Groza': 13, 'Clifford Eugene "Cliff" Barker': 13, 'Donald Argee "Don" Barksdale': 13, 'Gordon C. Carpenter': 13, 'Jesse Banard Renick': 13, 'Kenneth Herman "Kenny" Rollins': 13

In [40]:
#Densidad del grafo:
nx.density(G)

0.06446886446886448

## 4. Compute betweenness centrality for the graph and print the top 5 nodes with the highest centrality.

In [46]:
nx.betweenness_centrality(G, weight='edge')

{'Adrian Delano Dantley': 0.0,
 'Ernest "Ernie" Grunfeld': 0.0,
 'Kenneth Alan "Kenny" Carr': 0.0,
 'Michel Taylor "Tate" Armstrong': 0.0,
 'Mitchell William "Mitch" Kupchak': 0.0,
 'Philip Jackson "Phil" Ford, Jr.': 0.0,
 'Phillip Gregory "Phil" Hubbard': 0.0,
 'Scott Glenn May': 0.0,
 'Steven Bernard "Steve" Sheppard': 0.0,
 'Thomas Joseph "Tom" LaGarde': 0.0,
 'Walter Paul Davis': 0.0,
 'William Quinn Buckner': 0.0,
 'Adrian Howard Smith': 0.0,
 'Burdette Eliele "Burdie" Haldorson': 0.021517314300819455,
 'Darrall Tucker Imhoff': 0.0,
 'Earl Allen Kelley': 0.0,
 'Jay Joseph Hoyland Arnette': 0.0,
 'Jerome Alan "Jerry" West': 0.0,
 'Jerry Ray Lucas': 0.0,
 'Lester Everett "Les" Lane': 0.0,
 'Oscar Palmer Robertson': 0.0,
 'Robert Lewis "Bob" Boozer': 0.0,
 'Terence Gilbert "Terry" Dischinger': 0.0,
 'Walter Jones "Walt" Bellamy, Jr.': 0.0,
 'Alexander John "Alex" Groza': 0.0,
 'Clifford Eugene "Cliff" Barker': 0.0,
 'Donald Argee "Don" Barksdale': 0.0,
 'Gordon C. Carpenter': 0.0,
 '

In [45]:
#No sé cómo coger los cinco primeros, y mira que lo he intentado ;)

KeyError: (0, 1)

In [47]:
#No entiendo cómo Lola ha conseguido descifrar o crear este código... no entiendo nada...

# Obtain the betweenness centrality
bc = nx.betweenness_centrality(G)

# Sort the values
bc_sorted = {k: v for k, v in sorted(bc.items(), key=lambda item: item[1], reverse=True)}

# Obtain the top 5
list(bc_sorted.items())[:5]

[('Gary Dwayne Payton', 0.09193761564895586),
 ('Jason Frederick Kidd', 0.09135606661379858),
 ('Carmelo Kyan Anthony', 0.04742268041237115),
 ('David Maurice Robinson', 0.03266190853819722),
 ('William Marion "Bill" Hougland', 0.030240549828178694)]

## 5. Compute Eigenvector centrality for the graph and print the top 5 nodes with the highest centrality.

In [48]:
ec = nx.eigenvector_centrality_numpy(G)

# Sort the values
ec_sorted = {k: v for k, v in sorted(ec.items(), key=lambda item: item[1], reverse=True)}

# Obtain the top 5
list(ec_sorted.items())[:5]

[('Carmelo Kyan Anthony', 0.34185005667190693),
 ('LeBron Raymone James', 0.2884535214315888),
 ('Christopher Emmanuel "Chris" Paul', 0.22431681558531258),
 ('Kobe Bean Bryant', 0.2243168155853125),
 ('Deron Michael Williams', 0.22431681558531244)]

## 6. Compute degree centrality for the graph and print the top 5 nodes with the highest centrality.

In [49]:
dc = nx.degree_centrality(G)

# Sort the values
dc_sorted = {k: v for k, v in sorted(dc.items(), key=lambda item: item[1], reverse=True)}

# Obtain the top 5
list(dc_sorted.items())[:5]

[('Carmelo Kyan Anthony', 0.18461538461538463),
 ('David Maurice Robinson', 0.14358974358974358),
 ('Robert Albert "Bob" Kurland', 0.13333333333333333),
 ('LeBron Raymone James', 0.13333333333333333),
 ('William Marion "Bill" Hougland', 0.12307692307692308)]

## 7. Generate a network visualization for the entire graph using a Kamada-Kawai force-directed layout.

#A partir de aquí, no he conseguido que me salieran estas redes pero me cojo de referencia código que debería funcionar para cuando tenga tiempo de mirármelo :(

In [52]:
plt.figure(figsize=(14,8))

nx.draw_kamada_kawai(G, node_size=80, alpha=0.75, width=0.75)

ImportError: cannot import name '_png' from 'matplotlib' (/Users/blancurri/opt/anaconda3/lib/python3.7/site-packages/matplotlib/__init__.py)

<Figure size 1008x576 with 1 Axes>

## 8. Create and visualize an ego graph for the player with the highest betweenness centrality.

In [53]:
plt.figure(figsize=(14,8))

ego = nx.ego_graph(G, 'Gary Dwayne Payton', radius=1)

# Draw the graph
nx.draw_kamada_kawai(ego, node_size=80, with_labels=True, alpha=0.8)

ImportError: cannot import name '_png' from 'matplotlib' (/Users/blancurri/opt/anaconda3/lib/python3.7/site-packages/matplotlib/__init__.py)

<Figure size 1008x576 with 1 Axes>

## 9. Identify the communities within the entire graph and produce another visualization of it with the nodes color-coded by the community they belong to.

In [55]:
# Create a dictionary containing the name of each node and which community it has been grouped into
partition = louvain.best_partition(G)

# extract the values of this dictionary to be passed to the node_color
values = list(partition.values())

NameError: name 'louvain' is not defined

In [56]:
plt.figure(figsize=(14,8))

# Do the plot
nx.draw_kamada_kawai(G, node_size=100, alpha=0.8, width=0.5,node_color=values, cmap="jet")

NameError: name 'values' is not defined

<Figure size 1008x576 with 0 Axes>

## Bonus: Hierarchical Graphs

Thus far, we have analyzed graphs where the nodes represented individual players and the edges represented Olympic games that they have competed in together. We can analyze the data at a higher level if we wanted to, strippping out the players as entities and analyzing the data at the Games level. To do this, we would need to reconstruct the graph so that the *Games* field represents the entities and then use the player names as the edge criteria so that there would be an edge between two Olympic games if an player played in both of them. You already have the tools in your toolbox to be able to do this, so give it a try. 

### Create a graph with Games as the entities and then print out the graph statistics.

### Generate a network visualization of this graph using the layout of your choice.