## Analisis mediante Grafos del dataset "Marvel social Network"

En este notebook se analiza la co-ocurrencia de los heroes de Marvel en los comics. Cada fila describe una instacia en que *hero1* (columna 1) aparece en el mismo comic que *hero2* (columna 2)

In [None]:
import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt
import networkx as nx
from IPython.display import display
from PIL import Image
from tqdm import tqdm
import chart_studio.plotly
import plotly.graph_objs as go
from chart_studio.plotly import plot, iplot
import igraph as ig

In [None]:
edges = pd.read_csv('./data/edges.csv') 
hero_network = pd.read_csv('./data/hero-network.csv') 
nodes = pd.read_csv('./data/nodes.csv') 
#print(hero_network.head())

In [None]:
hero_graph = nx.from_pandas_edgelist(hero_network, 
                            source='hero1',
                            target='hero2',
                            )
#print(nx.info(hero_graph))

**Dada la data, el grafo obtenido tiene un total de 6426 nodos (*nodes*) y 167219 vertices (*edges*)**

In [None]:
## Graph of total data network
nx.draw_networkx(hero_graph)

![alt text](Figure_1.png "Graph Obtained")

**Ahora se medirá la importancia (grado de centralidad) de los nodos en la red mediante la observación de la cantidad de vecinos que tiene.**

In [None]:
## Calculating the degree centrality
degree_centrality_heros = nx.degree_centrality(hero_graph)

## Sorting the dictionaries according to their degree centrality and storing the top 10
sorted_degree_centrality_heros = sorted(degree_centrality_heros.items(),  key=lambda x: -x[1])
print(sorted_degree_centrality_heros[0:10])


In [None]:
## Numbers of Nodes
node_numbers = hero_graph.number_of_nodes()
## List of Edge
edge_list = hero_graph.number_of_edges()
## Graph objects
edges_name = [edge for edge in hero_graph.edges()] # List of Edges Names

nodes_renamed_by_numbers = nx.convert_node_labels_to_integers(hero_graph) #covert and maping all Nodes into Numbers
edges_renamed_by_numbers = [edge for edge in nodes_renamed_by_numbers.edges()] # edges from the enumerated nodes

## Graph based on edges renamed by numbers
graph_numbered = ig.Graph(edges_renamed_by_numbers, directed = False)

## Geolocalization
layout = graph_numbered.layout('kk',dim = 3)  #3D Localization

## x,y,z Position of nodes
X_nodes=[layout[i][0] for i in range(node_numbers)]# x-coordinates of enumated nodes
Y_nodes=[layout[i][1] for i in range(node_numbers)]# y-coordinates
Z_nodes=[layout[i][2] for i in range(node_numbers)]# z-coordinates

X_edges=[]
Y_edges=[]
Z_edges=[]

## Grouping Coordinates
for edge in edges_renamed_by_numbers:
    X_edges+=[layout[edge[0]][0],layout[edge[1]][0], None]# x-coordinates of edge ends
    Y_edges+=[layout[edge[0]][1],layout[edge[1]][1], None]
    Z_edges+=[layout[edge[0]][2],layout[edge[1]][2], None]
    
## Nodes Name 
labels = []
group = []

for i in range(len(edges_name)):
    value = edges_name[i][0]
    labels.append(value)
    
for i in range(len(edges_renamed_by_numbers)):
    value = edges_renamed_by_numbers[i][0]
    group.append(value)
    
group =[]
group.extend(np.repeat(1,2000))
group.extend(np.repeat(2,2000))
group.extend(np.repeat(3,3000))
group.extend(np.repeat(4,1000))
group.extend(np.repeat(5,2000))

trace1=go.Scatter3d(x=X_edges,y=Y_edges,z=Z_edges,mode='lines',
                    line=dict(color='rgb(125,125,125)', width=1),hoverinfo='none')

trace2=go.Scatter3d(x=X_nodes,y=Y_nodes,z=Z_nodes,mode='markers',name='SuperHeroe',
                    marker=dict(symbol='circle',size=4,color=group,colorscale='Viridis',line=dict(color='rgb(50,50,50)', width=0.5)),
                    text=labels,hoverinfo='text')

axis=dict(showbackground=False,showline=False,zeroline=False,
          showgrid=False,showticklabels=False,title='')
    



In [None]:
# layout = go.Layout(
#          title="Graph Hero Network Marvel",
#     autosize=True,
#          width=1000,
#          height=1000,
#          showlegend=False,
#          scene=dict(
#              xaxis=dict(axis),
#              yaxis=dict(axis),
#              zaxis=dict(axis),
#         ),
#      margin=dict(
#         t=100
#     ),
#     hovermode='closest',
#     annotations=[
#            dict(
#            showarrow=False,
#             text="Graph Marvel Network",
#             xref='paper',
#             yref='paper',
#             x=0,
#             y=0.1,
#             xanchor='left',
#             yanchor='bottom',
#             font=dict(
#             size=14
#             )
#             )
#         ],    )

# data=[trace1, trace2]
# fig=go.Figure(data=data, layout=layout)
# #fig.show()

## Conclusiones

 - Se obtuvo una cantidad total de  6426 nodos y 167219
 - Los  personajes con mas concurrencia en los comics son:
     1. Capitán América
     2. Spiderman/Peter Parker
     3. Iron Man/Tony Stark
     4. Thing/ Benjamin J. GR
     5. Mr. Fantastico/Reed R.
     6. Wolverine/Logan
     7. Human Torch/Jhonny S.
     8. Scarlet Witch/Wanda
     9. Thor/ Dr. Donal Blake
     10. Beast/Henry P.
 - Aunque nodos no son vecinos entre sí, la mayoria de los nodos se alcanzan desde cualquier nodo origen a través de un numero relativamente corto de saltos entre ellos, lo que ocasiona que el grafo obtenido tenga comportamiento tipo "pequeño mundo".