# Trabajando con Datos Externos

## Cargar archivos csv
Utilizamos la base de datos de Marvel (https://www.kaggle.com/csanhueza/the-marvel-universe-social-network)

In [None]:
import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt
import networkx as nx
import time

In [None]:
h = pd.read_csv('../data/hero-network.csv')

Revisamos la información del DataFrame

In [None]:
h.info()

Transformamos el DataFrame en un Grafo Dirigido

In [None]:
G = nx.from_pandas_edgelist(h, source = "hero1", target = "hero2", create_using=nx.DiGraph())
print(nx.info(G))

Crear la función top_nodes que mostrará los valores más altos de un diccionario

In [None]:
def get_top_nodes(cdict, num=5):
    top_nodes ={}
    for i in range(num):
        top_nodes =dict(
            sorted(cdict.items(), key=lambda x: x[1], reverse=True)[:num]
            )
        return top_nodes

#### Grado

Guardar el grado de cada nodo en un diccionario

In [None]:
gdeg=G.degree()

In [None]:
get_top_nodes(dict(gdeg))

In [None]:
print(nx.info(G,'CAPTAIN AMERICA'))

#### In-Degree

In [None]:
indeg=G.in_degree()
get_top_nodes(dict(indeg))

#### Out-Degree

In [None]:
outdeg=G.out_degree()
get_top_nodes(dict(outdeg))

#### Degree Centrality

In [None]:
degree_centrality =nx.degree_centrality(G)
nx.set_node_attributes(G,degree_centrality, 'dc')
get_top_nodes(degree_centrality)

#### Betweenness

In [None]:
t0= time.process_time()

betweenness_centrality = nx.betweenness_centrality(G)
nx.set_node_attributes(G,betweenness_centrality, 'bc')

t1 = time.process_time() - t0
print("Time elapsed: ", t1)

In [None]:
get_top_nodes(betweenness_centrality)

#### Closeness

In [None]:
t0= time.process_time()

closeness_centrality =nx.closeness_centrality(G)
nx.set_node_attributes(G,closeness_centrality, 'cc')

t1 = time.process_time() - t0
print("Time elapsed: ", t1)

In [None]:
get_top_nodes(closeness_centrality)

#### Eigenvector Centrality

In [None]:
t0= time.process_time()

eigenvector_centrality = nx.eigenvector_centrality(G)
nx.set_node_attributes(G, eigenvector_centrality,'ec')

t1 = time.process_time() - t0
print("Time elapsed: ", t1)

In [None]:
get_top_nodes(eigenvector_centrality)

#### PageRank Centrality

In [None]:
t0= time.process_time()

pagerank_centrality =nx.pagerank(G)
nx.set_node_attributes(G, pagerank_centrality, 'pr')

t1 = time.process_time() - t0
print("Time elapsed: ", t1)

In [None]:
get_top_nodes(pagerank_centrality)

## Métricas de Grafo

#### All Shortest Path

In [None]:
list(nx.all_shortest_paths(G,'SPIDER-MAN/PETER PAR','QUILL'))

In [None]:
nx.shortest_path_length(G,'SPIDER-MAN/PETER PAR','QUILL')

#### Densidad

In [None]:
nx.density(G)

#### Local Clustering Coefficient

In [None]:
nx.average_clustering(G)

## Pregunta
¿Cuáles son las principales diferencias entre el cálculo de métricas entre grafos dirigidos y no dirigidos? ¿Cuáles ejecutan más rápido? ¿Hay alguna que no se pueda calcular para uno de los tipos? (Explique en no más de 300 palabras)

Elaborado por Luis Cajachahua bajo licencia MIT (2021)