# Notebook 9 - Centralities of Actors in iEcosystems
### MIT Global Ecosystems Dynamics initiative

This notebook develops analysis of the studied Ecosystems, focused in the understanding of the value each organization brings to their respective network. From the measuring instrument of the the study conducted by GED, information could be captured such as the 

In particular, this notebook produces an account as complete as possible of the centralities of each of the organizations from the iEcosystems. Performing analysis of these centralities in parallel with the information obtained from the study conducted by collaborating organization PRODEM, might help understand notions such as resilience and equilibrium of economical systems, and in particular of iEcosystems. 


### Part 1: Importing Packages

In [1]:
import pandas as pd
import numpy as np 
import networkx as nx
import scipy.stats as stats

import matplotlib.pyplot as plt
import seaborn as sns; sns.set()

### Part 2: Reading of Data form GED

We read the data from each of our ecosystems. This data comes from a series of studies conducted by GED. 

In [2]:
#----------------
# CSVs: nodes 
#----------------
csvs_nodes = dict()

csvs_nodes['Aguascalientes'] =    pd.read_csv('Ecosystems_from_GED/Gephi_stats/Gephi AGS Stats.csv')             
csvs_nodes['CABA'] =              pd.read_csv('Ecosystems_from_GED/Gephi_stats/Gephi CABA Stats.csv')           #capital
csvs_nodes['CDMX'] =              pd.read_csv('Ecosystems_from_GED/Gephi_stats/Gephi CDMX Stats.csv')           #capital
csvs_nodes['Guadalajara'] =       pd.read_csv('Ecosystems_from_GED/Gephi_stats/Gephi GDL Stats.csv') 
csvs_nodes['Pachuca'] =           pd.read_csv('Ecosystems_from_GED/Gephi_stats/Gephi Hidalgo Stats.csv')
csvs_nodes['Madrid'] =            pd.read_csv('Ecosystems_from_GED/Gephi_stats/Gephi Madrid Stats.csv')          #capital
csvs_nodes['Montevideo'] =        pd.read_csv('Ecosystems_from_GED/Gephi_stats/Gephi Montevideo Stats.csv')      #capital
csvs_nodes['Oaxaca'] =            pd.read_csv('Ecosystems_from_GED/Gephi_stats/Gephi Oaxaca Stats.csv')
csvs_nodes['Sao Paulo'] =         pd.read_csv('Ecosystems_from_GED/Gephi_stats/Gephi Sao Paulo Stats.csv')       #capital
csvs_nodes['Santiago de Chile'] = pd.read_csv('Ecosystems_from_GED/Gephi_stats/Gephi SCL Stats.csv')             #capital
csvs_nodes['Valencia'] =          pd.read_csv('Ecosystems_from_GED/Gephi_stats/Gephi Valencia Stats.csv')             #capital


#----------------
# CSVs: edges
#----------------
csvs_edges = dict()

csvs_edges['Aguascalientes'] =     pd.read_csv('Ecosystems_from_GED/Gephi_edges/Gephi AGS Edges.csv')
csvs_edges['CABA'] =               pd.read_csv('Ecosystems_from_GED/Gephi_edges/Gephi CABA Edges.csv')           #capital
csvs_edges['CDMX'] =               pd.read_csv('Ecosystems_from_GED/Gephi_edges/Gephi CDMX Edges.csv')           #capital
csvs_edges['Guadalajara'] =        pd.read_csv('Ecosystems_from_GED/Gephi_edges/Gephi GDL Edges.csv')
csvs_edges['Pachuca'] =            pd.read_csv('Ecosystems_from_GED/Gephi_edges/Gephi Hidalgo Edges.csv')
csvs_edges['Madrid'] =             pd.read_csv('Ecosystems_from_GED/Gephi_edges/Gephi Madrid Edges.csv')          #capital
csvs_edges['Montevideo'] =         pd.read_csv('Ecosystems_from_GED/Gephi_edges/Gephi Montevideo Edges.csv')      #capital
csvs_edges['Oaxaca'] =             pd.read_csv('Ecosystems_from_GED/Gephi_edges/Gephi Oaxaca Edges.csv')
csvs_edges['Sao Paulo'] =          pd.read_csv('Ecosystems_from_GED/Gephi_edges/Gephi Sao Paulo Edges.csv')       #capital
csvs_edges['Santiago de Chile'] =  pd.read_csv('Ecosystems_from_GED/Gephi_edges/Gephi SCL Edges.csv')             #capital
csvs_edges['Valencia'] =           pd.read_csv('Ecosystems_from_GED/Gephi_edges/Gephi Valencia Edges.csv')             #capital



#---------------------------------
# CSVs: evaluating organizations
#---------------------------------
csvs_evals = dict()

csvs_evals['Aguascalientes'] =     pd.read_csv('Ecosystems_from_GED/Gephi_evaluadores/Evaluadores ecosistemas - AGS.csv')
csvs_evals['CABA'] =               pd.read_csv('Ecosystems_from_GED/Gephi_evaluadores/Evaluadores ecosistemas - CABA.csv')
csvs_evals['CDMX'] =               pd.read_csv('Ecosystems_from_GED/Gephi_evaluadores/Evaluadores ecosistemas - CDMX.csv')
csvs_evals['Guadalajara'] =        pd.read_csv('Ecosystems_from_GED/Gephi_evaluadores/Evaluadores ecosistemas - GDL.csv')
csvs_evals['Pachuca'] =            pd.read_csv('Ecosystems_from_GED/Gephi_evaluadores/Evaluadores ecosistemas - HGO.csv')
csvs_evals['Madrid'] =             pd.read_csv('Ecosystems_from_GED/Gephi_evaluadores/Evaluadores ecosistemas - MAD.csv')
csvs_evals['Montevideo'] =         pd.read_csv('Ecosystems_from_GED/Gephi_evaluadores/Evaluadores ecosistemas - MVD.csv')
csvs_evals['Oaxaca'] =             pd.read_csv('Ecosystems_from_GED/Gephi_evaluadores/Evaluadores ecosistemas - OAX.csv')
csvs_evals['Sao Paulo'] =          pd.read_csv('Ecosystems_from_GED/Gephi_evaluadores/Evaluadores ecosistemas - SAO.csv')
csvs_evals['Santiago de Chile'] =  pd.read_csv('Ecosystems_from_GED/Gephi_evaluadores/Evaluadores ecosistemas - SCL.csv')
#csvs_evals['Valencia'] =           pd.read_csv('Ecosystems_from_GED/Gephi_evaluadores/Evaluadores ecosistemas - VAL.csv')


#This is, for example, the list of the names of the organizations that responded the survey in Buenos Aires, ARG
#list(csvs_evals['CABA']['Label'])

### Part 3: Building of GraphMLs

In this part of the code, the imported data is processed into a GraphML object, 

In [3]:
def armar_grafo(nodes,edges,rol_str,weight_str):
    '''
    Function that builds a NetworkX graph from a list of nodes and connections.
    
    In:
    - nodes       node list
    - edges       edge list
    - rol_str     name of the parameter that describes an actor's role
    - weight_str  name of the parameter that describes an edge's weight
    
    Out:
    G, a NetworkX graph.
    '''
    
    # DiGraph from edge list
    G=nx.from_pandas_edgelist(edges,'Source','Target',edge_attr=["Weight"],create_using=nx.DiGraph())
    
    # rol is a dictionary: node id -> role
    rol = {nid: nodes[nodes['Id']==nid][rol_str].values[0] for nid in nodes['Id']}
    nx.set_node_attributes(G,rol,'rol')
    
    # weight is a dictionary: node id -> weight
    weight = {nid: nodes[nodes['Id']==nid][weight_str].values[0] for nid in nodes['Id']}
    nx.set_node_attributes(G,weight,'weight')
    
    return G

Now, we build all the graphs for our studied ecosystems by using the function `armar grafo` from above. 

In [4]:
#-----------------------
# Uniform names for Role and Weight
#-----------------------

#small detail: CDMX's type is empty column, Sao Paulo's node size is empty as well
csvs_nodes['CDMX'] = csvs_nodes['CDMX'].drop('type', axis=1)
csvs_nodes['Sao Paulo'] = csvs_nodes['Sao Paulo'].drop('node size', axis=1)


for city in csvs_nodes.keys():
    role_variations = ['role', 'type', 'rol estimado', 'rol']
    weight_variations = ['weight', 'node size']
    
    for var in role_variations:
        if var in csvs_nodes[city].columns:
            csvs_nodes[city].rename(columns={var: "Role"}, inplace=True)
    for var in weight_variations:
        if var in csvs_nodes[city].columns:
            csvs_nodes[city].rename(columns={var: "Weight"}, inplace=True)
            
#csvs_nodes['Aguascalientes']


In [5]:
#-----------------------
# Graph Construction
#-----------------------
graphs = dict()

for city in csvs_nodes.keys():
    graphs[city] = armar_grafo(csvs_nodes[city], csvs_edges[city], 'Role', 'Weight')
    
#graphs['Aguascalientes']

### Part 3: Polishing column names of the DataFrames

In this section, we polish the collection of columns which are being considered in each DataFrame. Some of the columns will be removed, some will be renamed. The objective is to end up with the same column names in all DataFrames. 

In [6]:
polished_cols = dict()
for city, datafr in csvs_nodes.items():
    #remove columns which we don't want to keep
    df = datafr.copy()
    to_remove = ['timeset', 'name', 'fullname', 'full name',
                 'strongcompnum', 'id dir', 'componentnumber', 
                 'url', 'part of a university', 'location', 'tags',
                 'id interno', '# gephi', 'description', 'region',
                 'organization type', 'participation', 'ego', 'evaluador',
                 'pageranks', 'Hub', 'Authority', 'mentions']
    for col in to_remove:
        if col in df.columns:
            df = df.drop(col, axis = 1)
    
    df = df.rename(columns={'indegree': 'In',
                      'outdegree': 'Out',
                      'closenesscentrality': 'Closeness Centrality',
                      'closnesscentrality': 'Closeness Centrality',
                      'harmonicclosnesscentrality': 'Harmonic Closeness Centrality',
                      'betweenesscentrality': 'Betweeness Centrality',
                      'modularity_class': 'Modularity Class',
                      'clustering': 'Clustering',
                      'weighted indegree': 'Weighted In',
                      'weighted outdegree': 'Weighted Out',
                      'eigencentrality': 'Eigencentrality',
                      'triangles': 'Triangles',
                      'avg strength': 'Average Strength',
                      'avg. strength': 'Average Strength'})
    
    polished_cols[city] = df
    
    
    
'''
columns = polished_cols['Sao Paulo'].columns.tolist()
for col in columns:
    print('')
    print(col)
    for city, datafr in polished_cols.items():
        if col not in datafr.columns:
            print(city)

print('***')
for city, datafr in polished_cols.items():
    print(city)
    print(len(datafr.columns))    
'''

polished_cols['Sao Paulo']

Unnamed: 0,Id,Label,Role,Average Strength,Weight,In,Out,Degree,Weighted In,Weighted Out,Weighted Degree,Eccentricity,Closeness Centrality,Harmonic Closeness Centrality,Betweeness Centrality,Modularity Class,Clustering,Triangles,Eigencentrality
0,AosFatos,AosFatos,Promotor,1.000000,0.2,1,0,1,1,0,1,8,0.197248,0.217846,0.000000,6,0.000000,0,0.018553
1,FACENS,FACENS,Habilitador,1.000000,0.2,2,0,2,2,0,2,8,0.218274,0.243272,0.000000,5,1.000000,1,0.120428
2,Anpecom,Anpecom,Habilitador,2.000000,0.4,1,0,1,2,0,2,8,0.180824,0.197757,0.000000,1,0.000000,0,0.015230
3,IFSP,IFSP,Habilitador,2.000000,0.4,0,1,1,0,2,2,8,0.218053,0.240947,0.000000,5,0.000000,0,0.064735
4,InfoAmazonia,InfoAmazonia,Promotor,2.000000,0.4,1,0,1,2,0,2,8,0.197248,0.217846,0.000000,6,0.000000,0,0.018553
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
211,MCTIC,MCTIC,Articulador,3.375000,5.4,8,0,8,27,0,27,6,0.313869,0.373023,1758.761382,9,0.107143,3,0.472154
212,CCTI Desenvolvimento Econômico SP,CCTI,Articulador,4.000000,6.4,11,9,20,35,27,62,5,0.337520,0.402171,2478.311235,8,0.104575,16,1.000000
213,USP,USP,Generador de Conocimiento,3.200000,6.4,11,0,11,33,0,33,6,0.313411,0.383023,3276.671903,8,0.072727,4,0.665529
214,Inovabra,Inovabra,Habilitador,3.777778,6.8,7,3,10,22,13,35,6,0.312500,0.366357,1771.946770,7,0.138889,5,0.573358


The objective of uniformizing the columns in the studied DataFrames was nearly achieved. The only nonuniform column is `avg strength` which is only missing in Aguascalientes and Guadalajara. 

In the remainder of this part, we set Id to be the index of each of the DataFrames.

In [7]:
for city, datafr in polished_cols.items():
    datafr.set_index('Id', inplace=True)

#polished_cols['Aguascalientes']

In [8]:
def avg_strength(v, G):
    U = nx.to_undirected(G)
    total, count = 0, 0
    for u in U.neighbors(v):
        count += 1
        edgedata = U.get_edge_data(u, v)
        total += edgedata['Weight']
    return total/count

In [9]:
for city in ['Aguascalientes', 'Guadalajara']:
    avg_strength_row = dict()
    for i, row in polished_cols[city].iterrows():
        avg_strength_row[i] = avg_strength(i,graphs[city])
    polished_cols[city]['Average Strength'] = polished_cols[city]['Label'].map(avg_strength_row)

In [10]:
polished_cols['Guadalajara']

Unnamed: 0_level_0,Label,Role,Weight,In,Out,Degree,Weighted In,Weighted Out,Weighted Degree,Eccentricity,Closeness Centrality,Harmonic Closeness Centrality,Betweeness Centrality,Modularity Class,Clustering,Triangles,Eigencentrality,Average Strength
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
inMateriis,inMateriis,Generador de Conocimiento,1.4,10,15,25,34,69,103,4,0.426606,0.478943,1693.202956,3,0.110526,21,0.472239,4.050000
Actum BSS,Actum BSS,Habilitador,1.0,1,0,1,5,0,5,5,0.299517,0.316487,0.000000,3,0.000000,0,0.034200,5.000000
CIATEJ,CIATEJ,Generador de Conocimiento,4.4,7,17,24,17,41,58,3,0.428571,0.485663,1833.776078,6,0.086957,22,0.381130,2.304348
AMC,AMC,Vinculador,0.2,1,0,1,1,0,1,4,0.300485,0.318548,0.000000,6,0.000000,0,0.028834,1.000000
CIE ITESM GDA,CIE ITESM GDA,Generador de Conocimiento,1.0,0,16,16,0,31,31,4,0.396588,0.438620,1055.693817,1,0.109890,10,0.355804,1.714286
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
UNAM,UNAM,Generador de Conocimiento,0.2,1,0,1,1,0,1,4,0.300485,0.318548,0.000000,6,0.000000,0,0.028834,1.000000
UNIVA,UNIVA,Habilitador,1.0,1,0,1,5,0,5,5,0.257975,0.271953,0.000000,2,0.000000,0,0.008818,5.000000
UP,UP,Habilitador,0.2,1,0,1,1,0,1,4,0.283105,0.296595,0.000000,2,0.000000,0,0.017125,1.000000
WeWork,WeWork,Habilitador,0.4,1,0,1,2,0,2,5,0.287926,0.303047,0.000000,3,0.000000,0,0.021163,2.000000


We check that there are no null entries in the DataFrames.

In [11]:
print('Null entries per studied city')
print('')
for city, datafr in polished_cols.items():
    print (city + ': '+str(sum(datafr.isnull().sum().tolist())))

Null entries per studied city

Aguascalientes: 0
CABA: 0
CDMX: 0
Guadalajara: 0
Pachuca: 0
Madrid: 0
Montevideo: 0
Oaxaca: 0
Sao Paulo: 0
Santiago de Chile: 0
Valencia: 0


### Part 4: Understanding the Centralities of each actor

Now that we have a uniform collection of DataFrames, we dive in in order to understand the centralities of each actor from the studied ecosystems:

  *  Degree
  *  Eccentricity
  *  Average Shortest Path
  *  Global Efficiency
  *  Local Efficiency
  *  Clustering Coefficient
  *  Betweeness Centrality
  *  Authority Centrality
  *  Hub Centrality
  *  Pagerank
  
Another metric from the series of studies conducted by Global Ecosystems Dynamics initiative is `Average Strength`, which denotes the average intensity of the collaborations of a given organization with other organizations. 
  
Most of these centralities are already included in the existing DataFrames, but some of them are computed in this section, in particular the following: 
  *  Average Shortest Path Length
  *  Global efficiency
  *  Local efficiency
  *  Clustering Coefficient Revision (*we found a discrepancy in an earlier notebook*)
  *  Authority Centrality
  *  Hub Centrality
  *  Pagerank

In [18]:
def avg_short_path(v,city):
    G = graphs[city]
    U = nx.to_undirected(G)
    count, total = 0,0
    for u in G.nodes():
        if u != v:
            count += 1
            total += len(nx.shortest_path(U,u,v))-1
    return total/count

        
def glob_eff(v,city):
    G = graphs[city]
    U = nx.to_undirected(G)
    count, total = 0,0
    for u in G.nodes():
        if u != v:
            count += 1
            total += nx.efficiency(U,u,v)
    return total/count

def loc_eff(v,city):
    G = graphs[city]
    S = nx.to_undirected(G.subgraph(G.neighbors(v)))
    return nx.global_efficiency(S)
    
def clust(v,city):
    G = graphs[city]
    U = nx.to_undirected(G)
    if nx.degree(G,v) >= 2:
        return nx.clustering(U,v)
    else:
        return 0
    

def auth_cent(v,city):
    return hits_cities[city][1][v]
    
def hub_cent(v,G):
    return hits_cities[city][0][v]
    

centralities = {'Average Shortest Path Length': avg_short_path,
                'Global efficiency': glob_eff,
                'Local efficiency': loc_eff,
                'Clustering Coefficient': clust,
                'Authority Centrality': auth_cent,
                'Hub Centrality': hub_cent}

hits_cities = {city : nx.hits(G) for city,G in graphs.items()}

In [19]:
for city, datafr in polished_cols.items():
    rows_add = {'Average Shortest Path Length': dict(),
                'Global efficiency': dict(),
                'Local efficiency': dict(),
                'Clustering Coefficient': dict(),
                'Authority Centrality': dict(),
                'Hub Centrality': dict()}
    
    #we work on the dictionary for each column we are going to add
    for metric, dictio in rows_add.items():
        for i, row in datafr.iterrows():
            dictio[i] = centralities[metric](i,city)
            
        datafr[metric] = datafr['Label'].map(dictio)

polished_cols['CDMX']

Unnamed: 0_level_0,Label,Average Strength,Weight,In,Out,Degree,Weighted In,Weighted Out,Weighted Degree,Clustering,...,Betweeness Centrality,Modularity Class,Triangles,Eigencentrality,Average Shortest Path Length,Global efficiency,Local efficiency,Clustering Coefficient,Authority Centrality,Hub Centrality
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Cenapyme,Cenapyme,1.000000,0.2,0,1,1,0,1,1,0.000000,...,0.000000,10,0,0.025775,4.046980,0.258725,0.000000,0.000000,0.000000,0.000507
Compartamos Banco,Compartamos Banco,1.000000,0.2,0,1,1,0,1,1,0.000000,...,0.000000,10,0,0.025775,4.046980,0.258725,0.000000,0.000000,0.000000,0.000507
SECTEI,SECTEI,1.000000,0.2,1,0,1,1,0,1,0.000000,...,0.000000,0,0,0.070557,3.587248,0.295414,0.000000,0.000000,0.004650,0.000000
Victoria147,Victoria147,1.000000,0.2,1,0,1,1,0,1,0.000000,...,0.000000,9,0,0.006758,4.996644,0.212832,0.000000,0.000000,0.000449,0.000000
Accenture,Accenture,2.000000,0.4,0,1,1,0,2,2,0.000000,...,0.000000,1,0,0.010384,4.654362,0.229195,0.000000,0.000000,0.000000,0.000087
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
ITESM,ITESM,3.600000,7.2,10,32,42,36,92,128,0.022760,...,9315.721646,5,16,0.874093,2.516779,0.462528,0.009677,0.022760,0.013071,0.090674
Startup Weekend,Startup Weekend,4.625000,7.4,7,1,8,32,5,37,0.000000,...,639.622302,4,0,0.133290,3.557047,0.312808,0.000000,0.000000,0.005634,0.001021
BID,BID,4.300000,8.6,8,2,10,33,10,43,0.095238,...,3158.583185,1,2,0.272244,2.879195,0.377908,0.000000,0.095238,0.010922,0.003938
Santander,Santander,4.461538,11.6,9,5,14,41,22,63,0.090909,...,2307.844838,10,6,0.503906,2.825503,0.392506,0.100000,0.090909,0.013972,0.004271
