First step: import the libraries

In [35]:
import json
import pandas as pd
import numpy as np
import networkx as nx
from pprint import pprint
import matplotlib.pyplot as plt
from itertools import combinations

Second step: path to the graph stored in gml format

In [42]:
G = nx.read_gml("data/dhdk/dhdk_coauthorship_network.gml")

Third step: apply measures

In [69]:

betweenness_centrality = nx.betweenness_centrality(G, weight="weight")
bc_data = pd.DataFrame.from_dict(betweenness_centrality, 
                                columns=["BetweennessCentrality"], 
                                orient="index")
bc_data.sort_values(by=["BetweennessCentrality"], ascending=False)

Unnamed: 0,BetweennessCentrality
"TOMASI, FRANCESCA",0.365687
"BARTOLINI, ILARIA",0.310429
"MILANO, MICHELA",0.157211
"DUCA, SILVIA",0.138348
"PRESUTTI, VALENTINA",0.123821
...,...
"COLUCCI, MARIACHIARA",0.000000
"MAMBELLI, FRANCESCA",0.000000
"DE SANTIS, CRISTIANA",0.000000
"LONGO, DANILA",0.000000


Betweenness Centrality:
High values suggest that the node plays a critical role in connecting other nodes in the network:
"TOMASI, FRANCESCA" has the highest betweenness centrality, indicating that this individual's presence is crucial for maintaining connectivity between other nodes.


In [49]:
degree_centrality = nx.degree_centrality(G)
dc_data = pd.DataFrame.from_dict(degree_centrality, 
                                columns=["DegreeCentrality"], 
                                orient="index")
dc_data.sort_values(by=["DegreeCentrality"], ascending=False)

Unnamed: 0,DegreeCentrality
"TOMASI, FRANCESCA",0.214106
"PERONI, SILVIO",0.188917
"MILANO, MICHELA",0.166247
"VITALI, FABIO",0.138539
"BARTOLINI, ILARIA",0.123426
...,...
"OMICINI, ANDREA",0.002519
"DI TELLA, ALESSANDRA",0.002519
"ROSSI, FEDERICA",0.002519
"GODART, FREDERIC",0.002519


Degree Centrality:
Reflects the number of connections a node has.
"TOMASI, FRANCESCA" has the highest degree centrality, implying that this person has the most direct connections.

In [72]:
closeness_centrality = nx.closeness_centrality(G, distance="weight")
cc_data = pd.DataFrame.from_dict(closeness_centrality, 
                                columns=["ClosenessCentrality"], 
                                orient="index")
cc_data.sort_values(by=["ClosenessCentrality"], ascending=False)

Unnamed: 0,ClosenessCentrality
"TOMASI, FRANCESCA",0.184239
"DUCA, SILVIA",0.172527
"PRESUTTI, VALENTINA",0.168067
"VIALE, MATTEO",0.165659
"SOLMI, RICCARDO",0.165133
...,...
"LEGNANI ANNICHINI, ALESSIA",0.004030
"TRAPIN, LUCA",0.002519
"LILLO, FABRIZIO",0.002519
"CAPUCCINO, CARLOTTA",0.001259


Without Edge Weights:
TOMASI, FRANCESCA (0.420055): This node has the highest closeness centrality. It is, on average, closer to other nodes in the network compared to the rest.

PERONI, SILVIO (0.394565): The second-highest closeness centrality. Similar interpretation to TOMASI, FRANCESCA.

Low Closeness Centralities: Nodes like PICCININI, ALESSANDRO, IOVINE, GIULIO, CAPUCCINO, CARLOTTA, LILLO, FABRIZIO, TRAPIN, LUCA have relatively low closeness centrality. These nodes are less central in terms of proximity to other nodes.

With Edge Weights:
TOMASI, FRANCESCA (0.184239): Despite having the highest closeness centrality without weights, its closeness decreases when considering edge weights. The weights indicate that the paths to other nodes may be longer or have higher costs.

DUCA, SILVIA (0.172527): DUCA, SILVIA becomes more central when edge weights are considered. It suggests that, with weights, DUCA, SILVIA is closer to other nodes.

PRESUTTI, VALENTINA (0.168067): Similar to DUCA, SILVIA, PRESUTTI, VALENTINA becomes more central when considering edge weights.

Why it Changes:
Edge Weights Influence Path Selection:

With edge weights, longer paths may be preferred if they have lower weights. This can lead to different paths being selected as the "shortest," altering closeness centrality.
Higher Edge Weights Increase Distance:

Higher edge weights effectively increase the distance between nodes. If a path has a high weight, it contributes more to the overall distance in the weighted network.
Connection Strength Matters:

Edge weights capture the strength or importance of connections. Nodes with strong connections (lower weights) may have higher closeness centrality in the weighted network.
Impact on Spread of Information:

Closeness centrality measures how quickly information can spread. If weighted edges represent communication strength, the weighted closeness reflects how quickly information can traverse strong connections.
In essence, the inclusion of edge weights adjusts the notion of "closeness" to consider the strength or cost associated with traversing edges. This adjustment can lead to changes in the ranking of nodes' closeness centrality.

In [75]:
eigenvector_centrality = nx.eigenvector_centrality(G, weight="weight")
ec_data = pd.DataFrame.from_dict(eigenvector_centrality, 
                                columns=["EigenvectorCentrality"], 
                                orient="index")
ec_data.sort_values(by=["EigenvectorCentrality"], ascending=False)

Unnamed: 0,EigenvectorCentrality
"PERONI, SILVIO",5.393762e-01
"VITALI, FABIO",5.310575e-01
"DI IORIO, ANGELO",3.102252e-01
"TOMASI, FRANCESCA",2.799114e-01
"GANGEMI, ALDO",2.111199e-01
...,...
"BISI, SILVIA",1.278039e-30
"CAPUCCINO, CARLOTTA",4.771634e-34
"IOVINE, GIULIO",4.771634e-34
"TRAPIN, LUCA",7.264521e-37


Eigenvector Centrality:
Considers both the number and the importance of a node's neighbors.
"TOMASI, FRANCESCA" has the highest eigenvector centrality, implying that the people connected to this person are themselves well-connected.


Why it Changes:
Edge Weights Influence Strength:

Nodes with strong connections (lower weights) contribute more to the eigenvector centrality of their neighbors. The weights influence the flow of influence through the network.
Weighted Paths Matter:

Paths with lower weights contribute more to the eigenvector centrality. If a node is connected to nodes with strong connections, its centrality increases.
Higher Eigenvector Centrality for Some Nodes:

Nodes like PERONI, SILVIO, VITALI, FABIO, and others have higher eigenvector centrality with weights. These nodes likely have strong connections to other central nodes.
Decreased Centrality for Others:

Nodes like PICCININI, ALESSANDRO, CAPUCCINO, CARLOTTA, IOVINE, GIULIO, TRAPIN, LUCA, LILLO, FABRIZIO have extremely low eigenvector centrality values with or without weights. These nodes might have weaker or less influential connections.
Interpretation:
Eigenvector centrality with weights reflects not only the structure of the network but also the strength of connections.
Nodes with higher eigenvector centrality in the weighted network are influential not just due to their connectivity but also because of the strength of their connections.

In [76]:
clustering = nx.clustering(G)
clustering_data = pd.DataFrame.from_dict(clustering, 
                                columns=["Clustering"],
                                orient="index")
clustering_data.sort_values(by=["Clustering"], ascending=False)

Unnamed: 0,Clustering
"MAMBELLI, FRANCESCA",0.075970
"SCOPECE, FIORA",0.074737
"DIONIGI, IVANO",0.061562
"CITTI, FRANCESCO",0.060264
"ZIOSI, ANTONIO",0.059982
...,...
"SOTTARA, DAVIDE",0.000000
"D'ANGELO, GABRIELE",0.000000
"CAPUCCINO, CARLOTTA",0.000000
"PERRETTI, FABRIZIO",0.000000


Clustering Coefficient:
Measures the degree to which nodes in a graph tend to cluster together.
"BARTOLINI, ILARIA" has the highest clustering coefficient, indicating that this person's neighbors are more interconnected.

In [53]:
avg_cohesion = nx.average_clustering(G)
print("Cohesion: ", avg_cohesion)

Cohesion:  0.7838469280463082


Average Clustering Coefficient (ACC):
The value 0.7838 suggests that, on average, the nodes in the graph tend to form cohesive groups. This coefficient is a measure of the density of triangles in the graph, indicating how much nodes tend to cluster together. A high average clustering coefficient implies that nodes in the graph are well-connected to their neighbors, forming local clusters or communities.

In [54]:
num_connected_components = nx.number_connected_components(G)
print("Connectedness: ", num_connected_components)

Connectedness:  6


Number of Connected Components:
The value 6 indicates that the graph has 6 connected components. A connected component is a subgraph in which there is a path between any two nodes. Having multiple connected components means that there are isolated groups of nodes in the graph. 

In [56]:
def calc_compactness(graph):
    shortest_path_lengths = dict(nx.all_pairs_shortest_path_length(graph))

    total_compactness = 0
    total_pairs = 0

    for source, lengths in shortest_path_lengths.items():
        for target, distance in lengths.items():
            if source != target:
                total_compactness += 1 / distance
                total_pairs += 1

    if total_pairs == 0:
        return 0  # Avoid division by zero

    return total_compactness / total_pairs

compactness = calc_compactness(G)
print("Compactness: ", compactness)

Compactness:  0.334764335938241


Compactness:
The value 0.33476433593824045 is the computed compactness for the given graph. Compactness is calculated as the reciprocal of the geodesic distance between pairs of nodes, this value suggests the following interpretation:
The compactness value lies between 0 and 1. Higher compactness values (closer to 1) indicate that nodes in the graph are more easily reachable from each other, potentially through shorter paths. Lower compactness values (closer to 0) suggest that the nodes are less easily reachable from each other, and there might be longer or more circuitous paths between them. The computed compactness value of approximately 0.355 indicates a moderate level of compactness. The reciprocal of the geodesic distances suggests that there are relatively shorter paths between pairs of nodes on average.

In [57]:
transitivity = nx.transitivity(G)
print("Transitivity: ", transitivity)

Transitivity:  0.6172535975124738


Transitivity:
A transitivity value of 0.657 suggests a relatively high level of clustering or transitive relationships within the graph, this value indicates that there are a substantial number of triangles or closed triads in the network. In simpler terms, if node A is connected to both node B and node C, there's a relatively high likelihood that nodes B and C are also directly connected to each other. This pattern of connectivity often reflects a clustering tendency in the network. The transitivity value ranges from 0 to 1, where 0 indicates no transitivity (no triangles or clustering), and 1 indicates maximum transitivity (all possible triangles are present). A value of 0.657 suggests that a significant portion of the graph's nodes are involved in triangles or closed triads, indicating a moderately high level of local clustering in the network.

In [58]:
core_number = nx.core_number(G)
k_data = pd.DataFrame.from_dict(core_number,
                                    columns=["KCore"],
                                    orient="index")
k_data.sort_values(by=["KCore"], ascending=False)

Unnamed: 0,KCore
"RENDA, GIULIA",30
"TOMASI, FRANCESCA",30
"BITELLI, GABRIELE",30
"TINI, MARIA ALESSANDRA",30
"GUALANDI, BIANCA",30
...,...
"BENIGNI, FEDERICA",1
"SOTTARA, DAVIDE",1
"VAN HENTENRYCK, PASCAL RENÉ M.",1
"FERRARIO, ROBERTA",1


High Core Numbers:
Nodes like "RENDA, GIULIA," "TOMASI, FRANCESCA," "BITELLI, GABRIELE," etc., have a core number of 30. This indicates that these nodes are part of the 30-core, which means they are highly interconnected and form a central part of the network.

Uniform Core Numbers:
The fact that many nodes have the same core number (30) suggests a relatively homogeneous and densely connected region in the network. It could indicate a well-connected community or subgroup.

Low Core Numbers:
Nodes like "BENIGNI, FEDERICA," "SOTTARA, DAVIDE," "VAN HENTENRYCK, PASCAL RENÉ M.," etc., have a core number of 1. These nodes are less connected and likely part of the periphery of the network.

Network Structure:
The presence of both high and low core numbers suggests a hierarchical or modular structure in the network. The nodes with high core numbers form a cohesive core, while nodes with low core numbers are more on the outskirts or in less-connected regions.

Centrality and Importance:
Nodes with high core numbers are likely to be more central and play a more crucial role in connecting different parts of the network. Nodes with low core numbers may have more specialized or isolated roles.

In [60]:
communities = nx.algorithms.community.greedy_modularity_communities(G)
community_mapping = {}
for i, community in enumerate(communities):
    for node in community:
        community_mapping[node] = i
        

c_data = pd.DataFrame.from_dict(community_mapping,
                                    columns=["Communities"],
                                    orient="index")
c_data.sort_values(by=["Communities"], ascending=False)

Unnamed: 0,Communities
"IOVINE, GIULIO",13
"CAPUCCINO, CARLOTTA",13
"LILLO, FABRIZIO",12
"TRAPIN, LUCA",12
"PLAZZI, GIUSEPPE",11
...,...
"TURRINI, ELISA",0
"SARTOR, GIOVANNI",0
"PODDA, EMANUELA",0
"SACERDOTI COEN, CLAUDIO",0


Community Sizes:
For each professor, the number next to their name represents the size of the community to which they belong. For example, "IOVINE, GIULIO" belongs to a community of size 13. The sizes of the communities vary, ranging from 13 to 0. Larger community sizes may indicate more cohesive or tightly connected groups within your network. Professors like "TURRINI, ELISA" and "SARTOR, GIOVANNI" belong to communities with zero size. This might imply that they are not part of any identified community according to the Greedy Modularity algorithm.

In [62]:
def homophily(G):
    num_same_ties = 0
    num_diff_ties = 0
    for n1, n2 in G.edges():
        if G.nodes[n1]['affiliation'] == G.nodes[n2]['affiliation']:
            num_same_ties += 1
        else:
            num_diff_ties += 1
    return (num_same_ties / (num_same_ties + num_diff_ties))
homophily(G)

0.3298041291688724

A homophily value of 0.3298041291688724 suggests a moderate level of homophily in your network. Homophily refers to the tendency of nodes with similar characteristics to be connected to each other in a network. The value ranges from 0 to 1, where:
0 indicates no homophily (nodes with similar characteristics are not more likely to be connected).
1 indicates perfect homophily (nodes with similar characteristics are always connected).

In [66]:
prof_list = ['PERONI, SILVIO', 'TOMASI, FRANCESCA', 'VITALI, FABIO', 'PESCARIN, SOFIA', 'GANGEMI, ALDO', 'ITALIA, PAOLA MARIA CARMELA', 'TAMBURINI, FABIO', 'DAQUINO, MARILENA', 'GIALLORENZO, SAVERIO', 'ZUFFRANO, ANNAFELICIA', 'IOVINE, GIULIO', 'BARTOLINI, ILARIA', 'SPEDICATO, GIORGIO', 'PALMIRANI, MONICA', 'BASKAKOVA, EKATERINA', 'FERRIANI, SIMONE']
def affiliation_homophiliy(G, nodes):
    data = []
    for node in nodes:
        affiliation = G.nodes[node]['affiliation']
        neighbors = list(G.neighbors(node))
        total_connections = len(neighbors)
        connections_outside_affiliation = sum(1 for neighbor in neighbors if G.nodes[neighbor]['affiliation'] != affiliation)
        
        ratio = 0 if total_connections == 0 else round(connections_outside_affiliation / total_connections, 2)

        data.append([node, connections_outside_affiliation, affiliation, total_connections, ratio])

    df = pd.DataFrame(data, columns=['name', 'connections_outside_affiliation', 'affiliation', 'total_connections', 'ratio'])
    return df.sort_values(by=['ratio'], ascending=False)
print(affiliation_homophiliy(G, prof_list))


                           name  connections_outside_affiliation  \
3               PESCARIN, SOFIA                               33   
10               IOVINE, GIULIO                                1   
4                 GANGEMI, ALDO                               38   
0                PERONI, SILVIO                               61   
11            BARTOLINI, ILARIA                               37   
6              TAMBURINI, FABIO                               28   
7             DAQUINO, MARILENA                               32   
2                 VITALI, FABIO                               35   
13            PALMIRANI, MONICA                               25   
12           SPEDICATO, GIORGIO                                2   
1             TOMASI, FRANCESCA                               42   
8          GIALLORENZO, SAVERIO                               10   
9         ZUFFRANO, ANNAFELICIA                                2   
5   ITALIA, PAOLA MARIA CARMELA                 