In [1]:
import json
import pandas as pd
import numpy as np
import networkx as nx
from pprint import pprint
import matplotlib.pyplot as plt

In [2]:
target = "it"
G = nx.read_gml("../data/{0}/{0}_coauthorship_network.gml".format(target))
node_attributes = nx.get_node_attributes(G, "affiliation")


---

# Centrality measures

### Degree Centrality

Degree Centrality:
Reflects the number of connections a node has.
"TOMASI, FRANCESCA" has the highest degree centrality, implying that this person has the most direct connections.

In [3]:
degree_centrality = nx.degree_centrality(G)
dc_data = pd.DataFrame({"Name": list(degree_centrality.keys()),
                        "Affiliation": [node_attributes[node] for node in degree_centrality.keys()],
                        "DegreeCentrality": list(degree_centrality.values())
                        }).sort_values(by="DegreeCentrality", ascending=False).reset_index(drop=True)

dc_data.head(10)

Unnamed: 0,Name,Affiliation,DegreeCentrality
0,"TOMASI, FRANCESCA",DIPARTIMENTO DI FILOLOGIA CLASSICA E ITALIANIS...,0.280528
1,"CHINES, LOREDANA",DIPARTIMENTO DI FILOLOGIA CLASSICA E ITALIANIS...,0.148515
2,"DAQUINO, MARILENA",DIPARTIMENTO DI FILOLOGIA CLASSICA E ITALIANIS...,0.135314
3,"TINTI, PAOLO",DIPARTIMENTO DI FILOLOGIA CLASSICA E ITALIANIS...,0.128713
4,"PERONI, SILVIO",DIPARTIMENTO DI FILOLOGIA CLASSICA E ITALIANIS...,0.128713
5,"TAMBURINI, FABIO",DIPARTIMENTO DI FILOLOGIA CLASSICA E ITALIANIS...,0.128713
6,"ANSELMI, GIAN MARIO GIUSTO",DIPARTIMENTO DI FILOLOGIA CLASSICA E ITALIANIS...,0.125413
7,"BARZAGHI, SEBASTIAN",DIPARTIMENTO DI BENI CULTURALI,0.108911
8,"MATTEUCCI, GIOVANNI",DIPARTIMENTO DI FILOSOFIA E COMUNICAZIONE,0.10231
9,"LUGLI, LUISA",DIPARTIMENTO DI FILOSOFIA E COMUNICAZIONE,0.10231


### Betweenness Centrality

Betweenness Centrality:
High values suggest that the node plays a critical role in connecting other nodes in the network:
"TOMASI, FRANCESCA" has the highest betweenness centrality, indicating that this individual's presence is crucial for maintaining connectivity between other nodes.


In [4]:
weighted_betweenness_centrality = nx.betweenness_centrality(G, weight="weight")

wbc_data = pd.DataFrame({"Name": list(weighted_betweenness_centrality.keys()),
                        "Affiliation": [node_attributes[node] for node in weighted_betweenness_centrality.keys()],
                        "WeightedBetweennessCentrality": list(weighted_betweenness_centrality.values())
                        }).sort_values(by="WeightedBetweennessCentrality", ascending=False).reset_index(drop=True)

wbc_data.head(10)

Unnamed: 0,Name,Affiliation,WeightedBetweennessCentrality
0,"BAZZOCCHI, MARCO ANTONIO",DIPARTIMENTO DI FILOLOGIA CLASSICA E ITALIANIS...,0.390251
1,"TOMASI, FRANCESCA",DIPARTIMENTO DI FILOLOGIA CLASSICA E ITALIANIS...,0.370793
2,"MATTEUCCI, GIOVANNI",DIPARTIMENTO DI FILOSOFIA E COMUNICAZIONE,0.327468
3,"SPAZIANTE, LUCIO",DIPARTIMENTO DELLE ARTI,0.310318
4,"PASQUINI, EMILIO",DIP. DI ITALIANISTICA,0.300467
5,"ITALIA, PAOLA MARIA CARMELA",DIPARTIMENTO DI FILOLOGIA CLASSICA E ITALIANIS...,0.212102
6,"ANSELMI, GIAN MARIO GIUSTO",DIPARTIMENTO DI FILOLOGIA CLASSICA E ITALIANIS...,0.189318
7,"SCOROLLI, CLAUDIA",DIPARTIMENTO DI FILOSOFIA E COMUNICAZIONE,0.177137
8,"VITALI, FABIO",DIPARTIMENTO DI INFORMATICA - SCIENZA E INGEGN...,0.153469
9,"TAMBURINI, FABIO",DIPARTIMENTO DI FILOLOGIA CLASSICA E ITALIANIS...,0.126449


### Closeness Centrality

Without Edge Weights:
TOMASI, FRANCESCA (0.420055): This node has the highest closeness centrality. It is, on average, closer to other nodes in the network compared to the rest.

PERONI, SILVIO (0.394565): The second-highest closeness centrality. Similar interpretation to TOMASI, FRANCESCA.

Low Closeness Centralities: Nodes like PICCININI, ALESSANDRO, IOVINE, GIULIO, CAPUCCINO, CARLOTTA, LILLO, FABRIZIO, TRAPIN, LUCA have relatively low closeness centrality. These nodes are less central in terms of proximity to other nodes.

With Edge Weights:
TOMASI, FRANCESCA (0.184239): Despite having the highest closeness centrality without weights, its closeness decreases when considering edge weights. The weights indicate that the paths to other nodes may be longer or have higher costs.

DUCA, SILVIA (0.172527): DUCA, SILVIA becomes more central when edge weights are considered. It suggests that, with weights, DUCA, SILVIA is closer to other nodes.

PRESUTTI, VALENTINA (0.168067): Similar to DUCA, SILVIA, PRESUTTI, VALENTINA becomes more central when considering edge weights.

Why it Changes:
Edge Weights Influence Path Selection:

With edge weights, longer paths may be preferred if they have lower weights. This can lead to different paths being selected as the "shortest," altering closeness centrality.
Higher Edge Weights Increase Distance:

Higher edge weights effectively increase the distance between nodes. If a path has a high weight, it contributes more to the overall distance in the weighted network.
Connection Strength Matters:

Edge weights capture the strength or importance of connections. Nodes with strong connections (lower weights) may have higher closeness centrality in the weighted network.
Impact on Spread of Information:

Closeness centrality measures how quickly information can spread. If weighted edges represent communication strength, the weighted closeness reflects how quickly information can traverse strong connections.
In essence, the inclusion of edge weights adjusts the notion of "closeness" to consider the strength or cost associated with traversing edges. This adjustment can lead to changes in the ranking of nodes' closeness centrality.

In [5]:
weighted_closeness_centrality = nx.closeness_centrality(G, distance="weight")

wcc_data = pd.DataFrame({"Name": list(weighted_closeness_centrality.keys()),
                        "Affiliation": [node_attributes[node] for node in weighted_closeness_centrality.keys()],
                        "WeightedClosenessCentrality": list(weighted_closeness_centrality.values())
                        }).sort_values(by="WeightedClosenessCentrality", ascending=False).reset_index(drop=True)

wcc_data.head(10)

Unnamed: 0,Name,Affiliation,WeightedClosenessCentrality
0,"ANSELMI, GIAN MARIO GIUSTO",DIPARTIMENTO DI FILOLOGIA CLASSICA E ITALIANIS...,0.190119
1,"ITALIA, PAOLA MARIA CARMELA",DIPARTIMENTO DI FILOLOGIA CLASSICA E ITALIANIS...,0.188707
2,"PASQUINI, EMILIO",DIP. DI ITALIANISTICA,0.18694
3,"TOMASI, FRANCESCA",DIPARTIMENTO DI FILOLOGIA CLASSICA E ITALIANIS...,0.18019
4,"BAZZOCCHI, MARCO ANTONIO",DIPARTIMENTO DI FILOLOGIA CLASSICA E ITALIANIS...,0.178351
5,"VITALI, FABIO",DIPARTIMENTO DI INFORMATICA - SCIENZA E INGEGN...,0.17767
6,"CHINES, LOREDANA",DIPARTIMENTO DI FILOLOGIA CLASSICA E ITALIANIS...,0.176548
7,"STOPPELLI, PASQUALE",DIPARTIMENTO DI FILOLOGIA CLASSICA E ITALIANIS...,0.174563
8,"BORDALEJO, BARBARA",,0.174563
9,"RICO MANRIQUE, FRANCISCO",DIPARTIMENTO DI FILOLOGIA CLASSICA E ITALIANIS...,0.174563


### Eigenvector centrality

Eigenvector Centrality:
Considers both the number and the importance of a node's neighbors.
"TOMASI, FRANCESCA" has the highest eigenvector centrality, implying that the people connected to this person are themselves well-connected.


Why it Changes:
Edge Weights Influence Strength:

Nodes with strong connections (lower weights) contribute more to the eigenvector centrality of their neighbors. The weights influence the flow of influence through the network.
Weighted Paths Matter:

Paths with lower weights contribute more to the eigenvector centrality. If a node is connected to nodes with strong connections, its centrality increases.
Higher Eigenvector Centrality for Some Nodes:

Nodes like PERONI, SILVIO, VITALI, FABIO, and others have higher eigenvector centrality with weights. These nodes likely have strong connections to other central nodes.
Decreased Centrality for Others:

Nodes like PICCININI, ALESSANDRO, CAPUCCINO, CARLOTTA, IOVINE, GIULIO, TRAPIN, LUCA, LILLO, FABRIZIO have extremely low eigenvector centrality values with or without weights. These nodes might have weaker or less influential connections.
Interpretation:
Eigenvector centrality with weights reflects not only the structure of the network but also the strength of connections.
Nodes with higher eigenvector centrality in the weighted network are influential not just due to their connectivity but also because of the strength of their connections.

In [6]:
weighted_eigenvector_centrality = nx.eigenvector_centrality(G, weight="weight")

wec_data = pd.DataFrame({"Name": list(weighted_eigenvector_centrality.keys()),
                        "Affiliation": [node_attributes[node] for node in weighted_eigenvector_centrality.keys()],
                        "WeightedEigenvectorCentrality": list(weighted_eigenvector_centrality.values())
                        }).sort_values(by="WeightedEigenvectorCentrality", ascending=False).reset_index(drop=True)

wec_data.head(10)

Unnamed: 0,Name,Affiliation,WeightedEigenvectorCentrality
0,"TOMASI, FRANCESCA",DIPARTIMENTO DI FILOLOGIA CLASSICA E ITALIANIS...,0.338949
1,"CONDELLO, FEDERICO",DIPARTIMENTO DI FILOLOGIA CLASSICA E ITALIANIS...,0.329643
2,"CITTI, FRANCESCO",DIPARTIMENTO DI FILOLOGIA CLASSICA E ITALIANIS...,0.32656
3,"PIERI, BRUNA",DIPARTIMENTO DI FILOLOGIA CLASSICA E ITALIANIS...,0.324067
4,"NERI, CAMILLO",DIPARTIMENTO DI FILOLOGIA CLASSICA E ITALIANIS...,0.324067
5,"PASETTI, LUCIA",DIPARTIMENTO DI FILOLOGIA CLASSICA E ITALIANIS...,0.324067
6,"ZIOSI, ANTONIO",DIPARTIMENTO DI BENI CULTURALI,0.324067
7,"DIONIGI, IVANO",DIPARTIMENTO DI FILOLOGIA CLASSICA E ITALIANIS...,0.28168
8,"PELLACANI, DANIELE",DIPARTIMENTO DI FILOLOGIA CLASSICA E ITALIANIS...,0.187491
9,"DAL CHIELE, ELISA",DIPARTIMENTO DI FILOLOGIA CLASSICA E ITALIANIS...,0.187491


---

# Other measures

### Clustering

In [7]:
clustering = nx.clustering(G, weight="weight")
clustering_data = pd.DataFrame.from_dict(clustering, 
                                columns=["Clustering"],
                                orient="index")
clustering_data.sort_values(by=["Clustering"], ascending=False)

Unnamed: 0,Clustering
"CITTI, FRANCESCO",0.261208
"PASETTI, LUCIA",0.260227
"ZIOSI, ANTONIO",0.260227
"NERI, CAMILLO",0.260227
"PIERI, BRUNA",0.260227
...,...
"BOOIJ, GEERT",0.000000
"DI TELLA, ALESSANDRA",0.000000
"MATTIOLA, SIMONE",0.000000
"SPADINI, ELENA",0.000000



Without Edge Weights:
REFORGIATO RECUPERO, DIEGO ANGELO: Clustering Coefficient = 1.0
STOPPELLI, PASQUALE: Clustering Coefficient = 1.0
GIACOMINI, FEDERICA: Clustering Coefficient = 1.0
MARTINEZ PANDIANI, DELFINA SOL: Clustering Coefficient = 1.0
BOLOGNESI, MARIANNA MARCELLA: Clustering Coefficient = 1.0
...
GODART, FREDERIC: Clustering Coefficient = 0.0
CAINI, CARLO: Clustering Coefficient = 0.0
MASTRONARDO, CLAUDIO: Clustering Coefficient = 0.0
GIULIANI, ANTONIO: Clustering Coefficient = 0.0
DE VIVO, MANUELA: Clustering Coefficient = 0.0
With Edge Weights:
MAMBELLI, FRANCESCA: Clustering Coefficient = 0.075970
SCOPECE, FIORA: Clustering Coefficient = 0.074737
DIONIGI, IVANO: Clustering Coefficient = 0.061562
CITTI, FRANCESCO: Clustering Coefficient = 0.060264
ZIOSI, ANTONIO: Clustering Coefficient = 0.059982
...
SOTTARA, DAVIDE: Clustering Coefficient = 0.0
D'ANGELO, GABRIELE: Clustering Coefficient = 0.0
CAPUCCINO, CARLOTTA: Clustering Coefficient = 0.0
PERRETTI, FABRIZIO: Clustering Coefficient = 0.0
DE VIVO, MANUELA: Clustering Coefficient = 0.0
Observations:
Node-Specific Clustering:

Without edge weights, certain nodes have a clustering coefficient of 1.0, indicating that every neighbor of these nodes is connected to each other.
With edge weights, clustering coefficients are more diverse and not necessarily maximal.
Impact of Edge Weights:

Nodes like MAMBELLI, FRANCESCA, SCOPECE, FIORA have non-zero clustering coefficients with edge weights, indicating that their neighbors are more likely to be connected to each other.
Nodes with Zero Clustering:

Nodes like SOTTARA, DAVIDE, D'ANGELO, GABRIELE, CAPUCCINO, CARLOTTA, PERRETTI, FABRIZIO, DE VIVO, MANUELA have zero clustering coefficients with and without edge weights, suggesting that their neighbors are less likely to form connections among themselves.
Change in Magnitude:

The magnitude of clustering coefficients can change with the introduction of edge weights.
In summary, the change in clustering coefficients when considering edge weights is a result of the weights influencing the likelihood and strength of connections between neighboring nodes. The impact will depend on the distribution and interpretation of edge weights in the specific context of the network. Nodes with strong connections, as indicated by lower weights, may exhibit higher clustering coefficients with edge weights. The introduction of weights adds a layer of complexity, allowing for a more nuanced analysis of connectivity patterns in the network.

In [8]:
avg_cohesion = nx.average_clustering(G, weight="weight")
print("Cohesion: ", avg_cohesion)

Cohesion:  0.03668116978938404


The result avg_cohesion = 0.03668116978938404 represents the average clustering coefficient of the graph G when considering edge weights (weight="weight"). 
Clustering Coefficient:
The clustering coefficient measures the tendency of nodes in a graph to form clusters or groups.

Average Clustering Coefficient:
The average clustering coefficient is the average of the local clustering coefficients across all nodes in the graph.

It ranges from 0 to 1, where:
0 indicates no clustering (nodes are not connected in clusters).
1 indicates maximum clustering (all neighbors of a node are connected to each other).

Interpretation:
In your case, the average clustering coefficient is approximately 0.0367.
This suggests a moderately low level of clustering in the graph when considering edge weights.
Nodes in the graph do form some local clusters, but the connectivity is not highly concentrated.

Implications:
Moderate Connectivity: The average clustering coefficient indicates a moderate level of local connectivity in the graph when considering edge weights.
Nodes form some clusters, but the network is not highly compartmentalized.

Considerations:
The interpretation may vary based on the nature and purpose of your network.
Depending on your specific domain or application, a moderately low average clustering coefficient might be expected or might raise further questions about the structure of the graph.

In [9]:
num_connected_components = nx.number_connected_components(G)
print("Connectedness: ", num_connected_components)

Connectedness:  4


Number of Connected Components:
The value 6 indicates that the graph has 6 connected components. A connected component is a subgraph in which there is a path between any two nodes. Having multiple connected components means that there are isolated groups of nodes in the graph. 

In [10]:
def calc_compactness(graph):
    shortest_path_lengths = dict(nx.all_pairs_shortest_path_length(graph))

    total_compactness = 0
    total_pairs = 0

    for source, lengths in shortest_path_lengths.items():
        for target, distance in lengths.items():
            if source != target:
                total_compactness += 1 / distance
                total_pairs += 1

    if total_pairs == 0:
        return 0  # Avoid division by zero

    return total_compactness / total_pairs

compactness = calc_compactness(G)
print("Compactness: ", compactness)

Compactness:  0.28659051084077036


Compactness:
The value 0.33476433593824045 is the computed compactness for the given graph. Compactness is calculated as the reciprocal of the geodesic distance between pairs of nodes, this value suggests the following interpretation:
The compactness value lies between 0 and 1. Higher compactness values (closer to 1) indicate that nodes in the graph are more easily reachable from each other, potentially through shorter paths. Lower compactness values (closer to 0) suggest that the nodes are less easily reachable from each other, and there might be longer or more circuitous paths between them. The computed compactness value of approximately 0.355 indicates a moderate level of compactness. The reciprocal of the geodesic distances suggests that there are relatively shorter paths between pairs of nodes on average.

In [11]:
transitivity = nx.transitivity(G)
print("Transitivity: ", transitivity)

Transitivity:  0.7261015683345781


Transitivity:
A transitivity value of 0.657 suggests a relatively high level of clustering or transitive relationships within the graph, this value indicates that there are a substantial number of triangles or closed triads in the network. In simpler terms, if node A is connected to both node B and node C, there's a relatively high likelihood that nodes B and C are also directly connected to each other. This pattern of connectivity often reflects a clustering tendency in the network. The transitivity value ranges from 0 to 1, where 0 indicates no transitivity (no triangles or clustering), and 1 indicates maximum transitivity (all possible triangles are present). A value of 0.657 suggests that a significant portion of the graph's nodes are involved in triangles or closed triads, indicating a moderately high level of local clustering in the network.

In [12]:
core_number = nx.core_number(G)
k_data = pd.DataFrame.from_dict(core_number,
                                    columns=["KCore"],
                                    orient="index")
k_data.sort_values(by=["KCore"], ascending=False)

Unnamed: 0,KCore
"VITTUARI, LUCA",30
"PERONI, SILVIO",30
"FANTINI, FILIPPO",30
"FANINI, BRUNO",30
"COLITTI, SIMONA",30
...,...
"MIZZAU, MARINA",1
"GENSINI, NICCOLO'",1
"DI PIETRO, IRENE",1
"PIGOZZI, MARINELLA",1


High Core Numbers:
Nodes like "RENDA, GIULIA," "TOMASI, FRANCESCA," "BITELLI, GABRIELE," etc., have a core number of 30. This indicates that these nodes are part of the 30-core, which means they are highly interconnected and form a central part of the network.

Uniform Core Numbers:
The fact that many nodes have the same core number (30) suggests a relatively homogeneous and densely connected region in the network. It could indicate a well-connected community or subgroup.

Low Core Numbers:
Nodes like "BENIGNI, FEDERICA," "SOTTARA, DAVIDE," "VAN HENTENRYCK, PASCAL RENÉ M.," etc., have a core number of 1. These nodes are less connected and likely part of the periphery of the network.

Network Structure:
The presence of both high and low core numbers suggests a hierarchical or modular structure in the network. The nodes with high core numbers form a cohesive core, while nodes with low core numbers are more on the outskirts or in less-connected regions.

Centrality and Importance:
Nodes with high core numbers are likely to be more central and play a more crucial role in connecting different parts of the network. Nodes with low core numbers may have more specialized or isolated roles.

In [16]:
communities = nx.algorithms.community.greedy_modularity_communities(G, weight="weight")
community_mapping = {}
for i, community in enumerate(communities):
    for node in community:
        community_mapping[node] = i
        

c_data = pd.DataFrame.from_dict(community_mapping,
                                    columns=["Communities"],
                                    orient="index")
c_data.sort_values(by=["Communities"], ascending=False)

Unnamed: 0,Communities
"CARMASSI, PATRIZIA",12
"VENTURA, IOLANDA",12
"FINTONI, LAURENT ANTOINE",11
"TRIPODI, SILVIA",11
"SABBA, FIAMMETTA",11
...,...
"TINTI, PAOLO",0
"CHINES, LOREDANA",0
"LARUCCIA, ROSAMARIA ISABELLA",0
"PAOLINI, LORENZO",0


Greedy Modularity Communities:

Community Assignment:
The algorithm assigns nodes to communities based on the concept of modularity, aiming to maximize the quality of the community structure.
Each node is placed in the community that results in the highest increase in modularity.

Community Mapping:
The result you provided is a mapping of nodes to their assigned communities, represented by numeric labels (e.g., 12, 11).
Each node is associated with the community to which it belongs.
Interpreting the Results:

Community Labels:
Nodes with the same label (e.g., 11) belong to the same community.

Community Sizes:
Some communities might have more members than others. For example, nodes labeled 0 may represent smaller or less cohesive communities.
Modularity and Weighted Graphs:

Modularity Definition:
Modularity measures the quality of a network's division into communities. Higher modularity values indicate a better community structure.

Weighted Graphs:
When the graph is weighted, the strength of connections between nodes is considered. Weighted modularity takes into account both the presence and strength of edges.

Why Weight Changes the Result:
Edge Strength Influence: In a weighted graph, the strength of connections can significantly impact community detection.
Community Formation: Nodes may be more likely to be grouped together if they share strong weighted connections.
Optimization Objective: The algorithm aims to optimize the modularity score by adjusting community assignments. Weighted edges contribute to this optimization differently than unweighted edges.

Greedy Modularity Algorithm:
Basic Idea:
Greedy Modularity algorithms iteratively add or remove nodes from communities to maximize the modularity score.

Steps:
Start with each node in its own community.
Greedily merge or split communities to maximize the modularity score.
Repeat until modularity cannot be further improved.

Considerations:
The choice of community detection algorithm depends on the specific characteristics and goals of your network.
The interpretation of community assignments may be context-dependent, and the results should be analyzed in conjunction with domain knowledge.

The community assignment results indicate that nodes are grouped into different communities based on the modularity optimization. The numeric labels represent the assigned communities, and the size of each community can vary. In a weighted graph, the strength of connections plays a crucial role in community formation, impacting the modularity optimization process. Interpretation of these results should be done considering the specific characteristics and goals of the network.

In [14]:
def homophily(G):
    num_same_ties = 0
    num_diff_ties = 0
    for n1, n2 in G.edges():
        if G.nodes[n1]['affiliation'] == G.nodes[n2]['affiliation']:
            num_same_ties += 1
        else:
            num_diff_ties += 1
    return (num_same_ties / (num_same_ties + num_diff_ties))
homophily(G)

0.3519369665134603

A homophily value of 0.3298041291688724 suggests a moderate level of homophily in your network. Homophily refers to the tendency of nodes with similar characteristics to be connected to each other in a network. The value ranges from 0 to 1, where:
0 indicates no homophily (nodes with similar characteristics are not more likely to be connected).
1 indicates perfect homophily (nodes with similar characteristics are always connected).

In [15]:
prof_list = ['PERONI, SILVIO', 'TOMASI, FRANCESCA', 'VITALI, FABIO', 'PESCARIN, SOFIA', 'GANGEMI, ALDO', 'ITALIA, PAOLA MARIA CARMELA', 'TAMBURINI, FABIO', 'DAQUINO, MARILENA', 'GIALLORENZO, SAVERIO', 'ZUFFRANO, ANNAFELICIA', 'IOVINE, GIULIO', 'BARTOLINI, ILARIA', 'SPEDICATO, GIORGIO', 'PALMIRANI, MONICA', 'BASKAKOVA, EKATERINA', 'FERRIANI, SIMONE']
def affiliation_homophiliy(G, nodes):
    data = []
    for node in nodes:
        affiliation = G.nodes[node]['affiliation']
        neighbors = list(G.neighbors(node))
        total_connections = len(neighbors)
        connections_outside_affiliation = sum(1 for neighbor in neighbors if G.nodes[neighbor]['affiliation'] != affiliation)
        
        ratio = 0 if total_connections == 0 else round(connections_outside_affiliation / total_connections, 2)

        data.append([node, connections_outside_affiliation, affiliation, total_connections, ratio])

    df = pd.DataFrame(data, columns=['name', 'connections_outside_affiliation', 'affiliation', 'total_connections', 'ratio'])
    return df.sort_values(by=['ratio'], ascending=False)
print(affiliation_homophiliy(G, prof_list))


KeyError: 'GIALLORENZO, SAVERIO'

Individual Professor Analysis:
Professors such as PESCARIN, SOFIA, GANGEMI, ALDO, PERONI, SILVIO, and BARTOLINI, ILARIA have relatively lower homophily ratios.

PESCARIN, SOFIA, for instance, has a homophily ratio of 0.29, indicating that about 29% of her connections are within her own department.
GANGEMI, ALDO, and PERONI, SILVIO, also have lower homophily ratios, suggesting a diverse set of connections across different departments.
Professors like BASKAKOVA, EKATERINA, FERRIANI, SIMONE, and SPEDICATO, GIORGIO have a homophily ratio of 0.0, indicating that all their connections are within their own department.

BASKAKOVA, EKATERINA, FERRIANI, SIMONE, and SPEDICATO, GIORGIO have a higher tendency to connect with colleagues from the same department.

Departmental Affiliations:
The departmental affiliations of professors play a significant role in determining the homophily ratios.
Professors from the same department are more likely to form connections with each other.
Homophily Variability:

There is variability in homophily ratios, indicating that different professors exhibit different degrees of homophily in their connections.

Impact of Affiliation:
Affiliation has a clear impact on the connectivity patterns within the network.
Professors from certain departments may have a higher likelihood of forming connections within their own department.
Network Diversity:

The network exhibits a certain degree of diversity, with some professors connecting across departments and others primarily connecting within their own department.