In [20]:
import json
import pandas as pd
import numpy as np
import networkx as nx
from pprint import pprint
import matplotlib.pyplot as plt

In [21]:
target = "dhdk"
G = nx.read_gml("../data/{0}/{0}_coauthorship_network.gml".format(target))
node_attributes = nx.get_node_attributes(G, "affiliation")


---

# Centrality measures

### Degree Centrality

Degree Centrality:
Reflects the number of connections a node has.
"TOMASI, FRANCESCA" has the highest degree centrality, implying that this person has the most direct connections. In other words, degree centrality measures the importance of a node in a network based on the number of connections it has.

Observations:
Degree centrality identifies the most connected nodes in the network.
Nodes with higher degree centrality values play a more central role in connecting various parts of the network.

In [22]:
degree_centrality = nx.degree_centrality(G)
dc_data = pd.DataFrame({"Name": list(degree_centrality.keys()),
                        "Affiliation": [node_attributes[node] for node in degree_centrality.keys()],
                        "DegreeCentrality": list(degree_centrality.values())
                        }).sort_values(by="DegreeCentrality", ascending=False).reset_index(drop=True)

dc_data.head(10)

Unnamed: 0,Name,Affiliation,DegreeCentrality
0,"TOMASI, FRANCESCA",DIPARTIMENTO DI FILOLOGIA CLASSICA E ITALIANIS...,0.214106
1,"PERONI, SILVIO",DIPARTIMENTO DI FILOLOGIA CLASSICA E ITALIANIS...,0.188917
2,"MILANO, MICHELA",DIPARTIMENTO DI INFORMATICA - SCIENZA E INGEGN...,0.166247
3,"VITALI, FABIO",DIPARTIMENTO DI INFORMATICA - SCIENZA E INGEGN...,0.138539
4,"BARTOLINI, ILARIA",DIPARTIMENTO DI INFORMATICA - SCIENZA E INGEGN...,0.123426
5,"DAQUINO, MARILENA",DIPARTIMENTO DI FILOLOGIA CLASSICA E ITALIANIS...,0.120907
6,"GANGEMI, ALDO",DIPARTIMENTO DI FILOSOFIA E COMUNICAZIONE,0.108312
7,"PALMIRANI, MONICA",DIPARTIMENTO DI SCIENZE GIURIDICHE,0.100756
8,"TAMBURINI, FABIO",DIPARTIMENTO DI FILOLOGIA CLASSICA E ITALIANIS...,0.098237
9,"PATELLA, MARCO",DIPARTIMENTO DI INFORMATICA - SCIENZA E INGEGN...,0.085642


### Weighted Betweenness Centrality

Weighted Betweenness Centrality:
High values suggest that the node plays a critical role in connecting other nodes in the network:
"TOMASI, FRANCESCA" has the highest betweenness centrality, indicating that this individual's presence is crucial for maintaining connectivity between other nodes. Weighted betweenness centrality measures the importance of a node in a network based on the weighted shortest paths that pass through it.

Observations:
Weighted betweenness centrality identifies nodes that act as important bridges along the weighted paths in the network.
Nodes with higher values are crucial for maintaining efficient communication within the network.

In [23]:
weighted_betweenness_centrality = nx.betweenness_centrality(G, weight="weight")

wbc_data = pd.DataFrame({"Name": list(weighted_betweenness_centrality.keys()),
                        "Affiliation": [node_attributes[node] for node in weighted_betweenness_centrality.keys()],
                        "WeightedBetweennessCentrality": list(weighted_betweenness_centrality.values())
                        }).sort_values(by="WeightedBetweennessCentrality", ascending=False).reset_index(drop=True)

wbc_data.head(10)

Unnamed: 0,Name,Affiliation,WeightedBetweennessCentrality
0,"TOMASI, FRANCESCA",DIPARTIMENTO DI FILOLOGIA CLASSICA E ITALIANIS...,0.365687
1,"BARTOLINI, ILARIA",DIPARTIMENTO DI INFORMATICA - SCIENZA E INGEGN...,0.310429
2,"MILANO, MICHELA",DIPARTIMENTO DI INFORMATICA - SCIENZA E INGEGN...,0.157211
3,"DUCA, SILVIA","APPC - AREA PIANIFICAZIONE, PROGRAMMAZIONE E C...",0.138348
4,"PRESUTTI, VALENTINA","DIPARTIMENTO DI LINGUE, LETTERATURE E CULTURE ...",0.123821
5,"VIALE, MATTEO",DIPARTIMENTO DI FILOLOGIA CLASSICA E ITALIANIS...,0.111866
6,"VITALI, FABIO",DIPARTIMENTO DI INFORMATICA - SCIENZA E INGEGN...,0.105919
7,"PERONI, SILVIO",DIPARTIMENTO DI FILOLOGIA CLASSICA E ITALIANIS...,0.097948
8,"NISSIM, MALVINA",DIPARTIMENTO DI FILOLOGIA CLASSICA E ITALIANIS...,0.093904
9,"SOLMI, RICCARDO",,0.09294


### Closeness Centrality

Without Edge Weights:
TOMASI, FRANCESCA (0.420055): This node has the highest closeness centrality. It is, on average, closer to other nodes in the network compared to the rest.

PERONI, SILVIO (0.394565): The second-highest closeness centrality. Similar interpretation to TOMASI, FRANCESCA.

Low Closeness Centralities: Nodes like PICCININI, ALESSANDRO, IOVINE, GIULIO, CAPUCCINO, CARLOTTA, LILLO, FABRIZIO, TRAPIN, LUCA have relatively low closeness centrality. These nodes are less central in terms of proximity to other nodes.

With Edge Weights:
TOMASI, FRANCESCA (0.184239): Despite having the highest closeness centrality without weights, its closeness decreases when considering edge weights. The weights indicate that the paths to other nodes may be longer or have higher costs.

DUCA, SILVIA (0.172527): DUCA, SILVIA becomes more central when edge weights are considered. It suggests that, with weights, DUCA, SILVIA is closer to other nodes.

PRESUTTI, VALENTINA (0.168067): Similar to DUCA, SILVIA, PRESUTTI, VALENTINA becomes more central when considering edge weights.

Why it Changes:
Edge Weights Influence Path Selection:

With edge weights, longer paths may be preferred if they have lower weights. This can lead to different paths being selected as the "shortest," altering closeness centrality.
Higher Edge Weights Increase Distance:

Higher edge weights effectively increase the distance between nodes. If a path has a high weight, it contributes more to the overall distance in the weighted network.
Connection Strength Matters:

Edge weights capture the strength or importance of connections. Nodes with strong connections (lower weights) may have higher closeness centrality in the weighted network.
Impact on Spread of Information:

Closeness centrality measures how quickly information can spread. If weighted edges represent communication strength, the weighted closeness reflects how quickly information can traverse strong connections.
In essence, the inclusion of edge weights adjusts the notion of "closeness" to consider the strength or cost associated with traversing edges. This adjustment can lead to changes in the ranking of nodes' closeness centrality.

In [24]:
weighted_closeness_centrality = nx.closeness_centrality(G, distance="weight")

wcc_data = pd.DataFrame({"Name": list(weighted_closeness_centrality.keys()),
                        "Affiliation": [node_attributes[node] for node in weighted_closeness_centrality.keys()],
                        "WeightedClosenessCentrality": list(weighted_closeness_centrality.values())
                        }).sort_values(by="WeightedClosenessCentrality", ascending=False).reset_index(drop=True)

wcc_data.head(10)

Unnamed: 0,Name,Affiliation,WeightedClosenessCentrality
0,"TOMASI, FRANCESCA",DIPARTIMENTO DI FILOLOGIA CLASSICA E ITALIANIS...,0.184239
1,"DUCA, SILVIA","APPC - AREA PIANIFICAZIONE, PROGRAMMAZIONE E C...",0.172527
2,"PRESUTTI, VALENTINA","DIPARTIMENTO DI LINGUE, LETTERATURE E CULTURE ...",0.168067
3,"VIALE, MATTEO",DIPARTIMENTO DI FILOLOGIA CLASSICA E ITALIANIS...,0.165659
4,"SOLMI, RICCARDO",,0.165133
5,"NISSIM, MALVINA",DIPARTIMENTO DI FILOLOGIA CLASSICA E ITALIANIS...,0.161543
6,"VAYRA, MARIO",,0.161459
7,"BARTOLINI, ILARIA",DIPARTIMENTO DI INFORMATICA - SCIENZA E INGEGN...,0.160134
8,"CAPACI, BRUNO",DIPARTIMENTO DI FILOLOGIA CLASSICA E ITALIANIS...,0.155274
9,"MODESTI, MADDALENA",DIPARTIMENTO DI FILOLOGIA CLASSICA E ITALIANIS...,0.155043


### Eigenvector centrality

Eigenvector Centrality:
Considers both the number and the importance of a node's neighbors.
"TOMASI, FRANCESCA" has the highest eigenvector centrality, implying that the people connected to this person are themselves well-connected.


Why it Changes:
Edge Weights Influence Strength:

Nodes with strong connections (lower weights) contribute more to the eigenvector centrality of their neighbors. The weights influence the flow of influence through the network.
Weighted Paths Matter:

Paths with lower weights contribute more to the eigenvector centrality. If a node is connected to nodes with strong connections, its centrality increases.
Higher Eigenvector Centrality for Some Nodes:

Nodes like PERONI, SILVIO, VITALI, FABIO, and others have higher eigenvector centrality with weights. These nodes likely have strong connections to other central nodes.
Decreased Centrality for Others:

Nodes like PICCININI, ALESSANDRO, CAPUCCINO, CARLOTTA, IOVINE, GIULIO, TRAPIN, LUCA, LILLO, FABRIZIO have extremely low eigenvector centrality values with or without weights. These nodes might have weaker or less influential connections.
Interpretation:
Eigenvector centrality with weights reflects not only the structure of the network but also the strength of connections.
Nodes with higher eigenvector centrality in the weighted network are influential not just due to their connectivity but also because of the strength of their connections.

In [25]:
weighted_eigenvector_centrality = nx.eigenvector_centrality(G, weight="weight")

wec_data = pd.DataFrame({"Name": list(weighted_eigenvector_centrality.keys()),
                        "Affiliation": [node_attributes[node] for node in weighted_eigenvector_centrality.keys()],
                        "WeightedEigenvectorCentrality": list(weighted_eigenvector_centrality.values())
                        }).sort_values(by="WeightedEigenvectorCentrality", ascending=False).reset_index(drop=True)

wec_data.head(10)

Unnamed: 0,Name,Affiliation,WeightedEigenvectorCentrality
0,"PERONI, SILVIO",DIPARTIMENTO DI FILOLOGIA CLASSICA E ITALIANIS...,0.539376
1,"VITALI, FABIO",DIPARTIMENTO DI INFORMATICA - SCIENZA E INGEGN...,0.531057
2,"DI IORIO, ANGELO",DIPARTIMENTO DI INFORMATICA - SCIENZA E INGEGN...,0.310225
3,"TOMASI, FRANCESCA",DIPARTIMENTO DI FILOLOGIA CLASSICA E ITALIANIS...,0.279911
4,"GANGEMI, ALDO",DIPARTIMENTO DI FILOSOFIA E COMUNICAZIONE,0.21112
5,"NUZZOLESE, ANDREA GIOVANNI",ARAG - AREA FINANZA E PARTECIPATE,0.200304
6,"DAQUINO, MARILENA",DIPARTIMENTO DI FILOLOGIA CLASSICA E ITALIANIS...,0.186261
7,"POGGI, FRANCESCO",DIPARTIMENTO DI INFORMATICA - SCIENZA E INGEGN...,0.158731
8,"PRESUTTI, VALENTINA","DIPARTIMENTO DI LINGUE, LETTERATURE E CULTURE ...",0.140422
9,"PALMIRANI, MONICA",DIPARTIMENTO DI SCIENZE GIURIDICHE,0.136665


---

# Other measures

### Clustering

In [26]:
clustering = nx.clustering(G, weight="weight")
clustering_data = pd.DataFrame.from_dict(clustering, 
                                columns=["Clustering"],
                                orient="index")
clustering_data.sort_values(by=["Clustering"], ascending=False)

Unnamed: 0,Clustering
"MAMBELLI, FRANCESCA",0.075970
"SCOPECE, FIORA",0.074737
"DIONIGI, IVANO",0.061562
"CITTI, FRANCESCO",0.060264
"ZIOSI, ANTONIO",0.059982
...,...
"GIANNINONI, RICCARDO",0.000000
"CAINI, CARLO",0.000000
"DAVOLI, RENZO",0.000000
"FALCHETTI, DENISE",0.000000



Without Edge Weights:
REFORGIATO RECUPERO, DIEGO ANGELO: Clustering Coefficient = 1.0
STOPPELLI, PASQUALE: Clustering Coefficient = 1.0
GIACOMINI, FEDERICA: Clustering Coefficient = 1.0
MARTINEZ PANDIANI, DELFINA SOL: Clustering Coefficient = 1.0
BOLOGNESI, MARIANNA MARCELLA: Clustering Coefficient = 1.0
...
GODART, FREDERIC: Clustering Coefficient = 0.0
CAINI, CARLO: Clustering Coefficient = 0.0
MASTRONARDO, CLAUDIO: Clustering Coefficient = 0.0
GIULIANI, ANTONIO: Clustering Coefficient = 0.0
DE VIVO, MANUELA: Clustering Coefficient = 0.0


With Edge Weights:
MAMBELLI, FRANCESCA: Clustering Coefficient = 0.075970
SCOPECE, FIORA: Clustering Coefficient = 0.074737
DIONIGI, IVANO: Clustering Coefficient = 0.061562
CITTI, FRANCESCO: Clustering Coefficient = 0.060264
ZIOSI, ANTONIO: Clustering Coefficient = 0.059982
...
SOTTARA, DAVIDE: Clustering Coefficient = 0.0
D'ANGELO, GABRIELE: Clustering Coefficient = 0.0
CAPUCCINO, CARLOTTA: Clustering Coefficient = 0.0
PERRETTI, FABRIZIO: Clustering Coefficient = 0.0
DE VIVO, MANUELA: Clustering Coefficient = 0.0

Observations:
Node-Specific Clustering:
Without edge weights, certain nodes have a clustering coefficient of 1.0, indicating that every neighbor of these nodes is connected to each other.
With edge weights, clustering coefficients are more diverse and not necessarily maximal.
Impact of Edge Weights:

Nodes like MAMBELLI, FRANCESCA, SCOPECE, FIORA have non-zero clustering coefficients with edge weights, indicating that their neighbors are more likely to be connected to each other.
Nodes with Zero Clustering:

Nodes like SOTTARA, DAVIDE, D'ANGELO, GABRIELE, CAPUCCINO, CARLOTTA, PERRETTI, FABRIZIO, DE VIVO, MANUELA have zero clustering coefficients with and without edge weights, suggesting that their neighbors are less likely to form connections among themselves.
Change in Magnitude:

The magnitude of clustering coefficients can change with the introduction of edge weights.
In summary, the change in clustering coefficients when considering edge weights is a result of the weights influencing the likelihood and strength of connections between neighboring nodes. The impact will depend on the distribution and interpretation of edge weights in the specific context of the network. Nodes with strong connections, as indicated by lower weights, may exhibit higher clustering coefficients with edge weights. The introduction of weights adds a layer of complexity, allowing for a more nuanced analysis of connectivity patterns in the network.

In [27]:
avg_cohesion = nx.average_clustering(G, weight="weight")
print("Cohesion: ", avg_cohesion)

Cohesion:  0.014213699232770037


The result avg_cohesion = 0.014213699232770037 represents the average clustering coefficient of the graph G when considering edge weights (weight="weight").

Clustering Coefficient:
The clustering coefficient measures the extent to which nodes in a graph tend to cluster together. It provides insights into the local connectivity structure.

Average Clustering Coefficient:
The average clustering coefficient is the average of the local clustering coefficients across all nodes in the graph.
It ranges from 0 to 1, where:
0 indicates no clustering (nodes are not connected in clusters).
1 indicates maximum clustering (all neighbors of a node are connected to each other).

Interpretation:
In your case, the average clustering coefficient is approximately 0.0142.
This suggests a relatively low level of clustering in the graph.
Nodes in the graph do not form dense local clusters, indicating a more sparse and interconnected network.

Implications:
Sparse Connectivity: The low average clustering coefficient might indicate that the graph has a sparse or decentralized structure. Nodes are not densely interconnected in local neighborhoods.

Potential Isolation: Nodes may have connections, but these connections do not necessarily form tight-knit groups or communities.

Network Dynamics: Depending on the context of your graph, this result could be indicative of specific patterns in relationships, collaborations, or interactions.

Considerations:
The interpretation may vary based on the nature and purpose of your network. For social networks, lower clustering might indicate a less tightly connected social group. For collaboration networks, it might suggest a more decentralized collaboration structure.

In [28]:
num_connected_components = nx.number_connected_components(G)
print("Connectedness: ", num_connected_components)

Connectedness:  6


The connectedness of a network is a measure of how many connected components it contains.

Interpretation:
The network has a total of 6 connected components. Each connected component is a subgraph where all nodes are connected to each other, and there are no isolated nodes.

Implications:
Isolated Subgroups:
The existence of multiple connected components indicates the presence of isolated subgroups within the network.
Nodes within a connected component are connected, but there is no direct connection between nodes in different components.

Potential Information Flow:
Information or influence is likely confined within each connected component.
Nodes within the same connected component can communicate or influence each other, but this communication is limited to nodes within the same component.

Structural Heterogeneity:
The network exhibits structural heterogeneity with distinct substructures.
Understanding the nature of each connected component can provide insights into the network's organization and functionality.

Disconnected Nodes:
Nodes that do not belong to any connected component are isolated, lacking direct connections to other nodes in the network.
Connected Components Breakdown:

Connectedness: 6
This implies that the network is divided into six separate groups of nodes, and each group forms a connected component.

Identification of key nodes that bridge different connected components (if any) can be crucial for understanding the overall connectivity and cohesion of the network.

In summary, the network's connectedness of 6 indicates the presence of distinct subgroups within the network, and exploring these subgroups can provide valuable insights into the network's organization and functionality.

In [29]:
def calc_compactness(graph):
    shortest_path_lengths = dict(nx.all_pairs_shortest_path_length(graph))

    total_compactness = 0
    total_pairs = 0

    for source, lengths in shortest_path_lengths.items():
        for target, distance in lengths.items():
            if source != target:
                total_compactness += 1 / distance
                total_pairs += 1

    if total_pairs == 0:
        return 0  # Avoid division by zero

    return total_compactness / total_pairs

compactness = calc_compactness(G)
print("Compactness: ", compactness)

Compactness:  0.33476433593827565


Compactness is a measure that reflects how efficiently a network allows information or influence to spread between nodes.

Interpretation:
The calculated compactness for the network is 0.3348.

Implications:
Efficient Information Flow:
A compactness value greater than zero indicates that the network allows for relatively efficient information or influence flow between nodes.
Nodes in the network are, on average, well-connected, facilitating the spread of information through the network.

Shorter Average Distances:
Higher compactness suggests shorter average distances between nodes, enhancing the network's connectivity.
Potential for Rapid Communication:
Nodes in the network are positioned such that information can quickly traverse the network, reaching distant nodes in a relatively short number of steps.
Network Cohesion:
The network is cohesive, with nodes forming connections that contribute to the overall compactness.
Recommendations:

Comparison:
Compare compactness values with other networks or across different time periods to assess changes in the efficiency of information flow.

Identify Bottlenecks:
Explore whether there are nodes or edges that act as bottlenecks, limiting the overall compactness of the network.

Summary:
The network's compactness of 0.3348 indicates a relatively efficient and well-connected structure. Understanding compactness provides insights into the network's ability to facilitate the rapid spread of information or influence, making it a key metric for evaluating network efficiency and cohesion.

In [30]:
transitivity = nx.transitivity(G)
print("Transitivity: ", transitivity)

Transitivity:  0.6172535975124738


Transitivity is a measure that reflects the likelihood of connectivity between the neighbors of a node. It provides insights into the tendency of nodes to form clusters or tightly connected groups.

Interpretation:
The calculated transitivity for the network is 0.6173.

Implications:

High Clustering Tendency:
A transitivity value close to 1 suggests a high tendency for nodes to form clusters or triangles.
Nodes in the network are likely to be connected to each other, forming local clusters or communities.

Community Structure:
The network exhibits a strong community structure, with nodes having connections to neighbors that are also connected to each other.

Local Cohesion:
Nodes in the network are part of cohesive local neighborhoods, contributing to the overall transitivity.

Resilience to Isolation:
Higher transitivity can indicate a degree of resilience, as nodes are interconnected, making it less likely for isolated nodes to emerge.
Recommendations:

Community Detection:
Explore community detection algorithms to identify and analyze the specific clusters or groups within the network.

Node Influence:
Investigate nodes with high transitivity to understand their role in local cohesion and potential influence within clusters.

Dynamic Changes:
Monitor changes in transitivity over time to identify shifts in the network's structure and connectivity patterns.

Summary:
The network's transitivity of 0.6173 suggests a strong tendency for nodes to form interconnected clusters. Understanding transitivity provides valuable insights into the network's community structure and local cohesion, highlighting key features of its organizational pattern.

In [31]:
core_number = nx.core_number(G)
k_data = pd.DataFrame.from_dict(core_number,
                                    columns=["KCore"],
                                    orient="index")
k_data.sort_values(by=["KCore"], ascending=False)

Unnamed: 0,KCore
"BARZAGHI, SEBASTIAN",30
"COLITTI, SIMONA",30
"DAQUINO, MARILENA",30
"HEIBI, IVAN",30
"PESCARIN, SOFIA",30
...,...
"DE VIVO, MANUELA",1
"BATTISTI, TOMMASO",1
"FERRARIO, ROBERTA",1
"POLTRONIERI, ANDREA",1


The k-core decomposition is a method to identify cohesive substructures within a network. A k-core is a maximal subgraph in which every node has at least degree k within that subgraph.

Interpretation:
The calculated k-core decomposition for the network reveals various cores, each with a corresponding value denoting the minimum degree required for a node to be part of that core. The node "BARZAGHI, SEBASTIAN" has the highest core number, indicating its presence in the densest subgraph.

Highly Connected Cores:
Nodes like "BARZAGHI, SEBASTIAN," "COLITTI, SIMONA," "DAQUINO, MARILENA," "HEIBI, IVAN," and "PESCARIN, SOFIA" are part of the highest core (core number 30).
These nodes play a crucial role in maintaining the cohesion of the densest subgraph within the network.

Low Core Numbers:
Nodes like "DE VIVO, MANUELA," "BATTISTI, TOMMASO," "FERRARIO, ROBERTA," "POLTRONIERI, ANDREA," and "MOCKUS, MARTYNAS" have lower core numbers (1).
These nodes may have lower degrees and are part of less connected subgraphs.

Summary:
The k-core decomposition provides valuable insights into the network's structural organization. Nodes in higher cores are integral to the network's overall connectivity, while nodes in lower cores may represent less connected regions. Understanding core structures aids in uncovering the hierarchical organization and cohesion within the network.

In [32]:
communities = nx.algorithms.community.greedy_modularity_communities(G, weight="weight")
community_mapping = {}
for i, community in enumerate(communities):
    for node in community:
        community_mapping[node] = i
        

c_data = pd.DataFrame.from_dict(community_mapping,
                                    columns=["Communities"],
                                    orient="index")
c_data.sort_values(by=["Communities"], ascending=False)

Unnamed: 0,Communities
"TRAPIN, LUCA",14
"LILLO, FABRIZIO",14
"CAPUCCINO, CARLOTTA",13
"IOVINE, GIULIO",13
"SABBA, FIAMMETTA",12
...,...
"DE GIORGIS, STEFANO",0
"RUBINO, ROSSELLA",0
"BRIGHI, RAFFAELLA",0
"DAQUINO, MARILENA",0


Greedy Modularity Communities:

Community Assignment:
The algorithm assigns nodes to communities based on the concept of modularity, aiming to maximize the quality of community structure.
Each node is placed in the community that results in the highest increase in modularity.

Community Mapping:
The result you provided is a mapping of nodes to their assigned communities, represented by numeric labels (e.g., 14, 13, 12).
Each node is associated with the community to which it belongs.

Interpreting the Results:
Community Labels: Nodes with the same label (e.g., 14) belong to the same community.

Community Sizes: Some communities might have more members than others. For example, nodes labeled 0 may represent smaller or less cohesive communities.

Modularity and Weighted Graphs:
Modularity Definition: Modularity measures the quality of a network's division into communities. Higher modularity values indicate a better community structure.

Weighted Graphs: When the graph is weighted, the strength of connections between nodes is considered. Weighted modularity takes into account both the presence and strength of edges.

Why Weight Changes the Result:
Edge Strength Influence: In a weighted graph, the strength of connections can significantly impact community detection.

Community Formation: Nodes may be more likely to be grouped together if they share strong weighted connections.

Optimization Objective: The algorithm aims to optimize the modularity score by adjusting community assignments. Weighted edges contribute to this optimization differently than unweighted edges.

Greedy Modularity Algorithm:
Basic Idea: Greedy Modularity algorithms iteratively add or remove nodes from communities to maximize the modularity score.

Steps:
Start with each node in its own community.
Greedily merge or split communities to maximize the modularity score.
Repeat until modularity cannot be further improved.

Considerations:
The choice of community detection algorithm depends on the specific characteristics and goals of your network.

The interpretation of community assignments may be context-dependent, and the results should be analyzed in conjunction with domain knowledge.

In [33]:
def homophily(G):
    num_same_ties = 0
    num_diff_ties = 0
    for n1, n2 in G.edges():
        if G.nodes[n1]['affiliation'] == G.nodes[n2]['affiliation']:
            num_same_ties += 1
        else:
            num_diff_ties += 1
    return (num_same_ties / (num_same_ties + num_diff_ties))
print("Homophily:", homophily(G))

0.3298041291688724

Homophily measures the tendency of nodes with similar attributes to connect with each other in a network. In this context, it explores whether nodes with the same affiliation are more likely to be connected than nodes with different affiliations.

Interpretation:
The calculated homophily coefficient for the network is approximately 0.33, indicating a moderate level of homophily. This value suggests that nodes in the network tend to form connections with other nodes that share the same affiliation.

Key Findings:
Moderate Homophily:
The homophily coefficient of 0.33 falls between 0 and 1, indicating a moderate level of homophily. It suggests that there is a tendency for nodes with similar affiliations to connect with each other, but the effect is not extremely strong.
Potential Affiliation Clusters:

The presence of homophily implies that there might be clusters or groups of nodes in the network that share common affiliations.
Affiliation-based connections may play a role in shaping the network's structure.

Summary:
The homophily coefficient of 0.33 suggests a notable tendency for nodes with similar affiliations to connect. Further exploration of the network's community structure and the role of affiliations in shaping connections can provide a more detailed understanding of the underlying dynamics.

In [34]:
prof_list = ['PERONI, SILVIO', 'TOMASI, FRANCESCA', 'VITALI, FABIO', 'PESCARIN, SOFIA', 'GANGEMI, ALDO', 'ITALIA, PAOLA MARIA CARMELA', 'TAMBURINI, FABIO', 'DAQUINO, MARILENA', 'GIALLORENZO, SAVERIO', 'ZUFFRANO, ANNAFELICIA', 'IOVINE, GIULIO', 'BARTOLINI, ILARIA', 'SPEDICATO, GIORGIO', 'PALMIRANI, MONICA', 'BASKAKOVA, EKATERINA', 'FERRIANI, SIMONE']
def affiliation_homophiliy(G, nodes):
    data = []
    for node in nodes:
        affiliation = G.nodes[node]['affiliation']
        neighbors = list(G.neighbors(node))
        total_connections = len(neighbors)
        connections_outside_affiliation = sum(1 for neighbor in neighbors if G.nodes[neighbor]['affiliation'] != affiliation)
        
        ratio = 0 if total_connections == 0 else round(connections_outside_affiliation / total_connections, 2)

        data.append([node, connections_outside_affiliation, affiliation, total_connections, ratio])

    df = pd.DataFrame(data, columns=['name', 'connections_outside_affiliation', 'affiliation', 'total_connections', 'ratio'])
    return df.sort_values(by=['ratio'], ascending=False)
print(affiliation_homophiliy(G, prof_list))


                           name  connections_outside_affiliation  \
3               PESCARIN, SOFIA                               33   
10               IOVINE, GIULIO                                1   
4                 GANGEMI, ALDO                               38   
0                PERONI, SILVIO                               61   
11            BARTOLINI, ILARIA                               37   
6              TAMBURINI, FABIO                               28   
7             DAQUINO, MARILENA                               32   
2                 VITALI, FABIO                               35   
13            PALMIRANI, MONICA                               25   
12           SPEDICATO, GIORGIO                                2   
1             TOMASI, FRANCESCA                               42   
8          GIALLORENZO, SAVERIO                               10   
9         ZUFFRANO, ANNAFELICIA                                2   
5   ITALIA, PAOLA MARIA CARMELA                 

Individual Professor Analysis:
Professors such as PESCARIN, SOFIA, GANGEMI, ALDO, PERONI, SILVIO, and BARTOLINI, ILARIA have relatively lower homophily ratios.

PESCARIN, SOFIA, for instance, has a homophily ratio of 0.29, indicating that about 29% of her connections are within her own department.
GANGEMI, ALDO, and PERONI, SILVIO, also have lower homophily ratios, suggesting a diverse set of connections across different departments.
Professors like BASKAKOVA, EKATERINA, FERRIANI, SIMONE, and SPEDICATO, GIORGIO have a homophily ratio of 0.0, indicating that all their connections are within their own department.

BASKAKOVA, EKATERINA, FERRIANI, SIMONE, and SPEDICATO, GIORGIO have a higher tendency to connect with colleagues from the same department.

Departmental Affiliations:
The departmental affiliations of professors play a significant role in determining the homophily ratios.
Professors from the same department are more likely to form connections with each other.
Homophily Variability:

There is variability in homophily ratios, indicating that different professors exhibit different degrees of homophily in their connections.

Impact of Affiliation:
Affiliation has a clear impact on the connectivity patterns within the network.
Professors from certain departments may have a higher likelihood of forming connections within their own department.
Network Diversity:

The network exhibits a certain degree of diversity, with some professors connecting across departments and others primarily connecting within their own department.