<p style="font-family: Verdana; letter-spacing: 2px; color:#000000; font-size:300%; padding: 0px; text-align:center;">
    <b>Fundamentals of Social Network Analysis (SNA)</b>

Network analysis is a multifaceted discipline that delves into the structure, dynamics, and behaviors of networks, which are composed of interconnected nodes and links. In the realm of social media, platforms like Facebook and Twitter use network analysis to understand user interactions, track the spread of information, and identify influential users who can drive trends. In biology, it aids in mapping complex protein interactions and understanding the intricate web of metabolic pathways in cells. Similarly, in epidemiology, network analysis models the spread of diseases like COVID-19, helping to predict outbreaks and evaluate the effectiveness of intervention strategies. Financial institutions leverage it to analyze interbank transactions and assess systemic risks in the financial system. By employing mathematical and computational techniques, network analysis facilitates the understanding of how nodes interact within a network, the identification of influential nodes, the discovery of community structures, and the prediction of network evolution. This analytical approach not only illuminates the underlying patterns and properties of networks but also aids in solving real-world problems by providing insights into the resilience, efficiency, and functionality of interconnected systems.

![](https://upload.wikimedia.org/wikipedia/commons/9/9b/Social_Network_Analysis_Visualization.png)

*A complex network by [Martin Grandjean](https://commons.wikimedia.org/wiki/User:SlvrKy) via Wikimedia Commons*
    
In this notebook I will give a quick overview about the most important concepts in terms of SNA.
    
```python
if helpful:
    print('Please upvote ❤️')
```

In [1]:
!pip install pyvis
!pip install infomap

Collecting pyvis
  Downloading pyvis-0.3.2-py3-none-any.whl.metadata (1.7 kB)
Collecting jsonpickle>=1.4.1 (from pyvis)
  Downloading jsonpickle-3.2.2-py3-none-any.whl.metadata (7.2 kB)
Downloading pyvis-0.3.2-py3-none-any.whl (756 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m756.0/756.0 kB[0m [31m9.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading jsonpickle-3.2.2-py3-none-any.whl (41 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.8/41.8 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: jsonpickle, pyvis
Successfully installed jsonpickle-3.2.2 pyvis-0.3.2
Collecting infomap
  Downloading infomap-2.8.0.tar.gz (264 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m264.8/264.8 kB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l- \ done
[?25hBuilding wheels for collected packages: infomap
  Building wheel for infomap (setup.py) ... 

In [2]:
import networkx as nx
import matplotlib.pyplot as plt
import numpy as np
import random
from pyvis.network import Network
import community as community_louvain
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
from sklearn.cluster import SpectralClustering
from networkx.algorithms.community import girvan_newman
from networkx.algorithms.community import label_propagation_communities
from infomap import Infomap

import warnings
warnings.filterwarnings('ignore')

# Into Network Graphs

Let´s start very simple and create a very basic example. I will use `NetworkX` to create the network. 

In [3]:
random.seed(1702)
np.random.seed(1702)

def create_graph():
    G = nx.Graph()
    nodes = range(40)
    G.add_nodes_from(nodes)

    for i in range(40):
        for j in range(i+1, 40):
            if random.random() < 0.2:
                weight = random.randint(1, 10)
                G.add_edge(i, j, weight=weight)
    return G

In [4]:
G = create_graph()

In [5]:
degrees = np.array([G.degree[node] for node in G.nodes()])
min_degree, max_degree = degrees.min(), degrees.max()

cmap = plt.cm.get_cmap('coolwarm')
node_colors = {
    node: mcolors.to_hex(
        cmap((G.degree[node] - min_degree) / (max_degree - min_degree))
    )
    for node in G.nodes()
}

net = Network(height="700px", width="800px", bgcolor="#ffffff", font_color="black", notebook=True, cdn_resources='in_line')
node_size = 15

for node in G.nodes():
    degree = G.degree[node]
    net.add_node(node, label=f"Node: {node}", size=node_size, color=node_colors[node])

for edge in G.edges():
    net.add_edge(edge[0], edge[1])

net.set_options("""
var options = {
  "physics": {
    "enabled": true
  }
}
""")

net.show('network.html')

network.html


A graph in this context is made up of vertices (also called nodes or points) which are connected by edges (also called arcs, links or lines). A distinction is made between undirect graphs, where edges link two vertices symmetrically, and direct graphs, where edges link two vertices asymmetrically. I randomly created 40 nodes with random edge weights. You can see that the nodes are different sizes. This is to show a visual representation of the importance of the note. The more connections a node has, the larger it is. More on this in the course of the notebook. 

# Metrics
In Social Network Analysis (SNA), the two main metrics often focused on are centrality and communities. These metrics provide insights into the importance of nodes (individuals or entities) and the overall structure of the network.

## Centrality
Not all nodes are equally important. Some nodes play an outsized role in the network, e.g. if they have many connections. Centrality measures the importance or influence of a node within a network. 
Let´s look at a simpler graph. From my example graph above, the importance of certain nodes cannot be determined with the naked eye. Not yet 😉

![](https://upload.wikimedia.org/wikipedia/commons/4/45/Srep17095-f1.jpg)

*By <a href="//commons.wikimedia.org/w/index.php?title=User:Ajalvare&amp;action=edit&amp;redlink=1" class="new" title="User:Ajalvare (page does not exist)">Ajalvare</a> - <span class="int-own-work" lang="en">Own work</span>, <a href="https://creativecommons.org/licenses/by-sa/4.0" title="Creative Commons Attribution-Share Alike 4.0">CC BY-SA 4.0</a>, <a href="https://commons.wikimedia.org/w/index.php?curid=45967327">Link</a>*

In the illustrated network, green and red nodes are the most dissimilar because they do not share neighbors between them. So, the green one contributes more to the centrality of the red one than the gray ones, because the red one can access to the blue ones only through the green, and the gray nodes are redundant for the red one, because it can access directly to each gray node without any intermediary.

There are several types of centrality, let´s go through each of them. 

In [6]:
def create_network_with_centrality(G, centrality_measure):
    # Compute the centrality
    if centrality_measure == "degree":
        centrality = nx.degree_centrality(G)
    elif centrality_measure == "weighted_degree":
        centrality = {node: sum(data['weight'] for _, _, data in G.edges(node, data=True)) for node in G.nodes()}
    elif centrality_measure == "betweenness":
        centrality = nx.betweenness_centrality(G, weight='weight')
    elif centrality_measure == "closeness":
        centrality = nx.closeness_centrality(G)
    elif centrality_measure == "eigenvector":
        centrality = nx.eigenvector_centrality(G, max_iter=1000)
    elif centrality_measure == "pagerank":
        centrality = nx.pagerank(G, weight='weight')
    else:
        raise ValueError("Invalid centrality measure")

    min_cent = min(centrality.values())
    max_cent = max(centrality.values())

    cmap = plt.cm.get_cmap('coolwarm')
    node_colors = {
        node: mcolors.to_hex(
            cmap((centrality[node] - min_cent) / (max_cent - min_cent))
        )
        for node in G.nodes()
    }

    net = Network(height="700px", width="1000px", bgcolor="#ffffff", font_color="black", notebook=True, cdn_resources='in_line')
    node_size = 15
    for node in G.nodes():
        net.add_node(node, label=f"{centrality[node]:.2f}", size=node_size, color=node_colors[node])
    
    # Normalize edge thickness based on weight
    min_weight = min(nx.get_edge_attributes(G, 'weight').values())
    max_weight = max(nx.get_edge_attributes(G, 'weight').values())
    
    for edge in G.edges(data=True):
        u, v, data = edge
        weight = data['weight']
        thickness = ((weight - min_weight) / (max_weight - min_weight)) * 3 + 0.5
        net.add_edge(u, v, width=thickness)
        
    #for edge in G.edges():
    #    net.add_edge(edge[0], edge[1])
    net.set_options("""
    var options = {
      "physics": {
        "enabled": true
      }
    }
    """)
    
    return net

### Degree Centrality

Degree centrality is a fundamental measure in network analysis that quantifies the importance of a node based on the number of direct connections it has. It is one of the simplest forms of centrality and can provide valuable insights into the structure and dynamics of a network.

Degree centrality for a node is essentially the count of its direct connections, also known as edges. For a given node, the degree centrality is defined as the number of edges that connect to it. In directed networks, degree centrality can be split into two types: in-degree, which counts the number of incoming edges, and out-degree, which counts the number of outgoing edges.

In [7]:
degree_net = create_network_with_centrality(G, "degree")
degree_net.show("degree_centrality_graph.html")

degree_centrality_graph.html


### Weighted Degree Centrality
Weighted degree centrality extends the concept of degree centrality by considering the weights of edges. While degree centrality counts the number of direct connections a node has, weighted degree centrality sums the weights of those connections, providing a more nuanced measure of a node's importance. In networks where connections have varying strengths or capacities, weighted degree centrality gives a better indication of the total "influence" or "activity" of a node. This is particularly useful in networks where not all connections are equal, such as social networks where relationships have different levels of interaction or transportation networks where routes have different capacities.

In [8]:
weighted_degree_net = create_network_with_centrality(G, "weighted_degree")
weighted_degree_net.show("weighted_degree_centrality_graph.html")

weighted_degree_centrality_graph.html


### Betweenness Centrality
Betweenness centrality is a measure that quantifies the importance of a node based on its role as a bridge in the network. It is defined as the number of times a node acts as a bridge along the shortest path between two other nodes. Nodes with high betweenness centrality often control the flow of information or resources in the network, as they lie on many of the shortest paths connecting different pairs of nodes. This measure is particularly useful for identifying influential nodes that facilitate communication or flow in networks such as transportation systems, social networks, and communication networks.

In [9]:
betweenness_net = create_network_with_centrality(G, "betweenness")
betweenness_net.show("betweenness_centrality_graph.html")

betweenness_centrality_graph.html


### Closeness Centrality
Closeness centrality measures how quickly a node can access other nodes in a network. It is defined as the reciprocal of the average shortest path distance from a node to all other nodes in the network. A node with high closeness centrality can reach other nodes more quickly and efficiently than a node with lower closeness centrality. This centrality measure is useful in scenarios where the speed of reaching other nodes is crucial, such as in communication networks or organizational hierarchies where the efficiency of information dissemination is important.

In [10]:
closeness_net = create_network_with_centrality(G, "closeness")
closeness_net.show("closeness_centrality_graph.html")

closeness_centrality_graph.html


### Eigenvector Centrality
Eigenvector centrality extends the idea of degree centrality by considering not just the quantity but also the quality of connections. It assigns relative scores to all nodes in the network based on the principle that connections to high-scoring nodes contribute more to the node's score than equal connections to low-scoring nodes. Eigenvector centrality identifies nodes that are not only well-connected but are connected to other well-connected nodes. It is particularly useful in social networks where influence and prestige are key, as it helps identify nodes that are influential by association with other influential nodes.

In [11]:
eigenvector_net = create_network_with_centrality(G, "eigenvector")
eigenvector_net.show("eigenvector_centrality_graph.html")

eigenvector_centrality_graph.html


### Page Rank
PageRank is a variant of eigenvector centrality that was originally developed by Google to rank web pages in search results. It measures the importance of a node based on the structure of the network and the concept of link "voting" or "recommendation." A node receives a high PageRank if it is linked to by many nodes, especially if those nodes themselves have high PageRank. Unlike eigenvector centrality, PageRank also introduces a damping factor that allows the centrality to capture the probability of randomly arriving at a node by following links. This centrality measure is valuable in web networks, citation networks, and any system where the notion of "endorsed importance" is relevant.

In [12]:
pagerank_net = create_network_with_centrality(G, "pagerank")
pagerank_net.show("pagerank_centrality_graph.html")

pagerank_centrality_graph.html


## Centrality Summary
The various centrality defintions can be overwhelming. Actually they can described much easier with just one question. 

1. **Degree Centrality:** Do you have many connections?
2. **Weighted Degree Centrality:** Do you have many interactions?
3. **Betweenness Centrality:** Do you help to connect different parts of the network?
4. **Closeness Centrality:** How quickly can you reach all other nodes in the network?
5. **Eigenvector Centrality:** Do you have many connections to important people?
6. **PageRank:** Do you have many interactions with important people?

## Communities
Communities in networks are subsets of nodes that are more densely connected internally than with the rest of the network. This concept stems from the observation that in many real-world networks, such as social, biological, and technological networks, nodes tend to form tightly-knit groups with strong internal connections. These groups often represent functional or logical units within the network. For example, in a social network, communities might correspond to groups of friends or colleagues who interact frequently. In a biological network, they could represent protein complexes or metabolic pathways.

The existence of communities is a key feature of complex networks, distinguishing them from random networks. Communities are often characterized by a high clustering coefficient, which is a measure of the degree to which nodes in a graph tend to cluster together. This clustering is typically much higher in real-world networks than in random graphs of the same size and density.

The theoretical foundation of community detection is built upon the concept of modularity, introduced by Newman and Girvan. Modularity is a measure used to quantify the strength of division of a network into communities. It is defined as the fraction of the edges that fall within the given groups minus the expected fraction if edges were distributed at random. A high modularity score indicates a strong community structure, where many edges fall within communities rather than between them.

In [13]:
def create_network_with_communities(G, communities):
    distinct_colors = ['#FF0000', '#0000FF', '#008000', '#000000', '#FFA500', 
                       '#800080', '#00FFFF', '#FFD700', '#FF00FF', '#C0C0C0']

    community_colors = {
        community: distinct_colors[i % len(distinct_colors)]
        for i, community in enumerate(set(communities.values()))
    }

    net = Network(height="700px", width="1000px", bgcolor="#ffffff", font_color="black", notebook=True, cdn_resources='in_line')

    node_size = 15
    for node in G.nodes():
        community = communities[node]
        net.add_node(node, label=str(node), size=node_size, color=community_colors[community])

    min_weight = min(nx.get_edge_attributes(G, 'weight').values())
    max_weight = max(nx.get_edge_attributes(G, 'weight').values())

    for edge in G.edges(data=True):
        u, v, data = edge
        weight = data['weight']
        thickness = ((weight - min_weight) / (max_weight - min_weight)) * 3 + 0.5
        net.add_edge(u, v, width=thickness)

    net.set_options("""
    var options = {
      "physics": {
        "enabled": true
      }
    }
    """)

    return net

### Louvain Method
The Louvain method is one of the most popular algorithms due to its efficiency and scalability. It works by iteratively optimizing the modularity of the network. Initially, each node is assigned to its own community. The algorithm then repeatedly merges communities in a way that maximizes the overall modularity until no further improvements can be made. The result is a hierarchy of communities, from fine to coarse granularity.

In [14]:
def louvain_communities(G):
    partition = community_louvain.best_partition(G, weight='weight')
    return partition

In [15]:
louvain_communities = louvain_communities(G)
louvain_net = create_network_with_communities(G, louvain_communities)
louvain_net.show("Louvain_communities_graph.html")

Louvain_communities_graph.html


### Girvan-Newman Algorithm
The Girvan-Newman algorithm focuses on edge betweenness centrality, which measures the number of shortest paths that pass through an edge. By iteratively removing edges with the highest betweenness centrality, the algorithm isolates communities by progressively breaking the network apart. This approach is computationally intensive and better suited for smaller networks.

In [16]:
def girvan_newman_communities(G):
    comp = girvan_newman(G)
    limited = tuple(sorted(c) for c in next(comp))
    community_map = {}
    for idx, community in enumerate(limited):
        for node in community:
            community_map[node] = idx
    return community_map

In [17]:
gn_communities = girvan_newman_communities(G)
gn_net = create_network_with_communities(G, gn_communities)
gn_net.show("Girvan-Newman_communities_graph.html")

Girvan-Newman_communities_graph.html


### Spectral Clustering
Spectral clustering leverages the eigenvectors of matrices associated with the graph, such as the Laplacian matrix. By examining the top eigenvectors, the algorithm partitions the graph into communities. This method is effective for identifying well-separated clusters but can be sensitive to noise and outliers.

In [18]:
def spectral_clustering_communities(G, n_clusters=4):
    adjacency_matrix = nx.to_numpy_array(G)
    sc = SpectralClustering(n_clusters=n_clusters, affinity='precomputed', random_state=42)
    labels = sc.fit_predict(adjacency_matrix)
    community_map = {node: labels[i] for i, node in enumerate(G.nodes())}
    return community_map

In [19]:
spectral_communities = spectral_clustering_communities(G)
spectral_net = create_network_with_communities(G, spectral_communities)
spectral_net.show("Spectral_Clustering_communities_graph.html")

Spectral_Clustering_communities_graph.html


### Label Propagation
Label propagation is a simple yet effective algorithm that assigns labels to nodes, which are then propagated through the network based on the labels of neighboring nodes. This process continues iteratively until a stable state is reached. Although fast and scalable, label propagation may yield different results on different runs due to its random initialization.

In [20]:
def label_propagation_communities_map(G):
    communities = list(label_propagation_communities(G))
    community_map = {}
    for idx, community in enumerate(communities):
        for node in community:
            community_map[node] = idx
    return community_map

In [21]:
lp_communities = label_propagation_communities_map(G)
lp_net = create_network_with_communities(G, lp_communities)
lp_net.show("Label_Propagation_communities_graph.html")

Label_Propagation_communities_graph.html


### Infomap
Infomap uses an information-theoretic approach to community detection by modeling the network as a flow of random walks. It identifies communities by finding the best compression of this flow, which corresponds to densely connected regions. Infomap is particularly effective in networks where the flow of information is of interest, such as communication networks.

In [22]:
def infomap_communities(G):
    infomap = Infomap()
    for edge in G.edges(data=True):
        infomap.addLink(edge[0], edge[1], edge[2]['weight'])
    infomap.run()
    community_map = {node.node_id: node.module_id for node in infomap.iterLeafNodes()}
    return community_map

In [23]:
infomap_communities = infomap_communities(G)
infomap_net = create_network_with_communities(G, infomap_communities)
infomap_net.show("Infomap_communities_graph.html")

  Infomap v2.8.0 starts at 2024-08-07 11:53:42
  -> Input network: 
  -> No file output!
  OpenMP 201511 detected with 4 threads...
  -> Ordinary network input, using the Map Equation for first order network flows
Calculating global network flow using flow model 'undirected'... 
  -> Using undirected links.
  => Sum node flow: 1, sum link flow: 1
Build internal network with 40 nodes and 138 links...
  -> One-level codelength: 5.2137009

Trial 1/1 starting at 2024-08-07 11:53:42
Two-level compression: -1.9% 0.43% Infomap_communities_graph.html


# Additional Information
In addition to communities and centralities, there are several other important concepts and techniques in Social Network Analysis that can provide deeper insights into network structures and dynamtics. Let´s go throw a few ones. 

## Network Topology and Properties
### Network Density
This measures how closely the network is to being complete. It is the ratio of the number of edges in the network to the number of possible edges. A dense network has many connections between nodes, whereas a sparse network has few. For example, our network has 40 nodes, each node could potentially connect to 39 other nodes. A density of 100% (40/40) is the greatest density in the system. In our case the network density of 0.18 suggests a relatively low to moderate level of interconnectedness among the nodes within the network. This can imply that the network is sparse, with many potential relationships not being actualized. 

In [24]:
density = nx.density(G)
print(f"Network Density: {density:.2f}")

Network Density: 0.18


### Diameter and Average Path Length
The diameter of a network is the longest shortest path between any two nodes in the network. A diameter of 3 means:
- Shortest Longest Path: The maximum distance you would have to travel between the most distant pair of nodes in the network is 3 edges. This is relatively short, indicating that even the most remote nodes are not very far from each other.
- Cohesion and Compactness: The network is fairly cohesive and compact, suggesting that it has a relatively efficient structure where information or resources can spread quickly from one node to any other in a few steps.

The average path length is the average number of steps along the shortest paths for all possible pairs of network nodes. An average path length of 2.06 indicates:
- Quick Connectivity: On average, any node is about 2 steps away from any other node. This further supports the idea of a tightly connected network.
- Efficiency in Communication or Flow: This short average path length implies that the network can facilitate quick and efficient communication or transfer of information, resources, or influence among its members.

In [25]:
if nx.is_connected(G):
    diameter = nx.diameter(G)
    print(f"Diameter: {diameter}")
else:
    print("Graph is not connected; diameter is undefined.")

if nx.is_connected(G):
    avg_path_length = nx.average_shortest_path_length(G)
    print(f"Average Path Length: {avg_path_length:.2f}")
else:
    print("Graph is not connected; average path length is undefined.")

Diameter: 3
Average Path Length: 2.06


### Clustering Coefficient
This metric measures the degree to which nodes in a network tend to cluster together. A high clustering coefficient indicates a high likelihood of forming tightly knit groups.
A clustering coefficient of 0.17 suggests:
- Moderate Clustering: Nodes in the network on average have about 17% of the maximum possible connections among their immediate neighbors. This indicates a moderate level of clustering within the network.
- Local Connectivity: The relatively low value (since it is closer to 0 than to 1) implies that while there are some tightly knit groups or cliques within the network, they are not the norm across the entire network. Most nodes do not form tightly connected groups.
- Network Structure: The network likely has a mix of some densely connected subgroups and many nodes with looser connections to their neighbors. This suggests variability in how groups within the network are structured.

In [26]:
clustering_coefficient = nx.average_clustering(G)
print(f"Average Clustering Coefficient: {clustering_coefficient:.2f}")

Average Clustering Coefficient: 0.17


### Network Components
Identifying connected components can reveal isolated sub-networks within a larger network. This is useful for understanding disconnected groups.
In our network we only have one connected component in the graph. 

In [27]:
components = list(nx.connected_components(G))
print(f"Number of connected components: {len(components)}")

for i, component in enumerate(components):
    print(f"Component {i + 1}: {component}")

Number of connected components: 1
Component 1: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39}


## Node-Level Metrics
### K-core Decomposition
This identifies subgraphs where each node is connected to at least k other nodes. K-core decomposition can help identify central, tightly-knit regions of a network. A k-core is a subgraph in which each node has at least k connections within that subgraph. The core number of a node is the highest value k for which that node is part of a k-core. This metric is used to identify highly interconnected groups of nodes and the relative influence of centralitiy nodes within the network. 

Most of our nodes (almost all except two) have a core number of 4. This indicates that these nodes are part of a subgraph where each node has at least four connections to other nodes within this subgraph. It suggests that the majority of the network is very well-connected. 

Therefore we can make some implications:
- Network Cohesion: The high core numbers across most nodes suggest a strong cohesion and robustness in the network. Nodes within a high k-core are generally more central to the network and can influence it significantly.
- Resilience and Stability: Networks with high k-core values are generally more resilient to random failures or attacks. Because nodes in high k-cores are interconnected, the removal of one node often doesn't isolate others.
- Focus Areas for Analysis or Intervention: Nodes with lower core numbers might either represent peripheral areas of the network or potential points of vulnerability. These could be the focus for strategies aimed at increasing connectivity or safeguarding the network.

In [28]:
core_number = nx.core_number(G)
print("Core number for each node:")
for node, core in core_number.items():
    print(f"Node {node}: Core {core}")

Core number for each node:
Node 0: Core 4
Node 1: Core 4
Node 2: Core 4
Node 3: Core 3
Node 4: Core 4
Node 5: Core 4
Node 6: Core 4
Node 7: Core 4
Node 8: Core 4
Node 9: Core 4
Node 10: Core 4
Node 11: Core 4
Node 12: Core 4
Node 13: Core 4
Node 14: Core 4
Node 15: Core 4
Node 16: Core 4
Node 17: Core 4
Node 18: Core 4
Node 19: Core 4
Node 20: Core 4
Node 21: Core 3
Node 22: Core 4
Node 23: Core 4
Node 24: Core 4
Node 25: Core 4
Node 26: Core 4
Node 27: Core 4
Node 28: Core 4
Node 29: Core 4
Node 30: Core 4
Node 31: Core 4
Node 32: Core 4
Node 33: Core 4
Node 34: Core 4
Node 35: Core 4
Node 36: Core 4
Node 37: Core 4
Node 38: Core 4
Node 39: Core 4


### Bridges and Cut Points
Bridges are edges whose removal increases the number of connected components, while cut points (or articulation points) are nodes whose removal increases the number of components. Identifying these can highlight critical connections or nodes. We have no bridges and cut points in our network graph. This implies that no single edge is critial to maintaining the network´s connectivity. Since we do not have any cutting points the network can withstand the removal of any single node without fragmenting into disconnected parts. 

In [29]:
bridges = list(nx.bridges(G))
print(f"Bridges: {bridges}")

cut_points = list(nx.articulation_points(G))
print(f"Cut Points: {cut_points}")

Bridges: []
Cut Points: []


## Edge-Level Metrics
### Edge Betweenness Centrality
Similar to node betweenness centrality, this metric measures the number of shortest paths that pass through an edge, indicating its importance in connecting different parts of the network.

Edges with higher betweenness centrality values are more frequently involved in the shortest paths between pairs of nodes. Conversely, edges with lower betweenness centrality scores are less critical in connecting distant parts of the network. Most of the edge centrality values range from 0.01 to 0.02. This suggests that while there is some variation in how critical different edges are, the network likely does not have extremely dominant pathways that significantly outstrip others in importance. This can be indicative of a network with a fairly distributed connectivity, where no single edge is overwhelmingly more critical than others.

The edges with the highest betweenness centrality (close to 0.02) are the most strategic for maintaining connectivity and should be prioritized in network maintenance or when planning improvements. 

In [30]:
edge_betweenness = nx.edge_betweenness_centrality(G)
print("Edge Betweenness Centrality:")
for edge, centrality in edge_betweenness.items():
    print(f"Edge {edge}: {centrality:.2f}")

Edge Betweenness Centrality:
Edge (0, 4): 0.02
Edge (0, 5): 0.02
Edge (0, 12): 0.01
Edge (0, 15): 0.01
Edge (0, 17): 0.02
Edge (0, 23): 0.01
Edge (0, 25): 0.02
Edge (0, 28): 0.01
Edge (0, 34): 0.02
Edge (1, 3): 0.01
Edge (1, 4): 0.02
Edge (1, 8): 0.01
Edge (1, 36): 0.02
Edge (1, 38): 0.01
Edge (2, 23): 0.01
Edge (2, 24): 0.02
Edge (2, 25): 0.02
Edge (2, 38): 0.01
Edge (3, 4): 0.02
Edge (3, 18): 0.02
Edge (4, 6): 0.02
Edge (4, 7): 0.02
Edge (4, 10): 0.02
Edge (4, 11): 0.02
Edge (4, 15): 0.01
Edge (4, 22): 0.01
Edge (4, 23): 0.02
Edge (4, 35): 0.02
Edge (4, 38): 0.01
Edge (5, 10): 0.01
Edge (5, 11): 0.02
Edge (5, 13): 0.01
Edge (5, 27): 0.01
Edge (5, 30): 0.01
Edge (5, 32): 0.02
Edge (5, 35): 0.01
Edge (6, 8): 0.01
Edge (6, 18): 0.01
Edge (6, 20): 0.01
Edge (6, 24): 0.01
Edge (6, 33): 0.02
Edge (6, 36): 0.01
Edge (7, 15): 0.01
Edge (7, 21): 0.02
Edge (7, 26): 0.01
Edge (7, 33): 0.02
Edge (8, 10): 0.01
Edge (8, 18): 0.01
Edge (8, 20): 0.01
Edge (8, 25): 0.02
Edge (8, 37): 0.01
Edge (9, 10

## Roles and Positions
### Structural Equivalence
Nodes are structurally equivalent if they have identical connections to other nodes. This does not necessarily mean that they are connected to each other, but rather that they share the same pattern of connection to other nodes. This concept helps in identifying nodes with similar roles or functions within the network. 

We have no pairs in the network, this tells us the following:
- Unique Connectivity Patterns: Every node in the network has a unique set of connections to other nodes. This uniqueness can suggest a diversity in the roles or positions that each node plays within the network. For example, in a social network, it could mean that no two individuals interact with exactly the same group of other people.
- Implications for Network Dynamics:
    - Different Influence or Roles: Since no two nodes are structurally equivalent, each node may exert influence or fulfill roles that are distinct from others within the network. This can have implications for the flow of information or resources, as different nodes might control access to different parts of the network.
    - Network Resilience and Redundancy: The absence of structurally equivalent nodes might mean that the network lacks redundancy in its connectivity patterns, which could affect its resilience. In some contexts, if each node has a unique connectivity pattern, the removal of any single node could potentially disrupt unique aspects of the network's functionality.
- Analytical Considerations:
    - Node Centrality and Clustering: Further analysis might focus on other aspects, such as centrality measures to understand which nodes are most critical in terms of their unique positions, or clustering analysis to identify how nodes group based on similar (though not identical) connection patterns.
    - Potential for Role Analysis: Without structural equivalence, role analysis could shift towards understanding similar but not identical roles, possibly using concepts like regular equivalence where nodes have similar patterns of connections but not necessarily identical ones.


In [31]:
def structurally_equivalent_pairs(G):
    eq_pairs = []
    nodes = list(G.nodes())
    for i in range(len(nodes)):
        for j in range(i + 1, len(nodes)):
            if set(G.neighbors(nodes[i])) == set(G.neighbors(nodes[j])):
                eq_pairs.append((nodes[i], nodes[j]))
    return eq_pairs

eq_pairs = structurally_equivalent_pairs(G)
print("Structurally Equivalent Pairs:", eq_pairs)

Structurally Equivalent Pairs: []


### Blockmodeling
This technique groups nodes into blocks or positions based on their structural similarity or roles, simplifying the analysis of large networks by focusing on the interactions between groups rather than individual nodes.

For blockmodeling, we need to cluster nodes based on some structural similarity. NetworkX doesn't directly support blockmodeling, but here's a basic example using clustering coefficients or degree as a proxy.

In [32]:
from sklearn.cluster import KMeans

features = np.array([nx.clustering(G, node) for node in G.nodes()]).reshape(-1, 1)
kmeans = KMeans(n_clusters=3, random_state=42).fit(features)
labels = kmeans.labels_

net = Network(notebook=True, height="750px", width="100%", bgcolor="#ffffff", font_color="black", cdn_resources='in_line')

colors = ['red', 'blue', 'green', 'orange', 'purple', 'yellow']
for node, label in enumerate(labels):
    net.add_node(node, label=str(node), color=colors[label])


for u, v in G.edges():
    net.add_edge(u, v, width=0.5) 

net.set_options("""
var options = {
  "physics": {
    "enabled": true
  }
}
""")
net.show("blockmodeling_interactive.html")

blockmodeling_interactive.html


## Homophily and Assortativity
### Homophily
This principle states that similar nodes tend to connect with each other. Measuring homophily can help understand the extent to which attributes like age, gender, or interests influence network formation. If the network has node attributes, you can calculate homophily. Here, let's assume nodes have a binary attribute.

The calculated homophily value is 0.49. This indicates that 49% of the edges in the network connect nodes with the same attribute (making random assignment expected to yield a homophily of about 0.5 purely by chance). 

In [33]:
# Assume nodes have a binary attribute
attributes = {node: random.choice(['A', 'B']) for node in G.nodes()}
nx.set_node_attributes(G, attributes, 'attribute')

# Calculate homophily
def calculate_homophily(G, attribute_name):
    same_attribute_edges = 0
    for u, v in G.edges():
        if G.nodes[u][attribute_name] == G.nodes[v][attribute_name]:
            same_attribute_edges += 1
    return same_attribute_edges / G.number_of_edges()

homophily = calculate_homophily(G, 'attribute')
print(f"Homophily: {homophily:.2f}")

Homophily: 0.49


### Assortativity
Both metrics provide insights into the correlation patterns of nodes within the network based on their connections and attributes. 

**Degree assortativity** measures the similarity of connections in the network with respect to node degree. A positive value indicates a tendency of nodes to connect with others that have a similar degree (assortative mixing), while a negative value suggests a tendency to connect with nodes of differing degrees (disassortative mixing). A degree assortativity of -0.10 indicates a slight disassortative mixing by degree. This means that nodes in the network tend to connect with others that have different degrees. For example, nodes with many connections (high degree) might be more likely to connect with nodes that have fewer connections (low degree), and vice versa. This can be common in networks like social networks where popular individuals (hubs) interact with less popular individuals, or in biological networks like protein interaction networks.

**Attribute assortativity** measures how much the nodes in a network tend to connect with others that have similar attributes. Similar to degree assortativity, a positive value indicates assortative mixing by the attribute, and a negative value indicates disassortative mixing. An attribute assortativity of -0.03 suggests a very slight tendency towards disassortative mixing based on the binary attribute ('A' or 'B'). Nodes are slightly more likely to connect with others that have a different attribute. However, the value is very close to zero, indicating that there is almost no strong pattern of attribute-based mixing in the network. Given that the attributes were randomly assigned, this result suggests that the attribute itself does not significantly influence the formation of links in the network.

In [34]:
# Degree assortativity
degree_assortativity = nx.degree_assortativity_coefficient(G)
print(f"Degree Assortativity: {degree_assortativity:.2f}")

# Attribute assortativity
attribute_assortativity = nx.attribute_assortativity_coefficient(G, 'attribute')
print(f"Attribute Assortativity: {attribute_assortativity:.2f}")

Degree Assortativity: -0.10
Attribute Assortativity: -0.03


## Influence and Power
### Influence Models
These models assess the potential impact of nodes based on their position and connections, helping identify key influencers in social networks.
The code will calculate the PageRank score for each node in the graph. 

Nodes with higher scores, such as nodes 4, 18, and 25 with scores of 0.042, 0.041, and 0.056 respectively, are likely central nodes in the network. These nodes are potentially critical for the flow of information or resources through the network because they act as hubs or important connectors. Nodes with lower scores, like nodes 2 and 21 with scores of 0.014 and 0.013, have fewer or less influential connections. These nodes might be more peripheral within the network's structure. Nodes with scores around the average (like nodes 0, 5, 11, and 23 with scores around 0.030) still play significant roles but are not as central as the highest scoring nodes.


In [35]:
# Using PageRank as an influence measure
pagerank = nx.pagerank(G)
print("PageRank (Influence):")
for node, score in pagerank.items():
    print(f"Node {node}: {score:.3f}")

PageRank (Influence):
Node 0: 0.030
Node 1: 0.021
Node 2: 0.014
Node 3: 0.016
Node 4: 0.042
Node 5: 0.030
Node 6: 0.031
Node 7: 0.018
Node 8: 0.028
Node 9: 0.016
Node 10: 0.027
Node 11: 0.030
Node 12: 0.027
Node 13: 0.025
Node 14: 0.020
Node 15: 0.028
Node 16: 0.020
Node 17: 0.020
Node 18: 0.041
Node 19: 0.029
Node 20: 0.025
Node 21: 0.013
Node 22: 0.024
Node 23: 0.030
Node 24: 0.031
Node 25: 0.056
Node 26: 0.020
Node 27: 0.023
Node 28: 0.017
Node 29: 0.029
Node 30: 0.021
Node 31: 0.032
Node 32: 0.022
Node 33: 0.032
Node 34: 0.016
Node 35: 0.017
Node 36: 0.026
Node 37: 0.014
Node 38: 0.022
Node 39: 0.017


### Power and Control
Understanding how power is distributed in a network can be crucial for analyzing hierarchies or dominance structures, especially in organizational or political networks.

In the follwoing code we calculate the betweenness centrality scores for each node in the network. We already talked about it earlier in the notebook. Let´s interpret the scores:
- High Scores: Nodes like 4, 18, 25, and 33 with relatively high betweenness centrality scores (0.091, 0.074, 0.105, 0.070 respectively) indicate that these nodes play a critical role in facilitating the flow of information or resources across the network. They act as significant bridges or junctions within the network, connecting different parts or clusters.
- Low Scores: Nodes such as 2, 3, 21, 34, and 35 with low scores (0.004, 0.002, 0.004, 0.007, 0.006) are less involved in connecting different parts of the network. These nodes may be more peripheral or located within densely connected subregions where pathways don't need to pass through them to reach other nodes.
- Medium Scores: Nodes with moderate betweenness centrality values serve as intermediate connectors within the network, facilitating some connectivity but not to the extent of the highest centrality nodes.

In [36]:
# Betweenness centrality
betweenness_centrality = nx.betweenness_centrality(G)
print("Betweenness Centrality (Power):")
for node, centrality in betweenness_centrality.items():
    print(f"Node {node}: {centrality:.3f}")

Betweenness Centrality (Power):
Node 0: 0.051
Node 1: 0.012
Node 2: 0.004
Node 3: 0.002
Node 4: 0.091
Node 5: 0.032
Node 6: 0.019
Node 7: 0.018
Node 8: 0.021
Node 9: 0.013
Node 10: 0.027
Node 11: 0.037
Node 12: 0.036
Node 13: 0.022
Node 14: 0.008
Node 15: 0.024
Node 16: 0.013
Node 17: 0.011
Node 18: 0.074
Node 19: 0.018
Node 20: 0.020
Node 21: 0.004
Node 22: 0.025
Node 23: 0.028
Node 24: 0.044
Node 25: 0.105
Node 26: 0.023
Node 27: 0.032
Node 28: 0.011
Node 29: 0.069
Node 30: 0.019
Node 31: 0.019
Node 32: 0.025
Node 33: 0.070
Node 34: 0.007
Node 35: 0.006
Node 36: 0.042
Node 37: 0.006
Node 38: 0.013
Node 39: 0.016


# Final Words
Social network analysis (SNA) provides powerful tools to understand the structures and dynamics of networks, ranging from interpersonal relationships within organizations to global digital communication networks. By leveraging various measures and techniques such as centralities, communities, and assortativity, you can glean deep insights into how networks function and how information or influence flows within them.
By combining these analytical tools, social network analysis allows researchers, policymakers, and practitioners to not only map and measure but also to predict and enhance interactions within various types of networks. The insights garnered can drive more informed decisions, tailored interventions, and ultimately, more effective outcomes across numerous fields.

>Networks are not merely sets of isolated components, but systems of interdependent pieces whose complex interactions determine their dynamical behavior.

*Duncan J. Watts* 

This quote, from his work on the dynamics of networks, underscores the complexity and interconnectivity within networks. It highlights how understanding these interdependencies is crucial to comprehending how networks function and evolve. This perspective is foundational in social network analysis, where the focus is often on how individual nodes and their connections influence the broader network behavior and outcomes.