# Social Netwotk Analysis - Analysis of the Network Structure
This notebook will delve deeper into the network structure of the provided social network. The main subject of this notebook is to analyze the network from a visual perspective and to determine popular nodes.

# Approach
In order to analyze the network structure several approaches will be tested an evaluated.
Therefore, the following approach will be applied:

1. Loading the data
   1. Imports
   2. Loading Network
   3. Visual Impression
2. Calculating modularity metrics
   1. Calculation
   2. Visualization


# 1. Loading the data

## 1.1 Imports

In [1]:
import pandas as pd
import networkx as nx

from igraph import (
    Graph,
    plot,
)

  from .autonotebook import tqdm as notebook_tqdm


## 1.2 Load Network

In [2]:
# Load network as Pandas DataFrame
df_network = pd.read_csv(
    "../data/graph.csv",
    delimiter=",",
)
df_nodes = pd.read_csv(
    "../data/nodes.csv",
    delimiter=",",
)

In [3]:
# Convert network into a networkX Object
x_network = nx.from_pandas_edgelist(
    df_network,
    "source",
    "target",
)
print(x_network)

Graph with 46849 nodes and 94884 edges


In [4]:
# Convert network into an igraph object
# NOTE: IGraph will be mainly used to conduct tasks with a high computational complexity,
# since this library is implemented in C
igraph_graph = Graph.from_networkx(
    x_network
)
print(
    igraph_graph.summary()
)

IGRAPH U--- 46849 94884 -- 
+ attr: _nx_name (v)


## 1.3 Create a simple plot of the Network

Use IGraph instead of NetworkX --> Plot in 40 Seconds instead of 2 Hours <br>
Note: You will need to install PyCairo to use the plotting functionality of IGraph <br>

PyCairo depends on the local installation of pkg-config, cairo and CMake on your local computer!

In [None]:
# Let IGraph determine the best fitting layout
layout = igraph_graph.layout(
    "auto"
)

plot(
    igraph_graph,
    layout=layout,
    vertex_size=5,
    edge_width=0.1,
    bbox=(500, 500),
)


**First Visual Impression**: The plot already displays a large number of small communities with overlapping nodes, whereas most nodes have edges pointing into the center. It seems that a large cluster has formed in the center.

# 2. Popularity Metrics
In this section the following five popularity metrics from lecture will be calculated and visualized:

1. Degree Centrality
2. Eigenvector Centrality
3. Page Rank Centrality
4. Betweeness Centrality
5. Closeness Centrality

## 2.1 Calculation


In [5]:
# Calculation of the Degree Centrality
degree_centrality = nx.degree_centrality(
    x_network
)

In [6]:
# Calculation of the Eigenvector Centrality
eigenvector_centrality = nx.eigenvector_centrality(
    x_network,
    max_iter=1000,
    weight="weight",
)

In [7]:
# Calculation of the Page Rank Centrality
page_rank_centrality = nx.pagerank(
    x_network,
    weight="weight",
)

In [8]:
# Calculation of the Betweeness Centrality
# Use IGraph instead of NetworkX, since computations are handled in C
betweeness_values = igraph_graph.betweenness()

# Store list of betweeness values in a dict with their corresponding user_id
betweeness_centrality = {
    igraph_graph.vs[
        i
    ][
        "_nx_name"
    ]: betweeness_values[
        i
    ]
    for i in range(
        len(
            betweeness_values
        )
    )
}

In [10]:
# Calculation of the Closeness Centrality
# Use IGraph instead of NetworkX, since computations are handled in C
closeness_values = igraph_graph.closeness()

# Store list of closeness values in a dict with their corresponding user_id
closeness_centrality = {
    igraph_graph.vs[
        i
    ][
        "_nx_name"
    ]: closeness_values[
        i
    ]
    for i in range(
        len(
            closeness_values
        )
    )
}

In [15]:
# Store popularity metrics in a dataframe
df_popularity = pd.DataFrame(
    {
        "user_id": list(
            x_network.nodes
        ),
        "degree_centrality": list(
            degree_centrality.values()
        ),
        "eigenvector_centrality": list(
            eigenvector_centrality.values()
        ),
        "page_rank_centrality": list(
            page_rank_centrality.values()
        ),
        "betweenness_centrality": list(
            betweeness_centrality.values()
        ),
        "closeness_centrality": list(
            closeness_centrality.values()
        ),
    }
)

# Merge with node.csv to receiver usernames as well
df_merged = pd.merge(
    df_nodes,
    df_popularity,
    on="user_id",
)

# Store CSV in data folder
# df_merged.to_csv("../data/nodes_popularity_metrics.csv", index=False, sep=",")

## 2.2 Visualization

In [28]:
# Visualize Top 10 Users with the highest Degree Centrality
top_10_users_degree = df_popularity.nlargest(
    10,
    "degree_centrality",
)[
    "user_id"
].tolist()
node_colors = [
    "blue"
    if igraph_graph.vs[
        i
    ]["_nx_name"]
    in top_10_users_degree
    else "gray"
    for i in range(
        len(
            igraph_graph.vs
        )
    )
]
node_sizes = [
    20
    if igraph_graph.vs[
        i
    ]["_nx_name"]
    in top_10_users_degree
    else 5
    for i in range(
        len(
            igraph_graph.vs
        )
    )
]

#plot(
#    igraph_graph,
#    layout=igraph_graph.layout(
#        "auto"
#    ),
#    vertex_color=node_colors,
#    vertex_size=node_sizes,
#    edge_width=0.1,
#    bbox=(100, 100),
#)

In [29]:
# Visualize Top 10 Users with the highest Eigenvector Centrality
top_10_users_ev = df_popularity.nlargest(
    10,
    "eigenvector_centrality",
)[
    "user_id"
].tolist()
node_colors = [
    "yellow"
    if igraph_graph.vs[
        i
    ]["_nx_name"]
    in top_10_users_ev
    else "gray"
    for i in range(
        len(
            igraph_graph.vs
        )
    )
]
node_sizes = [
    20
    if igraph_graph.vs[
        i
    ]["_nx_name"]
    in top_10_users_ev
    else 5
    for i in range(
        len(
            igraph_graph.vs
        )
    )
]

#plot(
#    igraph_graph,
#    layout=igraph_graph.layout(
#        "auto"
#    ),
#    vertex_color=node_colors,
#    vertex_size=node_sizes,
#    edge_width=0.1,
#    bbox=(100, 100),
#)

In [30]:
# Visualize Top 10 Users with the highest PageRank Centrality
top_10_users_pr = df_popularity.nlargest(
    10,
    "page_rank_centrality",
)[
    "user_id"
].tolist()
node_colors = [
    "green"
    if igraph_graph.vs[
        i
    ]["_nx_name"]
    in top_10_users_pr
    else "gray"
    for i in range(
        len(
            igraph_graph.vs
        )
    )
]
node_sizes = [
    20
    if igraph_graph.vs[
        i
    ]["_nx_name"]
    in top_10_users_pr
    else 5
    for i in range(
        len(
            igraph_graph.vs
        )
    )
]

#plot(
#    igraph_graph,
#    layout=igraph_graph.layout(
#        "auto"
#    ),
#    vertex_color=node_colors,
#    vertex_size=node_sizes,
#    edge_width=0.1,
#    bbox=(100, 100),
#)

In [31]:
# Visualize Top 10 Users with the highest Betweenness Centrality
top_10_users_betw = df_popularity.nlargest(
    10,
    "betweenness_centrality",
)[
    "user_id"
].tolist()
node_colors = [
    "orange"
    if igraph_graph.vs[
        i
    ]["_nx_name"]
    in top_10_users_betw
    else "gray"
    for i in range(
        len(
            igraph_graph.vs
        )
    )
]
node_sizes = [
    20
    if igraph_graph.vs[
        i
    ]["_nx_name"]
    in top_10_users_betw
    else 5
    for i in range(
        len(
            igraph_graph.vs
        )
    )
]

#plot(
#    igraph_graph,
#    layout=igraph_graph.layout(
#        "auto"
#    ),
#    vertex_color=node_colors,
#    vertex_size=node_sizes,
#    edge_width=0.1,
#    bbox=(100, 100),
#)

In [32]:
# Visualize Top 10 Users with the highest Closeness Centrality
top_10_users_close = df_popularity.nlargest(
    10,
    "closeness_centrality",
)[
    "user_id"
].tolist()
node_colors = [
    "red"
    if igraph_graph.vs[
        i
    ]["_nx_name"]
    in top_10_users_close
    else "gray"
    for i in range(
        len(
            igraph_graph.vs
        )
    )
]
node_sizes = [
    20
    if igraph_graph.vs[
        i
    ]["_nx_name"]
    in top_10_users_close
    else 5
    for i in range(
        len(
            igraph_graph.vs
        )
    )
]

#plot(
#    igraph_graph,
#    layout=igraph_graph.layout(
#        "auto"
#    ),
#    vertex_color=node_colors,
#    vertex_size=node_sizes,
#    edge_width=0.1,
#    bbox=(100, 100),
#)