#Social Network Analysis

Network theory is the study of graphs as a representation of either symmetric relations or asymmetric relations between discrete objects. In computer science and network science, network theory is a part of graph theory: a network can be defined as a graph in which nodes and/or edges have attributes (e.g. names).

Social network analysis (SNA) is the process of investigating social structures through the use of networks and graph theory. It characterizes networked structures in terms of nodes (individual actors, people, or things within the network) and the ties, edges, or links (relationships or interactions) that connect them.

In this prcatice we will use NetworkX. NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks. You can see the full documentation of NetworkX HERE

**Install & Import Libraries**

In [0]:
# Import Libraries
import pandas as pd
import matplotlib.pyplot as plt
import networkx as nx
import community
import seaborn as sns

## a.Network Construction

Here we construct a social network based on conversations about 'Pemilihan umum Gubernur DKI Jakarta 2017' on Twitter.

**Import Data**

In [0]:
# Import Data
df_paslon1 = pd.read_csv('https://raw.githubusercontent.com/dianrdn/data/master/paslon_1_reduced.csv')
df_paslon2 = pd.read_csv('https://raw.githubusercontent.com/dianrdn/data/master/paslon_2_reduced.csv')
df_paslon3 = pd.read_csv('https://raw.githubusercontent.com/dianrdn/data/master/paslon_3_reduced.csv')

# Show Data
df_paslon1

**Construct an Edge List**

In [0]:
# Select Columns and Create an Edge List
df_network1 = df_paslon1[['source', 'target']]
df_network2 = df_paslon2[['source', 'target']]
df_network3 = df_paslon3[['source', 'target']]

# Show Edgelist
df_network1

In [0]:
# Drop Missing Target from the Edge List
df_network1 = df_network1.mask(df_network1.eq('None')).dropna()
df_network2 = df_network2.mask(df_network2.eq('None')).dropna()
df_network3 = df_network3.mask(df_network3.eq('None')).dropna()

# Show Edgelist
df_network1

**Visualize Network for:**

Paslon 1

In [0]:
# Contstruct a Network
G1 = nx.from_pandas_edgelist(df_network1)

# Visualize the Network
import matplotlib.pyplot as plt
plt.figure(figsize=(50,50))
nx.draw(G1, with_labels=True, 
        node_color='skyblue', node_size=1200, 
        arrowstyle='->',arrowsize=20, edge_color='r',
        font_size=9,
        pos=nx.kamada_kawai_layout(G1))

Paslon 2

In [0]:
import networkx as nx
# Contstruct a Network
G2 = nx.from_pandas_edgelist(df_network2)

# Visualize the Network
import matplotlib.pyplot as plt
plt.figure(figsize=(50,50))
nx.draw(G2, with_labels=True, 
        node_color='skyblue', node_size=1200, 
        arrowstyle='->',arrowsize=20, edge_color='r',
        font_size=9,
        pos=nx.kamada_kawai_layout(G2))

Paslon 3

In [0]:
import networkx as nx
# Contstruct a Network
G3 = nx.from_pandas_edgelist(df_network3)

# Visualize the Network
import matplotlib.pyplot as plt
plt.figure(figsize=(50,50))
nx.draw(G3, with_labels=True, 
        node_color='skyblue', node_size=1200, 
        arrowstyle='->',arrowsize=20, edge_color='r',
        font_size=9,
        pos=nx.kamada_kawai_layout(G3))

## b.Network Metrics and Measurement

**Centrality Measurement**

In graph theory and network analysis, indicators of centrality identify the most important vertices within a graph. Applications include identifying the most influential person(s) in a social network, key infrastructure nodes in the Internet or urban networks, and super-spreaders of disease. Centrality concepts were first developed in social network analysis, and many of the terms used to measure centrality reflect their sociological origin.

In [0]:
# Degree Centrality
degree = nx.degree_centrality(G1)

# Sorted from the Highest
sorted(nx.degree(G1), key=lambda x: x[1], reverse=True)[0:10]

In [0]:
# Betweenness Centrality
betweenness = nx.betweenness_centrality(G1)

# Sorted from the Highest
sorted(nx.betweenness_centrality(G1, normalized=True).items(), key=lambda x:x[1], reverse=True)[0:10]

In [0]:
# Closeness Centrality
closeness = nx.closeness_centrality(G1)

# Sorted from the Highest
sorted(nx.closeness_centrality(G1).items(), key=lambda x:x[1], reverse=True)[0:10]

In [0]:
# Eigenvector Centrality
eigenvector = nx.eigenvector_centrality(G1)

# Sorted from the Highest
sorted(nx.eigenvector_centrality(G1).items(), key=lambda x:x[1], reverse=True)[0:10]

Visualize Centrality Score with Scatter Plot

In [0]:
# Convert Centralities to Data Frame
df_degree = pd.Series(degree).to_frame('degree_centrality')
df_betweenness = pd.Series(betweenness).to_frame('betweenness_centrality')
df_closeness = pd.Series(closeness).to_frame('closeness_centrality')
df_eigenvector = pd.Series(eigenvector).to_frame('eigenvector_centrality')

# Join Centralities Data Frame
df_centrality = pd.concat([df_degree, df_betweenness, df_closeness, df_eigenvector], axis = 1)
df_centrality['username'] = df_centrality.index
df_centrality = df_centrality.reset_index(drop = True)
df_centrality = df_centrality.sort_values(by=['degree_centrality'], ascending = False)
df_centrality = df_centrality.melt('username', var_name='cols',  value_name='centrality')
df_centrality

In [0]:
# Visualize Scatter Plot
plt.figure(figsize=(20,9))
sns.scatterplot(x='username', y='centrality', hue='cols', data=df_centrality)

Visualize Network based on Centrality Measurement

In [0]:
# Set Degree Dictionary
d = dict(degree)

# Contstruct a Network
G1 = nx.from_pandas_edgelist(df_network1)

# Visualize the Network
import matplotlib.pyplot as plt
plt.figure(figsize=(50,50))
nx.draw(G1, with_labels=True, 
        node_color='skyblue', nodelist=d.keys(),
        node_size=[v * 100000 for v in d.values()], 
        arrowstyle='->',arrowsize=20, edge_color='r',
        font_size=9,
        pos=nx.kamada_kawai_layout(G1))

**Network Topology Measurement**

The configuration, or topology, of a network is key to determining its performance. Network topology is the way a network is arranged, including the physical or logical description of how links and nodes are set up to relate to each other.

In [0]:
# Show Number of Nodes
nx.number_of_nodes(G1)

In [0]:
# Show Number of Edges
nx.number_of_edges(G1)

In [0]:
# Show Graph Density
nx.density(G1)

In [0]:
# Show Number of Connected Component
nx.number_connected_components(G1)

## c.Community Detection

Community detection is a fundamental problem in social network analysis consisting, roughly speaking, in dividing social actors (modelled as nodes in a social graph) with certain social connections (modelled as edges in the social graph) into densely knitted and highly related groups with each group well separated from different group members.

**Modularity Community**

In [0]:
# Import Module
from networkx.algorithms.community import greedy_modularity_communities

# Modularity Community Detection
communities_m = sorted(greedy_modularity_communities(G1), key=len, reverse=True)
communities_m

In [0]:
# Set Node Community Function
def set_node_community(G1, communities_m):
      '''Add community to node attributes'''
      for c, v_c in enumerate(communities_m):
        for v in v_c:
          # Add 1 to save 0 for external edges
          G1.nodes[v]['community'] = c + 1      

In [0]:
# Set Colour Function
def get_color(i, r_off=1, g_off=1, b_off=1):
     '''Assign a color to a vertex.'''
     r0, g0, b0 = 0, 0, 0
     n = 16
     low, high = 0.1, 0.9
     span = high - low
     r = low + span * (((i + r_off) * 3) % n) / (n - 1)
     g = low + span * (((i + g_off) * 5) % n) / (n - 1)
     b = low + span * (((i + b_off) * 7) % n) / (n - 1)
     return (r, g, b) 

In [0]:
# Set Node Communities
community = set_node_community(G1, communities_m)

# Set Node Color
node_color = [get_color(G1.nodes[v]['community']) for v in G1.nodes]

# Visualize the Network
import matplotlib.pyplot as plt
plt.figure(figsize=(50,50))
nx.draw(G1, with_labels=True, 
        node_color = node_color, node_size=1200, 
        arrowstyle='->',arrowsize=20, edge_color='r',
        font_size=9, map = plt.get_cmap('jet'),
        pos=nx.kamada_kawai_layout(G1))



---


##**Network Example on Information Spread : Corona Virus**

---



**Import Data**

In [0]:
# Import Data
df_corona = pd.read_csv('https://raw.githubusercontent.com/dianrdn/data/master/corona_indo.csv', sep =';')

# Show Data
df_corona

**Construct an Edgelist**

In [0]:
# Select Source and Target
df_corona_network = df_corona[['screen_name', 'reply_to_screen_name']]

# Show Source and Target
df_corona_network

In [0]:
# Rename the Column Name
df_corona_network = df_corona_network.rename(columns={"screen_name": "source", "reply_to_screen_name": "target"})

# Show Edgelist
df_corona_network

In [0]:
# Drop Missing Target from the Edge List
df_corona_network = df_corona_network.mask(df_corona_network['target'].eq('None')).dropna()

# Show Edgelist
df_corona_network

**Construct a Network**

In [0]:
# Contstruct a Network
G_corona = nx.from_pandas_edgelist(df_corona_network)

# Visualize the Network
import matplotlib.pyplot as plt
plt.figure(figsize=(100,100))
nx.draw(G_corona, with_labels=True, 
        node_color='skyblue', node_size=1200, 
        arrowstyle='->',arrowsize=20, edge_color='r',
        font_size=9,
        pos=nx.kamada_kawai_layout(G_corona))

**The Important Person**

In [0]:
# Degree Centrality
degree_corona = nx.degree(G_corona)

# Sorted from the Highest
sorted(nx.degree(G_corona), key=lambda x: x[1], reverse=True)[0:10]

In [0]:
# Betweenness Centrality
betweenness_corona = nx.betweenness_centrality(G_corona)

# Sorted from the Highest
sorted(nx.betweenness_centrality(G_corona, normalized=True).items(), key=lambda x:x[1], reverse=True)[0:10]

In [0]:
# Closeness Centrality
closeness_corona = nx.closeness_centrality(G_corona)

# Sorted from the Highest
sorted(nx.closeness_centrality(G_corona).items(), key=lambda x:x[1], reverse=True)[0:10]

In [0]:
# Eigenvector Centrality
eigenvector_corona = nx.eigenvector_centrality(G_corona)

# Sorted from the Highest
sorted(nx.eigenvector_centrality(G_corona).items(), key=lambda x:x[1], reverse=True)[0:10]

Visualize Network based on Important People

In [0]:
# Set Degree Dictionary
d_corona = dict(degree_corona)

# Contstruct a Network
G_corona = nx.from_pandas_edgelist(df_corona_network)

# Visualize the Network
import matplotlib.pyplot as plt
plt.figure(figsize=(50,50))
nx.draw(G_corona, with_labels=True, 
        node_color='skyblue', nodelist=d_corona.keys(),
        node_size=[v * 1000 for v in d_corona.values()], 
        arrowstyle='->',arrowsize=20, edge_color='r',
        font_size=9,
        pos=nx.kamada_kawai_layout(G_corona))

**Network Topology Measurement**

In [0]:
# Show Number of Nodes
nx.number_of_nodes(G_corona)

In [0]:
# Show Number of Edges
nx.number_of_edges(G_corona)

In [0]:
# Show Graph Density
nx.density(G_corona)

**Community Detection**

In [0]:
# Import Module
from networkx.algorithms.community import greedy_modularity_communities

# Modularity Community Detection
communities_corona = sorted(greedy_modularity_communities(G_corona), key=len, reverse=True)
communities_corona

In [0]:
# Set Node Community Function
def set_node_community(G_corona, communities_m):
      '''Add community to node attributes'''
      for c, v_c in enumerate(communities_corona):
        for v in v_c:
          # Add 1 to save 0 for external edges
          G_corona.nodes[v]['community'] = c + 1      

In [0]:
# Set Colour Function
def get_color(i, r_off=1, g_off=1, b_off=1):
     '''Assign a color to a vertex.'''
     r0, g0, b0 = 0, 0, 0
     n = 16
     low, high = 0.1, 0.9
     span = high - low
     r = low + span * (((i + r_off) * 3) % n) / (n - 1)
     g = low + span * (((i + g_off) * 5) % n) / (n - 1)
     b = low + span * (((i + b_off) * 7) % n) / (n - 1)
     return (r, g, b) 

In [0]:
# Set Node Communities
community = set_node_community(G_corona, communities_corona)

# Set Node Color
node_color = [get_color(G_corona.nodes[v]['community']) for v in G_corona.nodes]

# Set Degree Dictionary
d_corona = dict(degree_corona)

# Visualize the Network
import matplotlib.pyplot as plt
plt.figure(figsize=(50,50))
nx.draw(G_corona, with_labels=True, 
        node_color = node_color, node_size=[v * 1000 for v in d_corona.values()],  
        arrowstyle='->',arrowsize=20, edge_color='r',
        font_size=9, map = plt.get_cmap('jet'),
        pos=nx.kamada_kawai_layout(G_corona))