## Network analysis with `igraph` in R
This notebook was posted by Simon Lindgren // [@simonlindgren](http://www.twitter.com/simonlindgren) // [simonlindgren.com](http://simonlindgren.com).

It is about how to do social network analysis in R, using the [`igraph`](http://igraph.org/r/) package. In creating this notebook, I found the tutorials posted by [Katya Ognyanova](http://kateto.net) very helpful.

In [None]:
# Import required libraries
library(igraph)
library(readr)
library(dplyr)

##### Import data
We have a csv file with two columns where one is source (from) and the other is target (to). We read it into an R dataframe using `readr`. The `read_csv` function wants comma separated columns, while `read_csv2` is for semicolon.

In [None]:
# Read the file into a dataframe
edges_raw <- read_csv("blm.csv")
#edges_raw

###### Prepare edges

In [None]:
# We want the edge weights, so we read the edges
# into an adjacency matrix
adjmat <- as.matrix(get.adjacency(graph.data.frame(edges_raw)))

# adjacency matrix to graph
temp_graph <- graph.adjacency(adjmat, weighted = TRUE)

# back to dataframe
edges <- get.data.frame(temp_graph)
edges <- edges %>%
    arrange(desc(weight)) # sort by descending weight

edges

######  Read the edgelist as a graph

In [None]:
# Create a graph from the dataframe
g <- graph_from_data_frame(d=edges, directed=F)

# Simplify the graph
g <- simplify(g, remove.multiple = TRUE,
         remove.loops = TRUE,
         edge.attr.comb = igraph_opt("edge.attr.comb"))

##### Inspect the graph


The description of an igraph object starts with four letters:
1. D or U, for a directed or undirected graph
2. N for a named graph (where nodes have a name attribute)
3. W for a weighted graph (where edges have a weight attribute)
4. B for a bipartite (two-mode) graph (where nodes have a type attribute)

In [None]:
g

In [None]:
E(g)$weight # all edge weights

In [None]:
V(g)$name # all vertex names

###### Calculate centrality

In [None]:
# Degree
# 'normalized' means normalising scores with min-max normalisaion
deg <- degree(g, mode="total", normalized = F)
indeg <- degree(g, mode="in", normalized = F)
outdeg <- degree(g, mode="out", normalized = F)

# set them as node attributes
V(g)$deg <- deg
V(g)$indeg <- indeg
V(g)$outdeg <- outdeg

In [None]:
# Betweenness centrality
# 'normalized' normalises the socres according to Bnorm=2*B/(n*n-3*n+2)
betw <- betweenness(g, directed=F, normalized = T)
V(g)$betw <- betw # set it as a node attribute

In [None]:
# We can inspect the attributes we have set, for example:
V(g)$indeg

###### A first plot

In [None]:
plot(g, vertex.label=NA, 
     edge.curved=.4, 
     edge.color="black", 
     vertex.color="pink")

In [None]:
# Set node size based on betweenness
V(g)$size <- betw*100 # multiply by suitable number for visualisation

plot(g, vertex.label=NA, 
     edge.curved=.4, 
     edge.color="black", 
     vertex.color="pink")

In [None]:
# Set edge width based on weight:
E(g)$width <- E(g)$weight/10 # divide by suitable number for visualisation

plot(g, vertex.label=NA, 
     edge.curved=.4, 
     edge.color="black", 
     vertex.color="pink")

###### Sparsify the network
We calculate the mean edge weight in the network.

In [None]:
mean(edges$weight)

We then define the mean as a cutoff point and create a graph with all edges with a weight below the mean deleted.

In [None]:
edges_cut_off <- mean(edges$weight)
g.sp <- delete_edges(g, E(g)[weight<edges_cut_off])

plot(g.sp, vertex.label=NA, 
     edge.curved=.4, 
     edge.color="black", 
     vertex.color="pink")

Then, we calculate the mean degree of nodes.

In [None]:
mean(V(g.sp)$deg)

In [None]:
# nodes_cut_off <- mean(V(g)$deg)
nodes_cut_off <- 5 # or set it manually
g.sp <- delete_vertices(g.sp, V(g.sp)[V(g.sp)$deg<nodes_cut_off])

plot(g.sp, vertex.label=NA, 
     edge.curved=.4, 
     edge.color="black", 
     vertex.color="pink")

##### Layout the network
There are many available [network layouts](http://igraph.org/r/doc/layout_.html).

In [None]:
# Examples of available layouts
kk <- layout_with_kk(g.sp) # Kamada-Kawai
fr <- layout_with_fr(g.sp) # Fruchterman-Rheingold
lgl <- layout_with_lgl(g.sp) #LGL
mds <- layout_with_mds(g.sp) #MDS
sph <- layout_on_sphere(g.sp)

plot(g.sp, vertex.label=NA, 
     edge.curved=.4, 
     edge.color="black", 
     vertex.color="pink",
    layout=mds)

In [None]:
# decompose the graph into its connected components
graphs <- decompose.graph(g.sp)

# pick out the largest connected (giant) component
largest <- which.max(sapply(graphs, vcount))

giant <- (graphs[[largest]])

In [None]:
# tweak the node sizes
V(giant)$size <- betw*60

# tweak edge widths
E(giant)$width <- E(giant)$weight/15

In [None]:
# plot with some adjusted settings
plot(giant,
     edge.curved = 0.2,
     vertex.label=NA,
     edge.color="black", 
     vertex.color="black")

In [None]:
# plot with labels instead of nodes

l <- layout_with_fr(giant, grid = "grid")

plot(giant,
     edge.curved = 0.4,
     vertex.label=V(giant)$id,
     vertex.label.family = "Helvetica",
     vertex.label.color = "black",
     vertex.label.cex = 0.7,
     edge.color="black",
     vertex.shape="none",
    layout=layout_nicely,
    main = "...")

##### Community detection
Methods below based on [this](https://users.dimi.uniud.it/~massimo.franceschet/R/communities.html).

In [None]:
c1 = cluster_fast_greedy(giant)
c2 = cluster_leading_eigen(giant)
c3 = cluster_edge_betweenness(giant)

modularity(c1)
modularity(c2)
modularity(c3)

In [None]:
# plot communities with shaded regions

plot(c3, giant, # use c1, c2 or c3
     edge.curved = 0.0,
     vertex.label=NA,
     edge.color="black",
    layout=layout_with_fr)

In [None]:
# plot communities without shaded regions

plot(giant,
     edge.curved = 0.0,
     vertex.label=NA,
     edge.color="black", 
     vertex.color=membership(c1), # use c1, c2 or c3
    layout=layout_with_fr)


In [None]:
# plot dendogram
plot_dendrogram(c3) # use c1, c2 or c3

##### Other ways to visualise networks

In [None]:
# HEATMAP (suitable for graphs with small numbers of nodes)

heatm  <-  as_adjacency_matrix(giant, attr="weight", sparse=F)
colnames(heatm) <- V(net)$id
rownames(heatm) <- V(net)$id

palf <- colorRampPalette(c("gold", "dark orange")) 
heatmap(heatm[,17:1], Rowv = NA, Colv = NA, col = palf(100), 
        scale="none", margins=c(10,10) )

In [None]:
# DEGREE DISTRIBUTION PLOT
degr <- degree(giant, mode="total", normalized = F)


deg.dist <- degree_distribution(giant, cumulative=T, mode="all")
plot( x=0:max(degr), y=1-deg.dist, pch=19, cex=1.2, col="orange", 
      xlab="Degree", ylab="Cumulative Frequency")

##### Export the graph to Gephi format

In [None]:
library(rgexf)
giant.gexf <- igraph.to.gexf(giant)

f <- file("network.gexf")
writeLines(giant.gexf$graph, con = f)
close(f)