# CSX46 - Class Session 5 - Components

In this class session we are going to find the number of proteins that are in the giant component of the (undirected) protein-protein interaction network, using igraph.

In [1]:
suppressPackageStartupMessages(library(igraph))

Step 1:  load in the SIF file as a data frame `sif_data`, using the `read.table` function

In [2]:
sif_data <- read.table("shared/pathway_commons.sif",
                       sep="\t",
                       header=FALSE,
                       stringsAsFactors=FALSE,
                       col.names=c("species1",
                                   "interaction_type",
                                   "species2"),
                       quote="",
                       comment.char="")

Step 2:  restrict the interactions to protein-protein undirected ("in-complex-with", "interacts-with"), using the `%in%` operator and using array indexing `[`, and include only the two species columns. The restricted data frame should be called `interac_ppi`.

In [3]:
interac_ppi <- sif_data[sif_data$interaction_type %in% c("in-complex-with",
                                                         "interacts-with"), c(1,3)]

Step 3: restrict the data frame to only the unique interaction pairs of proteins (ignoring the interaction type), using the `unique` function.  Make an igraph `Graph` object from the data frame, using `graph_from_data_frame`.

In [4]:
interac_ppi_unique <- unique(interac_ppi)
ppi_igraph <- graph_from_data_frame(interac_ppi_unique, directed=FALSE)

Map the components of the graph `ppi_igraph` using the `igraph` function `components`.  That will return a list which you should assign to object name `component_res_list`.  Get the `csize` member of the list, which will be a vector of the sizes of the components of the graph.  Call `max` on that vector to get the size of the giant component of the PPI.

In [5]:
## call the igraph function `components` on the `ppi_igraph` object; name
## resulting object `component_res_list`
component_res_list <- components(ppi_igraph)

In [6]:
## obtain the list item in the slot named `csize`, and name the
## resulting object `component_sizes_vec`
component_sizes_vec <- component_res_list$csize

In [7]:
## use the `max` function to find the size of the giant component
max(component_sizes_vec)

Let's print out all the component sizes, in reverse-sorted order

In [8]:
sort(component_sizes_vec, decreasing=TRUE)

Let's get the vertex indices of the vertices that are in the giant component. Use the `membership` slot of the list `component_res_list`, and use `which.max(component_sizes_vec)`.

In [9]:
inds_giant_component <- which(component_res_list$membership == which.max(component_sizes_vec))

Let's compute the vertex degrees, for all the vertices in the giant component. Use the `degree` function and pass the indices of the giant component vertices as the function argument `v`.

In [10]:
ppi_degrees_gc <- degree(ppi_igraph, v=inds_giant_component)

What is the highest degree vertex in the giant component, and what is its degree? Use `which.max` on `ppi_degrees_gc`.

In [11]:
ind_max <- which.max(ppi_degrees_gc)
print(sprintf("Vertex number %d has maximum degree: %d", ind_max, 
             ppi_degrees_gc[ind_max]))

[1] "Vertex number 6854 has maximum degree: 3600"


Let's compute the shortest paths distance between highest-degree vertex of the giant component, and all the other vertices in the giant component, by calling the `distances` function on the `ppi_igraph` object. In that function call, you will want to specify as argument `v`the vertex index of the highest-degree vertex in the giant component, and as argument `to` the vertex indices of vertices in the giant component.

In [12]:
apsp_dists <- distances(ppi_igraph, v=inds_giant_component[ind_max], to=inds_giant_component)

What is the average distance between the highest-degree vertex of the giant component and all the other vertices of the giant component?

In [13]:
print(sprintf("%0.2f", mean(apsp_dists)))

[1] "1.86"


Advanced code-spellunking question:  go to the GitHub repo for igraph (https://github.com/igraph), and find the code components.c.  For the weakly connected components, is it doing a BFS or DFS?