# Network extension

In this notebook we will extend the network with genes that were excluded based on the set criteria for the selection of pathways. These gene were in pathways from all three databases, however not in a pathway that was selected for the creation of the network. Via the STRING app in Cytoscape we will try to add these genes to the network using ppi interactions.

In [1]:
# check working directory
getwd()

In [2]:
# load libraries
library(RCy3)
library(RNeo4j)
library(stringr)

"changing locked binding for 'length.path' in 'httr' whilst loading 'RNeo4j'"

Open Cytoscape on your computer and install the STRING app.

In [3]:
# check if Cytoscape is connected
cytoscapePing()
cytoscapeVersionInfo()

# install STRING app
installApp('STRINGapp') 
if("string" %in% commandsHelp("")) print("Success: the STRING app is installed") else print("Warning: STRING app is not installed. Please install the STRING app before proceeding.")

[1] "Available namespaces:"
[1] "Success: the STRING app is installed"


To begin with we will create the network and a ppi network using the STRING app in Cytoscape.

In [4]:
# load in network and data files
nodes <- read.table(file.path(getwd(), "network", "nodes.txt"), header = T, sep = "\t")
edges <- read.table(file.path(getwd(), "network", "edges.txt"), header = T, sep = "\t")
inflGenes <- read.table(file.path(getwd(), "inflGenes", "merged_infl_genes.txt"), header = T, sep = "\t")

head(nodes)
head(edges)
head(inflGenes)

key,type
Allograft_rejection,Process
Cytokines,Process
Development and heterogeneity of the ILC family%WikiPathways_20181110%WP3893%Homo sapiens,Process
Diseases,Process
Fibrin Complement Receptor 3 Signaling Pathway%WikiPathways_20181110%WP4136%Homo sapiens,Process
Immune_cell_regulation,Process


source,target
NFkB,TRIP6
NFkB,CASP8
NFkB,CHUK
NFkB,TAB3
NFkB,CXCL1
NFkB,RIPK2


hgnc_symbol,entrezgene,dis_score,GC_score,Type
IL22,50616,0.01,10.08,InflGene
CCR7,1236,0.01,10.12,InflGene
CAT,847,0.01,10.21,InflGene
SOD1,6647,0.33,10.35,InflGene
PLAT,5327,0.01,10.36,InflGene
AQP4,361,0.01,10.38,InflGene


In [5]:
# clean up network and data files
# nodes
colnames(nodes)[1] <- "id"
nodes$id <- as.character(nodes$id)

# edges
edges <- edges[,c(-3,-4)]
colnames(edges)[c(1,2)] <- c("source", "target") 
edges$interaction <- "interacts"
edges$target <- as.character(edges$target)

# inflGenes
inflGenes <- inflGenes[c(-2,-3,-4,-5)]

head(nodes)
head(edges)
head(inflGenes)

id,type
Allograft_rejection,Process
Cytokines,Process
Development and heterogeneity of the ILC family%WikiPathways_20181110%WP3893%Homo sapiens,Process
Diseases,Process
Fibrin Complement Receptor 3 Signaling Pathway%WikiPathways_20181110%WP4136%Homo sapiens,Process
Immune_cell_regulation,Process


source,target,interaction
NFkB,TRIP6,interacts
NFkB,CASP8,interacts
NFkB,CHUK,interacts
NFkB,TAB3,interacts
NFkB,CXCL1,interacts
NFkB,RIPK2,interacts


hgnc_symbol
IL22
CCR7
CAT
SOD1
PLAT
AQP4


In [6]:
# create network from files
createNetworkFromDataFrames(nodes, edges, title = "MyNetwork", collection = "MyCollection")

Loading data...
Applying default style...
Applying preferred layout...


We now have the network and everything running and installen in Cytoscape. We will use the all genes that are in the network, including the genes associated with inflammation to create a ppi network via the STRING app.

In [7]:
# get all genes from network and combine them with inflammation genes
networkGenes <- nodes[nodes$type != "Process",]
networkGenes <- networkGenes[-2]
colnames(networkGenes)[1] <- "hgnc_symbol"

networkGenes <- rbind(networkGenes, inflGenes)
networkGenes <- unique(networkGenes)

head(networkGenes)
dim(networkGenes)

Unnamed: 0,hgnc_symbol
11,AGER
12,AKT1
13,APOA1
14,C3
15,C5
16,CALCA


Lets create the ppi network!

In [8]:
# create STRING app API command and create ppi network
string_cmd <- paste('string protein query taxonID=9606 cutoff=0.9 query="',paste(networkGenes$hgnc_symbol, collapse=","),'"',sep="")
commandsGET(string_cmd)

setVisualStyle("default")
setNodeLabelMapping(table.column = "display name")

### Now we have to do a manual part. We will merge the networks. Go to Cytoscape -> 'Tools' -> 'Merge' -> 'Networks...'. Then select the two network and click 'Advanced options'. For the STRING network chose 'query term' as matching column!

Now we have the merged network we will extract the edge table and make two seperate files from this table for the Neo4J part. 

In [10]:
# get table columns we need from edge table
table <- getTableColumns(table = "edge", columns = c("name", "interaction"))

head(table)

Unnamed: 0,name,interaction
16384,HGF (pp) JAK2,pp
16385,HGF (pp) SRC,pp
16386,HGF (pp) PLAU,pp
16387,HGF (pp) NRAS,pp
16388,HGF (pp) TGFB2,pp
16389,HGF (pp) RAC1,pp


In [11]:
table_process <- table[table$interaction == "interacts",]
table_pp <- table[table$interaction == "pp",]

head(table_process)
head(table_pp)

Unnamed: 0,name,interaction
14139,Vitamin_B12 (interacts) MYD88,interacts
14140,Vitamin_B12 (interacts) IL18,interacts
14141,Vitamin_B12 (interacts) MMP9,interacts
14142,Vitamin_B12 (interacts) TLR1,interacts
14143,Vitamin_B12 (interacts) TLR2,interacts
14144,Vitamin_B12 (interacts) S100A9,interacts


Unnamed: 0,name,interaction
16384,HGF (pp) JAK2,pp
16385,HGF (pp) SRC,pp
16386,HGF (pp) PLAU,pp
16387,HGF (pp) NRAS,pp
16388,HGF (pp) TGFB2,pp
16389,HGF (pp) RAC1,pp


In [12]:
# clean process table
table_process <- table_process[-2]
table_process <- as.data.frame(lapply(table_process, gsub, pattern ="\\(", replacement = ''))
table_process <- as.data.frame(lapply(table_process, gsub, pattern ="\\)", replacement = ''))
table_process <- as.data.frame(lapply(table_process, gsub, pattern ="interacts", replacement = ''))

table_process <- as.data.frame(str_split_fixed(table_process$name, " ", n = 2))

table_process <- as.data.frame(apply(table_process,2,function(x)gsub('\\s+', '',x)))
colnames(table_process)[c(1,2)] <- c("source", "target")
                                     
head(table_process)
                                     
# save table
write.table(table_process, file.path(getwd(), "results", "cat_gene_table.txt"), col.names = T, row.names = F, sep = "\t", quote = F)

source,target
Vitamin_B12,MYD88
Vitamin_B12,IL18
Vitamin_B12,MMP9
Vitamin_B12,TLR1
Vitamin_B12,TLR2
Vitamin_B12,S100A9


In [13]:
# clean pp table
table_pp <- table_pp[-2]
table_pp <- as.data.frame(lapply(table_pp, gsub, pattern ="\\(", replacement = ''))
table_pp <- as.data.frame(lapply(table_pp, gsub, pattern ="\\)", replacement = ''))
table_pp <- as.data.frame(lapply(table_pp, gsub, pattern ="pp", replacement = ''))

table_pp <- as.data.frame(str_split_fixed(table_pp$name, " ", n = 2))

table_pp <- as.data.frame(apply(table_pp,2,function(x)gsub('\\s+', '',x)))
colnames(table_pp)[c(1,2)] <- c("source", "target")
                             
head(table_pp)

# save table
write.table(table_pp, file.path(getwd(), "results", "ppi_table.txt"), col.names = T, row.names = F, sep = "\t", quote = F)


source,target
HGF,JAK2
HGF,SRC
HGF,PLAU
HGF,NRAS
HGF,TGFB2
HGF,RAC1


Now that we have the edge table split up we can check the shared neighbors between process nodes and added gene nodes. We will use the RNeo4J package for this purpose. 

In [19]:
# first make conenction with Neo4J. Start Neo4J and open the url in a webbrowser. Create your own username and password
graph = startGraph("http://localhost:7474/db/data/", username = "neo4j", password = "123")

In [20]:
# load in both tables and load them in Neo4J
data = data.frame(read.table(file.path(getwd(), "results", "cat_gene_table.txt"), header = T, sep = "\t"))
data <- unique(data)

query = "
MERGE (source:Category {id:{Category}})
MERGE (target:Gene {id:{Gene}})
CREATE (source)<-[:pathway]-(target)
"

t = newTransaction(graph)

for (i in 1:nrow(data)) {
  Category = data[i, ]$source
  Gene = data[i, ]$target
  
  appendCypher(t, 
               query, 
               Category = Category, 
               Gene = Gene 
               )
}

commit(t)

data1 = data.frame(read.table(file.path(getwd(), "results", "ppi_table.txt"), header = T, sep = "\t"))

query = "
MERGE (source:Gene {id:{Gene}})
MERGE (target:Gene1 {id:{Gene1}})
CREATE (source)<-[:ppi]-(target)
"

y = newTransaction(graph)

for (i in 1:nrow(data1)) {
  Gene = data1[i, ]$source
  Gene1 = data1[i, ]$target
  
  appendCypher(y, 
               query, 
               Gene = Gene, 
               Gene1 = Gene1 
  )
}

commit(y)

In [21]:
# perform Neo4J query
shared_neighbors <- cypher(graph, "MATCH(source:Category)-[:pathway]-(neighbor:Gene)-[:ppi]-(target:Gene1)
WHERE NOT (source) = (target)
RETURN DISTINCT source.id AS source_id, target.id AS target_id, count(neighbor) AS common_neighbors")

# only extract the 55 added genes via STRING
genes55 <- read.table(file.path(getwd(), "55genes", "55_genes.txt"), header = T, sep = "\t")

colnames(genes55)[1] <- "gene"

# retrieve rows if one of the 55 genes is in that row
shared_neighbors55 <- shared_neighbors[shared_neighbors$target_id %in% genes55$gene,]

head(shared_neighbors55)
dim(shared_neighbors55)

Unnamed: 0,source_id,target_id,common_neighbors
13,Vitamin_B12,PPARA,4
14,Vitamin_B12,C4A,6
17,Vitamin_B12,TF,7
26,Vitamin_B12,ADIPOQ,3
28,Vitamin_B12,LEP,1
49,Vitamin_B12,NGF,3


In [22]:
# save shared neighbors table
write.table(shared_neighbors, file.path(getwd(), "results", "shared_neighbors.txt"), row.names = F, sep = "\t", quote = F)
write.table(shared_neighbors55, file.path(getwd(), "results", "shared_neighbors55.txt"), row.names = F, sep = "\t", quote = F)

With these shared neighbors we can decide if we would like to include genes. We opt to include genes with at least 4 shared neighbors to a process node. 

In [23]:
# at least 4 of more shared neighbors
shared_neighbors55 <- shared_neighbors55[shared_neighbors55$common_neighbors >= 4,]
head(shared_neighbors55)
dim(shared_neighbors55)

Unnamed: 0,source_id,target_id,common_neighbors
13,Vitamin_B12,PPARA,4
14,Vitamin_B12,C4A,6
17,Vitamin_B12,TF,7
125,Vitamin_B12,LCN2,4
185,Vitamin_B12,ELANE,4
193,Vitamin_B12,BPI,4


In [24]:
# add these to the edge table and node table
# edge table
colnames(shared_neighbors55)[c(1,2)] <- c("source", "target") 
shared_neighbors55 <- shared_neighbors55[-3]
edges1 <- edges[-3]
edge_table <- rbind(shared_neighbors55, edges1)
edge_table <- unique(edge_table)

head(edge_table)
dim(edge_table)

# save edge table
write.table(edge_table, file.path(getwd(), "results", "edges.txt"), row.names = F, sep = "\t", quote = F)

Unnamed: 0,source,target
13,Vitamin_B12,PPARA
14,Vitamin_B12,C4A
17,Vitamin_B12,TF
125,Vitamin_B12,LCN2
185,Vitamin_B12,ELANE
193,Vitamin_B12,BPI


In [25]:
# node table
source <- as.data.frame(edge_table[,"source"])
target <- as.data.frame(edge_table[,"target"])

colnames(source)[1] <- "id"
colnames(target)[1] <- "id"

node_table <- rbind(source, target)
node_table <- unique(node_table)

# add node typing to table
node_table$type <- "Gene"

node_table$type[node_table$id == "Diseases"] <- "Process"
node_table$type[node_table$id == "WP_OVERVIEW_OF_NANOPARTICLE_EFFECTS"] <- "Process"
node_table$type[node_table$id == "Immune_cell_regulation"] <- "Process"
node_table$type[node_table$id == "NFkB"] <- "Process"
node_table$type[node_table$id == "Allograft_rejection"] <- "Process"
node_table$type[node_table$id == "Vitamin_B12"] <- "Process"
node_table$type[node_table$id == "WP_FIBRIN_COMPLEMENT_RECEPTOR_3_SIGNALING_PATHWAY"] <- "Process"
node_table$type[node_table$id == "WP_DEVELOPMENT_AND_HETEROGENEITY_OF_THE_ILC_FAMILY"] <- "Process"
node_table$type[node_table$id == "Inflammation"] <- "Process"
node_table$type[node_table$id == "Cytokines"] <- "Process"

# type genes associated with inflammation
node_table$type[node_table$id %in% inflGenes$hgnc_symbol] <- "InflGene"

head(node_table)
dim(node_table)

# save node_table
write.table(node_table, file.path(getwd(), "results", "nodes.txt"), row.names = F, sep = "\t", quote = F)

Unnamed: 0,id,type
1,Vitamin_B12,Process
7,NFkB,Process
19,Inflammation,Process
29,Immune_cell_regulation,Process
37,Diseases,Process
41,Cytokines,Process


We can use these files to create the network and integrate the gene expression data into this network for analysis.