# 2018-12-04 The world of small RNAs
I discovered that INTS1 and PUS10 may have an important role in expression of HIV. The integrator complex regulates transcription pause-release. How do other members of the integrator complex relate to HIV and INTS1?

In [None]:
library(ggplot2)
library(RColorBrewer)
theme_set(theme_bw())

In [None]:
# load the data
matrices.dir <- "/home/rcortini/work/CRG/projects/sc_hiv/data/matrices"
merged <- read.table(sprintf('%s/exprMatrix.csv', matrices.dir),
                     header = TRUE, row.names = 1,
                     sep = "\t", check.names = FALSE)

# load sample sheet
sampleSheet <- read.table(sprintf('%s/samplesheet.csv', matrices.dir),
                          header = TRUE,
                          row.names = 1)

# remove dead cells
sampleSheet <- sampleSheet[sampleSheet$status != "dead", ]

# load gene annotations file
gene.annotations <- sprintf("%s/gene_annotations.tsv", matrices.dir)
gene.data <- read.delim(gene.annotations, header = TRUE, sep = "\t",
                        row.names = 1, stringsAsFactors = FALSE)
gene.data <- subset(gene.data, rownames(gene.data) %in% rownames(merged))

NABP1 is part of the integrator complex. Let's look at its relationship to HIV and the other member INTS1.

In [None]:
INTS1 <- "ENSG00000164880.15"
PUS10 <- "ENSG00000162927.13"

# NABP1 code
NABP1 <- rownames(gene.data)[which(gene.data$gene_symbol == "NABP1")]
DKC1 <- rownames(gene.data)[which(gene.data$gene_symbol == "DKC1")]

In [None]:
# useful
treated <- sampleSheet$status == "treated"

In [None]:
options(repr.plot.width = 2.5, repr.plot.height = 2)
gg <-ggplot(as.data.frame(t(merged[, treated])), aes_string(INTS1, NABP1)) + geom_point()  +
    geom_smooth(method='lm') +
    labs(x = "INTS1", y = "NABP1")
print(gg)

There are lots of cases where NABP1 is zero and INTS1 is non-zero, and vice versa.

In [None]:
options(repr.plot.width = 2.5, repr.plot.height = 2)
gg <-ggplot(as.data.frame(t(merged[, treated])), aes_string(PUS10, DKC1)) + geom_point()  +
    geom_smooth(method='lm') +
    labs(x = "PUS10", y = "DKC1")
print(gg)

In [None]:
INTS.genes <- rownames(gene.data)[which(substr(gene.data$gene_symbol, 0, 4) == "INTS")]
INTS.matrix <- t(merged[INTS.genes, treated])
colnames(INTS.matrix) <- gene.data[which(substr(gene.data$gene_symbol, 0, 4) == "INTS"), "gene_symbol"]

In [None]:
options(repr.plot.width = 6, repr.plot.height = 6)
INTS.corr <- cor(INTS.matrix)
library(corrplot)
corrplot(INTS.corr, type = "upper", order = "hclust", 
         tl.col = "black", tl.srt = 45)