Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add differential expression analyses in ANCA-associated vasculitis (#14)
* Fix DOI * Add NARES differential expression notebook * Add GPL file from GEO Downloaded on 15 Sept 2017 * Only ignore the sample metadata in greenelab/rheum-plier-data * Add GPA blood diff exp notebook * Remove lines * Ignore private kidney data * Move function for summarizing expression to util * Add table of contents * Add glomeruli differential expression analysis * Update: save B data.frame for plotting (NARES) * Update: save B data.frame for plotting (GPA blood) * Update: save B data.frame for plotting (kidney) * Add notebook -- first pass at looking across AAV sets differential expression of LVs * Update notebook title * Update: add note about LV96 * Update: add information about clinical data * Write wrapper function for DLVE * Move wrapper function to DLVE util * Use wrapper for GPA blood * Use wrapper for kidney * Save venn diagram as png
- Loading branch information
1 parent
664690a
commit 62047c2
Showing
27 changed files
with
277,134 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,118 @@ | ||
--- | ||
title: "NARES differential expression" | ||
output: | ||
html_notebook: | ||
toc: true | ||
toc_float: true | ||
--- | ||
|
||
**J. Taroni 2018** | ||
|
||
The NARES study ([Grayson, et al. _Arthritis Rheumatol_. 2015](https://dx.doi.org/10.1002/art.39185)) | ||
included three sets of patients with ANCA-associated vasculitis: those with | ||
active nasal disease, prior nasal disease, and those with no history of nasal | ||
disease. | ||
There is also a composite comparator group. | ||
|
||
Here, we'll test for PLIER LVs that are differentially expressed between groups. | ||
We'll focus on the multiPLIER LVs (e.g., the recount2 PLIER model), as | ||
we have demonstrated the validity of this approach in prior analyses. | ||
|
||
## Functions and directory set up | ||
|
||
```{r} | ||
`%>%` <- dplyr::`%>%` | ||
source(file.path("util", "test_LV_differences.R")) | ||
``` | ||
|
||
```{r} | ||
# plot and result directory setup for this notebook | ||
plot.dir <- file.path("plots", "18") | ||
dir.create(plot.dir, recursive = TRUE, showWarnings = FALSE) | ||
results.dir <- file.path("results", "18") | ||
dir.create(results.dir, recursive = TRUE, showWarnings = FALSE) | ||
``` | ||
|
||
## Read in data | ||
|
||
### Clinical data | ||
|
||
This clinical data includes information like measures of disease severity | ||
([`BVAS`](https://doi.org/10.1136/ard.2008.101279)) and disease duration. | ||
We're most interested in the disease labels (`Classification`). | ||
|
||
```{r} | ||
nares.demo <- readr::read_tsv(file.path("data", "sample_info", | ||
"NARES_demographic_data.tsv")) %>% | ||
dplyr::mutate(Classification = dplyr::recode(Classification, | ||
C1 = "Healthy", | ||
C2 = "Sarcoidosis", | ||
C3 = "Allergic Rhinitis", | ||
C4 = "EGPA", | ||
V1 = "GPA (no prior nasal disease)", | ||
V2 = "GPA with prior nasal disease", | ||
V3 = "GPA with active nasal disease")) | ||
``` | ||
|
||
### recount2 LVs | ||
|
||
```{r} | ||
recount.b <- readRDS(file.path("results", "13", "NARES_recount2_B.RDS")) | ||
``` | ||
|
||
## NARES differential latent variable expression (DLVE) | ||
|
||
We're going to compare the seven classification groups, which correspond to | ||
different disease labels (and were recoded above). | ||
|
||
```{r} | ||
nares.df.list <- LVTestWrapper(b.matrix = recount.b, | ||
sample.info.df = nares.demo, | ||
phenotype.col = "Classification", | ||
file.lead = "NARES_recount2_model", | ||
plot.dir = plot.dir, | ||
results.dir = results.dir) | ||
``` | ||
|
||
### Example LV | ||
|
||
```{r} | ||
nares.limma.df <- nares.df.list$limma | ||
nares.limma.df %>% | ||
dplyr::filter(adj.P.Val < 0.05) %>% | ||
dplyr::arrange(adj.P.Val) | ||
``` | ||
|
||
```{r} | ||
# read in recount2 summary data.frame | ||
summary.file <- file.path("results", "08", "recount2_PLIER_summary.tsv") | ||
recount.summary <- readr::read_tsv(summary.file) | ||
``` | ||
|
||
Let's plot `LV96`, as it looks biologically interesting because of it's | ||
putative association with natural killer (NK) cells. | ||
We'd expect myeloid cell, specifically granulocyte, signatures to be | ||
differentially expressed in this dataset, but not necessarily lymphocyte | ||
signatures. | ||
First, does the _name_ agree with the recount2 PLIER summary? | ||
|
||
```{r} | ||
recount.summary %>% | ||
dplyr::filter(`LV index` == 96) | ||
``` | ||
|
||
Yes, `SVM NK cells activated` is significantly associated with `LV96`. | ||
|
||
```{r} | ||
recount.b.df <- nares.df.list$b.df | ||
recount.b.df %>% | ||
dplyr::filter(LV == "96,SVM NK cells activated") %>% | ||
ggplot2::ggplot(ggplot2::aes(x = Classification, y = Value)) + | ||
ggplot2::geom_boxplot() + | ||
ggplot2::geom_jitter(position = ggplot2::position_jitter(0.15)) + | ||
ggplot2::theme_bw() + | ||
ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 45, hjust = 1), | ||
plot.title = ggplot2::element_text(hjust = 0.5, face = "bold")) + | ||
ggplot2::ggtitle("LV 96 SVM NK cells activated") | ||
``` | ||
|
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,191 @@ | ||
--- | ||
title: "GPA blood differential expression" | ||
output: | ||
html_notebook: | ||
toc: true | ||
toc_float: true | ||
--- | ||
|
||
**J. Taroni 2018** | ||
|
||
Here, we'll perform differential expression (of LV) analyses for the PBMC data | ||
from Cheadle C, Berger AE, Andrade F, et al. | ||
[Transcription of PR3 and Related Myelopoiesis Genes in Peripheral Blood Mononuclear Cells in Active Wegener’s Granulomatosis](https://dx.doi.org/10.1002/art.27398). | ||
_Arthritis & Rheumatism_, 2010. doi: 10.1002/art.27398. | ||
|
||
We'll be comparing the healthy controls, the patients with a GPA signature | ||
(GPA-positive) and the patients without a GPA signature (GPA-negative). | ||
We'll first need to project this data into the recount2 PLIER model latent space. | ||
|
||
## Functions and directory set up | ||
|
||
```{r} | ||
`%>%` <- dplyr::`%>%` | ||
source(file.path("util", "test_LV_differences.R")) | ||
source(file.path("util", "plier_util.R")) | ||
``` | ||
|
||
```{r} | ||
# plot and result directory setup for this notebook | ||
plot.dir <- file.path("plots", "19") | ||
dir.create(plot.dir, recursive = TRUE, showWarnings = FALSE) | ||
results.dir <- file.path("results", "19") | ||
dir.create(results.dir, recursive = TRUE, showWarnings = FALSE) | ||
``` | ||
|
||
## Read in data | ||
|
||
The GPA blood file is in the form of a GEO series matrix, which means the | ||
sample metadata is in what are essentially comment fields at the beginning of | ||
the file. | ||
Thus, some ugliness/wrangling will be necessary. | ||
|
||
### Expression data | ||
|
||
```{r} | ||
series.mat.file <- file.path("data", "expression_data", | ||
"GSE18885_series_matrix.txt") | ||
# expression matrix | ||
gpa.ma.data <- | ||
readr::read_delim(series.mat.file, | ||
delim = "\t", | ||
comment = "!", | ||
col_names = TRUE, | ||
skip = 1) | ||
# this information about GPL6140 was downloaded on 15 Sept 2017, and reportedly | ||
# last updated on 18 Jan 2013 | ||
gpl.from.geo <- readr::read_tsv(file.path("data", "expression_data", | ||
"GPL6104-11576.txt"), comment = "#") | ||
# use the gene symbols from the GPL file -- PLIER uses gene symbols | ||
gpa.ma.annot <- gpl.from.geo %>% | ||
dplyr::select(c(ID, ILMN_Gene)) %>% | ||
dplyr::left_join(y = gpa.ma.data, | ||
by = c("ID" = "ID_REF")) | ||
colnames(gpa.ma.annot)[2] <- "Gene" | ||
``` | ||
|
||
```{r} | ||
# are there any duplicates in the gene symbol column? | ||
sum(duplicated(gpa.ma.annot$Gene)) | ||
``` | ||
|
||
```{r} | ||
# aggregate & write to file | ||
agg.ma.df <- PrepExpressionDF(dplyr::select(gpa.ma.annot, -ID)) | ||
readr::write_tsv(agg.ma.df, | ||
path = file.path("data", "expression_data", | ||
"GSE18885_annotated_mean.pcl")) | ||
# get expression matrix | ||
exprs.mat <- as.matrix(dplyr::select(agg.ma.df, -Gene)) | ||
rownames(exprs.mat) <- agg.ma.df$Gene | ||
``` | ||
|
||
### Metadata | ||
|
||
As mentioned above, metadata is also extracted from the series matrix file. | ||
|
||
```{r} | ||
# get more information from the series matrix file | ||
# line 41 has the sample names | ||
conn <- file(series.mat.file) | ||
open(conn) | ||
smpl.name <- read.table(conn, skip = 41, nrow = 1) | ||
close(conn) | ||
# line 51 contains the WG signature status -- I'll call this GPA signature | ||
# GPA-positive, GPA-negative to be in line with the current disease name | ||
conn <- file(series.mat.file) | ||
open(conn) | ||
gpa.sig.status <- read.table(conn, skip = 51, nrow = 1) | ||
close(conn) | ||
# this is the GEO sample accession information | ||
conn <- file(series.mat.file) | ||
open(conn) | ||
gsm.info <- read.table(conn, skip = 77, nrow = 1) | ||
close(conn) | ||
# get those lines into data.frame format | ||
smpl.info.df <- as.data.frame(t(dplyr::bind_rows(gsm.info, | ||
smpl.name, | ||
gpa.sig.status))[-1, ]) | ||
colnames(smpl.info.df) <- c("Sample", "Name", "GPA_signature") | ||
# remove 1 row data.frames read in from the series matrix | ||
rm(conn, gsm.info, smpl.name, gpa.sig.status) | ||
``` | ||
|
||
```{r} | ||
# extract fraction ("cell type(s)") from sample names | ||
smpl.info.df <- smpl.info.df %>% | ||
dplyr::mutate(Cell_type = sub(".*\\ ", "", Name)) | ||
# recode the GPA signature bit | ||
smpl.info.df <- | ||
smpl.info.df %>% | ||
dplyr::mutate(GPA_signature = dplyr::case_when( | ||
grepl("control", GPA_signature) ~ "Control", | ||
grepl("WG Sig -", GPA_signature) ~ "GPAneg", | ||
grepl("WG Sig +", GPA_signature) ~ "GPApos" | ||
)) | ||
# write to file | ||
readr::write_tsv(smpl.info.df, | ||
path = file.path("data", "sample_info", | ||
"GSE18885_sample_info.tsv")) | ||
``` | ||
|
||
```{r} | ||
# remove data.frames that won't be needed | ||
rm(agg.ma.df, gpa.ma.annot, gpa.ma.data, gpl.from.geo) | ||
``` | ||
|
||
## Apply recount2 model | ||
|
||
```{r} | ||
# load model itself | ||
recount.plier <- readRDS(file.path("data", "recount2_PLIER_data", | ||
"recount_PLIER_model.RDS")) | ||
# project | ||
recount.b <- GetNewDataB(exprs.mat = as.matrix(exprs.mat), | ||
plier.model = recount.plier) | ||
``` | ||
|
||
```{r} | ||
# save B matrix to file | ||
b.file <- file.path(results.dir, "GPA_blood_recount2_B.RDS") | ||
saveRDS(recount.b, file = b.file) | ||
``` | ||
|
||
## Differential expression | ||
|
||
### PBMC only | ||
|
||
We're going to restrict our analyses to the peripheral blood mononuclear cell | ||
(PBMC) fraction. | ||
|
||
```{r} | ||
# sample information | ||
pbmc.df <- smpl.info.df %>% | ||
dplyr::filter(Cell_type == "PBMC") | ||
``` | ||
|
||
```{r} | ||
# b matrix (latent variables) | ||
pbmc.b <- recount.b[, pbmc.df$Sample] | ||
``` | ||
|
||
### Differential expression analysis itself | ||
|
||
```{r} | ||
LVTestWrapper(b.matrix = pbmc.b, | ||
sample.info.df = pbmc.df, | ||
phenotype.col = "GPA_signature", | ||
file.lead = "GPA_blood_recount2_model", | ||
plot.dir = plot.dir, | ||
results.dir = results.dir) | ||
``` |
Large diffs are not rendered by default.
Oops, something went wrong.
Oops, something went wrong.