Posterior predictive check for bulk RNA sequencing data

The input data set is a tidy representation of a differential gene transcript abundance analysis

To install:

For linux systems, in order to exploit multi-threading, from R write:

fileConn<-file("~/.R/Makevars")
writeLines(c( "CXX14FLAGS += -O3","CXX14FLAGS += -DSTAN_THREADS", "CXX14FLAGS += -pthread"), fileConn)
close(fileConn)

Then, install with

devtools::install_github("stemangiola/ppcSeq")

You can get the test dataset with

ppcSeq::counts

You can convert a list of BAM/SAM files into a tidy data frame of annotated counts

counts.ppc = 
    ppcSeq::counts %>%
    mutate(is_significant = FDR < 0.01) %>%
    ppc_seq(
        formula = ~ Label,
        significance_column = PValue,
        do_check_column = is_significant,
        value_column = value,
        percent_false_positive_genes = "5%"
    )

## [1] "2019-08-08 21:29:49 AEST"
## [1] "2019-08-08 21:32:44 AEST"
## [1] "2019-08-08 21:33:06 AEST"
## [1] "2019-08-08 21:36:35 AEST"

The new posterior predictive check has been added to the original data frame

counts.ppc

## # A tibble: 15 x 4
##    symbol   `sample wise data` plot   `tot deleterious outliers`
##    <chr>       <list<df[,11]>> <list>                      <int>
##  1 SLC16A12          [21 × 11] <gg>                            0
##  2 CYP1A1            [21 × 11] <gg>                            1
##  3 ART3              [21 × 11] <gg>                            1
##  4 DIO2              [21 × 11] <gg>                            0
##  5 OR51E2            [21 × 11] <gg>                            0
##  6 MUC16             [21 × 11] <gg>                            0
##  7 CCNA1             [21 × 11] <gg>                            0
##  8 LYZ               [21 × 11] <gg>                            1
##  9 PPM1H             [21 × 11] <gg>                            0
## 10 SUSD5             [21 × 11] <gg>                            0
## 11 TPRG1             [21 × 11] <gg>                            0
## 12 EPB42             [21 × 11] <gg>                            0
## 13 LRRC38            [21 × 11] <gg>                            0
## 14 SUSD4             [21 × 11] <gg>                            0
## 15 MMP8              [21 × 11] <gg>                            1

The new data frame contains plots for each gene

We can visualise the top five differentially transcribed genes

counts.ppc %>% 
    slice(1:2) %>% 
    pull(plot) %>% 
    cowplot::plot_grid(plotlist = ., align = "v", ncol = 1, axis="b", rel_widths = 1 )

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
R		R
README_files/figure-gfm		README_files/figure-gfm
data		data
dev		dev
inst		inst
man		man
src		src
.Rbuildignore		.Rbuildignore
.Rhistory		.Rhistory
.gitignore		.gitignore
.travis.yml		.travis.yml
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
README.Rmd		README.Rmd
README.md		README.md
configure		configure
configure.win		configure.win
ppcSeq.Rproj		ppcSeq.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Posterior predictive check for bulk RNA sequencing data

About

Releases

Packages

Languages

evan-wehi/ppcSeq

Folders and files

Latest commit

History

Repository files navigation

Posterior predictive check for bulk RNA sequencing data

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages