# About the notebook
[Back to the topic](pathway_toc.ipynb)

We are in step 01 now. The goal of this notebook is to process the counts using DESeq2 package and store it as a file for further usage. 

<img src="./fig/03 pathway analysis steps.png">

----

# set environment

In [1]:
source("Pathway_config.R")
source("Pathway_util.R")

Loading tidyverse: ggplot2
Loading tidyverse: tibble
Loading tidyverse: tidyr
Loading tidyverse: readr
Loading tidyverse: purrr
Loading tidyverse: dplyr
Conflicts with tidy packages ---------------------------------------------------
filter(): dplyr, stats
lag():    dplyr, stats
Loading required package: S4Vectors
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:dplyr’:

    combine, intersect, setdiff, union

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, cbind, colMeans, colnames,


# Import data

In [2]:
# import RData (annomapres0, annogenecnts0)
attach(file.path(OUTDIR, "HTS-Pilot-Annotated-STAR-counts.RData"))

Prepare columnData DataFrame and countData (matrix object)
- columnData --- metadata
- countData  --- count matrix

In [3]:
# columnData --- metadata
annomapres0 %>%
    dplyr::filter(enrichment_method == "RZ")  %>%
    DataFrame ->
    columnData
rownames(columnData) <- columnData[["Label"]]

head(columnData[, c("Label", "Strain", "Media")], 3)

DataFrame with 3 rows and 3 columns
              Label   Strain    Media
        <character> <factor> <factor>
1_RZ_J       1_RZ_J      H99      YPD
10_RZ_C     10_RZ_C    mar1d      YPD
11_RZ_J     11_RZ_J    mar1d      YPD

In [4]:
# countData  --- count matrix
annogenecnts0 %>%
    dplyr::select(as.character(c("gene", columnData[["Label"]]))) %>%
    as.data.frame %>%
    column_to_rownames("gene") %>%
    as.matrix ->
    countData

head(countData, 3)

Unnamed: 0,1_RZ_J,10_RZ_C,11_RZ_J,12_RZ_P,13_RZ_J,14_RZ_C,15_RZ_C,16_RZ_P,2_RZ_C,21_RZ_C,⋯,27_RZ_P,3_RZ_J,35_RZ_P,36_RZ_J,38_RZ_P,4_RZ_P,40_RZ_J,45_RZ_P,47_RZ_P,9_RZ_C
CNAG_00001,0,0,0,0,0,0,0,1,0,0,⋯,0,0,0,0,0,0,0,0,0,0
CNAG_00002,204,76,92,64,230,182,200,129,168,124,⋯,43,107,150,109,95,51,235,122,112,106
CNAG_00003,40,24,18,34,56,53,54,40,40,41,⋯,9,24,26,43,43,11,53,46,41,35


## Convert count matrix to gene expression matrix using DESeq model

Additive model

In [5]:
### Make DESeq object on the basis of the counts
dds_add <- DESeqDataSetFromMatrix(countData, columnData, ~ Media + Strain)

### Estimate Size Factors
dds_add <- estimateSizeFactors(dds_add)

### Estimate Dispersion parameters (for each gene)
dds_add <- estimateDispersions(dds_add)

### Fit NB MLE model
dds_add <- DESeq(dds_add)

### Rlog "normalized" expressions
rld_add <- rlog(dds_add)

gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
using pre-existing size factors
estimating dispersions
found already estimated dispersions, replacing these
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing


Note that if you are interested in multiplicative model, you can run the following code and store the results into RData as well.

```
### Make DESeq object on the basis of the counts
dds_mult <- DESeqDataSetFromMatrix(countData, columnData, ~ Media + Strain + Media:Strain)

### Estimate Size Factors
dds_mult <- estimateSizeFactors(dds_mult)

### Estimate Dispersion parameters (for each gene)
dds_mult <- estimateDispersions(dds_mult)

### Fit NB MLE model
dds_mult <- DESeq(dds_mult)

### Rlog "normalized" expressions
rld_mult <- rlog(dds_mult)
```

# Store the results

In [6]:
outfile <- file.path(OUTDIR, "dds_rld.RData")
save(dds_add,  rld_add,
     #dds_mult, rld_mult, 
     file = outfile)