# Steps 1 & 2 Generating Files From Wald Test And Likelyhood Ratio Test

## Site(s) Used:

* For the [demo version](http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html)

* The [installation of the DESeq2]( https://bioconductor.org/packages/release/bioc/html/DESeq2.html )


## Defining Output Paths


This is where the Wald Test and the Likelyhood Ratio Test Results will be outputted.

*Suggestion:* 

* Make a parent directory called `1___Structured_Data_Files` and the path to that should be the path used in `parent_directory`.

In [None]:
# Define the parent directory to output the Wald and Likelyhood Ratio Test
parent_directory <- "/path/to/summary/statistics/removed/1___Structured_Data_Files"

## Loading Counts Matrix


First the counts matrix needs to be loaded. Make sure it is the one without any summary statistics. In my case it is called `

Make sure to use the full path below as an input for `counts_matrix_file`.

In [None]:
# Define the path to your counts matrix file
counts_matrix_file <- "/path/to/my/counts/matrix/txt/file/myCountsMatrixWithShorterColumnNamesAsInput.txt"

# Read the counts matrix from the TSV file
counts_matrix <- read.table(counts_matrix_file, header = TRUE, row.names = 1, sep = "\t")

head(counts_matrix)

In [None]:
print(colnames(counts_matrix))

DEseq2 requires sample metadata so I have set it up with the sample names being the same as the column names.


In [None]:


sample_metadata <- data.frame(
  Sample = c("Control.01__Control",
             "Control.02__Control",
             "Control.03__Control",
             "Control.04__Control",
             "Control.05__Control",
             "Control.06__Control",
             "Experiment.01__Experimental",
             "Experiment.02__Experimental",
             "Experiment.03__Experimental",
             "Experiment.04__Experimental",
             "Experiment.05__Experimental",
             "Experiment.06__Experimental"),
    
  Treatment = c("Untreated", "Untreated", "Untreated",
                "Untreated", "Untreated", "Untreated",
                "Knockdown", "Knockdown", "Knockdown",
                "Knockdown", "Knockdown", "Knockdown"),
    
  Timepoint = c("IndependentVariableType1", "IndependentVariableType1", "IndependentVariableType1",
            "IndependentVariableType2", "IndependentVariableType2", "IndependentVariableType2",
            "IndependentVariableType1", "IndependentVariableType1", "IndependentVariableType1",
            "IndependentVariableType2", "IndependentVariableType2", "IndependentVariableType2")
)

# View the table

sample_metadata


Below is the tentative design formula I am going to use.

In [None]:
colnames(sample_metadata)

## DEseq2 Data Set

When it comes to the design forumla for the `DESeqDataSetFromMatrix` object ensure that you put the formula directly into design. Do not put it into a variable that will be fed in design. It will most likely not work.

In [None]:
library(DESeq2)

dds <- DESeqDataSetFromMatrix(countData = counts_matrix,
                              colData = sample_metadata,
                              design = ~ Timepoint + Treatment  )


## Manually Identify: Factor Levels

You may have to idiftify the reference factor as according to DEseq2:

> By default, R will choose a reference level for factors based on alphabetical order. Then, if you never tell the DESeq2 functions which level you want to compare against (e.g. which level represents the control group), the comparisons will be based on the alphabetical order of the levels. There are two solutions: you can either explicitly tell results which comparison to make using the contrast argument (this will be shown later), or you can explicitly set the factors levels. In order to see the change of reference levels reflected in the results names, you need to either run DESeq or nbinomWaldTest/nbinomLRT after the re-leveling operation. Setting the factor levels can be done in two ways, either using factor

Therefore below I am setting Untreated as the reference.  This makes it seem the knockdown as the experimental group.

In [None]:
dds$Treatment <- factor(dds$Treatment, levels = c("Untreated", "Knockdown"))

In [None]:
dds$Treatment <- relevel(dds$Treatment, ref = "Untreated")

## Pre-filtering Low Counts

In [None]:
keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]

## Differential Expression Analysis Main

In [None]:
dds <- DESeq(dds)
res <- results(dds)
res

## Manually Specify What You Want To Compare

Here I would like to compare the treatments, the untreated vs the treated.

### Wald Test (Default)

In [None]:
res <- results(dds, contrast=c("Treatment","Knockdown", "Untreated"))

The preference is that you order by the smallest p value.

In [None]:
resOrdered <- res[order(res$pvalue),]

In [None]:
df_resOrdered <- as.data.frame(resOrdered)
head(as.data.frame(resOrdered))

Now it is time to output the results from the Wald test. The file name will have the time in Hours, Minutes, Seconds and whether it is AM or PM for version control purposes.

In [None]:
# Create a timestamp with date and time
timestamp <- format(Sys.time(), format = "%Y_%m_%d_%I%M%S%p")

# Create the Wald test subdirectory under the parent directory
wald_test_subdirectory <- file.path(parent_directory, "1___Wald_Test")
dir.create(wald_test_subdirectory, showWarnings = FALSE)

# Define the full path for the Wald test file with date and time
Wald_test_file_path <- file.path(wald_test_subdirectory, paste0("Wald_Test_Contast_Treatment_Knockdown_Control_", timestamp, ".tsv"))

# Write the data frame to a TSV file
write.table(df_resOrdered, file = Wald_test_file_path, sep = "\t", row.names = TRUE)

### Examining Data:

In [None]:
summary(res)

To look at how many p values were less than 0.05 do the following command:

In [None]:
sum(res$padj < 0.05, na.rm=TRUE)

1598 values are there with a FDR less than 0.05 that is nice.

In [None]:
res05 <- results(dds, alpha=0.05)
summary(res05)

In [None]:
sum(res05$padj < 0.05, na.rm=TRUE)

In [None]:
resultsNames(dds)

### Likelyhood Ratio Test

In [None]:
# Reduced model (remove the variable of interest, e.g., Condition2)
dds_reduced <- DESeq(dds, test = "LRT",full = design(dds), reduced = ~Timepoint)

In [None]:
# Extract results from the likelihood ratio test
results_LRT <- results(dds_reduced)

# View the top differentially expressed genes
head(results_LRT)

In [None]:
head(results_LRT)

In [None]:
# Create a timestamp with date and time
timestamp <- format(Sys.time(), format = "%Y_%m_%d_%I%M%S%p")

# Define the subdirectory for the LRT test under the parent directory
lrt_test_subdirectory <- file.path(parent_directory, "2___Likelihood_Ratio_Test")
dir.create(lrt_test_subdirectory, showWarnings = FALSE)

# Define the full path for the LRT test file with date and time
lrt_test_file_path <- file.path(lrt_test_subdirectory, paste0("Likelihood_Ratio_Test_", timestamp, ".tsv"))

# Write the data frame to a TSV file
write.table(results_LRT, file = lrt_test_file_path, sep = "\t", row.names = TRUE)

## Log fold change shrinkage for visualization and ranking

In [None]:
resLFC <- lfcShrink(dds, coef="Treatment_Knockdown_vs_Untreated", type="apeglm")
resLFC

## Independent hypothesis weighting

In [None]:
library("IHW")
resIHW <- results(dds, filterFun=ihw)
summary(resIHW)
sum(resIHW$padj < 0.05, na.rm=TRUE)
metadata(resIHW)$ihwResult

## Data Exploration:

In [None]:
plotMA(res, ylim=c(-2,2))

In [None]:
plotMA(resLFC, ylim=c(-2,2))

## Session info

Currently I am using R 4.2.2.10 on the cluster.

In [None]:
sessionInfo()

In [None]:
print("Done")