# DESeq2 Analysis

The popular DESeq2 package in R is used to perform differential gene expression analysis.
DESeq2 is widely used for bulk RNA-seq count data, offering robust statistical methods to identify genes that are differentially expressed between experimental conditions.
The goal is to take raw count data, normalize it, and extract insights from significant changes in gene expression.

In [4]:
## Load data.
# Here we have the raw counts, which is the exact input we want for DESeq2.
# Raw counts are generated by tools such as featureCounts or HTSeq-count. 
counts_path <- file.path("../data/GSE164073_raw_counts_GRCh38.p13_NCBI.tsv.gz")
counts <- read.delim((counts_path))

# Observe the data
head(counts)

Unnamed: 0_level_0,GeneID,GSM4996084,GSM4996085,GSM4996086,GSM4996087,GSM4996088,GSM4996089,GSM4996090,GSM4996091,GSM4996092,GSM4996093,GSM4996094,GSM4996095,GSM4996096,GSM4996097,GSM4996098,GSM4996099,GSM4996100,GSM4996101
Unnamed: 0_level_1,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>
1,100287102,2,5,3,2,5,2,4,2,3,2,2,4,3,2,2,3,3,3
2,653635,244,236,337,266,317,226,303,196,219,202,168,201,235,221,270,262,234,314
3,102466751,25,17,34,22,24,19,23,15,21,15,19,22,15,17,16,18,17,23
4,107985730,1,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1
5,100302278,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
6,645520,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0


In [7]:
library(dplyr)
library(tibble)
# Set rownames to gene identifiers
count_mat <- counts %>%
  tibble::column_to_rownames("GeneID") %>%
  as.matrix()

print(head(count_mat))

          GSM4996084 GSM4996085 GSM4996086 GSM4996087 GSM4996088 GSM4996089
100287102          2          5          3          2          5          2
653635           244        236        337        266        317        226
102466751         25         17         34         22         24         19
107985730          1          1          1          0          0          0
100302278          0          0          0          0          0          0
645520             0          0          0          0          1          0
          GSM4996090 GSM4996091 GSM4996092 GSM4996093 GSM4996094 GSM4996095
100287102          4          2          3          2          2          4
653635           303        196        219        202        168        201
102466751         23         15         21         15         19         22
107985730          1          0          0          0          0          0
100302278          0          0          0          0          0          0
645520      

In [None]:
# Now we need to prepare a metadata table
# so that DESeq2 knows which conditions exist and which comparisons to perform