# KEGG Analysis

# Phase 2 vs. Phase 2

## 0. load libraries

In [21]:
library(tidyverse)
library(ggupset) 
library(enrichplot)
library(clusterProfiler) # for GSEA() and gseKEGG()

## 1. read CSVs

In [42]:
# get list of files
files <- list.files(
    path = '/project/pi_sarah_gignouxwolfsohn_uml_edu/julia/CE_2024/CE24_RNA-seq/analysis/diff_expression/phase2_v_phase2/deseq_res_files/all_genes',
    pattern = '\\.csv$',
    full.names = TRUE
    )

head(files)

`gseKEGG` takes in the whole list but uses LFC values (and wants them ordered, similar to GSEA - from what I've read, this is "better" than `enrichKEGG` bc of adding LFC, which allows your results to be more informative

In [43]:
names(files) <- gsub("^DEG_", "", tools::file_path_sans_ext(basename(files)))

df_list <- lapply(files, function(f) {
  df <- read.csv(f)

  # remove LOC from gene IDs
  df$X <- sub("^LOC", "", df$X)

  df
})

names(df_list) <- names(files)

all_df <- lapply(df_list, function(df) {
  geneList <- as.numeric(df$log2FoldChange)
  names(geneList) <- df$X

  # remove NA values
  geneList <- geneList[!is.na(geneList)]

  # sort decreasing
  sort(geneList, decreasing = TRUE)
})

names(all_df)
head(all_df$wc_cc)

## 2. run `gseKEGG`

### 1. effect of single exposure after period of recovery
- CC vs. WC
- CC vs. HC
- CC vs. BC

In [44]:
# CC vs. WC
k.cc.wc <- gseKEGG(geneList = all_df$wc_cc,
               organism     = 'cvn',
              # minGSSize    = 120,
               pvalueCutoff = 0.05,
               verbose      = FALSE)
as.data.frame(k.cc.wc)

“There are ties in the preranked stats (0.54% of the list).
The order of those tied genes will be arbitrary, which may produce unexpected results.”
no term enriched under specific pvalueCutoff...



ID,Description,setSize,enrichmentScore,NES,pvalue,p.adjust,qvalue
<chr>,<chr>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>


no enriched pathways

In [45]:
# CC vs. HC
k.cc.hc <- gseKEGG(geneList = all_df$hc_cc,
               organism     = 'cvn',
              # minGSSize    = 120,
               pvalueCutoff = 0.05,
               verbose      = FALSE)
as.data.frame(k.cc.hc)

“There are ties in the preranked stats (0.57% of the list).
The order of those tied genes will be arbitrary, which may produce unexpected results.”


Unnamed: 0_level_0,ID,Description,setSize,enrichmentScore,NES,pvalue,p.adjust,qvalue,rank,leading_edge,core_enrichment
Unnamed: 0_level_1,<chr>,<chr>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<int>,<chr>,<chr>
cvn04144,cvn04144,Endocytosis - Crassostrea virginica (eastern oyster),214,0.9957203,1.341435,0.000172953,0.02421342,0.02421342,56,"tags=1%, list=0%, signal=1%",111110117/111099630


endocytosis - cell membrane engulfs external substances, carries material inside the cell

enriched pathway in HC (I think that's how you interpret...)

enriched GO terms for this pathway are involved in cell migration/organization

In [47]:
# CC vs. BC
k.cc.bc <- gseKEGG(geneList = all_df$bc_cc,
               organism     = 'cvn',
              # minGSSize    = 120,
               pvalueCutoff = 0.05,
               verbose      = FALSE)
as.data.frame(k.cc.bc)

“There are ties in the preranked stats (0.55% of the list).
The order of those tied genes will be arbitrary, which may produce unexpected results.”
no term enriched under specific pvalueCutoff...



ID,Description,setSize,enrichmentScore,NES,pvalue,p.adjust,qvalue
<chr>,<chr>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>


no enriched pathways

### 2. effect of single late exposure 
- CC vs. CW
- CC vs. CH
- CC vs. CB

In [48]:
# CC vs. CW
k.cc.cw <- gseKEGG(geneList = all_df$cw_cc,
               organism     = 'cvn',
              # minGSSize    = 120,
               pvalueCutoff = 0.05,
               verbose      = FALSE)
as.data.frame(k.cc.cw)

“There are ties in the preranked stats (0.77% of the list).
The order of those tied genes will be arbitrary, which may produce unexpected results.”
no term enriched under specific pvalueCutoff...



ID,Description,setSize,enrichmentScore,NES,pvalue,p.adjust,qvalue
<chr>,<chr>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>


no enriched pathways

In [49]:
# CC vs. CH
k.cc.ch <- gseKEGG(geneList = all_df$ch_cc,
               organism     = 'cvn',
              # minGSSize    = 120,
               pvalueCutoff = 0.05,
               verbose      = FALSE)
as.data.frame(k.cc.ch)

“There are ties in the preranked stats (0.6% of the list).
The order of those tied genes will be arbitrary, which may produce unexpected results.”
no term enriched under specific pvalueCutoff...



ID,Description,setSize,enrichmentScore,NES,pvalue,p.adjust,qvalue
<chr>,<chr>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>


no enriched pathways

In [50]:
# CC vs. CB
k.cc.cb <- gseKEGG(geneList = all_df$cb_cc,
               organism     = 'cvn',
              # minGSSize    = 120,
               pvalueCutoff = 0.05,
               verbose      = FALSE)
as.data.frame(k.cc.cb)

“There are ties in the preranked stats (0.75% of the list).
The order of those tied genes will be arbitrary, which may produce unexpected results.”
no term enriched under specific pvalueCutoff...



ID,Description,setSize,enrichmentScore,NES,pvalue,p.adjust,qvalue
<chr>,<chr>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>


no enriched pathways

### 3. Effect of timing of initial stress exposure
- CW vs. WC
- CH vs. HC
- CB vs. BC

In [51]:
# CW vs. WC
k.cw.wc <- gseKEGG(geneList = all_df$cw_wc,
               organism     = 'cvn',
              # minGSSize    = 120,
               pvalueCutoff = 0.05,
               verbose      = FALSE)
as.data.frame(k.cw.wc)

“There are ties in the preranked stats (0.62% of the list).
The order of those tied genes will be arbitrary, which may produce unexpected results.”
no term enriched under specific pvalueCutoff...



ID,Description,setSize,enrichmentScore,NES,pvalue,p.adjust,qvalue
<chr>,<chr>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>


no enriched pathways

In [52]:
# CH vs. HC
k.ch.hc <- gseKEGG(geneList = all_df$ch_hc,
               organism     = 'cvn',
              # minGSSize    = 120,
               pvalueCutoff = 0.05,
               verbose      = FALSE)
as.data.frame(k.ch.hc)

“There are ties in the preranked stats (0.41% of the list).
The order of those tied genes will be arbitrary, which may produce unexpected results.”
no term enriched under specific pvalueCutoff...



ID,Description,setSize,enrichmentScore,NES,pvalue,p.adjust,qvalue
<chr>,<chr>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>


no enriched pathways

In [53]:
# CB vs. BC
k.cb.bc <- gseKEGG(geneList = all_df$cb_bc,
               organism     = 'cvn',
              # minGSSize    = 120,
               pvalueCutoff = 0.05,
               verbose      = FALSE)
as.data.frame(k.cb.bc)

“There are ties in the preranked stats (0.51% of the list).
The order of those tied genes will be arbitrary, which may produce unexpected results.”
no term enriched under specific pvalueCutoff...



ID,Description,setSize,enrichmentScore,NES,pvalue,p.adjust,qvalue
<chr>,<chr>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>


no enriched pathways