# Volcano Plots
# Phase 2 vs. Phase 2

using results from [deseq_p2.v.p2.ipynb](https://github.com/jgmcdonough/CE24_RNA-seq/blob/main/analysis/diff_expression/phase2_v_phase2/deseq_p2.v.p2.ipynb) to generate publication-formatted volcano plots

## 0. load libraries

In [25]:
library(tidyverse) # for ggplot and dplyr
library(cowplot) # for combining plots

## 1. Load CSVs


From looking at differnet comparisons, the DESeq analysis resulted in 32 output files. These contain *all genes*. Here, I start by assigning directionality for genes and write CSVs that contain only DEGs.

In [26]:
# get list of files
files <- list.files(
    path = '/project/pi_sarah_gignouxwolfsohn_uml_edu/julia/CE_2024/CE24_RNA-seq/analysis/diff_expression/phase2_v_phase2/deseq_res_files',
    pattern = '\\.csv$',
    full.names = TRUE
    )

head(files)

The order of oyster treatment is intential - the oyster listed second is the 'baseline' from DESeq. In other words, If the file name is bb_cc.csv, then the LFC values are for BB in relation to CC (a +lfc = more expression in BB, a -lfc = more expression in CC)

I want to read in a csv, assign directionality to DEGs (or mark genes as not significant)

In [27]:
# create function
deg_list <- lapply(files, function(f) {

    # read file
  df <- read.csv(f)

    # get basename of file without extension (so just bb_cc if bb_cc.csv
  name <- tools::file_path_sans_ext(basename(f))
    # separate the two treatments (bb and cc)
  groups <- strsplit(name, "_")[[1]]
    # assign treatment names to variables
  g1 <- groups[1] # bb
  g2 <- groups[2] # cc

  # set default for new col to NS (not significant)
  df$DEG_group <- "NS"

  # upregulated DEGs
  df$DEG_group[df$padj < 0.05 & df$log2FoldChange >  1] <- g1 # bb if cc is baseline

  # downregulated DEGs
  df$DEG_group[df$padj < 0.05 & df$log2FoldChange < -1] <- g2 

  df
})


In [28]:
# name list elements after the files
names(deg_list) <- tools::file_path_sans_ext(basename(files))

# basename() removes directory path, keeps only the filename
# tools::file_path_sans_exta(...) removes the file extension (.csv)
# names(deg_list) assigns those filenames as the names of the list elements
# now can access elements by name

writing out *only* DEGs for downstream analysis

In [29]:
out_dir <- "/project/pi_sarah_gignouxwolfsohn_uml_edu/julia/CE_2024/CE24_RNA-seq/analysis/diff_expression/phase2_v_phase2/deseq_res_files/DEGs"  

for (nm in names(deg_list)) {

  df <- deg_list[[nm]]

  deg_df <- df[!is.na(df$DEG_group) & df$DEG_group != "NS", ]

  write.csv(
    deg_df,
    file = file.path(out_dir, paste0("DEG_", nm, ".csv")),
    row.names = FALSE
  )
}
