# 4sU-seq data analysis using DESeq2 results output (mutant vs wt)
This Jupyter notebook contains scripts used to analyze 4sU-seq differential expression from DESeq2 results. Specifically, results from comparing GR mutant data to wt GR, as in Lammer et al., 2023.

# Table of contents
1. [Load packages and files](#load-packages-and-files)
2. [Save gene lists](#save-gene-lists)

## Load packages and files <a name="initialize"></a>
Load required packages and results files. Additionally, format results into one dataframe with zero baseMean genes removed (dfNoZero)

In [None]:
#analyze DESeq2 results comparing fold change of SoF or Ctrl over wt GR
import pandas as pd
pd.options.mode.chained_assignment = None
import numpy as np
import scipy as sp
from scipy import stats
#input p-value cutoff and data paths here
pCutoff=0.05
resDir = "..\\data"
outDir = "..\\analyses"
cellnames = ["sof","ctrl"]
samplefiles=[]
for cell in cellnames:
	samplefiles.append("res_"+cell+"vswt")
resFiles=[resDir+"\\"+file+".csv" for file in samplefiles]

In [None]:
#format data into one dataframe, remove rows with zero baseMean
def formatResFiles(resFiles):
	tables=[]
	allCols=["ens","name","baseMean","log2FoldChange","lfcSE","padj"]
	lessCols=["baseMean","log2FoldChange","lfcSE","padj"]
	for i,file in enumerate(resFiles):
		if i==0:
			tables.append(pd.read_csv(file, usecols=allCols))
		else:
			tables.append(pd.read_csv(file, usecols=lessCols))
	ensIDs=tables[0]["ens"]
	geneNames=tables[0]["name"]
	df=pd.DataFrame(ensIDs)
	df["name"]=geneNames
	for i,x in enumerate(tables):
		colNames=[cellnames[i]+"_"+y for y in lessCols]
		for n,col in enumerate(colNames):
			df[col]=x[lessCols[n]]
	#remove genes with NA gene names
	df = df[df["name"].isna()==False]
	#remove rows with zero base mean to remove nan fold change rows but maintain nan padjs
	dfNoZero=df.copy()
	columns=[x+"_baseMean" for x in cellnames]
	dfNoZero = dfNoZero[dfNoZero[columns].min(axis=1) > 0]

	return dfNoZero
dfNoZero=formatResFiles(resFiles)

## Save gene lists <a name="savelists"></a>
Subset the results dataframe into genes constitutively upregulated or downregulated in SoF GR cells compared to wt GR cells and save to lists

In [None]:
#save files
dfSoFPos=dfNoZero[(dfNoZero["sof_padj"]<pCutoff) & (dfNoZero["ctrl_padj"]>=pCutoff) & (dfNoZero["sof_log2FoldChange"]>0)]
dfSoFDown=dfNoZero[(dfNoZero["sof_padj"]<pCutoff) & (dfNoZero["ctrl_padj"]>=pCutoff) & (dfNoZero["sof_log2FoldChange"]<0)]
dfSoFPos["nover"]=[x.split(".")[0] for x in dfSoFPos["ens"]]
dfSoFDown["nover"]=[x.split(".")[0] for x in dfSoFDown["ens"]]
dfSoFPos.to_csv(outDir+"\\SoFPos_sym.txt",columns=["name"],header=False,index=False)
dfSoFPos.to_csv(outDir+"\\SoFPos_ens.txt",columns=["ens"],header=False,index=False)
dfSoFPos.to_csv(outDir+"\\SoFPos_ens_nover.txt",columns=["nover"],header=False,index=False)
dfSoFDown.to_csv(outDir+"\\SoFDown_sym.txt",columns=["name"],header=False,index=False)
dfSoFDown.to_csv(outDir+"\\SoFDown_ens.txt",columns=["ens"],header=False,index=False)
dfSoFDown.to_csv(outDir+"\\SoFDown_ens_nover.txt",columns=["nover"],header=False,index=False)