# Day 2

Today, we will start using nf-core pipelines to find differentially abundant genes in our dataset. 
We are using data from the following paper: https://www.nature.com/articles/s41593-023-01350-3#Sec10

1. Please take some time to read through the paper and understand their approach, hypotheses and goals.

What was the objective of the study?

to better understand the molecular effects of chronic opioid use and physical dependence in the brain's reward circuitry, specifically in the context of chronic pain. Researchers developed a mouse model to examine how oxycodone withdrawal affects gene expression in key reward areas both with and without pre-existing chronic neuropathic pain. The study aimed to identify the resulting transcriptional maladaptations and use this insight to predict and validate potential drug targets, ultimately suggesting that HDAC1/HDAC2 inhibition could offer a new way to treat chronic pain in individuals dependent on opioids.

What do the conditions mean?

oxy: Oxycodone. This group received the opioid drug that was being tested for its effects related to physical dependence and withdrawal

sal: Saline
This is the placebo group (control group). Saline was used to mimic the injection procedure without administering the active drug, allowing researchers to isolate the effects of the oxycodone

What do the genotypes mean?

SNI: Spared nerve injury. In this model, one of the three main nerves in the leg is partially injured, which reliably induces long-term sensory hypersensitivity, specifically mechanical allodynia and thermal hyperalgesia (increased sensitivity to heat).


Sham: Sham surgeryrefers to the control surgery. This group underwent the exact same surgical procedure as the SNI group (incision, exposure of the nerve) but the nerve itself was not injured or manipulated. Mice in the Sham groups (Sham-Sal and Sham-Oxy) represent the subjects that are pain-free and serve as a baseline to differentiate the effects of the oxycodone, withdrawal, and the SNI-induced pain.

Imagine you are the bioinformatician in the group who conducted this study. They hand you the raw files and ask you to analyze them.

What would you do?

Which groups would you compare to each other?

Please also mention which outcome you would expect to see from each comparison.

## Analysis Approach:
1. **QC & Processing**: FastQC → trim → align to mouse genome → count reads
2. **Differential Expression**: Compare conditions using DESeq2/edgeR
3. **Functional Analysis**: Pathway enrichment and GO analysis

## Key Comparisons:
1. **SNI-Oxy vs SNI-Sal**: Oxycodone effects in chronic pain
2. **Sham-Oxy vs Sham-Sal**: Oxycodone effects without pain
3. **SNI-Sal vs Sham-Sal**: Chronic pain effects alone
4. **SNI-Oxy vs Sham-Oxy**: Pain effects in oxycodone-treated animals

## Expected Results:
- **SNI-Oxy vs SNI-Sal**: Strong changes in HDAC genes, opioid pathways, withdrawal genes
- **Sham-Oxy vs Sham-Sal**: Moderate reward pathway changes
- **SNI-Sal vs Sham-Sal**: Pain/inflammatory gene changes
- **SNI-Oxy vs Sham-Oxy**: Enhanced drug tolerance/dependence genes

Your group gave you a very suboptimal excel sheet (conditions_runs_oxy_project.xlsx) to get the information you need for each run they uploaded to the SRA.<br>
So, instead of directly diving into downloading the data and starting the analysis, you first need to sort the lazy table.<br>
Use Python and Pandas to get the table into a more sensible order.<br>
Then, perform some overview analysis and plot the results
1. How many samples do you have per condition?
2. How many samples do you have per genotype?
3. How often do you have each condition per genotype?

1) 8
2) 8
3) 4 


In [None]:
conditions_table_xlsx = "conditions_runs_oxy_project.xlsx"
import pandas as pd
df = pd.read_excel(conditions_table_xlsx, index_col="Run")


Unnamed: 0_level_0,Patient,RNA-seq,DNA-seq,condition: Sal,Condition: Oxy,Genotype: SNI,Genotype: Sham
Run,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
SRR23195505,?,x,,x,,x,
SRR23195506,?,x,,,x,,x
SRR23195507,?,x,,x,,,x
SRR23195508,?,x,,,x,x,
SRR23195509,?,x,,,x,x,
SRR23195510,?,x,,x,,x,
SRR23195511,?,x,,,x,,x
SRR23195512,?,x,,x,,,x
SRR23195513,?,x,,x,,x,
SRR23195514,?,x,,,x,,x


In [None]:
df = df.fillna(False)
df = df.replace("x", True)


  df = df.fillna(False)
  df = df.replace("x", True)


Unnamed: 0_level_0,Patient,RNA-seq,DNA-seq,condition: Sal,Condition: Oxy,Genotype: SNI,Genotype: Sham
Run,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
SRR23195505,?,True,False,True,False,True,False
SRR23195506,?,True,False,False,True,False,True
SRR23195507,?,True,False,True,False,False,True
SRR23195508,?,True,False,False,True,True,False
SRR23195509,?,True,False,False,True,True,False
SRR23195510,?,True,False,True,False,True,False
SRR23195511,?,True,False,False,True,False,True
SRR23195512,?,True,False,True,False,False,True
SRR23195513,?,True,False,True,False,True,False
SRR23195514,?,True,False,False,True,False,True


In [None]:
import numpy as np
conditions = ["Sal", "Oxy"]


df["Condition"] = np.select([df["condition: Sal"], df["condition: Oxy"]].toNumpy().T, conditions, default="Unknown")

They were so kind to also provide you with the information of the number of bases per run, so that you can know how much space the data will take on your Cluster.<br>
Add a new column to your fancy table with this information (base_counts.csv) and sort your dataframe according to this information and the condition.

Then select the 2 smallest runs from your dataset and download them from SRA (maybe an nf-core pipeline can help here?...)

In [None]:
bases_per_run_csv = "base_counts.csv"
bases = pd.read_csv(bases_per_run_csv, index_col="Run")

In [None]:
df = df.merge(bases, onv="Run")

In [None]:
df.sort_values(by="Bases_x", ascending=1)


#df["Condition: Sal"].sum()

In [None]:
df[["Condition: Sal", "Condition: Oxy"]]

In [23]:
!nextflow run nf-core/fetchngs -profile conda --input ids.csv --outdir results


[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Launching[35m `https://github.com/nf-core/fetchngs` [0;2m[[0;1;36mdistraught_lalande[0;2m] DSL2 - [36mrevision: [0;36m8ec2d934f9 [master][m
[K
[33mWARN: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  There is a problem with your Conda configuration!

  You will need to set-up the conda-forge and bioconda channels correctly.
  Please refer to https://bioconda.github.io/
  The observed channel order is 
  [defaults]
  but the following channel order is required:
  [conda-forge, bioconda, defaults]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~[39m[K
[33mWARN: Access to undefined parameter `monochromeLogs` -- Initialise it to a default value eg. `params.monochromeLogs = some_value`[39m[K


-[2m----------------------------------------------------[0m-
                                        [0;32m,--.[0;30m/[0;32m,-.[0m
[

While your files are downloading, get back to the paper and explain how you would try to reproduce the analysis.<br>
When you are done with this shout, so we can discuss the different ideas.

In [26]:
!nextflow run nf-core/fetchngs --input ids.csv -profile docker --outdir results -resume


[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Launching[35m `https://github.com/nf-core/fetchngs` [0;2m[[0;1;36melegant_wright[0;2m] DSL2 - [36mrevision: [0;36m8ec2d934f9 [master][m
[K
[33mWARN: Access to undefined parameter `monochromeLogs` -- Initialise it to a default value eg. `params.monochromeLogs = some_value`[39m[K


-[2m----------------------------------------------------[0m-
                                        [0;32m,--.[0;30m/[0;32m,-.[0m
[0;34m        ___     __   __   __   ___     [0;32m/,-._.--~'[0m
[0;34m  |\ | |__  __ /  ` /  \ |__) |__         [0;33m}  {[0m
[0;34m  | \| |       \__, \__/ |  \ |___     [0;32m\`-._,-`-,[0m
                                        [0;32m`._,._,'[0m
[0;35m  nf-core/fetchngs v1.12.0-g8ec2d93[0m
-[2m----------------------------------------------------[0m-
[1mCore Nextflow options[0m
  [0;34mrevision       : [0;32mmaster[0m
  [0;34mrunName        : [0;32melegant_wrigh