# Modeling the dynamics of EMT reveals genes associated with pan-cancer intermediate states and plasticity

The bioRxiv preprint is available [here](https://www.biorxiv.org/content/10.1101/2024.10.03.616309v1). 


### Overview 

This repository contains code used to analyse the epithelial-to-mesenchymal transition (EMT) using single-cell RNA sequencing (scRNA-seq) data. The code is written in Python (for single-cell processing pipeline), Julia (for modeling and Bayesian parameter inference), and R (for cell scoring). Code is formatted in Jupyter Lab / Jupyter notebook (.ipynb) and R markdown (.Rmd) filetypes.


## Data

Raw data used for the analysis is from public repositories:
* Cook and Vanderhyden, 2020 (GSE147405)
* Pastushenko et al., 2018 (GEO accession GSE110357)
* van Dijk et al., 2018 (GSE114397)
* Panchy et al., 2022 (GSE213753)
(add full citations later)


## Project contents

1 - scanpy

scRNAseq processing pipeline, primarily based on Scanpy code.

Each notebook is designed to run a single sample from the titular dataset. The sample can be changed at the beginning and will import sample-specific parameters from all_run_settings.xlsx and functions from universal_dataset_functions.ipynb \& the titular functions.ipynb.

universal_dataset_functions.ipynb contains functions for all samples.
All \*\_functions.ipynb files contain overriding functions that are specific to each dataset.

The code for each sample can be run with:
* 3 - Pastushenko_singleRun.ipynb
* 5 - vanDijk_singleRun.ipynb
* 7 - Cook_singleRun.ipynb
* 9 - Panchy.ipynb
with no further adjustments to other files, as all other files are imported (with exception for directory references).


Directory contains:

1 - universal_dataset_functions.ipynb
2 - Pastushenko_functions.ipynb
3 - Pastushenko_singleRun.ipynb
4 - vanDijk_functions.ipynb
5 - vanDijk_singleRun.ipynb
6 - Cook_functions.ipynb
7 - Cook_singleRun.ipynb
8 - Cook_singleRun_removedRuns.ipynb
9 - Panchy.ipynb
9.5 - Panchy_getKRT15highcells.ipynb
9.5 - Panchy_KRT15highcells.csv
all_run_settings.xlsx


2 - ucell code

EMTscore - removed samples.Rmd
EMTscore - samples.Rmd

3 - DE genes

Epithelial - log2FC analysis.ipynb
Intermediate - log2FC analysis.ipynb
Mesenchymal - log2FC analysis.ipynb
nCells directory

4 - rnavelocity code

\_withoutContaminantCellLine_OVCA420-EGF.csv
\_withoutContaminantCellLine_OVCA420-TGFB1.csv
\_withoutContaminantCellLine_OVCA420-TNF.csv
1 - Cook_singleRun_filterContaminantCells_OVCA420.ipynb
2 - Cook_singleRun_realigned.ipynb
Cook-realigned_run_settings.xlsx

5 - ode model code

Bayesian parameter fitting, one I state.ipynb
Bayesian parameter fitting, two I state.ipynb
graphing fitted parameters, one I state.ipynb
graphing fitted parameters, two I state.ipynb

6 - ODE genes

Epithelial - kparam correlations.ipynb
Intermediate - kparam correlations.ipynb
Mesenchymal - kparam correlations.ipynb
nCells directory

marker gene lists

Cell Cycle Markers - G1,S Genes.csv
Cell Cycle Markers - G2,M Genes.csv

Markers - EMP Cook 2022.csv
Markers - MSigDB.csv
Markers - PanglaoDB, Epithelial.csv


## Package requirements

Conda environments are available in the directory environment files.



### Package requirements 
 - [DifferentialEquations.jl](https://diffeq.sciml.ai/stable/)
 - [Turing.jl](https://turing.ml/stable/)
 - MCMCChains.jl
 - StochasticDelayDiffEq.jl
 - DiffEqSensitivity.jl
 - DiffEqCallbacks.jl
 - DecisionTree.jl
 - ScikitLearn.CrossValidation
 - ModelingToolkit.jl
 - StatsPlots.jl
 - Distributions.jl
 - Statistics.jl
 - Catalyst.jl
 - ParameterizedFunctions.jl
 - DiffeqJump.jl
 - Plots.jl; pyplot()
 - DataFrames.jl
 - DelimitedFiles.jl
 - CSV.jl
 - JLD.jl

### Project contents
 - `README.md` : this file with information about the repository and [paper](https://doi.org/10.1101/2022.06.15.496246)
 - `Modeling_MDSCs.ipynb` :  Jupyter notebook containing code blocks for all simulations and figures in the paper. Code blocks within the notebook are intended to be run independently. 
 - `Modeling_MDSCs_julia_v1.8.ipynb` :  Jupyter notebook (updated for Julia 1.8) containing code blocks for all simulations and figures in the paper. Code blocks within the notebook are intended to be run independently. 
 - `tumor_data.xlsx` : data used in the analysis, see Spigel, D. R. et al. FIR: Efficacy, Safety, and Biomarker Analysis of a Phase II Open-Label Study of Atezolizumab in PD-L1–Selected Patients With NSCLC. Journal of Thoracic Oncology 13, 1733–1742 (2018). URL [https://www.jto.org/article/S1556-0864(18)30603-8/fulltext](https://www.jto.org/article/S1556-0864(18)30603-8/fulltext) and Laleh, N. G. et al. Classical mathematical models for prediction of response to chemotherapy and immunotherapy. PLOS Computational Biology 18, e1009822 (2022). URL [https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009822](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009822).
 - `gillespie.csv` : Gillespie simulation results (see Figure S5)

### Acknowledgments
We would like to thank E.J. Fertig for valuable discussions and guidance, and the Tumor Microenvironment Program at the USC-Norris Comprehensive Cancer Center for their support. We would like to thank all members of the Roussos Torres and MacLean labs for valuable input and discussions. Figures 1A, 5A, & 6A were created with [BioRender](https://biorender.com/).

### Contributors
<a href="https://github.com/maclean-lab/ModelingMDSCs/graphs/contributors">
  <img src="https://contributors-img.web.app/image?repo=maclean-lab/ModelingMDSCs" />
</a>

