A package for deconvoluting Bulk RNAseq, using sc RNAseq to train CVAE models to generate new sc RNA seq data, and finally fitting Bulk RNAseq to obtain representative sc RANseq data
scMet is a CVAE model trained using sc RNA seq data and generating new and reasonable sc RNA seq. Then, the generated data is used to fit Bulk RNA seq to obtain sc RNA seq data that can represent Bulk RNA seq data
Clone the repository.
git clone https://github.com/gwyang6/scMet.git
Navigate to the project directory.
cd scMet
Set up a virtual environment.
conda create -n scMET python=3.7 conda activate scMET
Install the dependencies.
pip install -r requirements.txt
Help Activate
cd code python Main.py -h
Run example data
python Main.py
- scanpy
- pandas
- combat
- numpy
- warnings
- matplotlib
- sklearn
- tqdm
- scipy
- tensorflow
- random
- anndata
Read multiple single-cell gene expression data files and corresponding cell type annotation files. Merge them into a large sample and randomly select cells to generate multiple simulated Bulk RNA-seq data.
Apply the Combat algorithm to perform batch correction on the simulated Bulk RNA-seq data and real Bulk RNA-seq data. Reduce the technical differences between scRNA-seq data and Bulk RNA-seq data. Obtain batch-corrected real Bulk RNA-seq data for deconvolution.
Preprocess single-cell gene expression data. Use cell type annotation files to identify cell type-specific expressed genes and their expression levels. Solve the deconvolution problem by applying NNLS (Non-Negative Least Squares) on Bulk RNA-seq data using cell type-specific expressed genes and their expression levels, thus obtaining cell type proportions.
Train a CVAE (Conditional Variational Autoencoder) model using single-cell metabolic gene expression profiles and corresponding cell annotations. Use cell annotations as conditional input and randomize batch inputs. Record the training loss at each iteration. Add Adamw optimizer for backpropagation. Save a well-performing model for generating single-cell data.
Use the trained CVAE-GAN model to generate a large number of new single-cell gene expression data and corresponding cell annotations. Filter the generated data based on the correlation with original single-cell data of different cell types. Select cells with correlation above a certain threshold as the source for fitting Bulk RNA-seq data in the next step.