scMet

A package for deconvoluting Bulk RNAseq, using sc RNAseq to train CVAE models to generate new sc RNA seq data, and finally fitting Bulk RNAseq to obtain representative sc RANseq data

scMet is a CVAE model trained using sc RNA seq data and generating new and reasonable sc RNA seq. Then, the generated data is used to fit Bulk RNA seq to obtain sc RNA seq data that can represent Bulk RNA seq data

Installation

Clone the repository.

git clone https://github.com/gwyang6/scMet.git

Navigate to the project directory.
```
cd scMet
```

Set up a virtual environment.

conda create -n scMET python=3.7
conda activate scMET

Install the dependencies.
```
pip install -r requirements.txt
```
Help Activate
```
cd code
python Main.py -h
```
Run example data
```
python Main.py
```

Dependencies

scanpy
pandas
combat
numpy
warnings
matplotlib
sklearn
tqdm
scipy
tensorflow
random
anndata

Steps

1. Reading and Merging Single-cell Gene Expression Data

Read multiple single-cell gene expression data files and corresponding cell type annotation files. Merge them into a large sample and randomly select cells to generate multiple simulated Bulk RNA-seq data.

2. Batch Correction of Simulated and Real Bulk RNA-seq Data

Apply the Combat algorithm to perform batch correction on the simulated Bulk RNA-seq data and real Bulk RNA-seq data. Reduce the technical differences between scRNA-seq data and Bulk RNA-seq data. Obtain batch-corrected real Bulk RNA-seq data for deconvolution.

3. Preprocessing of Single-cell Gene Expression Data for Deconvolution

Preprocess single-cell gene expression data. Use cell type annotation files to identify cell type-specific expressed genes and their expression levels. Solve the deconvolution problem by applying NNLS (Non-Negative Least Squares) on Bulk RNA-seq data using cell type-specific expressed genes and their expression levels, thus obtaining cell type proportions.

4. Training CVAE Model for Generating Single-cell Data

Train a CVAE (Conditional Variational Autoencoder) model using single-cell metabolic gene expression profiles and corresponding cell annotations. Use cell annotations as conditional input and randomize batch inputs. Record the training loss at each iteration. Add Adamw optimizer for backpropagation. Save a well-performing model for generating single-cell data.

5. Generating and Filtering Simulated Single-cell Data

Use the trained CVAE-GAN model to generate a large number of new single-cell gene expression data and corresponding cell annotations. Filter the generated data based on the correlation with original single-cell data of different cell types. Select cells with correlation above a certain threshold as the source for fitting Bulk RNA-seq data in the next step.

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
Resource		Resource
code		code
Cell infiltration preference		Cell infiltration preference
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
README.rst		README.rst
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scMet

Installation

Dependencies

Steps

1. Reading and Merging Single-cell Gene Expression Data

2. Batch Correction of Simulated and Real Bulk RNA-seq Data

3. Preprocessing of Single-cell Gene Expression Data for Deconvolution

4. Training CVAE Model for Generating Single-cell Data

5. Generating and Filtering Simulated Single-cell Data

6. Fitting Bulk RNA-seq Data using Selected Single-cell Data

About

Releases

Packages

Languages

License

gwyang6/scMet

Folders and files

Latest commit

History

Repository files navigation

scMet

Installation

Dependencies

Steps

1. Reading and Merging Single-cell Gene Expression Data

2. Batch Correction of Simulated and Real Bulk RNA-seq Data

3. Preprocessing of Single-cell Gene Expression Data for Deconvolution

4. Training CVAE Model for Generating Single-cell Data

5. Generating and Filtering Simulated Single-cell Data

6. Fitting Bulk RNA-seq Data using Selected Single-cell Data

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages