Seurat.IntegrateData (v3)
Description: GenePattern module which implements the batch correction algorithm derived from the Seurat software package (Seurat version 3.2.0).
Authors: Jonathan Zamora and Edwin F. Juárez, UCSD
Contact: Forum Link
Algorithm Version: Seurat 3.2.0
Summary
The Seurat.IntegrateData
Module integrates (corrects for batch effects) multiple single-cell datasets and identifies shared cell states that are present across different datasets, regardless of their origin. Once the Module integrates these datasets, the returned object will contain a new Assay that holds an integrated/batch-corrected expression matrix for all cells. The resultant batch-corrected expression matrix can then be used for downstream analyses and visualizations.
References
"Seurat V3 Paper:" Stuart T, Butler A, Hoffman P, Hafemeister C, Papalexi E, Mauck WM 3rd, Hao Y, Stoeckius M, Smibert P, Satija R. Comprehensive Integration of Single-Cell Data. Cell. 2019 Jun 13;177(7):1888-1902.e21. doi: 10.1016/j.cell.2019.05.031. Epub 2019 Jun 6. PMID: 31178118; PMCID: PMC6687398.
Technical Notes
This module runs in the Docker container tagged genepattern/seurat-suite\:4.0.3
.
GitHub repository for this module's code & version: https://github.com/genepattern/Seurat.IntegrateData/tree/v3
Parameters
Name | Description |
---|---|
input_files | Gene expression matrices (columns are cell IDs and rows are genes) stored in a .txt or one Seurat Object (.rds ) file per batch. If your files are in 10x Genomics (Cell Ranger) format or stored in a .tar , or .gz file, please use the Seurat.QC module to pre-process your data and then use the .txt or .rds output from Seurat.QC in this module. For sample data see test data file 1 and test data file 2. |
use_batch_names | (default = TRUE) Map each input file to Batch Numbers beginning with Batch 1 and going up to Batch n where n represents the number of input files. When set to FALSE, the batch names will be set to the file names. |
ncomps | (default = 50) The number of principal components to be used in the Principal Component Analysis (PCA) for batch correction. |
nCount_RNA | (default = TRUE) Whether or not the batch correction script will produce a violin plot of the number of molecules detected within a cell for our single-cell datasets. |
nFeature_RNA | (default = TRUE) Whether or not the batch correction script will produce a violin plot of the number of genes detected within each cell of our single-cell datasets. |
output_file_name | (default = 'batch_correction_results') Base name for the .pdf and .rds output files. |
Output Files
batch_correction_log.txt
- This
.txt
file contains a log of each process carried out during the batch correction script's execution.
- This
output_file_name.rds
- This
.rds
file can be used on another one of GenePattern's Seurat suite modules, such as theSeurat.Clustering
module. - It contains an integrated / batch-corrected expression matrix for all cells present in the input files.
- Note: This file inherits its name from the
output_file_name
parameter listed above.
- This
output_file_name.pdf
- This
.pdf
file contains both a UMAP plot and Violin plot of the integrated Seurat Objects that are derived from gene expression datasets. - Additionally, the first page will display a batch mapping table. In short, this table maps input files to numbered batches, starting at
Batch 1
and going up toBatch n
where n represents the number of input files given to the script. - Note: This file inherits its name from the
output_file_name
parameter listed above.
- This
License
Seurat.IntegrateData
is distributed under a modified BSD license available at https://github.com/genepattern/Seurat.IntegrateData/blob/v3/LICENSE.
Platform Dependencies
Task Type | CPU Type | Operating System | Language |
---|---|---|---|
Any | Any | R 4.0.2 |
Version Comments
Version | Release Date | Description |
---|---|---|
3 | Mar. 12th, 2021 | Input types now include .rds , tsv , and .txt files. Output shows data before the batch correction (in addition to post batch correction). |
2 | Mar. 3rd, 2021 | Introduced updates to parameters, program structure, and output format. |
1 | Nov. 4th, 2020 | Initial Release of Seurat.BatchCorrection . |