Skip to content

Commit

Permalink
Merge pull request #676 from maxplanck-ie/develop
Browse files Browse the repository at this point in the history
Develop
  • Loading branch information
katsikora committed Aug 24, 2020
2 parents 4145ce4 + 9a46d31 commit 4b961d8
Show file tree
Hide file tree
Showing 10 changed files with 56 additions and 10 deletions.
5 changes: 5 additions & 0 deletions docs/content/workflows/mRNA-seq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,11 @@ Like the other workflows, differential expression can be performed using the ``-

.. note:: The first entry defines which group of samples are control. This way, the order of comparison and likewise the sign of values can be changed. The DE analysis might fail if your sample names begin with a number. So watch out for that!

Differential Splicing
---------------------

In addition to differential expression, differential splicing analysis can be performed by using ``--rMats`` option in addition to supplying a sample sheet. This will invoke the rMats turbo on the samples.

Complex designs with blocking factors
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
45 changes: 43 additions & 2 deletions docs/content/workflows/scRNA-seq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,10 @@ What it does

The scRNA-seq pipeline is intended to process UMI-based data, expecting the cell barcode and umi in Read1, and the cDNA sequence in Read2.

There are currently two analysis modes available:
There are currently three analysis modes available:
- "Gruen" to reproduce CellSeq2 data analysis by Gruen et al.
- "STARsolo" which uses STAR solo for mapping and quantitation.
- "Alevin" based on Salmon for generating the count matrix.

The general procedure for mode "Gruen" involves:

Expand All @@ -29,6 +30,13 @@ The general procedure for mode "STARsolo" involves:

UMIs in the read headers are used to avoid counting PCR duplicates. A number of bigWig and QC plots (e.g., from ``plotEnrichment``) are generated as well.

Mode "Alevin" involves:

1. Generation of a salmon index used for mapping.
2. Mapping and generation of a readcount matrix.
3. Estimation of uncertainty of gene counts using bootstrap method implemented in Salmon Alevin.
4. General QC of the Alevin run using the AlevinQC R package.

.. image:: ../images/scRNAseq_pipeline.png


Expand Down Expand Up @@ -121,6 +129,11 @@ The default configuration file is listed below and can be found in ``snakePipes/
myKit: CellSeq384
BCwhiteList:
STARsoloCoords: ["1","7","8","7"]
##mode Alevin options
alevinLibraryType: "ISR"
prepProtocol: "celseq2"
salmonIndexOptions: --type puff -k 31
expectCells:
#generic options
libraryType: 1
bwBinSize: 10
Expand Down Expand Up @@ -241,10 +254,29 @@ The following will be produced in the output directory when the workflow is run

- The **VelocytoCounts** directory contains loom files in sample subdirectories.
- The **VelocytoCounts_merged** directory containes one loom file with all samples merged.
- The **STARsolo* directory contains bam files and 10X-format cell count matrices produced by STARsolo.
- The **STARsolo** directory contains bam files and 10X-format cell count matrices produced by STARsolo.

The remaining folders are described in the Gruen mode above.

The following output structure will be produced when running in Alevin mode::

├── Alevin
├── Annotation
├── cluster_logs
├── FastQC
├── multiQC
├── originalFASTQ
├── Salmon
├── scRNAseq.cluster_config.yaml
├── scRNAseq.config.yaml
├── scRNAseq_organism.yaml
├── scRNAseq_pipeline.pdf
├── scRNAseq_run-1.log
└── scRNAseq_tools.txt

- The **Salmon** directory contains the generated genome index.
- The **Alevin** directory contains the matrix failes (both bootstrapped and raw) per sample in subdirectories.
- The **multiQC** directory contains an additional alevinQC html file generated per sample.

Understanding the outputs: mode Gruen
--------------------------------------
Expand Down Expand Up @@ -287,6 +319,15 @@ Cell filtering, metrics collection and threshold selection are done as above onl
Clustering is done with RaceID default settings. The fully processed RaceID object is written to sc.minT\*.RData, the tsne plot with the clustering information to sc.minT\*.tsne.clu.png.
Top 10 and top 2 markers are calculated, and the resulting plots and tables written out as above. Violin and feature plots are generated for the top2 marker list and saved to files as in the description above. Session info is written to sessionInfo.txt. Statistical procedures and results are summarized in Stats_report.html.

Understanding the outputs: mode Alevin
--------------------------------------

- **Main result:** output folders containing the raw and boostrapped count matrices are found under the sample subfolders under ``Alevin``. The sample specific Alevin folders contain the matrices, as well as column data (barcodes) and row data (genes).

- Corresponding annotation files are: ``Annotation/genes.filtered.bed`` and ``Annotation/genes.filtered.gtf``, respectively.

- The QC plots (both from multiQC and AlevinQC) are available in the ``multiQC`` folder.


Example images
~~~~~~~~~~~~~~
Expand Down
2 changes: 1 addition & 1 deletion snakePipes/shared/organisms/GRCz10.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ hisat2_index: "/data/repository/organisms/GRCz10_ensembl/HISAT2Index/genome"
bwa_index: "/data/repository/organisms/GRCz10_ensembl/BWAindex/genome.fa"
bwameth_index: "/data/repository/organisms/GRCz10_ensembl/BWAmethIndex/genome.fa"
known_splicesites: "/data/repository/organisms/GRCz10_ensembl/ensembl/release-88/HISAT2/splice_sites.txt"
star_index: "/data/repository/organisms/GRCz10_ensembl/STARIndex/2.7.1a/"
star_index: "/data/repository/organisms/GRCz10_ensembl/STARIndex/2.7.4a/"
genes_bed: "/data/repository/organisms/GRCz10_ensembl/ensembl/release-88/genes.bed"
genes_gtf: "/data/repository/organisms/GRCz10_ensembl/ensembl/release-88/genes.gtf"
extended_coding_regions_gtf: "/data/repository/organisms/GRCz10_ensembl/ensembl/release-88/genes.slop.gtf"
Expand Down
2 changes: 1 addition & 1 deletion snakePipes/shared/organisms/SchizoSPombe_ASM294v2.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ bowtie2_index: "/data/repository/organisms/SchizoSPombe_ASM294v2_ensembl/BowtieI
hisat2_index: "/data/repository/organisms/SchizoSPombe_ASM294v2_ensembl/HISAT2Index/genome"
bwa_index: "/data/repository/organisms/SchizoSPombe_ASM294v2_ensembl/BWAindex/genome.fa"
known_splicesites: "/data/repository/organisms/SchizoSPombe_ASM294v2_ensembl/ensembl/release-35/HISAT2/splice_sites.txt"
star_index: "/data/repository/organisms/SchizoSPombe_ASM294v2_ensembl/STARIndex/2.7.1a/"
star_index: "/data/repository/organisms/SchizoSPombe_ASM294v2_ensembl/STARIndex/2.7.4a/"
genes_bed: "/data/repository/organisms/SchizoSPombe_ASM294v2_ensembl/Ensembl/release-35/genes.bed"
genes_gtf: "/data/repository/organisms/SchizoSPombe_ASM294v2_ensembl/Ensembl/release-35/genes.gtf"
extended_coding_regions_gtf: "/data/repository/organisms/SchizoSPombe_ASM294v2_ensembl/Ensembl/release-35/genes.slop.gtf"
Expand Down
2 changes: 1 addition & 1 deletion snakePipes/shared/organisms/dm3.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ bowtie2_index: "/data/repository/organisms/dm3_ensembl/BowtieIndex/genome"
hisat2_index: "/data/repository/organisms/dm3_ensembl/HISAT2Index/genome"
bwa_index: "/data/repository/organisms/dm3_ensembl/BWAindex/genome.fa"
known_splicesites: "/data/repository/organisms/dm3_ensembl/ensembl/release-78/HISAT2/splice_sites.txt"
star_index: "/data/repository/organisms/dm3_ensembl/STARIndex/2.7.1a/"
star_index: "/data/repository/organisms/dm3_ensembl/STARIndex/2.7.4a/"
genes_bed: "/data/repository/organisms/dm3_ensembl/Ensembl/release-78/genes.bed"
genes_gtf: "/data/repository/organisms/dm3_ensembl/Ensembl/release-78/genes.gtf"
extended_coding_regions_gtf: "/data/repository/organisms/dm3_ensembl/Ensembl/release-78/genes.slop.gtf"
Expand Down
2 changes: 1 addition & 1 deletion snakePipes/shared/organisms/dm6.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ bowtie2_index: "/data/repository/organisms/dm6_ensembl/BowtieIndex/genome"
hisat2_index: "/data/repository/organisms/dm6_ensembl/HISAT2Index/genome"
bwa_index: "/data/repository/organisms/dm6_ensembl/BWAindex/genome.fa"
known_splicesites: "/data/repository/organisms/dm6_ensembl/ensembl/release-79/HISAT2/splice_sites.txt"
star_index: "/data/repository/organisms/dm6_ensembl/STARIndex/2.7.1a/"
star_index: "/data/repository/organisms/dm6_ensembl/STARIndex/2.7.4a/"
genes_bed: "/data/repository/organisms/dm6_ensembl/Ensembl/release-79/genes.bed"
genes_gtf: "/data/repository/organisms/dm6_ensembl/Ensembl/release-79/genes.gtf"
extended_coding_regions_gtf: "/data/repository/organisms/dm6_ensembl/Ensembl/release-79/genes.slop.gtf"
Expand Down
2 changes: 1 addition & 1 deletion snakePipes/shared/organisms/hg38.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ bowtie2_index: "/data/repository/organisms/GRCh38_ensembl/BowtieIndex/genome"
hisat2_index: "/data/repository/organisms/GRCh38_ensembl/HISAT2Index/genome"
bwa_index: "/data/repository/organisms/GRCh38_ensembl/BWAindex/genome.fa"
known_splicesites: "/data/repository/organisms/GRCh38_ensembl/gencode/release_27/HISAT2/splice_sites.txt"
star_index: "/data/repository/organisms/GRCh38_ensembl/STARIndex/2.7.1a/"
star_index: "/data/repository/organisms/GRCh38_ensembl/STARIndex/2.7.4a/"
genes_bed: "/data/repository/organisms/GRCh38_ensembl/gencode/release_27/genes.bed"
genes_gtf: "/data/repository/organisms/GRCh38_ensembl/gencode/release_27/genes.gtf"
extended_coding_regions_gtf: "/data/repository/organisms/GRCh38_ensembl/gencode/release_27/genes.slop.gtf"
Expand Down
2 changes: 1 addition & 1 deletion snakePipes/shared/organisms/hs37d5.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ bowtie2_index: "/data/repository/organisms/hs37d5_ensembl/BowtieIndex/genome"
hisat2_index: "/data/repository/organisms/hs37d5_ensembl/HISAT2Index/genome"
bwa_index: "/data/repository/organisms/hs37d5_ensembl/BWAindex/genome.fa"
known_splicesites: "/data/repository/organisms/hs37d5_ensembl/gencode/release_19/HISAT2/splice_sites.txt"
star_index: "/data/repository/organisms/hs37d5_ensembl/STARIndex/2.7.1a/"
star_index: "/data/repository/organisms/hs37d5_ensembl/STARIndex/2.7.4a/"
genes_bed: "/data/repository/organisms/hs37d5_ensembl/gencode/release_19/genes.bed"
genes_gtf: "/data/repository/organisms/hs37d5_ensembl/gencode/release_19/genes.gtf"
extended_coding_regions_gtf: "/data/repository/organisms/hs37d5_ensembl/gencode/release_19/genes.slop.gtf"
Expand Down
2 changes: 1 addition & 1 deletion snakePipes/shared/organisms/mm10_gencodeM13.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ bowtie2_index: "/data/repository/organisms/GRCm38_ensembl/BowtieIndex/genome"
hisat2_index: "/data/repository/organisms/GRCm38_ensembl/HISAT2Index/genome"
bwa_index: "/data/repository/organisms/GRCm38_ensembl/BWAindex/genome.fa"
known_splicesites: "/data/repository/organisms/GRCm38_ensembl/gencode/m13/HISAT2/splice_sites.txt"
star_index: "/data/repository/organisms/GRCm38_ensembl/STARIndex/2.7.1a/"
star_index: "/data/repository/organisms/GRCm38_ensembl/STARIndex/2.7.4a/"
genes_bed: "/data/repository/organisms/GRCm38_ensembl/gencode/m13/genes.bed"
genes_gtf: "/data/repository/organisms/GRCm38_ensembl/gencode/m13/genes.gtf"
extended_coding_regions_gtf: "/data/repository/organisms/GRCm38_ensembl/gencode/m13/genes.slop.gtf"
Expand Down
2 changes: 1 addition & 1 deletion snakePipes/shared/organisms/mm9.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ bowtie2_index: "/data/repository/organisms/GRCm37_ensembl/BowtieIndex/genome"
hisat2_index: "/data/repository/organisms/GRCm37_ensembl/HISAT2Index/genome"
bwa_index: "/data/repository/organisms/GRCm37_ensembl/BWAindex/genome.fa"
known_splicesites: "/data/repository/organisms/GRCm37_ensembl/gencode/m1/HISAT2/splice_sites.txt"
star_index: "/data/repository/organisms/GRCm37_ensembl/STARIndex/2.7.1a/"
star_index: "/data/repository/organisms/GRCm37_ensembl/STARIndex/2.7.4a/"
genes_bed: "/data/repository/organisms/GRCm37_ensembl/gencode/m1/genes.bed"
genes_gtf: "/data/repository/organisms/GRCm37_ensembl/gencode/m1/genes.gtf"
extended_coding_regions_gtf: "/data/repository/organisms/GRCm37_ensembl/gencode/m1/genes.slop.gtf"
Expand Down

0 comments on commit 4b961d8

Please sign in to comment.