-
Notifications
You must be signed in to change notification settings - Fork 0
File Naming
Use of snakemake
provides flexibility in starting pipeline from any step and it will automatically generate required input files. However, this put constraints on how files will be named in this analysis. In case one want to use their own input, they MUST use input file name according to following rules.
For example, if you already have FastQ files which are not available on SRA archive, you must rename your files according to table provided below and put them in respective path. Let us say you have FastQ files for two paired-end samples, you may name them as sample1.sra_1.fastq
, sample1.sra_2.fastq
and sample2.sra_1.fastq
, sample2.sra_2.fastq
. You can keep them in base/fastq
folder. Doing so, this pipeline will perform task considering SRR_ID as sample1
and sample2
. You should also change respective details in your samples.csv
file. For this case, it will be following
run,is_paired,condition
sample1,true,condition1
sample2,true,condition2
Here repository/
is the path to downloaded CardioPipeLine
repository and base/
is the base folder path defined in your config.yaml
file.
File Name | Path | Note |
---|---|---|
config.yaml | repository/config | Main configuration file |
samples.csv | repository/config | Samples information |
SRR_ID.sra | base/sra | SRA file |
SRR_ID.sra.fastq | base/fastq | Reads for Single Layout |
SRR_ID.sra_1.fastq | base/fastq | Forward Reads for Paired Layout |
SRR_ID.sra_2.fastq | base/fastq | Reverse Reads for Paired Layout |
SRR_ID.sra.filtered.fastq | base/filtered | rRNA filtered reads for Single Layout |
SRR_ID.sra.filtered_1.fastq | base/filtered | Forward rRNA filtered reads for Paired Layout |
SRR_ID.sra.filtered_2.fastq | base/filtered | Reverse rRNA filtered reads for Paired Layout |
SRR_ID.sra.Aligned.sortedByCoord.out.bam | base/bams | Aligned and sorted BAM file |
SAindex + other files | base/index/star | Index files for STAR |
*.stats, *.dat | base/index/sortmerna/idx | Index files for SortMeRNA |
pos.bin + other files | base/index/salmon | Index files for Salmon |
kallisto.idx | base/index/kallisto | Index file for Kallisto |
quant.sf | base/mappings/salmon/SRR_ID | Quantification with Salmon |
abundance.tsv | base/mappings/kallisto/SRR_ID | Quantification with Kallisto |
SRR_ID_gene_expression.tsv | base/mappings/stringtie/SRR_ID | Quantification with StringTie |
SRR_ID.stringtie.counts | base/deseq2/counts/SRR_ID | Raw counts generated from StringTie output |
SRR_ID.star.counts | base/deseq2/counts/SRR_ID | Raw counts generated from BAM file |
SRR_ID.salmon.counts | base/deseq2/counts/SRR_ID | Raw counts generated from Salmon output |
SRR_ID.kallisto.counts | base/deseq2/counts/SRR_ID | Raw counts generated from Kallisto output |
combined.counts | base/deseq2/analysis/star | Count Matrix generated from SRR_ID.star.counts for all samples |
combined.counts | base/deseq2/analysis/salmon | Count Matrix generated from SRR_ID.salmon.counts for all samples |
combined.counts | base/deseq2/analysis/stringtie | Count Matrix generated from SRR_ID.stringtie.counts for all samples |
combined.counts | base/deseq2/analysis/kallisto | Count Matrix generated from SRR_ID.kallisto.counts for all samples |
Deseq2 output files will be named based on conditions provided in config.yaml
file. Let us consider following samples.csv
file
run,is_paired,time
SRR0000001,true,48
SRR0000002,false,24
and following are options in config.yaml
deseq2:
design_column: "time"
reference: "24"
design: "~ condition"
counts:
star: "true"
salmon: "true"
stringtie: "true"
kallisto: "true"
In above case, following output files will be generated
File Name | Path | Comment |
---|---|---|
star_48_vs_24.csv | base/deseq2/analysis/star | DE analysis based on STAR output |
stringtie_48_vs_24.csv | base/deseq2/analysis/stringtie | DE analysis based on StringTie output |
salmon_48_vs_24.csv | base/deseq2/analysis/salmon | DE analysis based on Salmon output |
kallisto_48_vs_24.csv | base/deseq2/analysis/kallisto | DE analysis based on Kallisto output |