File Naming

Use of snakemake provides flexibility in starting pipeline from any step and it will automatically generate required input files. However, this put constraints on how files will be named in this analysis. In case one want to use their own input, they MUST use input file name according to following rules.

For example, if you already have FastQ files which are not available on SRA archive, you must rename your files according to table provided below and put them in respective path. Let us say you have FastQ files for two paired-end samples, you may name them as sample1.sra_1.fastq, sample1.sra_2.fastq and sample2.sra_1.fastq, sample2.sra_2.fastq. You can keep them in base/fastq folder. Doing so, this pipeline will perform task considering SRR_ID as sample1 and sample2. You should also change respective details in your samples.csv file. For this case, it will be following

run,is_paired,condition
sample1,true,condition1
sample2,true,condition2

Here repository/ is the path to downloaded CardioPipeLine repository and base/ is the base folder path defined in your config.yaml file.

File Name	Path	Note
config.yaml	repository/config	Main configuration file
samples.csv	repository/config	Samples information
SRR_ID.sra	base/sra	SRA file
SRR_ID.sra.fastq	base/fastq	Reads for Single Layout
SRR_ID.sra_1.fastq	base/fastq	Forward Reads for Paired Layout
SRR_ID.sra_2.fastq	base/fastq	Reverse Reads for Paired Layout
SRR_ID.sra.filtered.fastq	base/filtered	rRNA filtered reads for Single Layout
SRR_ID.sra.filtered_1.fastq	base/filtered	Forward rRNA filtered reads for Paired Layout
SRR_ID.sra.filtered_2.fastq	base/filtered	Reverse rRNA filtered reads for Paired Layout
SRR_ID.sra.Aligned.sortedByCoord.out.bam	base/bams	Aligned and sorted BAM file
SAindex + other files	base/index/star	Index files for STAR
.stats, .dat	base/index/sortmerna/idx	Index files for SortMeRNA
pos.bin + other files	base/index/salmon	Index files for Salmon
kallisto.idx	base/index/kallisto	Index file for Kallisto
quant.sf	base/mappings/salmon/SRR_ID	Quantification with Salmon
abundance.tsv	base/mappings/kallisto/SRR_ID	Quantification with Kallisto
SRR_ID_gene_expression.tsv	base/mappings/stringtie/SRR_ID	Quantification with StringTie
SRR_ID.stringtie.counts	base/deseq2/counts/SRR_ID	Raw counts generated from StringTie output
SRR_ID.star.counts	base/deseq2/counts/SRR_ID	Raw counts generated from BAM file
SRR_ID.salmon.counts	base/deseq2/counts/SRR_ID	Raw counts generated from Salmon output
SRR_ID.kallisto.counts	base/deseq2/counts/SRR_ID	Raw counts generated from Kallisto output
combined.counts	base/deseq2/analysis/star	Count Matrix generated from SRR_ID.star.counts for all samples
combined.counts	base/deseq2/analysis/salmon	Count Matrix generated from SRR_ID.salmon.counts for all samples
combined.counts	base/deseq2/analysis/stringtie	Count Matrix generated from SRR_ID.stringtie.counts for all samples
combined.counts	base/deseq2/analysis/kallisto	Count Matrix generated from SRR_ID.kallisto.counts for all samples

Deseq2 output files will be named based on conditions provided in config.yaml file. Let us consider following samples.csv file

run,is_paired,time
SRR0000001,true,48
SRR0000002,false,24

and following are options in config.yaml

deseq2:
    design_column: "time"
    reference: "24"
    design: "~ condition"
  counts:
    star: "true"
    salmon: "true"
    stringtie: "true"
    kallisto: "true"

In above case, following output files will be generated

File Name	Path	Comment
star_48_vs_24.csv	base/deseq2/analysis/star	DE analysis based on STAR output
stringtie_48_vs_24.csv	base/deseq2/analysis/stringtie	DE analysis based on StringTie output
salmon_48_vs_24.csv	base/deseq2/analysis/salmon	DE analysis based on Salmon output
kallisto_48_vs_24.csv	base/deseq2/analysis/kallisto	DE analysis based on Kallisto output

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

File Naming

Clone this wiki locally