Skip to content

File Naming

Rohit Suratekar edited this page Sep 26, 2020 · 2 revisions

Use of snakemake provides flexibility in starting pipeline from any step and it will automatically generate required input files. However, this put constraints on how files will be named in this analysis. In case one want to use their own input, they MUST use input file name according to following rules.

For example, if you already have FastQ files which are not available on SRA archive, you must rename your files according to table provided below and put them in respective path. Let us say you have FastQ files for two paired-end samples, you may name them as sample1.sra_1.fastq, sample1.sra_2.fastq and sample2.sra_1.fastq, sample2.sra_2.fastq. You can keep them in base/fastq folder. Doing so, this pipeline will perform task considering SRR_ID as sample1 and sample2. You should also change respective details in your samples.csv file. For this case, it will be following

run,is_paired,condition
sample1,true,condition1
sample2,true,condition2

Here repository/ is the path to downloaded CardioPipeLine repository and base/ is the base folder path defined in your config.yaml file.

File Name Path Note
config.yaml repository/config Main configuration file
samples.csv repository/config Samples information
SRR_ID.sra base/sra SRA file
SRR_ID.sra.fastq base/fastq Reads for Single Layout
SRR_ID.sra_1.fastq base/fastq Forward Reads for Paired Layout
SRR_ID.sra_2.fastq base/fastq Reverse Reads for Paired Layout
SRR_ID.sra.filtered.fastq base/filtered rRNA filtered reads for Single Layout
SRR_ID.sra.filtered_1.fastq base/filtered Forward rRNA filtered reads for Paired Layout
SRR_ID.sra.filtered_2.fastq base/filtered Reverse rRNA filtered reads for Paired Layout
SRR_ID.sra.Aligned.sortedByCoord.out.bam base/bams Aligned and sorted BAM file
SAindex + other files base/index/star Index files for STAR
*.stats, *.dat base/index/sortmerna/idx Index files for SortMeRNA
pos.bin + other files base/index/salmon Index files for Salmon
kallisto.idx base/index/kallisto Index file for Kallisto
quant.sf base/mappings/salmon/SRR_ID Quantification with Salmon
abundance.tsv base/mappings/kallisto/SRR_ID Quantification with Kallisto
SRR_ID_gene_expression.tsv base/mappings/stringtie/SRR_ID Quantification with StringTie
SRR_ID.stringtie.counts base/deseq2/counts/SRR_ID Raw counts generated from StringTie output
SRR_ID.star.counts base/deseq2/counts/SRR_ID Raw counts generated from BAM file
SRR_ID.salmon.counts base/deseq2/counts/SRR_ID Raw counts generated from Salmon output
SRR_ID.kallisto.counts base/deseq2/counts/SRR_ID Raw counts generated from Kallisto output
combined.counts base/deseq2/analysis/star Count Matrix generated from SRR_ID.star.counts for all samples
combined.counts base/deseq2/analysis/salmon Count Matrix generated from SRR_ID.salmon.counts for all samples
combined.counts base/deseq2/analysis/stringtie Count Matrix generated from SRR_ID.stringtie.counts for all samples
combined.counts base/deseq2/analysis/kallisto Count Matrix generated from SRR_ID.kallisto.counts for all samples

Deseq2 output files will be named based on conditions provided in config.yaml file. Let us consider following samples.csv file

run,is_paired,time
SRR0000001,true,48
SRR0000002,false,24

and following are options in config.yaml

deseq2:
    design_column: "time"
    reference: "24"
    design: "~ condition"
  counts:
    star: "true"
    salmon: "true"
    stringtie: "true"
    kallisto: "true"

In above case, following output files will be generated

File Name Path Comment
star_48_vs_24.csv base/deseq2/analysis/star DE analysis based on STAR output
stringtie_48_vs_24.csv base/deseq2/analysis/stringtie DE analysis based on StringTie output
salmon_48_vs_24.csv base/deseq2/analysis/salmon DE analysis based on Salmon output
kallisto_48_vs_24.csv base/deseq2/analysis/kallisto DE analysis based on Kallisto output
Clone this wiki locally