If provide genome by fasta and gtf, workflow will try to generate genome index and gene bed if they are not provided. It works fine when fasta and gtf are uncompressed. However, if any one of fasta and gtf is compressed (in gz format in my case), and the corresponding downstream file (i.e. genome index or gene bed) is not provided, I will get the following error:
Not a valid path value: '../genome/Mus_musculus.GRCm38.dna_sm.primary_assembly.fa.gz'
Not a valid path value: '../genome/Mus_musculus.GRCm38.102.gtf.gz'
My params.yaml is
input: './samplesheet.csv'
outdir: './results/'
fasta: '../genome/Mus_musculus.GRCm38.dna_sm.primary_assembly.fa.gz'
gtf: '../genome/Mus_musculus.GRCm38.102.gtf.gz'
narrow_peak: true
aligner: 'bowtie2'
read_length: 150
max_memory: '50.GB'
max_cpus: 24
save_reference: true
My command is
HTTPS_PROXY=http://localhost:1081 nextflow run $PWD/../pipeline/nf-core-chipseq/2_1_0/main.nf -profile singularity -resume -params-file params.yaml
Both should be right because it works fine if I uncompress 'Mus_musculus.GRCm38.dna_sm.primary_assembly.fa.gz' and 'Mus_musculus.GRCm38.102.gtf'.
I also find some related issues in other nf-core workflows:
https://github.com/nf-core/rnaseq/issues/1311
https://github.com/nf-core/atacseq/issues/277
https://github.com/nf-core/cutandrun/issues/187
Thank you.