Pipeline ignores --kallisto_index #21

redst4r · 2019-12-21T03:28:59Z

Hi,

I'm trying to run kallisto, using a precomputed index, but it keeps failing.

nextflow run nf-core/scrnaseq --reads 'fastq/*_R{1,2}_001.fastq.gz' \
   --aligner "kallisto" \
   --kallisto_gene_map resources/transcripts_to_genes.txt \
   --kallisto_index resources/Homo_sapiens.GRCh38.cdna.all.idx \
   --chemistry "V2" \
   --barcode_whitelist resources/10xv2_whitelist.txt \
   --outdir results \
   --type 10x \
   -profile docker

I get Must provide either a GTF file ('--gtf') or transcript to gene mapping ('--txp2gene') to align with Alevin, which is slightly weird since im not trying to run Alevin.

Adding --gtf resources/Homo_sapiens.GRCh38.96.gtf, gives me Neither of --fasta or --transcriptome provided! At least one must be provided to quantify genes

Adding --transcriptome_fasta resources/Homo_sapiens.GRCh38.cdna.all.fa.gz runs into this issue #20

Anyways, neither --gtf nor --transcriptome_fasta should be needed for kallisto with a precomputed index!

I tried to backtrack the issue in the main.nf but no luck, the logic is quite complicated (I just started using nextflow two days ago)

The text was updated successfully, but these errors were encountered:

apeltzer · 2019-12-21T10:39:46Z

Pull Request #22 should fix this issue - you can for example test using this here and replacing [...] with your parameters of choice:

nextflow run apeltzer/scrnaseq -r dev [....]

redst4r · 2019-12-23T19:24:24Z

Hi,

thanks for looking into this! Unfortunately with nextflow run apeltzer/scrnaseq -r dev [....] I now get the following error:

Launching `apeltzer/scrnaseq` [astonishing_sanger] - revision: 03b29169ca [dev]
  nf-core/scrnaseq v1.0.1dev
----------------------------------------------------
Pipeline Release  : dev
Run Name          : astonishing_sanger
Reads             : 
fastq_path/20190805_A1_S2*_R{1,2}_001.fastq.gz
GTF Reference     : false
Save Reference?   : false
Aligner           : kallisto
Kallisto Index    :resources/Homo_sapiens.GRCh38.cdna.all.idx
Droplet Technology: 10x
Chemistry Version : V2
Kallisto Gene Map : resources/transcripts_to_genes.txt
BUSTools Correct  : true
Max Resources     : 128 GB memory, 16 cpus, 10d time per job
Container         : docker - nfcore/scrnaseq:dev
Output dir        : nf_pipeline_results
Script dir        : ~/.nextflow/assets/apeltzer/scrnaseq
Config Profile    : docker
----------------------------------------------------
[-        ] process > get_software_versions -
[-        ] process > unzip_10x_barcodes    -
No such variable: gtf_extract_transcriptome

 -- Check script '.nextflow/assets/apeltzer/scrnaseq/main.nf' at line: 324 or see '.nextflow.log' file for more details

Seems like the gtf_extract_transcriptome channel never gets defined if no --gtf is specified.
Oddly enough, I dont even see where the extract_transcriptome process (where gtf_extract_transcriptome is used) would be called (or used as an input) in the entire script

redst4r · 2019-12-24T02:40:17Z

in addition, if I specify some gtf file via --gtf, the pipeline finishes almost immediately now, just executing get_software_versions, multiqc and output_documentation:

executor >  local (3)
[cc/67d864] process > get_software_versions    [100%] 1 of 1 ✔
[-        ] process > unzip_10x_barcodes       -
[-        ] process > extract_transcriptome    -
[-        ] process > build_salmon_index       -
[-        ] process > makeSTARindex            -
[-        ] process > build_kallisto_index     -
[-        ] process > build_gene_map           -
[-        ] process > build_txp2gene           -
[-        ] process > alevin                   -
[-        ] process > alevin_qc                -
[-        ] process > star                     -
[-        ] process > kallisto                 -
[-        ] process > bustools_correct_sort    -
[-        ] process > bustools_count           -
[-        ] process > bustools_inspect         -
[1b/3c0d46] process > multiqc (1)              [100%] 1 of 1 ✔
[68/89d65f] process > output_documentation (1) [100%] 1 of 1 ✔
[0;35m[nf-core/scrnaseq] Pipeline completed successfully

looks like kallisto never gets triggered!

apeltzer · 2020-01-12T09:41:58Z

The first part is expected and intentional: If you don't specify a GTF, there is no way to extract the transcriptome as this annotation is used to find out what are the exons on the selected genome FastA.

Double checking the processes, this is also correctly configured:

 nextflow run apeltzer/scrnaseq -r dev -profile test,docker --aligner kallisto -resume --fasta https://github.com/nf-core/test-datasets/raw/scrnaseq/reference/GRCm38.p6.genome.chr19.fa --gtf https://github.com/nf-core/test-datasets/raw/scrnaseq/reference/gencode.vM19.annotation.chr19.gtf

This should work ...?

This runs through and produces

galanisl · 2020-02-12T14:56:44Z

What I see is that the index has to be specified as a directory rather than a .idx file:

--kallisto_index /path/to/index/kallisto_index

However, if the index is specified the pipeline doesn't execute the following processes:

bustools_correct_sort
bustools_count
bustools_inspect

winni2k · 2020-02-13T15:29:45Z

I get the No such variable: gtf_extract_transcriptome error as well. When I try the test code (#21 (comment)), then I get the following output and error:

nextflow run apeltzer/scrnaseq -r dev -profile test,docker --aligner kallisto -resume --fasta https:
//github.com/nf-core/test-datasets/raw/scrnaseq/reference/GRCm38.p6.genome.chr19.fa --gtf https://github.com/nf-core/test-datasets/raw/scrnaseq/reference/
gencode.vM19.annotation.chr19.gtf
N E X T F L O W  ~  version 20.01.0
Launching `apeltzer/scrnaseq` [insane_mccarthy] - revision: b953dac979 [dev]
[2m----------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/scrnaseq v1.0.1dev
----------------------------------------------------
Pipeline Release  : dev
Run Name          : insane_mccarthy
Reads             : data/*{1,2}.fastq.gz
Genome Reference  : https://github.com/nf-core/test-datasets/raw/scrnaseq/reference/GRCm38.p6.genome.chr19.fa
GTF Reference     : https://github.com/nf-core/test-datasets/raw/scrnaseq/reference/gencode.vM19.annotation.chr19.gtf
Save Reference?   : false
Aligner           : kallisto
Droplet Technology: 10x
Chemistry Version : V3
Kallisto Gene Map : false
BUSTools Correct  : true
Max Resources     : 6 GB memory, 2 cpus, 2d time per job
Container         : docker - nfcore/scrnaseq:dev
Output dir        : ./results
Launch dir        : /tmp/tmp.aPO3eT73ZU
Working dir       : /tmp/tmp.aPO3eT73ZU/work
Script dir        : /home/warkre/.nextflow/assets/apeltzer/scrnaseq
User              : warkre
Config Profile    : test,docker
Config Description: Minimal test dataset to check pipeline function
----------------------------------------------------
executor >  local (10)
executor >  local (10)
executor >  local (10)
executor >  local (10)
[16/0befac] process > get_software_versions [100%] 1 of 1 ✔
[2c/ab2dc0] process > unzip_10x_barcodes    [100%] 1 of 1 ✔
[77/637742] process > extract_transcriptome [100%] 1 of 1 ✔
[-        ] process > build_salmon_index    -
[-        ] process > makeSTARindex         -
[24/81cf56] process > build_kallisto_index  [100%] 1 of 1 ✔
[77/5b2ed7] process > build_gene_map        [100%] 1 of 1 ✔
[-        ] process > build_txp2gene        -
[-        ] process > alevin                -
[-        ] process > alevin_qc             -
[-        ] process > star                  -
[15/1d4072] process > kallisto              [100%] 1 of 1 ✔
[4c/4d7795] process > bustools_correct_sort [100%] 2 of 2, failed: 2, retries: 1 ✘
[-        ] process > bustools_count        -
[-        ] process > bustools_inspect      -
[36/c05311] process > multiqc               [100%] 1 of 1 ✔
[3d/f77b23] process > output_documentation  [100%] 1 of 1 ✔
[0;35m[nf-core/scrnaseq] Pipeline completed with errors
WARN: Access to undefined parameter `skip_bustools` -- Initialise it to a default value eg. `params.skip_bustools = some_value`
[c1/49fe9e] NOTE: Process `bustools_correct_sort (S10_L001_bus_output)` terminated with an error exit status (139) -- Execution is retried (1)
Error executing process > 'bustools_correct_sort (S10_L001_bus_output)'

Caused by:
  Process `bustools_correct_sort (S10_L001_bus_output)` terminated with an error exit status (139)

Command executed:

  bustools correct -w 10x_V3_barcode_whitelist -o S10_L001_bus_output/output.corrected.bus S10_L001_bus_output/output.bus
  mkdir -p tmp
  bustools sort -T tmp/ -t 2 -m 6G -o S10_L001_bus_output/output.corrected.sort.bus S10_L001_bus_output/output.corrected.bus

Command exit status:
  139

Command output:
  (empty)

Command error:
  WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
  Found 6794880 barcodes in the whitelist
  Processed 0 bus records
  In whitelist = 0
  Corrected = 0
  Uncorrected = 0
  Read in 0 BUS records
  .command.sh: line 4:   409 Segmentation fault      (core dumped) bustools sort -T tmp/ -t 2 -m 6G -o S10_L001_bus_output/output.corrected.sort.bus S10_L
001_bus_output/output.corrected.bus

Work dir:
  /tmp/tmp.aPO3eT73ZU/work/4c/4d779592ad3f6672d12d0ccd1e1449

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

apeltzer · 2020-02-14T10:16:06Z

I should probably sit down with this and provide a complete fix with additional tests for supplied indices (which isn't covered entirely so far...)

winni2k · 2020-02-14T10:45:15Z

Might be a good idea...

But I think the problem here is a bit broader than just the --kallisto_index argument not working. I have poked around a little with different command line arguments, and I can't figure out even a single way to get this pipeline to run using just kallisto?

Also, it appears from further work that the latest version of kallisto (0.46.2) is broken. When I follow the kallisto tutorial, then things appear to run through ok with kallisto v0.46.1.

apeltzer · 2020-02-28T19:37:07Z

Thanks for the notification for Kallisto 0.46.2 being broken. The tests were probably running at some point just fine (as tested by multiple tests...) but were then broken in attempts to fix some cases where kallisto wasn't running inside the tests at all unfortunately.

I unfortunately also experienced memory issues on TravisCI, which might be overcome now that we switched entirely to running on GitHub Actions. Lets see, currently going through 250+ github issues / things :-(

grst · 2022-03-07T20:27:45Z

This should be fixed in the latest dev version.

apeltzer added the bug Something isn't working label Dec 21, 2019

apeltzer added this to the 1.0.1 milestone Dec 21, 2019

apeltzer self-assigned this Dec 21, 2019

apeltzer added a commit to apeltzer/scrnaseq that referenced this issue Dec 21, 2019

Fix issue nf-core#21 , supply index

2c19f24

apeltzer mentioned this issue Dec 21, 2019

PR for 1.0.1 Bugfixes #22

Merged

8 tasks

apeltzer mentioned this issue Mar 6, 2020

Add FastQC support. Remove multiqc_config.yaml file from conf folder. #32

Closed

8 tasks

grst added this to Done in scrnaseq Mar 7, 2022

grst moved this from Done to Waiting for feedback in scrnaseq Mar 7, 2022

apeltzer closed this as completed Jun 8, 2022

scrnaseq automation moved this from Waiting for feedback to Done Jun 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pipeline ignores --kallisto_index #21

Pipeline ignores --kallisto_index #21

redst4r commented Dec 21, 2019

apeltzer commented Dec 21, 2019

redst4r commented Dec 23, 2019

redst4r commented Dec 24, 2019

apeltzer commented Jan 12, 2020

galanisl commented Feb 12, 2020

winni2k commented Feb 13, 2020

apeltzer commented Feb 14, 2020

winni2k commented Feb 14, 2020 •

edited

Loading

apeltzer commented Feb 28, 2020

grst commented Mar 7, 2022

Pipeline ignores --kallisto_index #21

Pipeline ignores --kallisto_index #21

Comments

redst4r commented Dec 21, 2019

apeltzer commented Dec 21, 2019

redst4r commented Dec 23, 2019

redst4r commented Dec 24, 2019

apeltzer commented Jan 12, 2020

galanisl commented Feb 12, 2020

winni2k commented Feb 13, 2020

apeltzer commented Feb 14, 2020

winni2k commented Feb 14, 2020 • edited Loading

apeltzer commented Feb 28, 2020

grst commented Mar 7, 2022

winni2k commented Feb 14, 2020 •

edited

Loading