NFCORE_RNASEQ:RNASEQ:QUANTIFY_STAR_SALMON:TX2GENE error on iGenomes TAIR10 #1132

holmrenser · 2023-11-24T09:28:43Z

Description of the bug

I tried running nf-core/rnaseq 3.13.2 on the Arabidopsis thaliana TAIR10 genome from iGenomes and ran into an issue with the tx2gene step of processing the annotation gtf. I have used previous pipeline versions on the same genome without this issue.

Command used and terminal output

nextflow run nf-core/rnaseq --input samplesheet_full.csv --outdir mapping_full --genome TAIR10 --max_cpus 8 -profile docker -r 3.13.2

ERROR ~ Error executing process > 'NFCORE_RNASEQ:RNASEQ:QUANTIFY_STAR_SALMON:TX2GENE (genome.filtered.gtf)'

Caused by:
  Process `NFCORE_RNASEQ:RNASEQ:QUANTIFY_STAR_SALMON:TX2GENE (genome.filtered.gtf)` terminated with an error exit status (1)

Command executed:

  tx2gene.py \
      --quant_type salmon \
      --gtf genome.filtered.gtf \
      --quants quants \
      --id gene_id \
      --extra gene_name \
      -o tx2gene.tsv

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_RNASEQ:RNASEQ:QUANTIFY_STAR_SALMON:TX2GENE":
      python: $(python --version | sed 's/Python //g')
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
  Traceback (most recent call last):
    File "/home/<omitted>/.nextflow/assets/nf-core/rnaseq/bin/tx2gene.py", line 162, in <module>
      if not map_transcripts_to_gene(args.quant_type, args.gtf, args.quants, args.id, args.extra, args.output):
    File "/home/<omitted>/.nextflow/assets/nf-core/rnaseq/bin/tx2gene.py", line 122, in map_transcripts_to_gene
      transcript_attribute = discover_transcript_attribute(gtf_file, transcripts)
    File "/home/<omitted>/.nextflow/assets/nf-core/rnaseq/bin/tx2gene.py", line 59, in discover_transcript_attribute
      attributes = dict(item.strip().split(" ", 1) for item in cols[8].split(";") if item.strip())
  ValueError: dictionary update sequence element #4 has length 1; 2 is required

Work dir:
  /lustre/<omitted>/full_experiment/work/57/7fc5bc40a9829787f3723fa27f46a9

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

 -- Check '.nextflow.log' file for details

Relevant files

No response

System information

Nextflow version: 23.10.0.5889
Hardware: 96CPU server, lustre filesystem
Executor: local
Container engine: Docker
OS: Ubuntu 20.04.6
Version of nf-core/rnaseq: 3.13.2

Guy2Horev · 2023-12-19T16:46:55Z

I got the same error today after the workflow was updated.

pinin4fjords · 2024-01-03T17:05:38Z

This is genuinely a bad GTF file rather than a pipeline issue: there's a semicolon in one of the gene names, specifically at line 33090

1   ensembl CDS 4810488 4811109 .   +   0   exon_number "1"; gene_biotype "protein_coding"; gene_id "AT1G14040"; gene_name "PHO1;H3"; gene_source "ensembl"; gene_version "1"; p_id "P3587"; protein_id "AT1G14040.1"; protein_version "1"; transcript_biotype "protein_coding"; transcript_id "AT1G14040.1"; transcript_name "PHO1;H3"; transcript_source "ensembl"; transcript_version "1"; tss_id "TSS29975";

"PHO1;H3" is not a good value and it's upsetting the parsing of the semicolon-delimited attributes field.

It wasn't an issue previously because we didn't sample enough lines (which @MatthiasZepper fixed).

I'll try to add something to skip a limited number of bad lines (we don't need them all for this part of the code). In the meantime I recommend you review our guidelines on reference file usage- you really are better off using more recent files from Ensembl (and you can complain to Ensembl about invalid formatting like this).

pinin4fjords · 2024-01-03T17:21:25Z

Actually, I think I can do better and allow those semicolons- PR incoming. I still maintain they're a silly idea though...

drpatelh · 2024-01-03T18:39:38Z

Fixed in #1150

holmrenser added the bug Something isn't working label Nov 24, 2023

drpatelh added this to the 3.13.3 milestone Jan 3, 2024

drpatelh assigned pinin4fjords Jan 3, 2024

pinin4fjords mentioned this issue Jan 3, 2024

Be more flexible on attribute values in GTFs #1150

Merged

10 tasks

pinin4fjords linked a pull request Jan 3, 2024 that will close this issue

Be more flexible on attribute values in GTFs #1150

Merged

10 tasks

drpatelh closed this as completed Jan 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NFCORE_RNASEQ:RNASEQ:QUANTIFY_STAR_SALMON:TX2GENE error on iGenomes TAIR10 #1132

NFCORE_RNASEQ:RNASEQ:QUANTIFY_STAR_SALMON:TX2GENE error on iGenomes TAIR10 #1132

holmrenser commented Nov 24, 2023 •

edited

Guy2Horev commented Dec 19, 2023

pinin4fjords commented Jan 3, 2024

pinin4fjords commented Jan 3, 2024

drpatelh commented Jan 3, 2024

NFCORE_RNASEQ:RNASEQ:QUANTIFY_STAR_SALMON:TX2GENE error on iGenomes TAIR10 #1132

NFCORE_RNASEQ:RNASEQ:QUANTIFY_STAR_SALMON:TX2GENE error on iGenomes TAIR10 #1132

Comments

holmrenser commented Nov 24, 2023 • edited

Description of the bug

Command used and terminal output

Relevant files

System information

Guy2Horev commented Dec 19, 2023

pinin4fjords commented Jan 3, 2024

pinin4fjords commented Jan 3, 2024

drpatelh commented Jan 3, 2024

holmrenser commented Nov 24, 2023 •

edited