Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError 'gene_id' in salmon_tx2gene.py #720

Closed
4 tasks done
didillysquat opened this issue Nov 4, 2021 · 2 comments
Closed
4 tasks done

KeyError 'gene_id' in salmon_tx2gene.py #720

didillysquat opened this issue Nov 4, 2021 · 2 comments
Labels
bug Something isn't working WIP Work in progress
Milestone

Comments

@didillysquat
Copy link

Check Documentation

I have checked the following places for your error:

Description of the bug

The salmon_tx2gene.py script is throwing an error of

Traceback (most recent call last):
    File "/home/humebc/.nextflow/assets/nf-core/rnaseq/bin/salmon_tx2gene.py", line 88, in <module>
      tx2gene(args.gtf, args.salmon, args.id, args.extra, args.output)
    File "/home/humebc/.nextflow/assets/nf-core/rnaseq/bin/salmon_tx2gene.py", line 47, in tx2gene
      gene_dict[attr_dict[gene_id]].append(attr_dict)
  KeyError: 'gene_id'

This is being caused due to the fact that the "gene_id" key is not present in the key-value pairings of the .gtf file being used as input. This is only the case for a small number of the lines. For the majority, "gene_id" is present. I originally gave your pipeline a GFF file as input. The GFF file is this one: GCF_017654675.1_Xenopus_laevis_v10.1_genomic.gff.gz.

The offending line in the salmon_tx2gene.py is this one:

gene_dict[attr_dict[gene_id]].append(attr_dict)

Because the "gene_id" key is present in the vast majority of the lines, this line can be surrounded in a try: except: and the script will complete with no issue. E.g.:

try:
    gene_dict[attr_dict[gene_id]].append(attr_dict)
except KeyError:
    continue

Alternatively, i suppose this problem could be fixed in the part of the workflow that is responsible for converting the GFF file into a .gtf file by ensuring that every output line has a "gene_id" key value pairing.

Steps to reproduce

Steps to reproduce the behaviour:

  1. Command line:
nextflow -c nextflow.config.heim run nf-core/rnaseq --input /home/humebc/projects/20211102_heim/nf_rna_seq/heim_sample_sheet.csv --pseudo_aligner salmon --transcript_fasta /home/humebc/projects/20211102_heim/xenopus_references/GCF_017654675.1_Xenopus_laevis_v10.1_rna.fna.gz --salmon_index /home/h
umebc/projects/20211102_heim/xenopus_references/xenopus.gentrome.index -profile docker --fasta /home/humebc/projects/20211102_heim/xenopus_references/GCF_017654675.1_Xenopus_laevis_v10.1_genomic.fna.gz --gff /home/humebc/projects/20211102_heim/xenopus_references/GCF_017654675.1_Xenopus_laevis_v10.1_genomic.gff.gz
  1. See error:
    See above

Expected behaviour

I would expect the script to complete and produce the salmon_tx2gene.tsv.

nextflow.log

Log files

Have you provided the following extra information/files:

  • The command used to run the pipeline
  • The .nextflow.log file

System

  • Hardware:
    linux server
  • Executor:
  • OS:
    linux ubuntu
  • Version
    v3.3

Nextflow Installation

  • Version:
    21.04.1.5556

Container engine

  • Engine:
    -Docker
  • version:
  • Docker version 20.10.5, build 55c4c88

Additional context

@didillysquat didillysquat added the bug Something isn't working label Nov 4, 2021
@drpatelh drpatelh added this to the 3.5 milestone Dec 13, 2021
@drpatelh drpatelh added the WIP Work in progress label Dec 13, 2021
drpatelh added a commit to drpatelh/nf-core-rnaseq that referenced this issue Dec 13, 2021
@drpatelh
Copy link
Member

Added the try logic in 9c0641a @didillysquat.

The other option of checking during the conversion will be trickier I suppose because we are using GFFREAD to do that and not a custom script. If you can find a solution with that then we can try and add it but patched for now based on your initial suggestion.

Will close for now but feel free to re-open if things change. Cheers!

@didillysquat
Copy link
Author

Great. Thanks @drpatelh !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working WIP Work in progress
Projects
None yet
Development

No branches or pull requests

2 participants