Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle KeyError: transcript_id not found #52

Closed
wants to merge 2 commits into from
Closed

handle KeyError: transcript_id not found #52

wants to merge 2 commits into from

Conversation

ygidtu
Copy link

@ygidtu ygidtu commented Nov 4, 2020

  1. Genes don't have transcript_id, therefore, just try to get transcript_id from transcript and exon.
  2. add try-catch to "transcript_id" to throw a better error message
  3. using gzip to handle the gzipped gtf file

Still strong recommend using pysam to handle the IO of bam file

Copy link
Collaborator

@dgarrimar dgarrimar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this proposal is great, and with it we can solve #2 and #46. I'd just provide a meaningful error message instead of printing the GTF line. I proposed a couple of lines of code, with a structure analogous to other error messages along the script.

Comment on lines +293 to +295
except KeyError as e:
print(line)
exit(e)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
except KeyError as e:
print(line)
exit(e)
except KeyError:
print("ERROR: 'transcript_id' attribute is missing in GTF file.")
exit(1)

Maybe better if we do not print the GTF line, but rather provide a more clear error message, following this structure (analogous to the one employed for other error messages along the code).

@dgarrimar dgarrimar self-requested a review December 4, 2020 17:32
Copy link
Collaborator

@dgarrimar dgarrimar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take advantage to suggest code to fix #48

d = dict(kv.strip().split(" ") for kv in tags.strip(";").split("; "))
transcript_id = d["transcript_id"]
try:
transcript_id = d["transcript_id"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
transcript_id = d["transcript_id"]
transcript_id = re.findall('transcript_id ("[^"]+")', tags)[0]

As it affects the same portion of code I take advantage to suggest this line too, which fixes #48

@dgarrimar
Copy link
Collaborator

Hi @ygidtu, the modified proposals have been added, and will be part of the next release. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants