handle KeyError: transcript_id not found #52

ygidtu · 2020-11-04T02:19:28Z

Genes don't have transcript_id, therefore, just try to get transcript_id from transcript and exon.
add try-catch to "transcript_id" to throw a better error message
using gzip to handle the gzipped gtf file

Still strong recommend using pysam to handle the IO of bam file

dgarrimar

I think this proposal is great, and with it we can solve #2 and #46. I'd just provide a meaningful error message instead of printing the GTF line. I proposed a couple of lines of code, with a structure analogous to other error messages along the script.

dgarrimar · 2020-12-04T17:17:05Z

sashimi-plot.py

+                        except KeyError as e:
+                                print(line)
+                                exit(e)


Suggested change

except KeyError as e:

print(line)

exit(e)

except KeyError:

print("ERROR: 'transcript_id' attribute is missing in GTF file.")

exit(1)

Maybe better if we do not print the GTF line, but rather provide a more clear error message, following this structure (analogous to the one employed for other error messages along the code).

dgarrimar

I take advantage to suggest code to fix #48

dgarrimar · 2020-12-04T17:34:41Z

sashimi-plot.py

                        d = dict(kv.strip().split(" ") for kv in tags.strip(";").split("; "))
-                        transcript_id = d["transcript_id"]
+                        try:
+                                transcript_id = d["transcript_id"]


Suggested change

transcript_id = d["transcript_id"]

transcript_id = re.findall('transcript_id ("[^"]+")', tags)[0]

As it affects the same portion of code I take advantage to suggest this line too, which fixes #48

dgarrimar · 2021-01-22T14:48:05Z

Hi @ygidtu, the modified proposals have been added, and will be part of the next release. Thank you!

ygidtu added 2 commits November 4, 2020 10:11

handle transcript_id key error

ea04d3a

using gzip handle bgzipped gtf file

c1b0db9

emi80 requested review from abreschi and dgarrimar December 4, 2020 16:07

dgarrimar requested changes Dec 4, 2020

View reviewed changes

dgarrimar self-requested a review December 4, 2020 17:32

dgarrimar requested changes Dec 4, 2020

View reviewed changes

ygidtu closed this Dec 7, 2020

dgarrimar mentioned this pull request Jan 21, 2021

KeyError: 'transcript_id' with Ensemble human annotation #2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

handle KeyError: transcript_id not found #52

handle KeyError: transcript_id not found #52

ygidtu commented Nov 4, 2020

dgarrimar left a comment

dgarrimar Dec 4, 2020

dgarrimar left a comment

dgarrimar Dec 4, 2020

dgarrimar commented Jan 22, 2021

	transcript_id = d["transcript_id"]
	transcript_id = re.findall('transcript_id ("[^"]+")', tags)[0]

handle KeyError: transcript_id not found #52

handle KeyError: transcript_id not found #52

Conversation

ygidtu commented Nov 4, 2020

dgarrimar left a comment

Choose a reason for hiding this comment

dgarrimar Dec 4, 2020

Choose a reason for hiding this comment

dgarrimar left a comment

Choose a reason for hiding this comment

dgarrimar Dec 4, 2020

Choose a reason for hiding this comment

dgarrimar commented Jan 22, 2021