Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draw_fusions.R content extending beyond page #56

Closed
erprateek opened this issue Mar 27, 2020 · 5 comments
Closed

Draw_fusions.R content extending beyond page #56

erprateek opened this issue Mar 27, 2020 · 5 comments

Comments

@erprateek
Copy link

Screen Shot 2020-03-26 at 5 56 00 PM

Hi Sebastian,

Attaching a screenshot of an example plot that extends beyond the page. ABL1 is known to have 12 exons but I guess the plot needs to resize if the first gene covers everything on the page like BCR: 23 exons. Do you know how this can be adjusted?

Here are 2 example rows for this fusion:
BCR ABL1 +/+ +/+ 22:23632600 9:133729451 splice-site splice-site translocation downstream upstream 27 27 7 423 1256 high . . duplicates(21) CATCCAGAGAGAGAAG___AGGGCGAACAAGGGCAGCAAGGCTACGGAGAGGCTGAAGAAGAAGCTGTCGGAGCAGGAGTCACTGCTGCTGCTTATGTCTCCCAGCATGGCCTTCAGGGTGCACAGCCGCAACGGCAAG___AGTTACACGTTCCTGATCTCCTCTGACTATGAGCGTGCAGAGTGGAGGGAGAACATCCGGGAGCAGCAGAAGAAGT___GTTTCAGAAGCTTCTCCCTGACATCCGTGGAGCTGCAGATGCTGACCAACTCGTGTGTGAAACTCCAGACTGTCCACAGCATTCCGCTGACCATCAATAAGGAAG___ATGATGAGTCTCCGGGGCTCTATGGGTTTCTGAATGTCATCGTCCACTCAGCCACTGGATTTAAGCAGAGTTCAA|AAGCCCTTCAGCGGCCAGTAGCATCTGACTTTGAGCCTCAGGGTCTGAGTGAAGCCGCTCGTTGGAACTCCAAGGAAAACCTTCTCGCTGGACCCAGTGAAAATGACCCCAACCTTTTCGTTGCACTGTATGATTTTGTGGCCAGTGGAGATAACACTCTAAGCATAACTAAAG___GTGAAAAGCTCCGGGTCTTAGGCTATAATCACAATGGGGAATGGTGTGAAGCCCAAACCAAAAATGGCCAAGGCTGGGTCCCAAGCAACTACATCACGCCAGTCAACAGTCTGGAGAAACACTCCTG in-frame IQREKRANKGSKATERLKKKLSEQESLLLLMSPSMAFRVHSRNGKSYTFLISSDYERAEWRENIREQQKKCFRSFSLTSVELQMLTNSCVKLQTVHSIPLTINKEDDESPGLYGFLNVIVHSATGFKQSS|kALQRPVASDFEPQGLSEAARWNSKENLLAGPSENDPNLFVALYDFVASGDNTLSITKGEKLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPVNSLEKHS .
BCR ABL1 +/+ +/+ 22:23632601 9:133729454 splice-site splice-site translocation downstream upstream 1 0 7 368 1256 medium . . duplicates(4) CATCCGGGAGCAGCAGAAGAAGT___GTTTCAGAAGCTTCTCCCTGACATCCGTGGAGCTGCAGATGCTGACCAACTCGTGTGTGAAACTCCAGACTGTCCACAGCATTCCGCTGACCATCAATAAGGAAG___ATGATGAGTCTCCGGGGCTCTATGGGTTTCTGAATGTCATCGTCCACTCAGCCACTGGATTTAAGCAGAGTTCAAG|CagccactgGatttaagcagagTGACTTTGAGCCTCAGGGTCTGAGTGAAGCCGCTCGTTGGAACTCCAAGGAAAACCTTCTCGCTGGACCCAGTGAAAATGACCCCAACCTTTTCGTTGCACTGTATGATTTTGTGGCCAGTGGAGATAACACTCTAAGCATAACTAAAG___GTGAAAAGCTCCGGGTCTTAGGCTATAATCACAATGGGGAATGGTGTGAAGCCCAAACCAAAAATGGCCAAGGCTGGGTCCCAAGCAACTACATCACGCCAGTCAACAGTCTGGA out-of-frame IREQQKKCFRSFSLTSVELQMLTNSCVKLQTVHSIPLTINKEDDESPGLYGFLNVIVHSATGFKQSSs|shwi* .

@suhrig
Copy link
Owner

suhrig commented Mar 27, 2020

This may look like a display error at first glance, but it's really not. It's an error in the GENCODE annotation. Let me explain: draw_fusions.R chooses the transcript to draw based on the following criteria in order:

  1. Find a transcript that has a splice site at the fusion breakpoint.
  2. If you are using the develop version: Find the transcript with the highest coverage.
  3. Additional criteria detailed in the manual.

In this case, GENCODE annotates exactly one transcript with a breakpoint at position 9:133729454, namely ENST00000393293.4, so the first rule is triggered. This transcript has only two exons, in contrast to the canonical transcript of ABL1, which has 12. I'm not sure where this transcript comes from. GENCODE has some errors and this may be one of them. The transcript certainly does not look sensible with only two exons and no stop codon. If you want draw_fusions.R to pick a different transcript, you need to delete the erroneous transcript from the GENCODE annotation.

In the future, the issue you are seeing may be alleviated by two improvements that I am currently thinking about implementing in the transcript selection of draw_fusions.R:

  1. I might change the selection to only consider the coverage and ignore all other rules. I will need to test if this has adverse effects, though.
  2. Since the new version of Arriba will provide columns for transcript IDs, I am going to add an option to draw_fusions.R which enables the user to decide if the script should pick the transcript predicted by Arriba or the transcript with the highest coverage. It will then be very easy to force the script to select a certain transcript by means of modifying the fusions.tsv file before passing to the script.

@erprateek
Copy link
Author

erprateek commented Apr 1, 2020

Hi Sebastian,

Based on your suggestion, I have created a python and bash script that can cleanup GENCODE data. This basically removes any transcripts that do not comply with a standard set of features that a protein coding transcript should have. Wanted to share in case anyone else faced this problem.

Archive.zip
(Requires pyranges to read in gtf as dataframe)

I think it will also be helpful to provide Arriba a preferred transcript input file that some users may find useful. The idea being, for clinical use case one may want to pick a different known transcript for a given gene than in a discovery setting. Since you plan to make this seamless between Arriba and draw_fusions.R, this preferred transcript gets selected by Arriba and then can be passed on to draw_fusions.R as well. Let me know what you think

@suhrig
Copy link
Owner

suhrig commented Apr 1, 2020

Thanks for the script! I will have a look at it. Currently, I have implemented two options:

  1. pick the transcript with the highest coverage (and matching splice site)
  2. pick the transcript provided in the fusions file

I am thinking about a third option (pick the canonical transcript). It will be helpful for me to see what criteria you use to select the transcript. Thanks!

@erprateek
Copy link
Author

I can speak for myself here and say I am really looking forward for #2. This really brings out the WYSIWYG behavior. Wondering if you thought about the canonical transcript option for Arriba itself. That way, again, the behavior is maintained and one does not see a disconnect between the tool output and the plot. Just a thought.

@suhrig
Copy link
Owner

suhrig commented Oct 10, 2020

Closing this one, too, as explained in issue #51.

@suhrig suhrig closed this as completed Oct 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants