Making transcript identity assignments with adaptive exon searching #22

dewyman · 2018-04-16T23:13:43Z

This is a fairly difficult problem. The first version I implemented was greedy, seeking to choose the most closely matched exon for every query exon in the sam transcript. Here is an example that illustrates why this doesn't work.

Transcript c33255/f1p0/3238: The exon coordinates are as follows:
827670-827775
829003-829104
847654-847806
849484-849602
851927-852110
852671-852766
853391-853424
853474-855121
856449-857242
When we try to match the very first exon, the greedy match is 827669-827775, which is found only in annotated transcript ENST00000609139.5. Visually, we can see in the UCSC genome browser shot below that there were other options. For instance, the first four exons of ENST00000608189.4 (second transcript) match the query.

dewyman · 2018-04-17T00:05:17Z

Here is another special case that I found: c10882/f1p1/3230. This situation suggests that maybe we need to impose a similarity requirement on the exon matches. If the 3' or the 5' difference exceeds 10 basepairs, then it is not a match.

dewyman · 2018-04-17T06:46:18Z

Solution: Loose exon assignments and transcript pool approach

For each exon in query transcript:
- Fetch annotated exons that overlap with query exon
- match_pool := transcripts that contain these exons
- transcript pool := intersection(match_pool, previous transcript_pool)

By the end, the only transcripts left in the transcript pool are transcripts that contain overlap for every exon in the query transcript. A final cross-check of the total exons in the query and the annotation ensures that we don't report a transcript that has more exons than the query.

dewyman added planning build labels Apr 16, 2018

dewyman added this to the TALON: Exon-based comparison setup milestone Apr 16, 2018

dewyman self-assigned this Apr 16, 2018

dewyman closed this as completed Apr 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Making transcript identity assignments with adaptive exon searching #22

Making transcript identity assignments with adaptive exon searching #22

dewyman commented Apr 16, 2018 •

edited

Loading

dewyman commented Apr 17, 2018 •

edited

Loading

dewyman commented Apr 17, 2018

Making transcript identity assignments with adaptive exon searching #22

Making transcript identity assignments with adaptive exon searching #22

Comments

dewyman commented Apr 16, 2018 • edited Loading

dewyman commented Apr 17, 2018 • edited Loading

dewyman commented Apr 17, 2018

dewyman commented Apr 16, 2018 •

edited

Loading

dewyman commented Apr 17, 2018 •

edited

Loading