filtering low-identity regions of alignments #7

ekg · 2019-07-10T09:39:27Z

I'd like to remove parts of alignments that have low identity. The idea would be to take a longer alignment and break it into multiple alignments, removing regions where the identity drops below some threshold over a window of a given length. This would have to work on top of alignments with cigar strings.

The goal is to provide a controllable limit to collapse between diverged regions of sequences in graphs that are built from PAF based alignments. Applying this filter should make the graph have more large bubbles and be more "open", but have less small bubbles.

natir · 2019-07-10T13:44:48Z

I don't like the idea of using a filter based on the cigar string because it is not always present in all files. But this is not a fundamental problem.

To understand the idea of the filter, for this overlap:

A 100 50 100 + B 100 0 50 10 50 50 255 cg:Z:15I10X15I

fpa must split this overlaps and give in output the two "good" part of overlap or just filter out this overlap

ekg · 2019-07-10T14:00:15Z

I don't like the idea of using a filter based on the cigar string because it is not always present in all files. But this is not a fundamental problem.

I do understand you. I appreciate this is a new direction for fpa as you aren't working with these strings before. On my side, I can't really work without the cigar strings.

To understand the idea of the filter, for this overlap:
fpa must split this overlaps and give in output the two "good" part of overlap or just filter out this overlap

That'd be the idea. No worries if this isn't something trivial for you to do or useful for your work. I can implement the modifier in another context.

natir · 2019-07-10T14:18:51Z

At the moment my parser ignores the optional fields of the paf and its would require time to adapt it and create a cigar string parser.

This feature seems very interesting/important to me but requires a lot of code to be written and I unfortunately don't have time for write it yet.

If you want to have this behaviour quickly, you may have to develop it yourself.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

filtering low-identity regions of alignments #7

filtering low-identity regions of alignments #7

ekg commented Jul 10, 2019

natir commented Jul 10, 2019

ekg commented Jul 10, 2019

natir commented Jul 10, 2019

filtering low-identity regions of alignments #7

filtering low-identity regions of alignments #7

Comments

ekg commented Jul 10, 2019

natir commented Jul 10, 2019

ekg commented Jul 10, 2019

natir commented Jul 10, 2019