Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filtering low-identity regions of alignments #7

Open
ekg opened this issue Jul 10, 2019 · 3 comments
Open

filtering low-identity regions of alignments #7

ekg opened this issue Jul 10, 2019 · 3 comments

Comments

@ekg
Copy link

ekg commented Jul 10, 2019

I'd like to remove parts of alignments that have low identity. The idea would be to take a longer alignment and break it into multiple alignments, removing regions where the identity drops below some threshold over a window of a given length. This would have to work on top of alignments with cigar strings.

The goal is to provide a controllable limit to collapse between diverged regions of sequences in graphs that are built from PAF based alignments. Applying this filter should make the graph have more large bubbles and be more "open", but have less small bubbles.

@natir
Copy link
Owner

natir commented Jul 10, 2019

I don't like the idea of using a filter based on the cigar string because it is not always present in all files. But this is not a fundamental problem.

To understand the idea of the filter, for this overlap:

A 100 50 100 + B 100 0 50 10 50 50 255 cg:Z:15I10X15I

fpa must split this overlaps and give in output the two "good" part of overlap or just filter out this overlap

@ekg
Copy link
Author

ekg commented Jul 10, 2019

I don't like the idea of using a filter based on the cigar string because it is not always present in all files. But this is not a fundamental problem.

I do understand you. I appreciate this is a new direction for fpa as you aren't working with these strings before. On my side, I can't really work without the cigar strings.

To understand the idea of the filter, for this overlap:
fpa must split this overlaps and give in output the two "good" part of overlap or just filter out this overlap

That'd be the idea. No worries if this isn't something trivial for you to do or useful for your work. I can implement the modifier in another context.

@natir
Copy link
Owner

natir commented Jul 10, 2019

At the moment my parser ignores the optional fields of the paf and its would require time to adapt it and create a cigar string parser.

This feature seems very interesting/important to me but requires a lot of code to be written and I unfortunately don't have time for write it yet.

If you want to have this behaviour quickly, you may have to develop it yourself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants