Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

N-containing reads never align, so it can appear that we never succeed in including them #167

Closed
ekg opened this issue Nov 30, 2015 · 4 comments

Comments

@ekg
Copy link
Member

ekg commented Nov 30, 2015

This demonstrates a failure (via msga) of path editing:

test git:(master) ✗ (for seq in GRCh38_alts/FASTA/HLA/A-3105.fa; do time vg msga -f $seq -B 256 -k 22 -K 11 -X 1 -E 4 -Q 22 -D >hla/$(basename $seq .fa).vg; done )                                               
loading GRCh38_alts/FASTA/HLA/A-3105.fa
preparing initial graph
building xg index
building GCSA2 index
gi|157734152:29655295-29712160: adding to graph, attempt 1
gi|157734152:29655295-29712160: aligning sequence of 56866bp against 2585 nodes
gi|157734152:29655295-29712160: editing graph
gi|157734152:29655295-29712160: normalizing graph and node size
gi|157734152:29655295-29712160: sorting and compacting ids
building xg index
building GCSA2 index
testing inclusion of gi|157734152:29655295-29712160
edit failed! {"rank": 533, "position": {"node_id": 533}, "edit": [{"from_length": 8, "to_length": 8}, {"to_length": 128, "sequence": "ATGTGTCTCTGGCAGCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNAAATGTTTTTGTGATAACTCAGATTGC"}]} is not a simple match!

I've attached the graph and the alignment generating this at this point in the process.

failed-edit.gam.json.txt
failed-edit.vg.json.txt

screenshot from 2015-11-30 10 37 09

Not clear what's going on, but now it's testable.

@ekg
Copy link
Member Author

ekg commented Nov 30, 2015

The problem is that the mapper cannot align sequences that have too many Ns in them! As a result we can include these particular edits in the graph, but no alignment will exactly match them. This breaks path labeling.

So we can take the failed alignment and include it, yielding:

vg mod -i 'gi|157734152:29655295-29712160-failed-edit.gam' 'gi|157734152:29655295-29712160-failed-edit.vg'

screenshot from 2015-11-30 11 11 59

But if you notice, these alternative paths are exactly equivalent, and a round of normalization will merge them back together.

Then, if we align again we'll see that the N-containing fragment does not map, and conclude that our editing process failed as we can't get a perfect walk through the graph to match our input sequence.

@ekg ekg changed the title some path inclusion is still broken N-containing reads never align, so it can appear that we never succeed in including them Nov 30, 2015
@ekg
Copy link
Member Author

ekg commented Nov 30, 2015

Two possible solutions.

  1. In banded alignment, attempt alignment of unaligned regions once their mates indicate a likely best alignment.
  2. Include the paths directly in editing. The fixes to path handling make this feasible, as paths are preserved during node splitting, end-to-end merging, and homologous merging (as in normalization).

The latter looks somewhat attractive. However, it isn't easy, because the editing semantics need to be adjusted to keep track of the reference-matching bits and also to tag the additional pieces that the edits add.

@ekg
Copy link
Member Author

ekg commented Nov 30, 2015

Both solutions could be implemented. Neither would hurt. The concern is that there may still be parts of the graph that can't be aligned to (as in 1) but this problem "goes away" in 2. However... we still need to test for path inclusion if we want to debug graph construction. So. Hm.

@ekg
Copy link
Member Author

ekg commented Mar 24, 2016

This has been resolved for some time.

@ekg ekg closed this as completed Mar 24, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant