-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong output for two ~5Mb long sequences #7
Comments
Hi, You are not doing anything wrong. Probably, it is my fault. I decided to set the heuristic
In this case, I am getting a
Let me know if that is what you expect or if the score difference (123502-123585) is a bug, and I should patch it. That decision -- which I honestly don't like -- was made because many people tend to compare the full-WFA against other methods based on heuristics (e.g., banded, drops, ...) and dismiss it. In many cases, people fail to understand that the default is an exact method that can take time significantly more time than a heuristic one. Revisiting this decision, I should put a warning on the README and, I don't know, maybe switching the default back to full-WFA. PS. The alignment. |
Thanks. This fixed the issue (lh3/lv89@7a740f7). 123585 is the correct edit distance for global alignment. 123502 is for extension alignment. I mildly prefer to make the exact mode the default. The heuristic is more likely to fail on real data when we have long tandem duplications. One possible way is to add an explicit parameter to the main API and require the users to enable/disable heuristics: wavefront_align(wavefront_aligner_t*, char*, int, char*, int heuristic_strategy); PS: after a second thought, I feel adding an explicit parameter might not be a good idea because other parameters like type of alignment are more important. |
I am using WFA2-lib as a library to align two MHC sequences. The edit distance should be 123502 but WFA2 gives 2164110 in very short time. The two sequences are MHC-00GRCh38 and MHC-CHM13 from file "MHC-61.agc" at this Zenodo record. The code calling WFA2 is here:
https://github.com/lh3/lv89/blob/a16c30bba590e7de734edbdeeab55e4849e9ed20/main.c#L65-L71
PS: the output is correct for two ~100k sequences.
The text was updated successfully, but these errors were encountered: