New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
explain ref agree normalization rules #381
Conversation
Did y'all (i.e., @ahwagner , @andreasprlic ) see that I addressed #377 in this PR a few weeks ago? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the ping on this thread @reece, I missed the original.
I think these changes accurately represent prior discussion on these points. I do think that the distinction between Allele and variant is artificial, particularly as we do not normalize reference-agree and reference-disagree Alleles the same. In addition, we (for well-justified reasons) represent Alleles as a state on a reference sequence, inherently comparing the state to some underlying reference.
To be very clear about what I mean, for two (short) reference sequences:
Reference 1: AACTA
Reference 2: AACGA
A resultant sequence of AACCA may be expressed as two different Alleles, conceptually:
- A
C
at interresidue coordinates<3,4>
onReference 1
- A
C
at interresidue coordinates<3,4>
onReference 2
Each of these result in different digests, as the information of the reference sequence is inherent to the representation of each Allele. The primary distinction between Allele and similar entities from other major variant formats is that we:
- Do not represent the reference state as it is retrievable
- Do not provide special character representation (e.g.
=
) of the reference-match state
I am not advocating that we compute resultant sequence to create digests; the process is far too cumbersome to be practical in implementation and the gains would be nearly non-existent (i.e. analogous reference sequences are too dissimilar for the above scenario to realistically occur). But I think we should revise the text here to better capture the notion of allele as a reference-match agnostic state instead of a reference-sequence agnostic state. The edits I have suggested here reflect this subtlety.
Looks good to me (but I'm not a reviewer and am totally powerless! Yay!) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed applied per 👍 from @reece
Anyone want to hit the merge button? |
Finally got around to Larry's requests for explaining ref agree normalization rules.
Instead of writing a new section, I overhauled an existing related section (
should-normalize
).