Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to handle IUPAC ambiguous codes #11

Open
lituan opened this issue Jul 21, 2017 · 0 comments
Open

how to handle IUPAC ambiguous codes #11

lituan opened this issue Jul 21, 2017 · 0 comments

Comments

@lituan
Copy link

lituan commented Jul 21, 2017

I read a paragrah from tmap-book, but does not understand the phrase " lexicographically smallest DNA base" and why R is converted to C, can you explain this?

Ambiguous IUPAC codes in the reference/target FASTA will be converted to the lexico-
graphically smallest DNA base that is not compatible to the IUPAC code to ensure mini-
mum reference bias. For example, an IUPAC base R, which represents an A or a G, will be
converted to a C. All Ns in the reference will be converted to As. Furthermore, any non-
IUPAC character will be treated as an N. The ambiguity codes will only be re-considered
when calculating the NM and MD SAM record optional tags.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant