Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

seqtk doesn't support the full range of ambiguity codes. #51

Closed
hexylena opened this issue Feb 16, 2015 · 1 comment
Closed

seqtk doesn't support the full range of ambiguity codes. #51

hexylena opened this issue Feb 16, 2015 · 1 comment

Comments

@hexylena
Copy link

E.g. for a test sequence:

>ambig
ACGTMRWSYKVHDBN

We'll see the following bases identified as ambiguous:

#chr    position  base
ambig   5         M
ambig   6         R
ambig   7         W
ambig   8         S
ambig   9         Y
ambig   10        K

It would be great if seqtk could support the following IUPAC ambiguities:

B   C or G or T
D   A or G or T
H   A or C or T
V   A or C or G
N   any base

If I wrote any C at all, I'd make a PR, but as I don't I'll just leave an issue. It looks like the offending line is code like L812 and L438 which checks if the value obtained from bitcnt_table == 2, whereas there are ambiguities with values 3 and 4.

@lh3
Copy link
Owner

lh3 commented Feb 16, 2015

Both cases are intentional. As its name implies, listhet gives the heterozygous positions. B/D/H/V/N are not hets. randbase is intended to work on hets only.

@lh3 lh3 closed this as completed Feb 16, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants