Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explain seqtk comp columns #47

Closed
slowkow opened this issue Jan 6, 2015 · 6 comments
Closed

Explain seqtk comp columns #47

slowkow opened this issue Jan 6, 2015 · 6 comments

Comments

@slowkow
Copy link

slowkow commented Jan 6, 2015

The usage says:

seqtk comp
Usage:  seqtk comp [-u] [-r in.bed] <in.fa>

Output format: chr, length, #A, #C, #G, #T, #2, #3, #4, #CpG, #tv, #ts, #CpG-ts

Can you please explain the meaning of #2 #3 #4 #CpG #tv #ts #CpG-ts? I read your code, but it is hard to digest for me.

@tseemann
Copy link

@slowkow I agree, that piece code is challenging, with multiple levels of redirection and bit encoding!

My guesses are:

  • ts transition ie. adacent A<=>G or C<=>T
  • tv transversion - the other possible [AGTC]<=>[AGTC] ajdacent pairs
  • CpG CG pair (revcom aware)
  • CpG-ts CG pair (revcom aware) but allowing transitions in 1st (and/or 2nd) base
  • 2 number of ambiguous IUPAC bases with 2 possibile values
  • 3 ditto with 3
  • 4 ditto with 4 (I assume this means "N")

@slowkow
Copy link
Author

slowkow commented Feb 26, 2015

@tseemann Thanks for the helpful explanation! Your explanation sounds correct to me.

@slowkow slowkow closed this as completed Feb 26, 2015
@pengchy
Copy link

pengchy commented Nov 21, 2015

And what's the parameter "-u" meaning?

@tseemann
Copy link

@pengchy According to the code, -u means upper_only=1 and it seems to only count uppercase letters in the composition statistics ie. it masks all lowercase letters.

@pengchy
Copy link

pengchy commented Nov 21, 2015

Thank you @tseemann, It is very helpful.

@tseemann
Copy link

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants