Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What base to use in FASTA files #10

Open
alexjironkin opened this issue Apr 28, 2016 · 1 comment
Open

What base to use in FASTA files #10

alexjironkin opened this issue Apr 28, 2016 · 1 comment

Comments

@alexjironkin
Copy link

This is a place holder for a discussion on what base to put in the FASTA files currently:

  • SNP fail - N
  • NON-SNP fail - N

While this works it seems too simplistic. For example: 100 coverage with 90-10 split between REF and ALT it doesn't make sense (to me) to put an N. We certainly can't call SNP here, but it's not N either. Where there are 50/50 splits I also we shouldn't be putting an N - it should be a proper mixed base coded with extended UIPAC code. The same goes for positions with 0 REF and 2 _ALT_s. A lot of information is lost into N.

In fact, should N be used only when all 4 bases have been observed? Otherwise, it should be excluded from the analysis (the whole column will in this case).

@alexjironkin
Copy link
Author

A new proposal has been suggested by @richardemyers

  • POS: no reads mapped = -
  • POS: Fails depth, Quality etc = N or - (user defined?)
  • POS: Passes all filters, no ALT present = REF
  • POS: Passes all filter, ALT present = ALT
  • POS: Fails AD ratio = REF or Mixture, based on number of bases above a mixture threshold

REF only call can fail with LowQual see #9.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant