What base to use in FASTA files #10

alexjironkin · 2016-04-28T07:38:00Z

This is a place holder for a discussion on what base to put in the FASTA files currently:

SNP fail - N
NON-SNP fail - N

While this works it seems too simplistic. For example: 100 coverage with 90-10 split between REF and ALT it doesn't make sense (to me) to put an N. We certainly can't call SNP here, but it's not N either. Where there are 50/50 splits I also we shouldn't be putting an N - it should be a proper mixed base coded with extended UIPAC code. The same goes for positions with 0 REF and 2 _ALT_s. A lot of information is lost into N.

In fact, should N be used only when all 4 bases have been observed? Otherwise, it should be excluded from the analysis (the whole column will in this case).

alexjironkin · 2016-04-28T07:41:55Z

A new proposal has been suggested by @richardemyers

POS: no reads mapped = -
POS: Fails depth, Quality etc = N or - (user defined?)
POS: Passes all filters, no ALT present = REF
POS: Passes all filter, ALT present = ALT
POS: Fails AD ratio = REF or Mixture, based on number of bases above a mixture threshold

REF only call can fail with LowQual see #9.

richardemyers mentioned this issue Apr 29, 2016

LowQual on REF base. #9

Open

alexjironkin added enhancement question labels Aug 24, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What base to use in FASTA files #10

What base to use in FASTA files #10

alexjironkin commented Apr 28, 2016

alexjironkin commented Apr 28, 2016

What base to use in FASTA files #10

What base to use in FASTA files #10

Comments

alexjironkin commented Apr 28, 2016

alexjironkin commented Apr 28, 2016