Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect handling of non-N nucleotides with quality zero after Q-score binning #230

Open
ivan-mh opened this issue Jul 5, 2023 · 0 comments

Comments

@ivan-mh
Copy link

ivan-mh commented Jul 5, 2023

Dear developers,

Thanks a lot for this amazing tool!

I would like to ask you about Strelka support of Q-score binned FASTQs. Q-score binning is currently the default option for new sequencing machines, e.g. NextSeq1000, NextSeq2000 or NovaSeq6000. After Q-score binning is applied, bases with score 0-2 are assigned score 0. As a result, some bases with non-N nucleotides (i.e. ACGT) will have score zero.
For a validation sample, I applied the Q-score binning, only for bases with score 0-2. After I run the sample through Strelka, I saw hundreds of somatic variant calls supported by one read. I also observed that ~1% of somatic expected variant calls are no longer called. I also saw ~0.15% germline expected variant calls no longer called.
When I then masked/replaced those non-N nucleotides with quality zero by Ns, identical somatic and germline variants were called as before the Q-score binning, just the read support of the variants was typically by 1 read less, which is expected.

Do I understand it correctly that after the Q-score binning, non-N nucleotides with quality zero need to be replaced/masked by Ns to get correct variant calls from Strelka somatic and Strelka germline?

Many Thanks,
Ivan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant