-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarification - how to avoid high FPR #377
Comments
With respect to using the mask in the Another option—one that is mentioned in the docs, by the way—is to perform error correction. In my experience, this leads to a dramatic reduction in memory requirements/FPR, and the only side effect is a small loss of sensitivity (1-2 low-coverage SNPs are erroneously treated as sequencing errors and "corrected"). In kevlar's recommended BAM preprocessing workflow we use the Lighter error corrector. |
Ok, I have tried running the software with a mask at the count stage. The commands have been:
I have run 5 trios. The FPR rates have been as follows:
I'm going to try performing error correction on the fastq files, and see if that makes any difference, however, it seems that the kevlar count outputs already have an extremely low FPR. Is it usual for the kevlar filter command to report a much higher FPR than the kevlar count data it uses? The documentation states that a much higher error rate of up to 0.05 would be acceptable, but that doesn't seem to be working for me. |
|
I'm trying to run Kevlar on a human trio, sequenced on a BGI-Seq 500, so the data has a fairly low error rate, but the reads are not error-corrected. For some inexplicable reason we have been given about 50X-worth of data when we expected 30X. I am getting some errors for too-high FPR. I ran:
The kevlar filter stage fails with a high FPR. The documentation states that a FPR of <0.5 for control samples and <0.05 for case samples are required, and that this can be achieved for uncorrected reads with 36-72GB of space with a suitable mask. I have tried the above commands for three trios so far, and the FPR rates are as follows:
So, the FPR rates for the samples are under the thresholds stated in the documentation, but the FPR rate is reported much higher by the filter command, and the filter command produces an error message.
I have applied the --mask option to the filter command. I just noticed that the count command also has a --mask option - should I be giving the same mask to the count command as well? The documentation doesn't really make this clear. Should I allocate even more memory to the count process?
I would greatly appreciate being set in the right direction on this one, but I think the documentation could also do with clarification. Many thanks.
The text was updated successfully, but these errors were encountered: