-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
correcting low cov reads from heterozygous genomes #7
Comments
Hello, We've never explicitly tested this, but I can give you some information on what I would expect to happen. For FMLRC, the coverage of your long reads doesn't matter actually since reads are all corrected individually. Instead, this is going to largely depend on the short-read data you're using. FMLRC looks for evidence of the k-mer sequences in the short reads, so if a particular allele is absent (or at a low frequency) from that short read data, then FMLRC will treat it as if it were a sequencing error and will most likely correct it to an allele that is present in the short read data. However, if you have multiple alleles and those alleles are present at the required thresholds, then FMLRC should recognize the allele as a valid k/K-mer. Does that make sense? As for suggested parameters, I don't have any reason to believe one value for Let me know if you have any more questions! |
Closing due to inactivity. Feel free to open if you have more questions. |
Hello, |
In the code, indels and single base changes are indistinguishable and we calculate edit distance between the uncorrected and the correction in the event of multiple possible corrections that need to be selected from. The short answer is that any k-mer that is not solid (i.e. present) in the short read BWT will be treated as an error, even if that same k-mer block occurs hundreds or thousands of times in the long read data (remember, each long read is handled independently). Currently, solid is defined using two parameters:
So if the short 5-bp insertion is present at the above requirements in your short read dataset, then I would not expect fmlrc to correct it because it thinks the k-mers are not errors. |
clear now, thanks! |
Hello,
I have about 20x (per allele) PromethION data of a highly heterozygous plant genome. I also have plenty of short read data to align to it for the error correction.
I wonder if with FMLRC the small allelic variants (substitutions and indels) will be washed away at the error correction step: I want to remove errors, but keep allelism so that I can assemble separately the two alleles - we know that this is already possible with Illumina, I want to do it with long reads now.
Did you ever try your tool with low (~20x raw data) ONT coverage from a heterozygous genome? Do you have any suggestions (k and K size, T, ...) on how not to lose allelic variation?
thanks!
The text was updated successfully, but these errors were encountered: