New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Insect 5hmC values anomalous #29
Comments
I would first recommend using Guppy for modified base calling as Megalodon is being deprecated and all the necessary features from Megalodon to achieve the highest accuracy results are ported into Guppy. As for the correspondence of the 5mC and the 5mC+5hmC results, these should not necessarily be equal. The 5mC model was trained to detect 5mC, not both 5mC and 5hmC as 5mC (as would occur in bisulfite data to some extent). Thus when the Remora model sees data from a new modified base not modeled the results are not known. It may be that some other modified base also exists here (4mC, 5caC, 5fC, etc) that may be causing these results. Again with the principle that modified bases not seen by the model may fall into any modeled category with unknown frequencies. This is one of the advantages of nanopore modified base detection over chemical methods (such as bisulfite). When we are able to extend our modified base catalogue the results will be much richer than chemical conversion methods. The 5mC+5hmC model is the first step in this direction. As for the higher frequency of 5hmC in your data I would recommend a bit more investigation. Are these confident calls? Try to adjust the aggregation thresholds in |
Thanks for the detailed response. I used megalodon rather than guppy because I already had an assembled genome that I wanted to get methylation calls for, rather than calling with guppy, then having to map the unmapped bam file back to the genome. I have done that once, but it's a hassle. In the meantime it's easier to run megalodon with remora against the genome to get modification data. I appreciate your comments regarding miscalling of non-5mC mods as 5mC, thats' why I went with the 5hmc-5mc model, to reduce those miscalls. I believe this dual model worked, by giving me expected levels of 5mC, but the 5hmC results are hard to explain. The input here is whole genomic DNA derived from adult wasps, with no conversions or chemical modifications, sequenced on an 9.4.1 flow cell to 60X coverage. I filtered them with even more strict confidence than default by For comparison, I recently called 5mC and 5hmC in an ant genome, where they both came out as expected values (less than 1% for 5mC and nearly 0% for 5hmC.) There I had used a previous model from rerio, As for ground truth, I haven't checked via HPLC or mass spec. I am willing to look with a cheap ELISA assay for 5hmC, but honestly, nobody has ever seen such incredibly high 5hmC levels in any type of cell from any organism, from bacteria, plants, fungi, or animals that I am aware of, and certainly not any insects. I can't imagine that 4mC, 5-carboxy, or 5-formylC are explaining >50% of modified C called from this wasp DNA. More likely I'm doing something wrong or this model isn't handling wasp DNA very well. I am willing to try the old res_dna_r941_min_modbases_5mC_5hmC_v001.cfg model again but it will take me awhile to set it up because megalodon is crashing when I try it at the moment. I will probably have to do the guppy -> unaligned bam -> aligned bam -> modbam2bed -> awk summary to get there. It'll take me a couple days. |
I agree that this certainly seem very high, but as noted this may suggest another modified base that is not modeled. I don't know enough about wasp genomics to know if this has been tested, but if the only bases present are C, 5mC and 5hmC I would highly doubt that the released Remora model would produce so many false positives, though it is certainly in the realm of possibilities and we will look to address this if this is the case. As for the counting method, this seems reasonable, though I would suggest |
Just an update. I re-ran the basecaller with guppy, using the old research model with >80% stringency, Then I ran the updated model, I believe with such sparse and low incidence of 5mC, the data is replicating within 2% no matter what model is used, so long as stringency is applied. That's reassuring. And with low values it's going to be near limit of detection probably. Also, marcus1487, I respect the method of averaging all calls that you proposed, however the method I am using averages the percentage methylation itself, therefore it normalizes any uneven coverage. I don't want errant contigs with 1000x coverage biasing my overall methylation. In practice though, after I applied the 80% filter, both count methods came back with indistinguishable values anyway! |
Hi @dithiii, I'm now using Guppy and Remora pipeline. This pipeline gave me fast5, fastq, and bam files for each sample. The problem is that I have unaligned bam files with methylation tags and I want to align those bam files to reference genome. I think, minimap2 doest not accept bam files as input. I want to keep alignment info and match this info to methylation info. Do you have any experience with that to recommend me something? Thank you. |
Hi @kaanokay , Map unmapped Bams to a reference. The -TXX,YY flag carries over the modifications from bam to fastq. The -y flag copies input fastq commands to output. Convert sam to sorted bam. Then I am using Clair3 to call variants, which uses Whatshap internally to call haplotypes. You can visualize regions with Methylartist and convert to bed file with modbam2bed. Good luck! |
Dear @dithiii, I'll try your pipeline soon. My first aim is differential methylation analysis and second one is phasing for differential methylation. If you have any suggestion for DMR and phasing analyses, I'm very appreciated with that. Thank you so much for sharing. |
I am working with an insect genome and trying to call 5mC and optionally 5hmC. When using megalodon with
--remora-modified-bases dna_r9.4.1_e8
model calling 5mC only, I get around 6% 5mC (too high). while calling 5hmC_5mC, I get 0.55% 5mC (about right), but I am getting near 60% 5hmC which seems FAR too high to be realistic. I've never heard of an insect with such incredibly high 5hmC.My calls are
The text was updated successfully, but these errors were encountered: