Dear modkit developers,
I've been facing this curious issue with m6A pileup) and differential methylation (dmr), and would like to hear your opinion on whether or not to trust on results:
I'm analysing m6A on a viral RNAs in a timeseries designh (0h, 24h, and 48h, of infection). On the 0h timepoint, expectedly, I've got very few reads aligned to the virus ref genome ( ~3k mapped reads only).
I know that modkit pileup recommends an N sampling of 10-50k reads for performing that "pass" threshold calculation (and I use -n 50000 on my runs), but it does still seem to run well even for these 0h samples like 0h_rep1 (~2.1k reads mapped) and 0h_rep3 (~4.8k reads mapped), finding a confidence 'pass' threshold over 80%.
Then, modkit dmr also seems to run well (on both 0h-24h and 0h-vs-48h), even though warning on a great amount of failed sites due to "missing-in-one-condition", which is the 0h one.
After filtering for pval < 0.05 (column 15 from the dmr bedMethyl), I get tens of differentially modified m6A sites, with an effect size (column) bias towards the 0h sample group. All sites are highly differentially modified in the 0h group.
Shall I trust these results? Or it's just a statistical bias from the little input size (%mapped reads) of the 0h sample.
JFYI: 24h and 48h have several hundreds thousands (200-700k) mapped reds against the same genome. And it's dmr bedMethyl file displays a good balance of effect sizes (some more modified in the 24h, and some more in the 48h).
Thanks,
Best,
Elton
Dear modkit developers,
I've been facing this curious issue with m6A pileup) and differential methylation (dmr), and would like to hear your opinion on whether or not to trust on results:
I'm analysing m6A on a viral RNAs in a timeseries designh (0h, 24h, and 48h, of infection). On the 0h timepoint, expectedly, I've got very few reads aligned to the virus ref genome ( ~3k mapped reads only).
I know that modkit pileup recommends an N sampling of 10-50k reads for performing that "pass" threshold calculation (and I use -n 50000 on my runs), but it does still seem to run well even for these 0h samples like 0h_rep1 (~2.1k reads mapped) and 0h_rep3 (~4.8k reads mapped), finding a confidence 'pass' threshold over 80%.
Then, modkit dmr also seems to run well (on both 0h-24h and 0h-vs-48h), even though warning on a great amount of failed sites due to "missing-in-one-condition", which is the 0h one.
After filtering for pval < 0.05 (column 15 from the dmr bedMethyl), I get tens of differentially modified m6A sites, with an effect size (column) bias towards the 0h sample group. All sites are highly differentially modified in the 0h group.
Shall I trust these results? Or it's just a statistical bias from the little input size (%mapped reads) of the 0h sample.
JFYI: 24h and 48h have several hundreds thousands (200-700k) mapped reds against the same genome. And it's dmr bedMethyl file displays a good balance of effect sizes (some more modified in the 24h, and some more in the 48h).
Thanks,
Best,
Elton