Hello @ArtRand,
Thank you for your tool, It helps me a lot,
I have a couple of questions related to the DMR score when I am trying to find the DMR (TSS regions provided by a bed file) in 2 samples with a fixed size (2kbps/region). The command I used is as follows
modkit dmr pair \ -a \${methylbed[i]} \ -b \${methylbed[j]} \ --regions-bed \${filename_i}_\${filename_j}.${bedtype}.intersected \ --min-valid-coverage ${cov_cutoff} \ --ref ${reference} \ --missing quiet \ --base C \ --threads ${task.cpus} \ --header \ --log-filepath \${filename_i}_\${filename_j}.dmr.log \ --out-path \${filename_i}_\${filename_j}.dmr \ --force
-
I found that the score column was not well correlated with the *difference_pct_modified=abs(a_pct_modified - b_pct_modified)100 as shown in this plot. Some points (TSS) have extremely high scores (log2 scale) but the difference_pct_modified were not really high. This is a bit strange to me intuitively.

-
In 191, you mentioned that "the score is, unfortunately, somewhat correlated with the number of potentially modified positions (CpGs in this case I believe) in the region."
So my questions are:
- Should I normalize the score with the number of CpG sites in a certain TSS and the depth at each TSS (or whatever regions), therefore we will have a more standardized score, If yes, is there any available or suggested formula?
- Should I consider the indeed DMR regions based on both score (from Modkit) and difference_pct_modified (As I calculated above), rather than solely based on the score? if yes, can you suggest a suitable cut-off? for example: score >= 4 and difference_pct_modified >=50? as illustrate in this image

Hello @ArtRand,
Thank you for your tool, It helps me a lot,
I have a couple of questions related to the DMR score when I am trying to find the DMR (TSS regions provided by a bed file) in 2 samples with a fixed size (2kbps/region). The command I used is as follows
modkit dmr pair \ -a \${methylbed[i]} \ -b \${methylbed[j]} \ --regions-bed \${filename_i}_\${filename_j}.${bedtype}.intersected \ --min-valid-coverage ${cov_cutoff} \ --ref ${reference} \ --missing quiet \ --base C \ --threads ${task.cpus} \ --header \ --log-filepath \${filename_i}_\${filename_j}.dmr.log \ --out-path \${filename_i}_\${filename_j}.dmr \ --forceI found that the score column was not well correlated with the *difference_pct_modified=abs(a_pct_modified - b_pct_modified)100 as shown in this plot. Some points (TSS) have extremely high scores (log2 scale) but the difference_pct_modified were not really high. This is a bit strange to me intuitively.

In 191, you mentioned that "the score is, unfortunately, somewhat correlated with the number of potentially modified positions (CpGs in this case I believe) in the region."
So my questions are: