Normalized the DMR scores

Hello @ArtRand,
Thank you for your tool, It helps me a lot,
I have a couple of questions related to the DMR score when I am trying to find the DMR (TSS regions provided by a bed file) in 2 samples with a fixed size (2kbps/region). The command I used is as follows

`            modkit dmr pair \
                        -a \${methylbed[i]} \
                        -b \${methylbed[j]} \
                        --regions-bed \${filename_i}_\${filename_j}.${bedtype}.intersected \
                        --min-valid-coverage ${cov_cutoff} \
                        --ref ${reference} \
                        --missing quiet \
                        --base C \
                        --threads ${task.cpus} \
                        --header \
                        --log-filepath \${filename_i}_\${filename_j}.dmr.log \
                        --out-path \${filename_i}_\${filename_j}.dmr \
                        --force
`
- I found that the score column was not well correlated with the **difference_pct_modified=abs(a_pct_modified - b_pct_modified)*100** as shown in this plot. Some points (TSS) have extremely high scores (log2 scale) but the difference_pct_modified were not really high. This is a bit strange to me intuitively.
![image](https://github.com/user-attachments/assets/26ee4084-dd78-4e90-9b2e-039177c21767)

- In [191](https://github.com/nanoporetech/modkit/issues/191), you mentioned that "the score is, unfortunately, somewhat correlated with the number of potentially modified positions (CpGs in this case I believe) in the region."

**So my questions are:**
- Should I normalize the score with the number of CpG sites in a certain TSS and the depth at each TSS (or whatever regions), therefore we will have a more standardized score, If yes, is there any available or suggested formula?
- Should I consider the indeed DMR regions based on both score (from Modkit) and difference_pct_modified (As I calculated above), rather than solely based on the score? if yes, can you suggest a suitable cut-off? for example: score >= 4 and difference_pct_modified >=50? as illustrate in this image 
![image](https://github.com/user-attachments/assets/4bb21102-28e1-4898-8b36-d90460d2ca95)





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalized the DMR scores #261

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Normalized the DMR scores #261

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions