# DMR Calling

Call Differential Methylated Regions (DMR) from multiple ALLC files. The DMR calling contains two steps:
1. Call DMS with permutation based goodness-of-fit test
2. Call DMR from the DMS results

The result is stored in RegionDS format, which contains versatile functions for genomic region analysis in following sections.

## Import

In [1]:
import pathlib
from ALLCools.mcds import RegionDS
from ALLCools.dmr import call_dms, call_dmr

## Parameters

In [2]:
mc_bulk_dir = '../../data/HIPBulk/mc_bulk/'
# make a dict, key is sample name, value is allc path
allc_table = {allc_path.name.split('.')[0]: str(allc_path) 
              for allc_path in pathlib.Path(mc_bulk_dir).glob('*/*.CGN-Merge.allc.tsv.gz')}
samples = list(allc_table.keys())
allc_paths = list(allc_table.values())

chrom_size_path = '../../data/genome/mm10.main.nochrM.chrom.sizes'
output_dir = 'test_HIP'

## Call Differentially Methylated Sites
Identify DMS from multiple ALLC files. This step is the most time consuming one. If you want to have an estimate of how long it will take given the ALLC paths, you may provide a small region to have a test run first. For example, passing `region="chr1:10000000-11000000"` to `call_dms()`

In [3]:
call_dms(
    output_dir=output_dir,
    allc_paths=allc_paths,
    samples=samples,
    chrom_size_path=chrom_size_path,
    cpu=45,
    max_row_count=50,
    n_permute=3000,
    min_pvalue=0.01,
    # here we just calculate some small regions for demo
    # do not provide region parameter if you want to run DMR calling for the whole genome
    # This parameter can also be used for call DMR/DMS in specific region of interest
    region=['chr1:0-100',
            'chr1:10000000-10010000',
            'chr19:5000000-5100000'])

RMS tests for 1923 sites.
RMS tests for 105 sites.


## Call Differentially Methylated Regions

In [4]:
call_dmr(
    output_dir=output_dir,
    p_value_cutoff=0.001,
    frac_delta_cutoff=0.3,
    max_dist=250,
    residual_quantile=0.7,
    corr_cutoff=0.3,
    cpu=30)

## Results in RegionDS
After `call_dms` and `call_dmr`, the `output_dir` will contain xarray.Datasets for DMR and DMS stored in zarr format. The RegionDS class can automatically handle these datasets by `RegionDS.open(output_dir)`.
Some key design principles for the RegionDS:
- Similar to MCDS, the RegionDS is based on xarray.Dataset class, inherit all of its APIs and can handle large matrix (size exceed physical memory) efficently with the zarr backend.
- RegionDS is focused on genomic region analysis, provides functions related to region annotation, motif analysis, and correlation analysis.
- A key parameter for RegionDS is the `region_dim`, which tells most of its functions which region set to focus on. By default, after DMR calling, RegionDS use `'dmr'` as the `region_dim`. When open a RegionDS that contains multiple datasets (e.g., different annotations adding by following sections), only those related to `region_dim` will be loaded.
- By changing the `region_dim` parameter, you can open other dataset (e.g., dms) as well. The `select_dir` parameter also allows you to specify related datasets to open.

In [5]:
dmr_ds = RegionDS.open(output_dir)
dmr_ds

Using dmr as region_dim


In [6]:
dmr_ds = RegionDS.open(output_dir, region_dim='dms')
dmr_ds

## Directory Structure of RegionDS

RegionDS store all the datasets in `output_dir` with `xarray.Dataset.to_zarr`

In [7]:
!tree -L 1 -h test_HIP/

test_HIP/
├── [ 320]  chrom_sizes.txt
├── [ 290]  dmr
└── [ 278]  dms

2 directories, 1 file


Each dataset dir contains [xarray.Dataset](xarray:xarray.Dataset) stored in zarr format via
[{func}`to_zarr()`](xarray:xarray.Dataset.to_zarr).


In [8]:
!tree -L 1 -h test_HIP/dmr

test_HIP/dmr
├── [  61]  count_type
├── [ 248]  dmr
├── [ 248]  dmr_chrom
├── [ 308]  dmr_da
├── [ 278]  dmr_da_frac
├── [ 248]  dmr_end
├── [ 248]  dmr_length
├── [ 248]  dmr_ndms
├── [ 248]  dmr_start
├── [ 278]  dmr_state
└── [  61]  sample

11 directories, 0 files


The `{output_dir}/.ALLCools` file contains configuration recognized by `RegionDS.open`

In [9]:
!cat test_HIP/.ALLCools

ds_region_dim:
  dmr: dmr
  dms: dms
region_dim: dmr
