In [1]:
# notebook config (optional)
%load_ext lab_black
%load_ext autoreload
%autoreload 2
%matplotlib inline

In [2]:
import logging, ray

logging.getLogger().setLevel(logging.INFO)

# Multipass Processing for MiD Hi-C/HiChIP

"MiD" workflow digest nucleus into smaller digested fragments (DF) comparing to current in situ Hi-C. For a given length of sequenced fragment (SF), it is expected to contain more number of smaller DFs. 

To make use of this informatio, we introduced multipass processing pipeline. In the mapping step, the pipeline further split reads by ligation pattern beyond the first split for current Hi-C. As a result, each PETs can have more than 2 mapped fragments. 

At the moment, we only use to refine validpair calling by prioritizing cis validpairs over interchromosomal validpairs and religation pairs. Future efforts can be done on extracting multi-way interactions.

In [4]:
from multipass_process.mpmap import multipass_mapping_from_hicpro
from multipass_process.mvp import genome_digestion, construct_mpp_validpair

We provide a convinient script for the multipass processing. Here is the help information. For users wanting to know the details, we also have the breakdown of the steps below.

In [4]:
!hicpro_to_mvp.py -h

usage: hicpro_to_mvp.py [-h] [--bowtie2_path BOWTIE2_PATH]
                        [--samtools_path SAMTOOLS_PATH] [--num_cpus NUM_CPUS]
                        [--ligation_site LIGATION_SITE]
                        [--digestion_site DIGESTION_SITE] [--mapq MAPQ]
                        hicpro_results project_name genome_index genome_fa

Multipass mapping from HiC-Pro results and process to multipass-processed
validpairs (MVP)

positional arguments:
  hicpro_results        HiC-Pro output directory
  project_name          project/sample name of HiC-Pro output
  genome_index          bowtie2 genome index
  genome_fa             genome sequence file

optional arguments:
  -h, --help            show this help message and exit
  --bowtie2_path BOWTIE2_PATH
                        bowtie2 program path, default to find in env PATH
  --samtools_path SAMTOOLS_PATH
                        samtools program path, default to find in env PATH
  --num_cpus NUM_CPUS   number of cpu cores to use, defa

Step 1: Multipass mapping from HiC-Pro results

In [5]:
multipass_mapped_bam = multipass_mapping_from_hicpro(
    "/Extension_HDD2/Hanbin/ES_Cell/E14/HiC3_HL/HL28_Smc1_MiDHiChIP_Test/HL28_Smc1_MiDHiChIP_out/",
    "data",
    b"(CT[ATCG]AT[ATCG]AG)|(CT[ATCG]ATAA)|(TTAT[ATCG]AG)|(TTATAA)",
    "/home/software/bowtie2-2.2.9/genome/mm9/mm9",
    38,
    "bowtie2",
    "samtools",
)

This step generate a paired bam file under the HiC-Pro output directory. The next step resolved the multi mapped frag PETs to different types of proximal-ligated products.

In [5]:
digested_frags = genome_digestion(
    "/home/software/genome_index/mouse/bowtie_indexing/mm10.fa", "(CT[ATCG]AG)|(TTAA)"
)

In [6]:
ray.init(num_cpus=30)

2021-08-17 17:55:35,619	INFO services.py:1245 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8266[39m[22m


{'node_ip_address': '132.239.183.12',
 'raylet_ip_address': '132.239.183.12',
 'redis_address': '132.239.183.12:49048',
 'object_store_address': '/tmp/ray/session_2021-08-17_17-55-31_375515_9514/sockets/plasma_store',
 'raylet_socket_name': '/tmp/ray/session_2021-08-17_17-55-31_375515_9514/sockets/raylet',
 'webui_url': '127.0.0.1:8266',
 'session_dir': '/tmp/ray/session_2021-08-17_17-55-31_375515_9514',
 'metrics_export_port': 59667,
 'node_id': '5f832ba8f39f23e2e697fa778dae6e3e5d6757a3dd2f3d9fab52fa7a'}

In [9]:
construct_mpp_validpair(
    "/Extension_HDD2/Hanbin/ES_Cell/E14/HiC3_HL/HL28_Smc1_MiDHiChIP_Test/HL28_Smc1_MiDHiChIP_out/bowtie_results/bwt2_multipass/data/data.merged.bam",
    10,
    digested_frags,
    "/Extension_HDD2/Hanbin/ES_Cell/E14/HiC3_HL/HL28_Smc1_MiDHiChIP_Test/HL28_Smc1_MiDHiChIP_out/hop_results/data.pympp.mppValidPairs",
    30,
)

# Multi-way ValidHubs

In [7]:
from multipass_process.mvh import construct_mpp_validhub

In [12]:
construct_mpp_validhub(
    "/home/murrelab/MiD_HiChIP_Project/data_to_publish/E14_B2T1_Smc3_out/bowtie_results/bwt2_multipass/data/data.merged.bam",
    10,
    digested_frags,
    "/home/murrelab/MiD_HiChIP_Project/data_to_publish/E14_B2T1_Smc3_out/mvp_results/data.mppValidHubs",
    30,
)