In [1]:
import sys
import os
import pandas
import copy
sys.path.append("..")
from diachr import DiachromaticInteractionSet
from diachr import BaitedDigest
from diachr import BaitedDigestSet

# Interactions at baited digests - Usage

In this Jupyter notebook, the classes ``BaitedDigest`` and ``BaitedDigestSet`` are demonstrated.

The class ``BaitedDigest`` can be used to group interactions according to the baits. For each bait, interactions are further differntiated according to interaction category (``DIX``, ``DI``, ``UIR``, ``UI`` and ``ALL``) and enrichment status (``NE`` or ``EN``). Interactions with the enrichment state `NN` or ``EE`` cannot be assigned unabgiuously to a baited digests and, therefore, are not taken into account. However, for capture Hi-C data, such interactions make up only a small percentage (around 15% on average).

The class ``BaitedDigestSet`` can be used to manage a set of ``BaitedDigest`` objects. It is basically a dictionary in which ``BaitedDigest`` objects are stored, with the coordinates of the digests serving as keys.

This structuring of the data allows to investigate interactions at individual baits.

## Input file for testing

We have prepared two small test files in ``Diachromatic11`` format.

In [2]:
# Interactions on 'chr21' and 'chr22' only
INTERACTION_FILE = "../tests/data/baited_digests_d11_interaction_test_file.tsv.gz"
OUT_PREFIX = "TEST_FILE_1"

# 100,000 randomly selected interactions
#INTERACTION_FILE = "../tests/data/baited_digests_d11_interaction_test_file_2.tsv.gz"
#OUT_PREFIX = "TEST_FILE_2"

## Creation of a ``BaitedDigestSet``

First, we create a ``DiachromaticInteractionSet`` from the test file.

In [3]:
# Create DiachromaticInteractionSet
d11_interaction_set = DiachromaticInteractionSet(rpc_rule = 'ht')
d11_interaction_set.parse_file(
    i_file = INTERACTION_FILE,
    verbose = True)

[INFO] Parsing Diachromatic interaction file ...
	[INFO] ../tests/data/baited_digests_d11_interaction_test_file.tsv.gz
	[INFO] Set size: 118,468
[INFO] ... done.


Next, we create a ``BaitedDigestSet`` and pass the ``DiachromaticInteractionSet``.

In [4]:
baited_digest_set = BaitedDigestSet()
read_interactions_info_dict = baited_digest_set.ingest_interaction_set(d11_interaction_set, verbose=True)

[INFO] Reading interactions and group them according to chromosomes and baited digests ...
	[INFO] Total number of interactions read: 118,468
	[INFO] Total number of baited digests: 680
[INFO] ... done.


The function ``get_ingest_interaction_set_info_report()`` returns a string with more detailed information on the ingestion.

In [5]:
print(baited_digest_set.get_ingest_interaction_set_info_report())

[INFO] Report on ingestion of interactions:
	[INFO] Total number of interactions read: 118,468
	[INFO] Discarded NN and EE interactions: 11,996
	[INFO] Total number of ingested NE and EN interactions: 106,472
	[INFO] Broken down by interaction category and enrichment status: 
		[INFO] DIX: 
			[INFO] NE: 1
			[INFO] EN: 0
		[INFO] DI: 
			[INFO] NE: 2,501
			[INFO] EN: 2,559
		[INFO] UIR: 
			[INFO] NE: 2,257
			[INFO] EN: 1,825
		[INFO] UI: 
			[INFO] NE: 46,851
			[INFO] EN: 50,478
		[INFO] ALL: 
			[INFO] NE: 51,610
			[INFO] EN: 54,862
	[INFO] Total number of baited digests: 680
[INFO] End of report.



And the function ``get_ingest_interaction_set_table_row()`` returns the same information in table format.

In [6]:
print(baited_digest_set.get_ingest_interaction_set_table_row())

:TR_INGESTION:	TOTAL_INTERACTIONS_READ	DISCARDED	INGESTED	DIX_NE	DIX_EN	DI_NE	DI_EN	UIR_NE	UIR_EN	UI_NE	UI_EN	ALL_NE	ALL_EN	BAITED_DIGESTS
:TR_INGESTION:	118468	11996	106472	1	0	2501	2559	2257	1825	46851	50478	51610	54862	680



## Explanation of the data structure

In the object ``baited_digest_set``, the chromosomes are first grouped according to chromosomes. For example, ``baited_digest_set._baited_digest_dict['chr21']`` all ``BaitedDigest`` object that were created for ``chr21``.

In [7]:
dict_all_baited_digest_objects_on_chr21 = baited_digest_set._baited_digest_dict['chr21']

A individual ``BaitedDigest`` object can be accessed via its digest coordinates.

In [8]:
individual_baited_digest_object_on_chr21 = baited_digest_set._baited_digest_dict['chr21']['chr21\t33167499\t33175012']

An ``BaitedDigest``, contains all interactions that end in this digest, spearted by interaction category and enrichment state. Here, as an example, an interaction of category ``DI`` with enricchment state ``NE``.

In [9]:
baited_digest_set._baited_digest_dict['chr21']['chr21\t33167499\t33175012'].interactions['DI']['NE'][0].get_category()

'DI'

In [10]:
baited_digest_set._baited_digest_dict['chr21']['chr21\t33167499\t33175012'].interactions['DI']['NE'][0].enrichment_status_tag_pair

'NE'

In [16]:
baited_digest_set._baited_digest_dict['chr21']['chr21\t33167499\t33175012'].interactions['DI']['NE'][0].fromB

33167499

In [15]:
baited_digest_set._baited_digest_dict['chr21']['chr21\t33167499\t33175012'].interactions['DI']['NE'][0].toB

33175012

In [14]:
baited_digest_set._baited_digest_dict['chr21']['chr21\t33167499\t33175012'].interactions['DI']['NE'][0].rp_total

125

## Iterate baited digest set

The data structure allows to iterate over all baited digests and associated interactions. In this example, for each bait, the numbers of interactions that go from the bait to the left or right are determined.

In [13]:
di_baited_digests = 0
di_num_total = 0
for chrom in baited_digest_set._baited_digest_dict.keys():
    for baited_digest_key, baited_digest in baited_digest_set._baited_digest_dict[chrom].items():
        dix_num_ne = len(baited_digest.interactions['DIX']['NE'])
        dix_num_en = len(baited_digest.interactions['DIX']['EN'])
        di_num_ne = len(baited_digest.interactions['DI']['NE'])
        di_num_en = len(baited_digest.interactions['DI']['EN'])
        uir_num_ne = len(baited_digest.interactions['UIR']['NE'])
        uir_num_en = len(baited_digest.interactions['UIR']['EN'])
        ui_num_ne = len(baited_digest.interactions['UI']['NE'])
        ui_num_en = len(baited_digest.interactions['UI']['EN'])
        all_num_ne = len(baited_digest.interactions['ALL']['NE'])
        all_num_en = len(baited_digest.interactions['ALL']['EN'])
                
        print('-----------')
        print('Key: ' + baited_digest_key)
        print('Unbalanced without reference: '  + str(dix_num_ne) + ', ' + str(dix_num_en))
        print('Unbalanced: '  + str(di_num_ne) + ', ' + str(di_num_en))
        print('Balanced reference: '  + str(uir_num_ne) + ', ' + str(uir_num_en))
        print('Balanced: '  + str(ui_num_ne) + ', ' + str(ui_num_en))
        print('All: '  + str(all_num_ne) + ', ' + str(all_num_en))

-----------
Key: chr21	35717316	35721211
Unbalanced without reference: 0, 0
Unbalanced: 9, 6
Balanced reference: 12, 67
Balanced: 308, 29
All: 329, 102
-----------
Key: chr21	41739289	41741963
Unbalanced without reference: 0, 0
Unbalanced: 8, 5
Balanced reference: 2, 21
Balanced: 89, 74
All: 99, 100
-----------
Key: chr21	16031845	16036421
Unbalanced without reference: 0, 0
Unbalanced: 1, 3
Balanced reference: 0, 7
Balanced: 79, 205
All: 80, 215
-----------
Key: chr21	30209798	30216814
Unbalanced without reference: 0, 0
Unbalanced: 0, 1
Balanced reference: 1, 35
Balanced: 88, 74
All: 89, 110
-----------
Key: chr21	34667825	34671202
Unbalanced without reference: 0, 0
Unbalanced: 0, 0
Balanced reference: 0, 27
Balanced: 63, 78
All: 63, 105
-----------
Key: chr21	45234082	45236451
Unbalanced without reference: 0, 0
Unbalanced: 0, 0
Balanced reference: 8, 0
Balanced: 97, 132
All: 105, 132
-----------
Key: chr21	25937466	25943895
Unbalanced without reference: 0, 0
Unbalanced: 0, 0
Balanced 

Unbalanced: 4, 2
Balanced reference: 1, 0
Balanced: 6, 46
All: 11, 48
-----------
Key: chr21	44797276	44803934
Unbalanced without reference: 0, 0
Unbalanced: 12, 10
Balanced reference: 1, 0
Balanced: 59, 86
All: 72, 96
-----------
Key: chr21	30493703	30500222
Unbalanced without reference: 0, 0
Unbalanced: 3, 5
Balanced reference: 0, 7
Balanced: 77, 108
All: 80, 120
-----------
Key: chr21	36445539	36446180
Unbalanced without reference: 0, 0
Unbalanced: 0, 7
Balanced reference: 0, 0
Balanced: 30, 146
All: 30, 153
-----------
Key: chr21	44807647	44822197
Unbalanced without reference: 0, 0
Unbalanced: 0, 2
Balanced reference: 0, 0
Balanced: 71, 82
All: 71, 84
-----------
Key: chr21	32021567	32025593
Unbalanced without reference: 0, 0
Unbalanced: 0, 0
Balanced reference: 0, 0
Balanced: 30, 30
All: 30, 30
-----------
Key: chr21	46323276	46338460
Unbalanced without reference: 0, 0
Unbalanced: 1, 0
Balanced reference: 1, 0
Balanced: 16, 74
All: 18, 74
-----------
Key: chr21	29176986	29185161
U

Unbalanced without reference: 0, 0
Unbalanced: 0, 4
Balanced reference: 1, 0
Balanced: 48, 38
All: 49, 42
-----------
Key: chr22	22519070	22523276
Unbalanced without reference: 0, 0
Unbalanced: 3, 4
Balanced reference: 2, 0
Balanced: 35, 28
All: 40, 32
-----------
Key: chr22	45603335	45609029
Unbalanced without reference: 0, 0
Unbalanced: 0, 0
Balanced reference: 9, 0
Balanced: 143, 142
All: 152, 142
-----------
Key: chr22	41964036	41978969
Unbalanced without reference: 0, 0
Unbalanced: 1, 14
Balanced reference: 4, 0
Balanced: 52, 92
All: 57, 106
-----------
Key: chr22	42207306	42217865
Unbalanced without reference: 0, 0
Unbalanced: 0, 2
Balanced reference: 6, 0
Balanced: 66, 68
All: 72, 70
-----------
Key: chr22	40854326	40857794
Unbalanced without reference: 0, 0
Unbalanced: 1, 2
Balanced reference: 15, 0
Balanced: 173, 109
All: 189, 111
-----------
Key: chr22	43148273	43157962
Unbalanced without reference: 0, 0
Unbalanced: 5, 2
Balanced reference: 3, 0
Balanced: 30, 19
All: 38, 21
-

Unbalanced: 1, 11
Balanced reference: 0, 0
Balanced: 35, 46
All: 36, 57
-----------
Key: chr22	24464798	24477931
Unbalanced without reference: 0, 0
Unbalanced: 10, 4
Balanced reference: 0, 3
Balanced: 34, 9
All: 44, 16
-----------
Key: chr22	41698697	41704298
Unbalanced without reference: 0, 0
Unbalanced: 3, 7
Balanced reference: 0, 0
Balanced: 20, 65
All: 23, 72
-----------
Key: chr22	46149131	46160186
Unbalanced without reference: 0, 0
Unbalanced: 2, 3
Balanced reference: 0, 0
Balanced: 43, 46
All: 45, 49
-----------
Key: chr22	30324366	30328072
Unbalanced without reference: 0, 0
Unbalanced: 4, 5
Balanced reference: 0, 0
Balanced: 19, 18
All: 23, 23
-----------
Key: chr22	21735202	21760740
Unbalanced without reference: 0, 0
Unbalanced: 1, 1
Balanced reference: 0, 0
Balanced: 24, 84
All: 25, 85
-----------
Key: chr22	31063986	31067156
Unbalanced without reference: 0, 0
Unbalanced: 10, 1
Balanced reference: 0, 0
Balanced: 47, 34
All: 57, 35
-----------
Key: chr22	43341947	43345157
Unba