# Realtionship between CHC interactions that were not filtered for distance-dependent interaction frequencies and TAD boundaries

The heatmaps show that the interaction profiles are cell type specific, especially for strong interaction (with imbalanced read pair counts), but also for all other interaction categories, inculding the profiles that are obtained when all categories are combined. In the UCSC genome browser it can be guessed that the coverage, in profiles from strong interactions (`DIX`) drops at TAD boundaries. One psoosible explanation would be that interactions spread from a bait and are prevented from further spreading by structural obstacles such as TAD boundaries.

The aim here is to investigate whether interactions end more often at TAD boundaries than expected by chance. Firthermore, it will be investigated whether interactions span TAD boundaries less often than expected by chance. For this purpose, there is the module `TadBoundarySet`, which contains TAD boundaries and supports two functions:

1. `get_distance_to_nearest_tad_boundary(chr, pos) -> distance_to_next_tad`
2. `get_number_of_boundaries_spanned_by_region(chr, sta_pos, end_pos) -> number_of_spanned_tads`

The first function returns the distance to the next TAD boudary for a given position. The second function returns the number of TAD boundaries that are spanned by a given region. To process the interactions, the module `DiachromaticInteractionSet` is used.

In [3]:
import sys
import os
import numpy as np
sys.path.append("..")
from diachr import TadBoundarySet
from diachr import DiachromaticInteractionSet
from scipy import stats
import random

There are interactions that were evaluated using the `ST` or `HT` rule. An FDR of 5% was used for the `ST` rule and an FDR of 1% was used for the `HT` rule. For x and y there are interactions without `RPC` filter, with `RPC` filter and `RPC` filter complement. With the `RPC` filter all interactions in which at least one of the four read pair counts is `0` were discarded at the very beginning of the analysis (before the P-value threshold was determined) were discarded.

In [4]:
RPC_RULE = "st"
ANALYSIS='ST_FDR005'
#ANALYSIS='ST_RMRO_FDR005'
#ANALYSIS='ST_KRO_FDR005'
#RPC_RULE = "ht"
#ANALYSIS='HT_FDR001'
#ANALYSIS='HT_RMRO_FDR001'
#ANALYSIS='HT_KRO_FDR001'

There is one CHC dataset for each of the 17 cell types and, for eight of the cell types, there are HC data and TAD boundaries.

In [5]:
#CELL_TYPE_SHORT = 'MK'             # Has HC data
#CELL_TYPE_SHORT = 'ERY'           # Has HC data
#CELL_TYPE_SHORT = 'NEU'           # Has HC data
#CELL_TYPE_SHORT = 'MON'           # Has HC data
#CELL_TYPE_SHORT = 'MAC_M0'        # Has HC data
#CELL_TYPE_SHORT = 'MAC_M1'
#CELL_TYPE_SHORT = 'MAC_M2'
#CELL_TYPE_SHORT = 'EP'
CELL_TYPE_SHORT = 'NB'            # Has HC data
#CELL_TYPE_SHORT = 'TB'
#CELL_TYPE_SHORT = 'FOET'
#CELL_TYPE_SHORT = 'NCD4'          # Has HC data
#CELL_TYPE_SHORT = 'TCD4'
#CELL_TYPE_SHORT = 'NACD4'
#CELL_TYPE_SHORT = 'ACD4'
#CELL_TYPE_SHORT = 'NCD8'          # Has HC data
#CELL_TYPE_SHORT = 'TCD8'

A `TadBoundarySet` can be created with one of the eight BED files with the published TADs or a BED file with TADs from all eight cell types that was created using `BedTools`. See bash script in: `../additional_files/javierre_2016/tad_regions_hg38/`.

In [6]:
#tad_boundary_bed_file = '../additional_files/javierre_2016/tad_regions_hg38/hglft_genome_TADs_' + CELL_TYPE_SHORT + '_hg38.bed'
tad_boundary_bed_file = '../additional_files/javierre_2016/tad_regions_hg38/merged_tad_boundary_centers.bed'
chr_size_file = '../additional_files/hg38.chrom.sizes.txt'

An interaction file that was created with `DICer` is read in and reference interactions are re-selected afterwards. The selection of reference interactions can be omitted as soon as the new reference selection (no distinction between `NE` and `EN` and additional `DIX` category) is integrated into `DICer`.

In [7]:
INTERACTION_FILE = '../DICer_interactions/' + ANALYSIS.upper() + '/CHC/JAV_' + CELL_TYPE_SHORT + '_RALT_20000_' + ANALYSIS.lower() + '_evaluated_and_categorized_interactions.tsv.gz'
OUT_PREFIX = 'JAV_' + CELL_TYPE_SHORT + '_RALT_20000_' + ANALYSIS.lower()

d11_interaction_set = DiachromaticInteractionSet(rpc_rule = RPC_RULE)
d11_interaction_set.parse_file(
    i_file = INTERACTION_FILE,
    verbose = True)

report_dict = d11_interaction_set.select_reference_interactions(verbose=True) # This step is now integrated in DICer

[INFO] Parsing Diachromatic interaction file ...
	[INFO] ../DICer_interactions/ST_FDR005/CHC/JAV_NB_RALT_20000_st_fdr005_evaluated_and_categorized_interactions.tsv.gz
	[INFO] Parsed 1,000,000 interaction lines ...
	[INFO] Parsed 2,000,000 interaction lines ...
	[INFO] Parsed 3,000,000 interaction lines ...
	[INFO] Parsed 4,000,000 interaction lines ...
	[INFO] Set size: 4,652,522
[INFO] ... done.
[INFO] Select reference interactions ...
	[INFO] Treating NE and EN as one category ...
	[INFO] First pass: Count directed interactions for different read pair counts ...
	[INFO] Second pass: Select undirected reference interactions for different read pair counts ...
	[INFO] Third pass: Moving DI interactions for which there is no reference to DIX ...
[INFO] ... done.


In [11]:
# Create dictionary for chromosome sizes required for randomization
chr_size_dict = {}
with open(chr_size_file, 'rt') as fp:
    for line in fp:
        chr_key, size = line.rstrip().rsplit('\t')
        chr_size_dict[chr_key] = int(size)

In [13]:
iter_num = 20
random_range = 500000

## Test whether interactions end closer to TAD boundaries than expected by chance

Now we have everything in place to do the first analysis. We compare the distances to the next TAD for the following interaction categories:

1. `DIX`: Imbalanced interactions with high read pair counts and without counterpart in the reference interactions
2. `DI`: Imbalanced interactions with counterpart in the reference interactions
3. `URI`: Balanced reference interactions (comparable to `DI` with respect to total number and distribution of read pair numbers)
4. `UI`: Balanced interactions (remaining powered interactions)
5. `ALL`: All interaction categories combined

The following function determines the distances to the next TAD boundary for all interactions and saves them to separate lists for the various interaction categories. If the parameter `random_range` is zero, then the distance of the outermost end position of the `N` digests (`pos`) is determined. If `random_range` is different from zero, e.g. 500,000, then a postion is randomly selected from the range (`pos - random_range, pos + random_range`) and the distance to the next TAD boundary is determined for this position.

In [9]:
def determine_median_distances_to_tad_boundaries(
    d11_interaction_set: DiachromaticInteractionSet=None,
    tbs: TadBoundarySet=None,
    random_range=0):
    
    tad_dist_lists = {
        'DIX': [],
        'DI': [],
        'UIR': [],
        'UI': [],
        'ALL': [],
    }
    for d11_inter in d11_interaction_set._inter_dict.values():
        
        if d11_inter.enrichment_status_tag_pair == 'EN' or d11_inter.enrichment_status_tag_pair == 'NE':
        
            # Determine the distance to the next TAD from the outermost position of the 'N' digest
            if d11_inter.enrichment_status_tag_pair == 'NE':
                pos = d11_inter.fromA
                if random_range != 0:
                    pos = random.randint(pos - random_range, pos + random_range)

            if d11_inter.enrichment_status_tag_pair == 'EN':
                pos = d11_inter.toB
                if random_range != 0:
                    pos = random.randint(pos - random_range, pos + random_range)
                    
            if chr_size_dict[d11_inter.chrA] < pos:
                pos = chr_size_dict[d11_inter.chrA]
            if pos < 0:
                pos = 0

            dist = tbs.get_distance_to_nearest_tad_boundary(d11_inter.chrA, pos)

            # Add determined distance to list
            tad_dist_lists[d11_inter.get_category()].append(dist)
            tad_dist_lists['ALL'].append(dist)     
    
    return tad_dist_lists

### Randomize interaction end positions

In [14]:
tbs = TadBoundarySet(tad_boundary_bed_file = tad_boundary_bed_file)
tad_dist_lists = determine_median_distances_to_tad_boundaries(
    d11_interaction_set=d11_interaction_set,
    tbs=tbs)
print('Observed')
tad_dist_lists_medians = {}
for i_cat in ['DIX', 'DI', 'UIR', 'UI', 'ALL']:
    tad_dist_lists_medians[i_cat] = np.median(tad_dist_lists[i_cat])
print(tad_dist_lists_medians)
print()

print('Random')
tad_dist_median_random_dict = {
    'DIX': {
        'N': None,
        'MED_LIST': [],
    },
    'DI': {
        'N': None,
        'MED_LIST': [],
    },
    'UIR': {
        'N': None,
        'MED_LIST': [],
    },
    'UI': {
        'N': None,
        'MED_LIST': [],
    },
    'ALL': {
        'N': None,
        'MED_LIST': [],
    }
}
for random_seed in range(0,iter_num):
    tad_dist_lists_random = determine_median_distances_to_tad_boundaries(
    d11_interaction_set=d11_interaction_set,
    tbs=tbs,
    random_range=random_range)
    tad_dist_lists_random_medians = {}
    for i_cat in ['DIX', 'DI', 'UIR', 'UI', 'ALL']:
        tad_dist_median_random_dict[i_cat]['N'] = len(tad_dist_lists_random[i_cat])
        tad_dist_median_random_dict[i_cat]['MED_LIST'].append(np.median(tad_dist_lists_random[i_cat]))
        tad_dist_lists_random_medians[i_cat] = np.median(tad_dist_lists_random[i_cat])
    print(tad_dist_lists_random_medians)
    
print("Done.")

Observed
{'DIX': 58896.0, 'DI': 58645.0, 'UIR': 59516.0, 'UI': 59798.0, 'ALL': 59749.0}

Random
{'DIX': 43138.0, 'DI': 59251.5, 'UIR': 59978.5, 'UI': 60944.0, 'ALL': 60827.0}
{'DIX': 43031.0, 'DI': 59302.5, 'UIR': 59437.0, 'UI': 60935.0, 'ALL': 60796.0}
{'DIX': 39318.0, 'DI': 59135.5, 'UIR': 59667.0, 'UI': 60877.0, 'ALL': 60756.0}
{'DIX': 46985.0, 'DI': 59218.0, 'UIR': 59910.0, 'UI': 60877.0, 'ALL': 60764.0}
{'DIX': 69622.0, 'DI': 59000.0, 'UIR': 59732.0, 'UI': 60999.0, 'ALL': 60855.0}
{'DIX': 58266.0, 'DI': 58936.0, 'UIR': 60062.5, 'UI': 60909.0, 'ALL': 60792.0}
{'DIX': 56453.0, 'DI': 59047.0, 'UIR': 59741.5, 'UI': 60951.0, 'ALL': 60819.0}
{'DIX': 56061.0, 'DI': 59201.5, 'UIR': 59495.0, 'UI': 60927.0, 'ALL': 60794.0}
{'DIX': 44373.0, 'DI': 59404.0, 'UIR': 59651.0, 'UI': 60909.0, 'ALL': 60792.0}
{'DIX': 61038.0, 'DI': 58940.5, 'UIR': 59848.5, 'UI': 60892.0, 'ALL': 60763.0}
{'DIX': 55556.0, 'DI': 59355.0, 'UIR': 59597.0, 'UI': 60920.0, 'ALL': 60797.0}
{'DIX': 59173.0, 'DI': 59163.5, 'UI

In [15]:
print(OUT_PREFIX)
print(INTERACTION_FILE)
print(tad_boundary_bed_file)
print(iter_num)
print()
print('I_CAT\tN\tOBS\tMEAN_RAND\tSTD_RAND\tZ_SCORE\tST_OBS')
for i_cat in ['DIX', 'DI', 'UIR', 'UI', 'ALL']:
    n = tad_dist_median_random_dict[i_cat]['N']
    observed = tad_dist_lists_medians[i_cat]
    mean = np.mean(tad_dist_median_random_dict[i_cat]['MED_LIST'])
    std = np.std(tad_dist_median_random_dict[i_cat]['MED_LIST'])
    z_score = (observed - mean) / std
    # Find number of smaller than observed
    st_obs = 0
    for med in tad_dist_median_random_dict[i_cat]['MED_LIST']:
        #print(str(med) + '\t' + str(observed))
        if observed > med:
            st_obs += 1
    print(i_cat + '\t' + str(n) + '\t' + str(observed) + '\t' + str(mean) + '\t' + str(std) + '\t' + str(z_score) + '\t' + str(st_obs))

JAV_NB_RALT_20000_st_fdr005
../DICer_interactions/ST_FDR005/CHC/JAV_NB_RALT_20000_st_fdr005_evaluated_and_categorized_interactions.tsv.gz
../additional_files/javierre_2016/tad_regions_hg38/merged_tad_boundary_centers.bed
20

I_CAT	N	OBS	MEAN_RAND	STD_RAND	Z_SCORE	ST_OBS
DIX	81	58896.0	51545.75	8473.403111353784	0.8674495835269733	17
DI	175166	58645.0	59153.775	160.27300731876218	-3.174427238319139	0
UIR	175166	59516.0	59753.625	191.6606294339033	-1.2398216613493285	4
UI	3909076	59798.0	60909.2	43.155069227148736	-25.74900283790865	0
ALL	4259489	59749.0	60784.45	39.51262456481472	-26.205548515297732	0


```

JAV_NB_RALT_20000_st_fdr005
../DICer_interactions/ST_FDR005/CHC/JAV_NB_RALT_20000_st_fdr005_evaluated_and_categorized_interactions.tsv.gz
../additional_files/javierre_2016/tad_regions_hg38/merged_tad_boundary_centers.bed
20

I_CAT	N	OBS	MEAN_RAND	STD_RAND	Z_SCORE	ST_OBS
DIX	81	58896.0	51545.75	8473.403111353784	0.8674495835269733	17
DI	175166	58645.0	59153.775	160.27300731876218	-3.174427238319139	0
UIR	175166	59516.0	59753.625	191.6606294339033	-1.2398216613493285	4
UI	3909076	59798.0	60909.2	43.155069227148736	-25.74900283790865	0
ALL	4259489	59749.0	60784.45	39.51262456481472	-26.205548515297732	0


JAV_MAC_M0_RALT_20000_st_fdr005
../DICer_interactions/ST_FDR005/CHC/JAV_MAC_M0_RALT_20000_st_fdr005_evaluated_and_categorized_interactions.tsv.gz
../additional_files/javierre_2016/tad_regions_hg38/merged_tad_boundary_centers.bed
20

I_CAT	N	OBS	MEAN_RAND	STD_RAND	Z_SCORE	ST_OBS
DIX	138	51896.5	46422.775	5671.682139090219	0.9650972790372251	16
DI	216918	58213.0	59405.975	112.08395012221865	-10.643584551571871	0
UIR	216918	60125.0	60480.95	146.69031154101486	-2.426540623308124	0
UI	4673701	61403.0	61909.05	37.65829921810065	-13.437940918924118	0
ALL	5107675	61209.0	61735.45	34.576690124996055	-15.225575325366894	0

--------------
JAV_MON_RALT_20000_st_fdr005
../DICer_interactions/ST_FDR005/CHC/JAV_MON_RALT_20000_st_fdr005_evaluated_and_categorized_interactions.tsv.gz
../additional_files/javierre_2016/tad_regions_hg38/hglft_genome_TADs_MON_hg38.bed
20

I_CAT	N	OBS	MEAN_RAND	STD_RAND	Z_SCORE	ST_OBS
DIX	37	200735.0	179635.85	41347.70606971443	0.5102858660266595	14
DI	90824	205170.0	212639.15	595.2475766435341	-12.547972126349233	0
UIR	90824	208340.0	212788.1	724.434910809798	-6.140096140628795	0
UI	3344705	211594.0	217531.4	155.45623178245378	-38.19338685829476	0
ALL	3526390	211366.0	217278.1	148.07089518200394	-39.927495492838375	0

JAV_MAC_M0_RALT_20000_st_fdr005
../DICer_interactions/ST_FDR005/CHC/JAV_MAC_M0_RALT_20000_st_fdr005_evaluated_and_categorized_interactions.tsv.gz
../additional_files/javierre_2016/tad_regions_hg38/hglft_genome_TADs_MAC_M0_hg38.bed
20

I_CAT	N	OBS	MEAN_RAND	STD_RAND	Z_SCORE	ST_OBS
DIX	138	133149.5	136747.55	14811.205536603022	-0.24292755853722411	7
DI	216918	152211.0	161255.375	316.69866888731946	-28.558298119080394	0
UIR	216918	156285.0	164285.35	451.42073224432215	-17.722602061772346	0
UI	4673701	162150.0	171344.2	72.76853715720827	-126.34856160619215	0
ALL	5107675	161452.0	170593.1	65.62537618939795	-139.29215390123414	0

JAV_NCD4_RALT_20000_st_fdr005
../DICer_interactions/ST_FDR005/CHC/JAV_NCD4_RALT_20000_st_fdr005_evaluated_and_categorized_interactions.tsv.gz
../additional_files/javierre_2016/tad_regions_hg38/hglft_genome_TADs_NCD4_hg38.bed
20

I_CAT	N	OBS	MEAN_RAND	STD_RAND	Z_SCORE	ST_OBS
DIX	210	122568.0	150617.425	9931.569739063156	-2.8242690467827183	0
DI	217298	192553.0	199531.0	497.2685642587916	-14.032658610546042	0
UIR	217298	201217.5	207851.2	518.5356159416632	-12.793142449729668	0
UI	4934351	209525.0	217810.7	113.76559233792968	-72.83133528974334	0
ALL	5369157	208458.0	216605.05	104.53921513001713	-77.93295549299246	0

JAV_NCD8_RALT_20000_st_fdr005
../DICer_interactions/ST_FDR005/CHC/JAV_NCD8_RALT_20000_st_fdr005_evaluated_and_categorized_interactions.tsv.gz
../additional_files/javierre_2016/tad_regions_hg38/hglft_genome_TADs_NCD8_hg38.bed
20

I_CAT	N	OBS	MEAN_RAND	STD_RAND	Z_SCORE	ST_OBS
DIX	167	108759.0	132903.55	9416.696642002438	-2.5640148470223743	1
DI	217428	178673.0	184506.75	332.47783760726065	-17.54628230857034	0
UIR	217428	186831.0	191629.675	414.88924049076036	-11.566159185819753	0
UI	4367028	193212.0	200521.425	98.90466053225195	-73.90374690802814	0
ALL	4802051	192174.0	199321.05	100.12865473978965	-71.37866796047003	0

JAV_NEU_RALT_20000_st_fdr005
../DICer_interactions/ST_FDR005/CHC/JAV_NEU_RALT_20000_st_fdr005_evaluated_and_categorized_interactions.tsv.gz
../additional_files/javierre_2016/tad_regions_hg38/hglft_genome_TADs_NEU_hg38.bed
20

I_CAT	N	OBS	MEAN_RAND	STD_RAND	Z_SCORE	ST_OBS
DIX	12	193656.0	179350.575	38827.74146382809	0.3684330960462101	14
DI	79804	270834.5	272469.05	948.5333270370631	-1.7232393985626608	1
UIR	79804	270146.0	274522.125	932.4110932818206	-4.693342916585527	0
UI	3400441	275960.0	280248.05	135.8754852797222	-31.55867293627196	0
ALL	3560061	275721.0	279944.7	136.01180095859328	-31.053923043676576	0

JAV_ERY_RALT_20000_st_fdr005
../DICer_interactions/ST_FDR005/CHC/JAV_ERY_RALT_20000_st_fdr005_evaluated_and_categorized_interactions.tsv.gz
../additional_files/javierre_2016/tad_regions_hg38/hglft_genome_TADs_ERY_hg38.bed
20

I_CAT	N	OBS	MEAN_RAND	STD_RAND	Z_SCORE	ST_OBS
DIX	51	118830.0	174781.15	30325.463339040674	-1.8450220982433956	0
DI	183963	181009.0	189398.55	491.4018187797029	-17.072688132969756	0
UIR	183963	187943.0	194958.05	496.9970296691923	-14.114873090226107	0
UI	4514222	197698.0	204617.375	118.90730370755196	-58.191337152997285	0
ALL	4882199	196603.0	203639.4	103.05406348126212	-68.2787244122537	0

JAV_MK_RALT_20000_st_fdr005
../additional_files/javierre_2016/tad_regions_hg38/hglft_genome_TADs_MK_hg38.bed
20

I_CAT	N	OBS	MEAN_RAND	STD_RAND	Z_SCORE	ST_OBS
DIX	145	128608.0	141195.0	13362.456203857135	-0.9419675400969101	4
DI	197902	162991.5	172641.7	457.9521645324978	-21.0725065790473	0
UIR	197902	169522.0	178017.15	429.05131103400674	-19.799846269032063	0
UI	4523334	178681.0	186882.075	121.2007915609465	-67.66519339006177	0
ALL	4919283	177604.0	185897.4	117.71465499248596	-70.45341975924225	0

JAV_NB_RALT_20000_st_fdr005
../additional_files/javierre_2016/tad_regions_hg38/hglft_genome_TADs_NB_hg38.bed

I_CAT	N	OBS	MEAN_RAND	STD_RAND	Z_SCORE	ST_OBS
DIX	81	128562.0	151359.4	17254.790524373224	-1.321221487319569	1
DI	175166	186427.0	192627.425	473.5757139835192	-13.092784990692676	0
UIR	175166	194319.5	200046.1	579.1875689273726	-9.887297841363193	0
UI	3909076	199846.0	208633.2	116.35230981806939	-75.5223511569288	0
ALL	4259489	199049.0	207580.45	109.96430102537823	-77.58381511497146	0
```

### Randomize TAD boundaries 1

For each chromosome, a corresponding number of random TAD boundaries is randomly selected from the entire sequence of the chromosome.

In [13]:
tbs = TadBoundarySet(tad_boundary_bed_file = tad_boundary_bed_file, chr_size_file = chr_size_file)
tad_dist_lists = determine_median_distances_to_tad_boundaries(
    d11_interaction_set=d11_interaction_set,
    tbs=tbs)
print('Observed')
tad_dist_lists_medians = {}
for i_cat in ['DIX', 'DI', 'UIR', 'UI', 'ALL']:
    tad_dist_lists_medians[i_cat] = np.median(tad_dist_lists[i_cat])
print(tad_dist_lists_medians)
print()

print('Random')
tad_dist_median_random_dict = {
    'DIX': {
        'N': None,
        'MED_LIST': [],
    },
    'DI': {
        'N': None,
        'MED_LIST': [],
    },
    'UIR': {
        'N': None,
        'MED_LIST': [],
    },
    'UI': {
        'N': None,
        'MED_LIST': [],
    },
    'ALL': {
        'N': None,
        'MED_LIST': [],
    }
}
for random_seed in range(0,iter_num):
    tbs.randomize_tad_boundaries(random_seed=random_seed)
    tad_dist_lists_random = determine_median_distances_to_tad_boundaries(
    d11_interaction_set=d11_interaction_set,
    tbs=tbs,
    random_range=0)
    tad_dist_lists_random_medians = {}
    for i_cat in ['DIX', 'DI', 'UIR', 'UI', 'ALL']:
        tad_dist_median_random_dict[i_cat]['N'] = len(tad_dist_lists_random[i_cat])
        tad_dist_median_random_dict[i_cat]['MED_LIST'].append(np.median(tad_dist_lists_random[i_cat]))
        tad_dist_lists_random_medians[i_cat] = np.median(tad_dist_lists_random[i_cat])
    print(tad_dist_lists_random_medians)
    
print("Done.")

Observed
{'DIX': 128608.0, 'DI': 162991.5, 'UIR': 169522.0, 'UI': 178681.0, 'ALL': 177604.0}

Random
{'DIX': 296661.0, 'DI': 298918.0, 'UIR': 300871.5, 'UI': 303926.0, 'ALL': 303608.0}
{'DIX': 303743.0, 'DI': 310083.0, 'UIR': 302277.5, 'UI': 299109.0, 'ALL': 299661.0}
{'DIX': 242885.0, 'DI': 296219.5, 'UIR': 294244.0, 'UI': 296194.0, 'ALL': 296095.0}
{'DIX': 423410.0, 'DI': 308715.5, 'UIR': 300054.5, 'UI': 302787.0, 'ALL': 302938.0}
{'DIX': 331878.0, 'DI': 291122.0, 'UIR': 291961.5, 'UI': 294379.5, 'ALL': 294150.0}
{'DIX': 247448.0, 'DI': 294308.0, 'UIR': 293828.0, 'UI': 295586.0, 'ALL': 295442.0}
{'DIX': 269884.0, 'DI': 295994.0, 'UIR': 292969.5, 'UI': 296547.0, 'ALL': 296392.0}
{'DIX': 244068.0, 'DI': 302101.0, 'UIR': 299014.5, 'UI': 299712.0, 'ALL': 299752.0}
{'DIX': 313341.0, 'DI': 311350.5, 'UIR': 306983.0, 'UI': 309050.0, 'ALL': 309056.0}
{'DIX': 330603.0, 'DI': 298013.5, 'UIR': 299906.0, 'UI': 301131.0, 'ALL': 300974.0}
{'DIX': 235847.0, 'DI': 298402.0, 'UIR': 293057.0, 'UI': 29

In [14]:
print(OUT_PREFIX)
print(tad_boundary_bed_file)
print(iter_num)
print()
print('I_CAT\tN\tOBS\tMEAN_RAND\tSTD_RAND\tZ_SCORE\tST_OBS')
for i_cat in ['DIX', 'DI', 'UIR', 'UI', 'ALL']:
    n = tad_dist_median_random_dict[i_cat]['N']
    observed = tad_dist_lists_medians[i_cat]
    mean = np.mean(tad_dist_median_random_dict[i_cat]['MED_LIST'])
    std = np.std(tad_dist_median_random_dict[i_cat]['MED_LIST'])
    z_score = (observed - mean) / std
    # Find number of smaller than observed
    st_obs = 0
    for med in tad_dist_median_random_dict[i_cat]['MED_LIST']:
        #print(str(med) + '\t' + str(observed))
        if observed > med:
            st_obs += 1
    print(i_cat + '\t' + str(n) + '\t' + str(observed) + '\t' + str(mean) + '\t' + str(std) + '\t' + str(z_score) + '\t' + str(st_obs))

JAV_MK_RALT_20000_st_fdr005
../additional_files/javierre_2016/tad_regions_hg38/hglft_genome_TADs_MK_hg38.bed
20

I_CAT	N	OBS	MEAN_RAND	STD_RAND	Z_SCORE	ST_OBS
DIX	145	128608.0	298736.65	50187.094998888904	-3.3898883767583383	0
DI	197902	162991.5	297990.275	7103.553597452123	-19.004400142545684	0
UIR	197902	169522.0	296137.6	5221.986888627737	-24.24663307289014	0
UI	4523334	178681.0	297664.05	4549.514806273302	-26.152909720380485	0
ALL	4919283	177604.0	297616.1	4620.586963795833	-25.973345148645247	0


### Randomize TAD boundaries 2

For each TAD boundary, a random postion is selected from the sourrounding sequence.

In [45]:
tbs = TadBoundarySet(tad_boundary_bed_file = tad_boundary_bed_file, chr_size_file = chr_size_file)
tad_dist_lists = determine_median_distances_to_tad_boundaries(
    d11_interaction_set=d11_interaction_set,
    tbs=tbs)
print('Observed')
tad_dist_lists_medians = {}
for i_cat in ['DIX', 'DI', 'UIR', 'UI', 'ALL']:
    tad_dist_lists_medians[i_cat] = np.median(tad_dist_lists[i_cat])
print(tad_dist_lists_medians)
print()

print('Random')
tad_dist_median_random_dict = {
    'DIX': {
        'N': None,
        'MED_LIST': [],
    },
    'DI': {
        'N': None,
        'MED_LIST': [],
    },
    'UIR': {
        'N': None,
        'MED_LIST': [],
    },
    'UI': {
        'N': None,
        'MED_LIST': [],
    },
    'ALL': {
        'N': None,
        'MED_LIST': [],
    }
}
for random_seed in range(0,iter_num):
    tbs.randomize_tad_boundaries_2(random_seed=random_seed, random_range=random_range)
    tad_dist_lists_random = determine_median_distances_to_tad_boundaries(
    d11_interaction_set=d11_interaction_set,
    tbs=tbs,
    random_range=0)
    tad_dist_lists_random_medians = {}
    for i_cat in ['DIX', 'DI', 'UIR', 'UI', 'ALL']:
        tad_dist_median_random_dict[i_cat]['N'] = len(tad_dist_lists_random[i_cat])
        tad_dist_median_random_dict[i_cat]['MED_LIST'].append(np.median(tad_dist_lists_random[i_cat]))
        tad_dist_lists_random_medians[i_cat] = np.median(tad_dist_lists_random[i_cat])
    print(tad_dist_lists_random_medians)
    
print("Done.")

Observed
{'DIX': 128562.0, 'DI': 186427.0, 'UIR': 194319.5, 'UI': 199846.0, 'ALL': 199049.0}

Random
{'DIX': 150117.0, 'DI': 217394.5, 'UIR': 226447.5, 'UI': 235988.0, 'ALL': 234799.0}
{'DIX': 146286.0, 'DI': 235282.5, 'UIR': 241415.0, 'UI': 251489.0, 'ALL': 250482.0}
{'DIX': 135945.0, 'DI': 242592.0, 'UIR': 248948.5, 'UI': 257017.0, 'ALL': 256112.0}
{'DIX': 168930.0, 'DI': 241999.5, 'UIR': 249020.5, 'UI': 256719.0, 'ALL': 255790.0}
{'DIX': 158216.0, 'DI': 244722.5, 'UIR': 249183.0, 'UI': 257688.0, 'ALL': 256775.0}
{'DIX': 171807.0, 'DI': 248006.0, 'UIR': 253008.0, 'UI': 261583.0, 'ALL': 260692.0}
{'DIX': 231760.0, 'DI': 252148.0, 'UIR': 254994.0, 'UI': 263784.0, 'ALL': 262979.0}
{'DIX': 178029.0, 'DI': 251629.0, 'UIR': 256428.5, 'UI': 263833.5, 'ALL': 263056.0}
{'DIX': 247621.0, 'DI': 259031.0, 'UIR': 263263.5, 'UI': 269169.0, 'ALL': 268514.0}
{'DIX': 269242.0, 'DI': 262915.0, 'UIR': 263525.0, 'UI': 269440.0, 'ALL': 268955.0}
{'DIX': 225158.0, 'DI': 262242.5, 'UIR': 263955.5, 'UI': 26

In [46]:
print(OUT_PREFIX)
print(tad_boundary_bed_file)
print()
print('I_CAT\tN\tOBS\tMEAN_RAND\tSTD_RAND\tZ_SCORE\tST_OBS')
for i_cat in ['DIX', 'DI', 'UIR', 'UI', 'ALL']:
    n = tad_dist_median_random_dict[i_cat]['N']
    observed = tad_dist_lists_medians[i_cat]
    mean = np.mean(tad_dist_median_random_dict[i_cat]['MED_LIST'])
    std = np.std(tad_dist_median_random_dict[i_cat]['MED_LIST'])
    z_score = (observed - mean) / std
    # Find number of smaller than observed
    st_obs = 0
    for med in tad_dist_median_random_dict[i_cat]['MED_LIST']:
        #print(str(med) + '\t' + str(observed))
        if observed > med:
            st_obs += 1
    print(i_cat + '\t' + str(n) + '\t' + str(observed) + '\t' + str(mean) + '\t' + str(std) + '\t' + str(z_score) + '\t' + str(st_obs))

JAV_NB_RALT_20000_st_fdr005
../additional_files/javierre_2016/tad_regions_hg38/hglft_genome_TADs_NB_hg38.bed

I_CAT	N	OBS	MEAN_RAND	STD_RAND	Z_SCORE	ST_OBS
DIX	81	128562.0	224812.1	67680.10732622992	-1.422132792078147	0
DI	175166	186427.0	256600.625	14281.563680034305	-4.913581353707302	0
UIR	175166	194319.5	261481.325	13442.250388490575	-4.9963230157879535	0
UI	3909076	199846.0	268453.475	12263.15559559916	-5.594602014559851	0
ALL	4259489	199049.0	267670.4	12369.512494031444	-5.547623645888335	0


## Test whether interactions span TAD boundaries less often than expected by chance, taking into account their length

In the second analysis it is investigated how often interctions span TAD boundaries.

In [16]:
def get_sum_of_spanned_boundaies_and_total_length(
    d11_interaction_set: DiachromaticInteractionSet=None,
    tbs: TadBoundarySet=None,
    random_flip_interaction=False):
        
    spanned_boundary_length_dict = {
        'DIX': {
            'I_NUM': 0, 
            'I_DIST': 0, 
            'SB_NUM': 0       
        },
            'DI': {
            'I_NUM': 0,
            'I_DIST': 0, 
            'SB_NUM': 0        
        },
        'UIR': {
            'I_NUM': 0,
            'I_DIST': 0, 
            'SB_NUM': 0          
        },
        'UI': {
            'I_NUM': 0,
            'I_DIST': 0, 
            'SB_NUM': 0          
        },
        'ALL': {
            'I_NUM': 0,
            'I_DIST': 0, 
            'SB_NUM': 0      
        }
    }
    warn_count = 0
    for d11_inter in d11_interaction_set._inter_dict.values():

        if d11_inter.enrichment_status_tag_pair == 'NE' or d11_inter.enrichment_status_tag_pair == 'EN':

            warn = False
            i_dist = d11_inter.i_dist
            
             # Get interaction distance and number of spanned TAD boundaries
            if random_flip_interaction and random.uniform(0,1) <= 0.5:
                
                if d11_inter.enrichment_status_tag_pair == 'NE':
                    if chr_size_dict[d11_inter.chrA] < d11_inter.toB + i_dist:
                        end_pos = chr_size_dict[d11_inter.chrA]
                        warn_count += 1
                        warn = True
                    else:
                        end_pos = d11_inter.toB + i_dist
                    sb_num = tbs.get_number_of_boundaries_spanned_by_region(d11_inter.chrA, d11_inter.fromB, end_pos)

                else: # EN
                    if d11_inter.fromA - i_dist < 0:
                        sta_pos = 0
                        warn_count += 1
                        warn = True
                    else:
                        sta_pos = d11_inter.fromA - i_dist
                    sb_num = tbs.get_number_of_boundaries_spanned_by_region(d11_inter.chrA, sta_pos, d11_inter.toA)
            else:        
                sb_num = tbs.get_number_of_boundaries_spanned_by_region(d11_inter.chrA, d11_inter.toA, d11_inter.fromB)

            if not warn: # Skip interactions that cannot be flipped
                # Increment numbers for interaction category
                spanned_boundary_length_dict[d11_inter.get_category()]['I_NUM'] += 1
                spanned_boundary_length_dict[d11_inter.get_category()]['I_DIST'] += i_dist
                spanned_boundary_length_dict[d11_inter.get_category()]['SB_NUM'] += sb_num

                # Increment numbers for all interaction categories combined
                spanned_boundary_length_dict['ALL']['I_NUM'] += 1
                spanned_boundary_length_dict['ALL']['I_DIST'] += i_dist
                spanned_boundary_length_dict['ALL']['SB_NUM'] += sb_num
            
    print(warn_count)        
    return spanned_boundary_length_dict

### Flip intactions at baits

In [17]:
tbs = TadBoundarySet(tad_boundary_bed_file = tad_boundary_bed_file, chr_size_file = chr_size_file)

spanned_boundary_length_obs_dict = get_sum_of_spanned_boundaies_and_total_length(
    d11_interaction_set=d11_interaction_set,
    tbs=tbs)

spanned_boundary_length_random_lists_dict = {
    'DIX': {
        'I_NUM': [], 
        'I_DIST': [], 
        'SB_NUM': []       
    },
        'DI': {
        'I_NUM': [],
        'I_DIST': [], 
        'SB_NUM': []        
    },
    'UIR': {
        'I_NUM': [],
        'I_DIST': [], 
        'SB_NUM': []          
    },
    'UI': {
        'I_NUM': [],
        'I_DIST': [], 
        'SB_NUM': []          
    },
    'ALL': {
        'I_NUM': [],
        'I_DIST': [], 
        'SB_NUM': []      
    }
}
for random_seed in range(0,iter_num):
    print('Iteration: ' + str(random_seed))
    spanned_boundary_length_random_dict = get_sum_of_spanned_boundaies_and_total_length(
    d11_interaction_set=d11_interaction_set,
    tbs=tbs,
    random_flip_interaction=True)
    for i_cat in ['DIX', 'DI', 'UIR', 'UI', 'ALL']:
        for num_type in ['I_NUM', 'I_DIST', 'SB_NUM']:
            spanned_boundary_length_random_lists_dict[i_cat][num_type].append(spanned_boundary_length_random_dict[i_cat][num_type])
    
print("Done.")

0
Iteration: 0
7308
Iteration: 1
7346
Iteration: 2
7276
Iteration: 3
7154
Iteration: 4
7343
Iteration: 5
7388
Iteration: 6
7281
Iteration: 7
7314
Iteration: 8
7223
Iteration: 9
7306
Iteration: 10
7246
Iteration: 11
7442
Iteration: 12
7383
Iteration: 13
7260
Iteration: 14
7219
Iteration: 15
7308
Iteration: 16
7242
Iteration: 17
7403
Iteration: 18
7358
Iteration: 19
7183
Done.


In [18]:
print(OUT_PREFIX)
print(tad_boundary_bed_file)
print(iter_num)
print(INTERACTION_FILE)
print()
print('I_CAT\tN\tOBS\tMEAN_RAND\tSTD_RAND\tZ_SCORE\tST_OBS')
for i_cat in ['DIX', 'DI', 'UIR', 'UI', 'ALL']:
    n = len(spanned_boundary_length_random_lists_dict[i_cat]['I_NUM'])
    observed = (spanned_boundary_length_obs_dict[i_cat]['SB_NUM'] / spanned_boundary_length_obs_dict[i_cat]['I_DIST'])
    mean = np.mean(spanned_boundary_length_random_lists_dict[i_cat]['SB_NUM']) / spanned_boundary_length_obs_dict[i_cat]['I_DIST']
    std = np.std(spanned_boundary_length_random_lists_dict[i_cat]['SB_NUM']) / spanned_boundary_length_obs_dict[i_cat]['I_DIST']
    z_score = (observed - mean) / std
    st_obs = 0 # Find number of smaller than observed
    for sb in spanned_boundary_length_random_lists_dict[i_cat]['SB_NUM']:
        if sb < spanned_boundary_length_obs_dict[i_cat]['SB_NUM']:
            st_obs += 1
    print(i_cat + '\t' + str(n) + '\t' + str(1000000*observed) + '\t' + str(1000000*mean) + '\t' + str(1000000*std) + '\t' + str(z_score) + '\t' + str(st_obs))
    #print(spanned_boundary_length_random_lists_dict)

JAV_NB_RALT_20000_st_fdr005
../additional_files/javierre_2016/tad_regions_hg38/merged_tad_boundary_centers.bed
20
../DICer_interactions/ST_FDR005/CHC/JAV_NB_RALT_20000_st_fdr005_evaluated_and_categorized_interactions.tsv.gz

I_CAT	N	OBS	MEAN_RAND	STD_RAND	Z_SCORE	ST_OBS
DIX	20	2.2680538633926415	5.7938466873939305	0.9558261034854636	-3.68873878956049	0
DI	20	4.364294473858813	5.028743424164841	0.02876497938496734	-23.099232626367588	0
UIR	20	4.67158606838679	5.07871624707811	0.012572621570107844	-32.38228212159797	0
UI	20	4.84583980753302	4.9554290074603395	0.0028721812393234856	-38.15539159817525	0
ALL	20	4.8336699597801935	4.959779548955314	0.00268866729489603	-46.90412585243189	0


```
JAV_NB_RALT_20000_st_fdr005
../additional_files/javierre_2016/tad_regions_hg38/merged_tad_boundary_centers.bed
20
../DICer_interactions/ST_FDR005/CHC/JAV_NB_RALT_20000_st_fdr005_evaluated_and_categorized_interactions.tsv.gz

I_CAT	N	OBS	MEAN_RAND	STD_RAND	Z_SCORE	ST_OBS
DIX	20	2.2680538633926415	5.7938466873939305	0.9558261034854636	-3.68873878956049	0
DI	20	4.364294473858813	5.028743424164841	0.02876497938496734	-23.099232626367588	0
UIR	20	4.67158606838679	5.07871624707811	0.012572621570107844	-32.38228212159797	0
UI	20	4.84583980753302	4.9554290074603395	0.0028721812393234856	-38.15539159817525	0
ALL	20	4.8336699597801935	4.959779548955314	0.00268866729489603	-46.90412585243189	0

JAV_MAC_M0_RALT_20000_st_fdr005
../additional_files/javierre_2016/tad_regions_hg38/merged_tad_boundary_centers.bed
20
../DICer_interactions/ST_FDR005/CHC/JAV_MAC_M0_RALT_20000_st_fdr005_evaluated_and_categorized_interactions.tsv.gz

I_CAT	N	OBS	MEAN_RAND	STD_RAND	Z_SCORE	ST_OBS
DIX	20	3.3212711224194242	5.8952562422944785	0.6022586898495712	-4.273886227391704	0
DI	20	4.262601807709271	5.101693557002013	0.02128227252911093	-39.42679279879493	0
UIR	20	4.626280291133495	5.123368260830427	0.005488468764225628	-90.56951784749133	0
UI	20	4.805791353721792	5.035608742059688	0.0010225778575759445	-224.74316907534697	0
ALL	20	4.79244624440815	5.038999955928377	0.001113947284469102	-221.33337453013544	0

--------------------------------
JAV_MON_RALT_20000_st_fdr005
../additional_files/javierre_2016/tad_regions_hg38/hglft_genome_TADs_MON_hg38.bed
20
../DICer_interactions/ST_FDR005/CHC/JAV_MON_RALT_20000_st_fdr005_evaluated_and_categorized_interactions.tsv.gz

I_CAT	N	OBS	MEAN_RAND	STD_RAND	Z_SCORE	ST_OBS
DIX	20	0.0	0.8759961137626953	0.6772963480079356	-1.293371972754874	0
DI	20	0.9438654693255323	1.2367570653718736	0.014084985313977479	-20.79459719107306	0
UIR	20	1.0857996332084487	1.2560550888213249	0.004020824096168021	-42.34342302493046	0
UI	20	1.157758689062913	1.2401269635876475	0.00046422363135255123	-177.4323170165815	0
ALL	20	1.1542793047574706	1.240371908608981	0.0004507791693144821	-190.986207242972	0

JAV_MAC_M0_RALT_20000_st_fdr005
../additional_files/javierre_2016/tad_regions_hg38/hglft_genome_TADs_MAC_M0_hg38.bed
20
../DICer_interactions/ST_FDR005/CHC/JAV_MAC_M0_RALT_20000_st_fdr005_evaluated_and_categorized_interactions.tsv.gz

I_CAT	N	OBS	MEAN_RAND	STD_RAND	Z_SCORE	ST_OBS
DIX	20	0.968704077372332	1.9581660992597856	0.24078817972484812	-4.109263266237259	0
DI	20	1.26062852098162	1.6439849874604768	0.00788049558487513	-48.64623834250031	0
UIR	20	1.3757087629768459	1.5938912614992025	0.002734584801161256	-79.78633481386436	0
UI	20	1.449247070386364	1.545697797645053	0.0003197890227431707	-301.60737360941465	0
ALL	20	1.4443130341349286	1.5485337877301975	0.0003640496150804199	-286.2817299566323	0


JAV_NCD4_RALT_20000_st_fdr005
../additional_files/javierre_2016/tad_regions_hg38/hglft_genome_TADs_NCD4_hg38.bed
20
../DICer_interactions/ST_FDR005/CHC/JAV_NCD4_RALT_20000_st_fdr005_evaluated_and_categorized_interactions.tsv.gz

I_CAT	N	OBS	MEAN_RAND	STD_RAND	Z_SCORE	ST_OBS
DIX	20	0.4282426727892807	1.6316045833271595	0.18661752413126842	-6.448279260694849	0
DI	20	1.0918660887378182	1.2190044351976965	0.007483964119885127	-16.988102083769746	0
UIR	20	1.1546177307175647	1.186243717757414	0.003155827141119062	-10.02145733135271	0
UI	20	1.1713755418237153	1.1023382037823994	0.000668232792036751	103.31330468068212	20
ALL	20	1.1699374864674181	1.1057405605429893	0.0006154992655536579	104.30057274996416	20

JAV_NCD8_RALT_20000_st_fdr005
../additional_files/javierre_2016/tad_regions_hg38/hglft_genome_TADs_NCD8_hg38.bed
20
../DICer_interactions/ST_FDR005/CHC/JAV_NCD8_RALT_20000_st_fdr005_evaluated_and_categorized_interactions.tsv.gz

I_CAT	N	OBS	MEAN_RAND	STD_RAND	Z_SCORE	ST_OBS
DIX	20	0.45841914844288195	2.4238912473917384	0.41365025812491085	-4.751531179645338	0
DI	20	1.1447222311670464	1.3698231855548757	0.006374844635593466	-35.31081418533638	0
UIR	20	1.2276180996211274	1.3188639387153267	0.0035420682527454246	-25.760610068277366	0
UI	20	1.2677311952309867	1.2441174117237186	0.0006698280169637097	35.25350225615822	20
ALL	20	1.2647317790039228	1.2480725458346926	0.0006553916762371948	25.418743895064388	20

JAV_NEU_RALT_20000_st_fdr005
../additional_files/javierre_2016/tad_regions_hg38/hglft_genome_TADs_NEU_hg38.bed
20
../DICer_interactions/ST_FDR005/CHC/JAV_NEU_RALT_20000_st_fdr005_evaluated_and_categorized_interactions.tsv.gz

I_CAT	N	OBS	MEAN_RAND	STD_RAND	Z_SCORE	ST_OBS
DIX	20	0.0	0.0	0.0	nan	0
DI	20	0.8250655766728365	0.9936054481875968	0.01080110761996891	-15.603943358842805	0
UIR	20	0.9469371354746864	1.007372834514544	0.002475013175686305	-24.41833426729106	0
UI	20	0.9899389562884163	0.9918781258834157	0.0006203769670280962	-3.1257923779616434	0
ALL	20	0.9880243476501527	0.9921221741029469	0.0006085008676465654	-6.734298454894475	0

JAV_ERY_RALT_20000_st_fdr005
../additional_files/javierre_2016/tad_regions_hg38/hglft_genome_TADs_ERY_hg38.bed
20
../DICer_interactions/ST_FDR005/CHC/JAV_ERY_RALT_20000_st_fdr005_evaluated_and_categorized_interactions.tsv.gz

I_CAT	N	OBS	MEAN_RAND	STD_RAND	Z_SCORE	ST_OBS
DIX	20	1.88473791070206	2.120330149539818	0.6746335039151355	-0.3492151478847893	5
DI	20	1.1210003824177888	1.4561424507270258	0.008496469817955881	-39.44486068802007	0
UIR	20	1.225603786126091	1.4228858811070217	0.003196462768466712	-61.71887779426985	0
UI	20	1.274857214972639	1.3796642012502223	0.000541274490682519	-193.63001227977156	0
ALL	20	1.2710320809227311	1.382058629494515	0.0005837357440644427	-190.20001721793983	0

JAV_MK_RALT_20000_st_fdr005
../additional_files/javierre_2016/tad_regions_hg38/hglft_genome_TADs_MK_hg38.bed
20

I_CAT	N	OBS	MEAN_RAND	STD_RAND	Z_SCORE	ST_OBS
DIX	20	0.19921300900791422	1.6235860234145012	0.3055501571079046	-4.6616667714674795	0
DI	20	1.165488985611845	1.5614443667686007	0.007249445430852096	-54.61871324276112	0
UIR	20	1.2784624730896113	1.5109875209568528	0.002055189068001522	-113.14046551120983	0
UI	20	1.3511935003644306	1.4732999925312198	0.0004947958381685606	-246.7815667543903	0
ALL	20	1.3461871526998521	1.475758315287303	0.00047891406055783825	-270.55201185057405	0

JAV_NB_RALT_20000_st_fdr005
../additional_files/javierre_2016/tad_regions_hg38/hglft_genome_TADs_NB_hg38.bed

I_CAT	N	OBS	MEAN_RAND	STD_RAND	Z_SCORE	ST_OBS
DIX	20	0.0	1.9381551196264393	0.49827271094476494	-3.889747676431129	0
DI	20	1.0599907268625508	1.3336106204636584	0.006941041888693551	-39.42057950216586	0
UIR	20	1.155667408654959	1.3003940753988346	0.00429908771772739	-33.664506575916505	0
UI	20	1.2060873901624474	1.2398345701933975	0.0008279438546463757	-40.76022769124121	0
ALL	20	1.2024574259256016	1.242891986917607	0.0008556324119404232	-47.2569299943968	0
```

### Randomize TAD boundaries 1

For each chromosome, a corresponding number of random TAD boundaries is randomly selected from the entire sequence of the chromosome.

In [18]:
tbs = TadBoundarySet(tad_boundary_bed_file = tad_boundary_bed_file, chr_size_file = chr_size_file)

spanned_boundary_length_obs_dict = get_sum_of_spanned_boundaies_and_total_length(
    d11_interaction_set=d11_interaction_set,
    tbs=tbs)

spanned_boundary_length_random_lists_dict = {
    'DIX': {
        'I_NUM': [], 
        'I_DIST': [], 
        'SB_NUM': []       
    },
        'DI': {
        'I_NUM': [],
        'I_DIST': [], 
        'SB_NUM': []        
    },
    'UIR': {
        'I_NUM': [],
        'I_DIST': [], 
        'SB_NUM': []          
    },
    'UI': {
        'I_NUM': [],
        'I_DIST': [], 
        'SB_NUM': []          
    },
    'ALL': {
        'I_NUM': [],
        'I_DIST': [], 
        'SB_NUM': []      
    }
}
for random_seed in range(0,iter_num):
    print('Iteration: ' + str(random_seed))
    tbs.randomize_tad_boundaries(random_seed=random_seed)
    spanned_boundary_length_random_dict = get_sum_of_spanned_boundaies_and_total_length(
    d11_interaction_set=d11_interaction_set,
    tbs=tbs)
    for i_cat in ['DIX', 'DI', 'UIR', 'UI', 'ALL']:
        for num_type in ['I_NUM', 'I_DIST', 'SB_NUM']:
            spanned_boundary_length_random_lists_dict[i_cat][num_type].append(spanned_boundary_length_random_dict[i_cat][num_type])

print("Done.")

Iteration: 0
Iteration: 1
Iteration: 2
Iteration: 3
Iteration: 4
Iteration: 5
Iteration: 6
Iteration: 7
Iteration: 8
Iteration: 9
Iteration: 10
Iteration: 11
Iteration: 12
Iteration: 13
Iteration: 14
Iteration: 15
Iteration: 16
Iteration: 17
Iteration: 18
Iteration: 19
Iteration: 20
Iteration: 21
Iteration: 22
Iteration: 23
Iteration: 24
Iteration: 25
Iteration: 26
Iteration: 27
Iteration: 28
Iteration: 29
Iteration: 30
Iteration: 31
Iteration: 32
Iteration: 33
Iteration: 34
Iteration: 35
Iteration: 36
Iteration: 37
Iteration: 38
Iteration: 39
Iteration: 40
Iteration: 41
Iteration: 42
Iteration: 43
Iteration: 44
Iteration: 45
Iteration: 46
Iteration: 47
Iteration: 48
Iteration: 49
Iteration: 50
Iteration: 51
Iteration: 52
Iteration: 53
Iteration: 54
Iteration: 55
Iteration: 56
Iteration: 57
Iteration: 58
Iteration: 59
Iteration: 60
Iteration: 61
Iteration: 62
Iteration: 63
Iteration: 64
Iteration: 65
Iteration: 66
Iteration: 67
Iteration: 68
Iteration: 69
Iteration: 70
Iteration: 71
It

In [19]:
print(OUT_PREFIX)
print(tad_boundary_bed_file)
print()
print('I_CAT\tN\tOBS\tMEAN_RAND\tSTD_RAND\tZ_SCORE\tST_OBS')
for i_cat in ['DIX', 'DI', 'UIR', 'UI', 'ALL']:
    n = len(spanned_boundary_length_random_lists_dict[i_cat]['I_NUM'])
    observed = (spanned_boundary_length_obs_dict[i_cat]['SB_NUM'] / spanned_boundary_length_obs_dict[i_cat]['I_DIST'])
    mean = np.mean(spanned_boundary_length_random_lists_dict[i_cat]['SB_NUM']) / spanned_boundary_length_obs_dict[i_cat]['I_DIST']
    std = np.std(spanned_boundary_length_random_lists_dict[i_cat]['SB_NUM']) / spanned_boundary_length_obs_dict[i_cat]['I_DIST']
    z_score = (observed - mean) / std
    st_obs = 0 # Find number of smaller than observed
    for sb in spanned_boundary_length_random_lists_dict[i_cat]['SB_NUM']:
        if sb < spanned_boundary_length_obs_dict[i_cat]['SB_NUM']:
            st_obs += 1
    print(i_cat + '\t' + str(n) + '\t' + str(1000000*observed) + '\t' + str(1000000*mean) + '\t' + str(1000000*std) + '\t' + str(z_score) + '\t' + str(st_obs))

JAV_NB_RALT_20000_st_fdr005
../additional_files/javierre_2016/tad_regions_hg38/hglft_genome_TADs_NB_hg38.bed

I_CAT	N	OBS	MEAN_RAND	STD_RAND	Z_SCORE	ST_OBS
DIX	100	0.20618671485387652	1.0433047771606152	0.5682829047613528	-1.473065713032986	2
DI	100	1.1724426213450398	1.025360403227732	0.024830463891165024	5.923458327721434	100
UIR	100	1.2129642804957468	1.0015553177048169	0.01589801901557487	13.297817959823687	100
UI	100	1.2427845664424853	0.98646326284693	0.014762672871765782	17.362797768538265	100
ALL	100	1.2408928095610732	0.9874722073269894	0.014797098578713485	17.126371152155777	100


### Randomize TAD boundaries 2

For each TAD boundary, a random postion is selected from the sourrounding sequence.

In [20]:
tbs = TadBoundarySet(tad_boundary_bed_file = tad_boundary_bed_file, chr_size_file = chr_size_file)

spanned_boundary_length_obs_dict = get_sum_of_spanned_boundaies_and_total_length(
    d11_interaction_set=d11_interaction_set,
    tbs=tbs)

spanned_boundary_length_random_lists_dict = {
    'DIX': {
        'I_NUM': [], 
        'I_DIST': [], 
        'SB_NUM': []       
    },
        'DI': {
        'I_NUM': [],
        'I_DIST': [], 
        'SB_NUM': []        
    },
    'UIR': {
        'I_NUM': [],
        'I_DIST': [], 
        'SB_NUM': []          
    },
    'UI': {
        'I_NUM': [],
        'I_DIST': [], 
        'SB_NUM': []          
    },
    'ALL': {
        'I_NUM': [],
        'I_DIST': [], 
        'SB_NUM': []      
    }
}
for random_seed in range(0,iter_num):
    print('Iteration: ' + str(random_seed))
    #tbs.randomize_tad_boundaries(random_seed=random_seed)
    tbs.randomize_tad_boundaries_2(random_seed=random_seed, random_range=random_range)
    spanned_boundary_length_random_dict = get_sum_of_spanned_boundaies_and_total_length(
    d11_interaction_set=d11_interaction_set,
    tbs=tbs)
    for i_cat in ['DIX', 'DI', 'UIR', 'UI', 'ALL']:
        for num_type in ['I_NUM', 'I_DIST', 'SB_NUM']:
            spanned_boundary_length_random_lists_dict[i_cat][num_type].append(spanned_boundary_length_random_dict[i_cat][num_type])

print("Done.")

Iteration: 0
Iteration: 1
Iteration: 2
Iteration: 3
Iteration: 4
Iteration: 5
Iteration: 6
Iteration: 7
Iteration: 8
Iteration: 9
Iteration: 10
Iteration: 11
Iteration: 12
Iteration: 13
Iteration: 14
Iteration: 15
Iteration: 16
Iteration: 17
Iteration: 18
Iteration: 19
Iteration: 20
Iteration: 21
Iteration: 22
Iteration: 23
Iteration: 24
Iteration: 25
Iteration: 26
Iteration: 27
Iteration: 28
Iteration: 29
Iteration: 30
Iteration: 31
Iteration: 32
Iteration: 33
Iteration: 34
Iteration: 35
Iteration: 36
Iteration: 37
Iteration: 38
Iteration: 39
Iteration: 40
Iteration: 41
Iteration: 42
Iteration: 43
Iteration: 44
Iteration: 45
Iteration: 46
Iteration: 47
Iteration: 48
Iteration: 49
Iteration: 50
Iteration: 51
Iteration: 52
Iteration: 53
Iteration: 54
Iteration: 55
Iteration: 56
Iteration: 57
Iteration: 58
Iteration: 59
Iteration: 60
Iteration: 61
Iteration: 62
Iteration: 63
Iteration: 64
Iteration: 65
Iteration: 66
Iteration: 67
Iteration: 68
Iteration: 69
Iteration: 70
Iteration: 71
It

In [21]:
print(OUT_PREFIX)
print(tad_boundary_bed_file)
print()
print('I_CAT\tN\tOBS\tMEAN_RAND\tSTD_RAND\tZ_SCORE\tST_OBS')
for i_cat in ['DIX', 'DI', 'UIR', 'UI', 'ALL']:
    n = len(spanned_boundary_length_random_lists_dict[i_cat]['I_NUM'])
    observed = (spanned_boundary_length_obs_dict[i_cat]['SB_NUM'] / spanned_boundary_length_obs_dict[i_cat]['I_DIST'])
    mean = np.mean(spanned_boundary_length_random_lists_dict[i_cat]['SB_NUM']) / spanned_boundary_length_obs_dict[i_cat]['I_DIST']
    std = np.std(spanned_boundary_length_random_lists_dict[i_cat]['SB_NUM']) / spanned_boundary_length_obs_dict[i_cat]['I_DIST']
    z_score = (observed - mean) / std
    st_obs = 0 # Find number of smaller than observed
    for sb in spanned_boundary_length_random_lists_dict[i_cat]['SB_NUM']:
        if sb < spanned_boundary_length_obs_dict[i_cat]['SB_NUM']:
            st_obs += 1
    print(i_cat + '\t' + str(n) + '\t' + str(1000000*observed) + '\t' + str(1000000*mean) + '\t' + str(1000000*std) + '\t' + str(z_score) + '\t' + str(st_obs))

JAV_NB_RALT_20000_st_fdr005
../additional_files/javierre_2016/tad_regions_hg38/hglft_genome_TADs_NB_hg38.bed

I_CAT	N	OBS	MEAN_RAND	STD_RAND	Z_SCORE	ST_OBS
DIX	100	0.20618671485387652	1.4680494097596006	0.7005980637531896	-1.8011221557561425	1
DI	100	1.1724426213450398	1.2701688524974704	0.05773355231210193	-1.692711209317799	0
UIR	100	1.2129642804957468	1.2169195806920257	0.04964025386807656	-0.07967929025484886	66
UI	100	1.2427845664424853	1.1779174451430798	0.04200302009121283	1.544344219976122	90
ALL	100	1.2408928095610732	1.1803934563733418	0.04240421774396076	1.426729613384929	88
