# Realtionship between CHC interactions that were not filtered for distance-dependent interaction frequencies and TAD boundaries

The aim here is to investigate whether interactions end more often at TAD boundaries than expected by chance. Firthermore, it will be investigated whether interactions span TAD boundaries less often than expected by chance. For this purpose, there is the module `TadBoundarySet`, which contains TAD boundaries and supports two functions:

1. `get_distance_to_nearest_tad_boundary(chr, pos) -> distance_to_next_tad`
2. `get_number_of_boundaries_spanned_by_region(chr, sta_pos, end_pos) -> number_of_spanned_tads`

The first function returns the distance to the next TAD boudary for a given position. The second function returns the number of TAD boundaries that are spanned by a given region. To process the interactions, the module `DiachromaticInteractionSet` is used.

In [1]:
import sys
import os
import numpy as np
sys.path.append("..")
from diachr import TadBoundarySet
from diachr import DiachromaticInteractionSet
from scipy import stats
import random

There are interactions that were evaluated using the `ST` or `HT` rule. An FDR of 5% was used for the `ST` rule and an FDR of 1% was used for the `HT` rule. For x and y there are interactions without `RPC` filter, with `RPC` filter and `RPC` filter complement. With the `RPC` filter all interactions in which at least one of the four read pair counts is `0` were discarded at the very beginning of the analysis (before the P-value threshold was determined) were discarded.

In [2]:
RPC_RULE = "st"
ANALYSIS='ST_FDR005'
#ANALYSIS='ST_RMRO_FDR005'
#ANALYSIS='ST_KRO_FDR005'
#RPC_RULE = "ht"
#ANALYSIS='HT_FDR001'
#ANALYSIS='HT_RMRO_FDR001'
#ANALYSIS='HT_KRO_FDR001'

There is one CHC dataset for each of the 17 cell types and, for eight of the cell types, there are HC data and TAD boundaries.

In [3]:
#CELL_TYPE_SHORT = 'MK'             # Has HC data
CELL_TYPE_SHORT = 'ERY'           # Has HC data
#CELL_TYPE_SHORT = 'NEU'           # Has HC data
#CELL_TYPE_SHORT = 'MON'           # Has HC data
#CELL_TYPE_SHORT = 'MAC_M0'        # Has HC data
#CELL_TYPE_SHORT = 'MAC_M1'
#CELL_TYPE_SHORT = 'MAC_M2'
#CELL_TYPE_SHORT = 'EP'
#CELL_TYPE_SHORT = 'NB'            # Has HC data
#CELL_TYPE_SHORT = 'TB'
#CELL_TYPE_SHORT = 'FOET'
#CELL_TYPE_SHORT = 'NCD4'          # Has HC data
#CELL_TYPE_SHORT = 'TCD4'
#CELL_TYPE_SHORT = 'NACD4'
#CELL_TYPE_SHORT = 'ACD4'
#CELL_TYPE_SHORT = 'NCD8'          # Has HC data
#CELL_TYPE_SHORT = 'TCD8'

A `TadBoundarySet` can be created with one of the eight BED files with the published TADs or a BED file with TADs from all eight cell types that was created using `BedTools`. See bash script in: `../additional_files/javierre_2016/tad_regions_hg38/`.

In [4]:
tad_boundary_bed_file = '../additional_files/javierre_2016/tad_regions_hg38/hglft_genome_TADs_' + CELL_TYPE_SHORT + '_hg38.bed'
#tad_boundary_bed_file = '../additional_files/javierre_2016/tad_regions_hg38/merged_tad_boundary_centers.bed'
chr_size_file = '../additional_files/hg38.chrom.sizes.txt'

An interaction file that was created with `DICer` is read in and reference interactions are re-selected afterwards. The selection of reference interactions can be omitted as soon as the new reference selection (no distinction between `NE` and `EN` and additional `DIX` category) is integrated into `DICer`.

In [11]:
INTERACTION_FILE = '../DICer_interactions/' + ANALYSIS.upper() + '/CHC/JAV_' + CELL_TYPE_SHORT + '_RALT_20000_' + ANALYSIS.lower() + '_evaluated_and_categorized_interactions.tsv.gz'
OUT_PREFIX = 'JAV_' + CELL_TYPE_SHORT + '_RALT_20000_' + ANALYSIS.lower()

d11_interaction_set = DiachromaticInteractionSet(rpc_rule = RPC_RULE)
d11_interaction_set.parse_file(
    i_file = INTERACTION_FILE,
    verbose = True)

report_dict = d11_interaction_set.select_reference_interactions(verbose=True) # This step is now integrated in DICer

[INFO] Parsing Diachromatic interaction file ...
	[INFO] ../DICer_interactions/ST_FDR005/CHC/JAV_ERY_RALT_20000_st_fdr005_evaluated_and_categorized_interactions.tsv.gz
	[INFO] Parsed 1,000,000 interaction lines ...
	[INFO] Parsed 2,000,000 interaction lines ...
	[INFO] Parsed 3,000,000 interaction lines ...
	[INFO] Parsed 4,000,000 interaction lines ...
	[INFO] Parsed 5,000,000 interaction lines ...
	[INFO] Set size: 5,232,757
[INFO] ... done.
[INFO] Select reference interactions ...
	[INFO] Treating NE and EN as one category ...
	[INFO] First pass: Count directed interactions for different read pair counts ...
	[INFO] Second pass: Select undirected reference interactions for different read pair counts ...
	[INFO] Third pass: Moving DI interactions for which there is no reference to DIX ...
[INFO] ... done.


In [12]:
# Create dictionary for chromosome sizes required for randomization
chr_size_dict = {}
with open(chr_size_file, 'rt') as fp:
    for line in fp:
        chr_key, size = line.rstrip().rsplit('\t')
        chr_size_dict[chr_key] = int(size)

In [13]:
iter_num = 10
random_range = 500000

## Test whether interactions end closer to TAD boundaries than expected by chance

Now we have everything in place to do the first analysis. We compare the distances to the next TAD for the following interaction categories:

1. `DIX`: Imbalanced interactions with high read pair counts and without counterpart in the reference interactions
2. `DI`: Imbalanced interactions with counterpart in the reference interactions
3. `URI`: Balanced reference interactions (comparable to `DI` with respect to total number and distribution of read pair numbers)
4. `UI`: Balanced interactions (remaining powered interactions)
5. `ALL`: All interaction categories combined

The following function determines the distances to the next TAD boundary for all interactions and saves them to separate lists for the various interaction categories. If the parameter `random_range` is zero, then the distance of the outermost end position of the `N` digests (`pos`) is determined. If `random_range` is different from zero, e.g. 500,000, then a postion is randomly selected from the range (`pos - random_range, pos + random_range`) and the distance to the next TAD boundary is determined for this position.

In [14]:
def determine_median_distances_to_tad_boundaries(
    d11_interaction_set: DiachromaticInteractionSet=None,
    tbs: TadBoundarySet=None,
    random_range=0):
    
    # Dictionary with lists of distances to nearest TAD boundary
    ntb_dist_lists = {
        'DIX': [],
        'DI': [],
        'UIR': [],
        'UI': [],
        'ALL': []
    }
    for d11_inter in d11_interaction_set._inter_dict.values():
        
        # This analysis is restricted to NE and EN interactions, which typically make up mor than 90% of CHC data
        if d11_inter.enrichment_status_tag_pair == 'EN' or d11_inter.enrichment_status_tag_pair == 'NE':
        
            # Determine the distance to the next TAD from the outermost position of the 'N' digest
            if d11_inter.enrichment_status_tag_pair == 'NE':
                pos = d11_inter.fromA
                if random_range != 0:
                    # Randomize position
                    pos = random.randint(pos - random_range, pos + random_range)

            if d11_inter.enrichment_status_tag_pair == 'EN':
                pos = d11_inter.toB
                if random_range != 0:
                    # Randomize position
                    pos = random.randint(pos - random_range, pos + random_range)
            
            # Correct invalid positions that may result from randomization
            if chr_size_dict[d11_inter.chrA] < pos:
                pos = chr_size_dict[d11_inter.chrA]
            if pos < 0:
                pos = 0

            # Determine distance to nearest TAD boundary
            dist = tbs.get_distance_to_nearest_tad_boundary(d11_inter.chrA, pos)

            # Add determined distance to list
            ntb_dist_lists[d11_inter.get_category()].append(dist)
            ntb_dist_lists['ALL'].append(dist)     
    
    return ntb_dist_lists

### Randomize interaction end positions

In [15]:
# Create new TadBoundarySet
tbs = TadBoundarySet(tad_boundary_bed_file = tad_boundary_bed_file)

# Get list of distances to nearest TAD boundary
ntb_dist_lists = determine_median_distances_to_tad_boundaries(
    d11_interaction_set=d11_interaction_set,
    tbs=tbs)

# Determine median distances to nearest TAD boundary for each interaction category
print('Observed')
ntb_dist_medians = {}
for i_cat in ['DIX', 'DI', 'UIR', 'UI', 'ALL']:
    ntb_dist_medians[i_cat] = np.median(ntb_dist_lists[i_cat])
ntb_dist_medians['DI-UIR'] = np.median(ntb_dist_lists['DI']) - np.median(ntb_dist_lists['UIR'])   
print(ntb_dist_medians)

print('Random')
ntb_dist_medians_random_dict = {
    'DIX': {
        'N': None,
        'MED_LIST': []
    },
    'DI': {
        'N': None,
        'MED_LIST': []
    },
    'UIR': {
        'N': None,
        'MED_LIST': []
    },
    'UI': {
        'N': None,
        'MED_LIST': []
    },
    'ALL': {
        'N': None,
        'MED_LIST': []
    },
    'DI-UIR': {
        'N': None,
        'MED_LIST': []
    }
}
for random_seed in range(0,iter_num):
    
    # Get list of randomized distances to nearest TAD boundary
    ntb_dist_lists_random = determine_median_distances_to_tad_boundaries(
    d11_interaction_set=d11_interaction_set,
    tbs=tbs,
    random_range=random_range)
    
    # Determine medians of randomized distances and add to dictionary
    ntb_dist_random_medians = {} # Only for outputting the results of this iteration
    for i_cat in ['DIX', 'DI', 'UIR', 'UI', 'ALL']:
        ntb_dist_medians_random_dict[i_cat]['N'] = len(ntb_dist_lists_random[i_cat])
        ntb_dist_medians_random_dict[i_cat]['MED_LIST'].append(np.median(ntb_dist_lists_random[i_cat]))
        ntb_dist_random_medians[i_cat] = np.median(ntb_dist_lists_random[i_cat])
    
    # Test whether the difference between the medians of DI and UIR is greater than expected by chance
    ntb_dist_medians_random_dict['DI-UIR']['N'] = len(ntb_dist_lists_random['DI']) # Same number as UIR
    ntb_dist_medians_random_dict['DI-UIR']['MED_LIST'].append(np.median(ntb_dist_lists_random['DI']) - np.median(ntb_dist_lists_random['UIR']))    
    ntb_dist_random_medians['DI-UIR'] = np.median(ntb_dist_lists_random['DI']) - np.median(ntb_dist_lists_random['UIR'])
    
    print(ntb_dist_random_medians)

print()
print("Done.")

Observed
{'DIX': 118830.0, 'DI': 181009.0, 'UIR': 187943.0, 'UI': 197698.0, 'ALL': 196603.0, 'DI-UIR': -6934.0}
Random
{'DIX': 159556.0, 'DI': 188935.0, 'UIR': 194173.0, 'UI': 204576.0, 'ALL': 203539.0, 'DI-UIR': -5238.0}
{'DIX': 166526.0, 'DI': 189272.0, 'UIR': 194999.0, 'UI': 204652.0, 'ALL': 203670.0, 'DI-UIR': -5727.0}
{'DIX': 192665.0, 'DI': 189674.0, 'UIR': 194425.0, 'UI': 204684.0, 'ALL': 203688.0, 'DI-UIR': -4751.0}
{'DIX': 125040.0, 'DI': 188912.0, 'UIR': 195165.0, 'UI': 204662.0, 'ALL': 203676.0, 'DI-UIR': -6253.0}
{'DIX': 171292.0, 'DI': 188878.0, 'UIR': 195228.0, 'UI': 204633.0, 'ALL': 203631.0, 'DI-UIR': -6350.0}
{'DIX': 147853.0, 'DI': 190371.0, 'UIR': 195686.0, 'UI': 204520.0, 'ALL': 203594.0, 'DI-UIR': -5315.0}
{'DIX': 160545.0, 'DI': 189097.0, 'UIR': 195677.0, 'UI': 204785.0, 'ALL': 203805.0, 'DI-UIR': -6580.0}
{'DIX': 198635.0, 'DI': 188827.0, 'UIR': 195833.0, 'UI': 204465.5, 'ALL': 203509.0, 'DI-UIR': -7006.0}
{'DIX': 138376.0, 'DI': 189743.0, 'UIR': 195273.0, 'UI': 

In [16]:
print(OUT_PREFIX)
print(INTERACTION_FILE)
print(tad_boundary_bed_file)
print(iter_num)
print()
print('I_CAT\tN\tOBS\tMEAN_RAND\tSTD_RAND\tZ_SCORE\tST_OBS')
for i_cat in ['DIX', 'DI', 'UIR', 'UI', 'ALL', 'DI-UIR']:
    
    # Get total number of interactions for which distances were determined
    n = ntb_dist_medians_random_dict[i_cat]['N']
    
    # Determine Z-score
    observed = ntb_dist_medians[i_cat]
    mean = np.mean(ntb_dist_medians_random_dict[i_cat]['MED_LIST'])
    std = np.std(ntb_dist_medians_random_dict[i_cat]['MED_LIST'])
    z_score = (observed - mean) / std
    
    # Find number of smaller than observed
    st_obs = 0
    for med in ntb_dist_medians_random_dict[i_cat]['MED_LIST']:
        if observed > med:
            st_obs += 1
    
    # Print line with results for this category
    print(i_cat + '\t' + str(n) + '\t' + str(observed) + '\t' + str(mean) + '\t' + str(std) + '\t' + str(z_score) + '\t' + str(st_obs))

JAV_ERY_RALT_20000_st_fdr005
../DICer_interactions/ST_FDR005/CHC/JAV_ERY_RALT_20000_st_fdr005_evaluated_and_categorized_interactions.tsv.gz
../additional_files/javierre_2016/tad_regions_hg38/hglft_genome_TADs_ERY_hg38.bed
10

I_CAT	N	OBS	MEAN_RAND	STD_RAND	Z_SCORE	ST_OBS
DIX	51	118830.0	160265.8	22101.602972635264	-1.8747870935562023	0
DI	183963	181009.0	189391.1	542.3424102907682	-15.455365173278773	0
UIR	183963	187943.0	195139.3	510.0409885489596	-14.109258199959756	0
UI	4514222	197698.0	204614.25	88.48198969281827	-78.16562471087113	0
ALL	4882199	196603.0	203638.4	81.66908839946728	-86.14520056337356	0
DI-UIR	183963	-6934.0	-5748.2	735.8013047012081	-1.6115763758825166	1


```
JAV_NCD8_RALT_20000_st_fdr005
../DICer_interactions/ST_FDR005/CHC/JAV_NCD8_RALT_20000_st_fdr005_evaluated_and_categorized_interactions.tsv.gz
../additional_files/javierre_2016/tad_regions_hg38/hglft_genome_TADs_NCD8_hg38.bed
10

I_CAT	N	OBS	MEAN_RAND	STD_RAND	Z_SCORE	ST_OBS
DIX	167	108759.0	128409.3	12877.996979732525	-1.525881705899277	0
DI	217428	178673.0	184392.0	259.74323860304816	-22.017897485062335	0
UIR	217428	186831.0	191516.1	539.3888578752809	-8.685941379017711	0
UI	4367028	193212.0	200540.8	95.19852940040619	-76.98438249161356	0
ALL	4802051	192174.0	199325.2	104.21784875922167	-68.61780477278602	0
DI-UIR	217428	-8158.0	-7124.1	652.2978537447444	-1.5850121138135507	1


JAV_NCD8_RALT_20000_st_rmro_fdr005
../DICer_interactions/ST_RMRO_FDR005/CHC/JAV_NCD8_RALT_20000_st_rmro_fdr005_evaluated_and_categorized_interactions.tsv.gz
../additional_files/javierre_2016/tad_regions_hg38/hglft_genome_TADs_NCD8_hg38.bed
10

I_CAT	N	OBS	MEAN_RAND	STD_RAND	Z_SCORE	ST_OBS
DIX	144	109330.0	130252.7	9101.478380461056	-2.298824336595316	0
DI	47131	167939.0	174597.0	1113.5423656062665	-5.979116920598761	0
UIR	47131	178296.0	181366.8	484.6973901312034	-6.3354993497463425	0
UI	2670032	192728.0	199677.55	103.46918623435675	-67.16540694791318	0
ALL	2764438	191964.0	198860.5	91.73112884948054	-75.1816759097809	0
DI-UIR	47131	-10357.0	-6769.8	1333.3841007001697	-2.6902975655074446	0


JAV_NCD8_RALT_20000_st_kro_fdr005
../DICer_interactions/ST_KRO_FDR005/CHC/JAV_NCD8_RALT_20000_st_kro_fdr005_evaluated_and_categorized_interactions.tsv.gz
../additional_files/javierre_2016/tad_regions_hg38/hglft_genome_TADs_NCD8_hg38.bed
10

I_CAT	N	OBS	MEAN_RAND	STD_RAND	Z_SCORE	ST_OBS
DIX	967	160100.0	168044.6	7100.931491008768	-1.1188109630489316	1
DI	201497	184097.0	189650.0	305.4508143711521	-18.179686347971465	0
UIR	201497	185384.0	191565.5	403.64792827413345	-15.314088261099402	0
UI	1801750	194580.0	202502.2	127.32364273770996	-62.22096564045023	0
ALL	2205711	192688.0	200199.0	122.19410787758957	-61.46777557821607	0
DI-UIR	201497	-1287.0	-1915.5	621.1455948487439	1.0118400665033243	9

```

### Randomize TAD boundaries 1

For each chromosome, a corresponding number of random TAD boundaries is randomly selected from the entire sequence of the chromosome.

In [11]:
tbs = TadBoundarySet(tad_boundary_bed_file = tad_boundary_bed_file, chr_size_file = chr_size_file)
tad_dist_lists = determine_median_distances_to_tad_boundaries(
    d11_interaction_set=d11_interaction_set,
    tbs=tbs)
print('Observed')
tad_dist_lists_medians = {}
for i_cat in ['DIX', 'DI', 'UIR', 'UI', 'ALL']:
    tad_dist_lists_medians[i_cat] = np.median(tad_dist_lists[i_cat])
    
tad_dist_lists_medians['DI-UIR'] = np.median(tad_dist_lists['DI']) - np.median(tad_dist_lists['UIR'])   
print(tad_dist_lists_medians)
print()

print('Random')
tad_dist_median_random_dict = {
    'DIX': {
        'N': None,
        'MED_LIST': [],
    },
    'DI': {
        'N': None,
        'MED_LIST': [],
    },
    'UIR': {
        'N': None,
        'MED_LIST': [],
    },
    'UI': {
        'N': None,
        'MED_LIST': [],
    },
    'ALL': {
        'N': None,
        'MED_LIST': [],
    },
    'DI-UIR': {
        'N': None,
        'MED_LIST': []
    }
}
for random_seed in range(0,iter_num):
    tbs.randomize_tad_boundaries(random_seed=random_seed)
    tad_dist_lists_random = determine_median_distances_to_tad_boundaries(
    d11_interaction_set=d11_interaction_set,
    tbs=tbs,
    random_range=0)
    tad_dist_lists_random_medians = {}
    for i_cat in ['DIX', 'DI', 'UIR', 'UI', 'ALL']:
        tad_dist_median_random_dict[i_cat]['N'] = len(tad_dist_lists_random[i_cat])
        tad_dist_median_random_dict[i_cat]['MED_LIST'].append(np.median(tad_dist_lists_random[i_cat]))
        tad_dist_lists_random_medians[i_cat] = np.median(tad_dist_lists_random[i_cat])
        
    # Test whether the difference between the medians of DI and UIR is greater than expected by chance
    tad_dist_median_random_dict['DI-UIR']['N'] = len(tad_dist_lists_random['DI']) # Same number as UIR
    tad_dist_median_random_dict['DI-UIR']['MED_LIST'].append(np.median(tad_dist_lists_random['DI']) - np.median(tad_dist_lists_random['UIR']))
    
    tad_dist_lists_random_medians['DI-UIR'] = np.median(tad_dist_lists_random['DI']) - np.median(tad_dist_lists_random['UIR'])
    print(tad_dist_lists_random_medians)
    
print("Done.")

Observed
{'DIX': 108759.0, 'DI': 178673.0, 'UIR': 186831.0, 'UI': 193212.0, 'ALL': 192174.0, 'DI-UIR': -8158.0}

Random
{'DIX': 422041.0, 'DI': 352746.0, 'UIR': 362373.5, 'UI': 364033.0, 'ALL': 363437.0, 'DI-UIR': -9627.5}
{'DIX': 355011.0, 'DI': 348282.0, 'UIR': 351559.0, 'UI': 349630.0, 'ALL': 349652.0, 'DI-UIR': -3277.0}
{'DIX': 392952.0, 'DI': 350990.5, 'UIR': 348364.5, 'UI': 350412.0, 'ALL': 350344.0, 'DI-UIR': 2626.0}
{'DIX': 396209.0, 'DI': 347036.0, 'UIR': 353487.0, 'UI': 354478.0, 'ALL': 354094.0, 'DI-UIR': -6451.0}
{'DIX': 316583.0, 'DI': 354426.0, 'UIR': 358959.0, 'UI': 360787.0, 'ALL': 360460.0, 'DI-UIR': -4533.0}
{'DIX': 344871.0, 'DI': 352608.0, 'UIR': 353314.5, 'UI': 355371.0, 'ALL': 355151.0, 'DI-UIR': -706.5}
{'DIX': 332086.0, 'DI': 356248.0, 'UIR': 360022.5, 'UI': 358086.5, 'ALL': 358081.0, 'DI-UIR': -3774.5}
{'DIX': 394592.0, 'DI': 353789.0, 'UIR': 350030.5, 'UI': 353985.0, 'ALL': 353775.0, 'DI-UIR': 3758.5}
{'DIX': 265165.0, 'DI': 355452.0, 'UIR': 365812.0, 'UI': 36

In [12]:
print(OUT_PREFIX)
print(tad_boundary_bed_file)
print(iter_num)
print()
print('I_CAT\tN\tOBS\tMEAN_RAND\tSTD_RAND\tZ_SCORE\tST_OBS')
for i_cat in ['DIX', 'DI', 'UIR', 'UI', 'ALL', 'DI-UIR']:
    n = tad_dist_median_random_dict[i_cat]['N']
    observed = tad_dist_lists_medians[i_cat]
    mean = np.mean(tad_dist_median_random_dict[i_cat]['MED_LIST'])
    std = np.std(tad_dist_median_random_dict[i_cat]['MED_LIST'])
    z_score = (observed - mean) / std
    # Find number of smaller than observed
    st_obs = 0
    for med in tad_dist_median_random_dict[i_cat]['MED_LIST']:
        #print(str(med) + '\t' + str(observed))
        if observed > med:
            st_obs += 1
    print(i_cat + '\t' + str(n) + '\t' + str(observed) + '\t' + str(mean) + '\t' + str(std) + '\t' + str(z_score) + '\t' + str(st_obs))

JAV_NCD8_RALT_20000_st_fdr005
../additional_files/javierre_2016/tad_regions_hg38/hglft_genome_TADs_NCD8_hg38.bed
10

I_CAT	N	OBS	MEAN_RAND	STD_RAND	Z_SCORE	ST_OBS
DIX	167	108759.0	354953.6	44719.21125019984	-5.505343075542276	0
DI	217428	178673.0	352705.8	2942.6219447968506	-59.14208595763554	0
UIR	217428	186831.0	355727.3	5434.101412377211	-31.080814873146494	0
UI	4367028	193212.0	356839.75	6112.757893332599	-26.768236670795446	0
ALL	4802051	192174.0	356602.3	5864.571425944099	-28.037564564835996	0
DI-UIR	217428	-8158.0	-3021.5	4710.95577882026	-1.0903307611361843	2


### Randomize TAD boundaries 2

For each TAD boundary, a random postion is selected from the sourrounding sequence.

In [13]:
tbs = TadBoundarySet(tad_boundary_bed_file = tad_boundary_bed_file, chr_size_file = chr_size_file)
tad_dist_lists = determine_median_distances_to_tad_boundaries(
    d11_interaction_set=d11_interaction_set,
    tbs=tbs)
print('Observed')
tad_dist_lists_medians = {}
for i_cat in ['DIX', 'DI', 'UIR', 'UI', 'ALL']:
    tad_dist_lists_medians[i_cat] = np.median(tad_dist_lists[i_cat])
    
tad_dist_lists_medians['DI-UIR'] = np.median(tad_dist_lists['DI']) - np.median(tad_dist_lists['UIR'])   
print(tad_dist_lists_medians)
print()

print('Random')
tad_dist_median_random_dict = {
    'DIX': {
        'N': None,
        'MED_LIST': [],
    },
    'DI': {
        'N': None,
        'MED_LIST': [],
    },
    'UIR': {
        'N': None,
        'MED_LIST': [],
    },
    'UI': {
        'N': None,
        'MED_LIST': [],
    },
    'ALL': {
        'N': None,
        'MED_LIST': [],
    },
    'DI-UIR': {
        'N': None,
        'MED_LIST': []
    }
}
for random_seed in range(0,iter_num):
    tbs.randomize_tad_boundaries_2(random_seed=random_seed, random_range=random_range)
    tad_dist_lists_random = determine_median_distances_to_tad_boundaries(
    d11_interaction_set=d11_interaction_set,
    tbs=tbs,
    random_range=0)
    tad_dist_lists_random_medians = {}
    for i_cat in ['DIX', 'DI', 'UIR', 'UI', 'ALL']:
        tad_dist_median_random_dict[i_cat]['N'] = len(tad_dist_lists_random[i_cat])
        tad_dist_median_random_dict[i_cat]['MED_LIST'].append(np.median(tad_dist_lists_random[i_cat]))
        tad_dist_lists_random_medians[i_cat] = np.median(tad_dist_lists_random[i_cat])
        
    # Test whether the difference between the medians of DI and UIR is greater than expected by chance
    tad_dist_median_random_dict['DI-UIR']['N'] = len(tad_dist_lists_random['DI']) # Same number as UIR
    tad_dist_median_random_dict['DI-UIR']['MED_LIST'].append(np.median(tad_dist_lists_random['DI']) - np.median(tad_dist_lists_random['UIR']))
    
    tad_dist_lists_random_medians['DI-UIR'] = np.median(tad_dist_lists_random['DI']) - np.median(tad_dist_lists_random['UIR'])
    print(tad_dist_lists_random_medians)
    
print("Done.")

Observed
{'DIX': 108759.0, 'DI': 178673.0, 'UIR': 186831.0, 'UI': 193212.0, 'ALL': 192174.0, 'DI-UIR': -8158.0}

Random
{'DIX': 164431.0, 'DI': 216901.5, 'UIR': 225060.0, 'UI': 233262.5, 'ALL': 232184.0, 'DI-UIR': -8158.5}
{'DIX': 182267.0, 'DI': 220241.0, 'UIR': 231268.0, 'UI': 239821.0, 'ALL': 238489.0, 'DI-UIR': -11027.0}
{'DIX': 188916.0, 'DI': 234408.0, 'UIR': 240279.5, 'UI': 249079.5, 'ALL': 248076.0, 'DI-UIR': -5871.5}
{'DIX': 232257.0, 'DI': 232870.0, 'UIR': 240416.0, 'UI': 247340.0, 'ALL': 246331.0, 'DI-UIR': -7546.0}
{'DIX': 203042.0, 'DI': 236646.0, 'UIR': 244515.5, 'UI': 252491.0, 'ALL': 251358.0, 'DI-UIR': -7869.5}
{'DIX': 172424.0, 'DI': 243481.5, 'UIR': 248452.5, 'UI': 256665.0, 'ALL': 255646.0, 'DI-UIR': -4971.0}
{'DIX': 174263.0, 'DI': 240977.5, 'UIR': 246941.0, 'UI': 254859.0, 'ALL': 253790.0, 'DI-UIR': -5963.5}
{'DIX': 176901.0, 'DI': 245435.5, 'UIR': 251776.0, 'UI': 260028.5, 'ALL': 258941.0, 'DI-UIR': -6340.5}
{'DIX': 227987.0, 'DI': 244042.0, 'UIR': 249282.0, 'UI'

In [14]:
print(OUT_PREFIX)
print(tad_boundary_bed_file)
print()
print('I_CAT\tN\tOBS\tMEAN_RAND\tSTD_RAND\tZ_SCORE\tST_OBS')
for i_cat in ['DIX', 'DI', 'UIR', 'UI', 'ALL','DI-UIR']:
    n = tad_dist_median_random_dict[i_cat]['N']
    observed = tad_dist_lists_medians[i_cat]
    mean = np.mean(tad_dist_median_random_dict[i_cat]['MED_LIST'])
    std = np.std(tad_dist_median_random_dict[i_cat]['MED_LIST'])
    z_score = (observed - mean) / std
    # Find number of smaller than observed
    st_obs = 0
    for med in tad_dist_median_random_dict[i_cat]['MED_LIST']:
        #print(str(med) + '\t' + str(observed))
        if observed > med:
            st_obs += 1
    print(i_cat + '\t' + str(n) + '\t' + str(observed) + '\t' + str(mean) + '\t' + str(std) + '\t' + str(z_score) + '\t' + str(st_obs))

JAV_NCD8_RALT_20000_st_fdr005
../additional_files/javierre_2016/tad_regions_hg38/hglft_genome_TADs_NCD8_hg38.bed

I_CAT	N	OBS	MEAN_RAND	STD_RAND	Z_SCORE	ST_OBS
DIX	167	108759.0	195583.0	25314.99734939745	-3.4297455694604917	0
DI	217428	178673.0	236291.8	10007.067445560662	-5.75781069863388	0
UIR	217428	186831.0	243054.15	8536.940649465709	-6.585866331812763	0
UI	4367028	193212.0	251004.05	8335.098516064463	-6.933577316286759	0
ALL	4802051	192174.0	249952.1	8400.942904817293	-6.877573226556357	0
DI-UIR	217428	-8158.0	-6762.35	1833.9559434457524	-0.7610051948018796	2


## Test whether interactions span TAD boundaries less often than expected by chance, taking into account their length

In the second analysis it is investigated how often interctions span TAD boundaries.

In [36]:
def get_sum_of_spanned_boundaies_and_total_length(
    d11_interaction_set: DiachromaticInteractionSet=None,
    tbs: TadBoundarySet=None,
    random_flip_interaction=False):
        
    spanned_boundary_length_dict = {
        'DIX': {
            'I_NUM': 0, 
            'I_DIST': 0, 
            'SB_NUM': 0       
        },
            'DI': {
            'I_NUM': 0,
            'I_DIST': 0, 
            'SB_NUM': 0        
        },
        'UIR': {
            'I_NUM': 0,
            'I_DIST': 0, 
            'SB_NUM': 0          
        },
        'UI': {
            'I_NUM': 0,
            'I_DIST': 0, 
            'SB_NUM': 0          
        },
        'ALL': {
            'I_NUM': 0,
            'I_DIST': 0, 
            'SB_NUM': 0      
        }
    }
    warn_count = 0
    for d11_inter in d11_interaction_set._inter_dict.values():

        if d11_inter.enrichment_status_tag_pair == 'NE' or d11_inter.enrichment_status_tag_pair == 'EN':

            warn = False
            i_dist = d11_inter.i_dist
            
             # Get interaction distance and number of spanned TAD boundaries
            if random_flip_interaction and random.uniform(0,1) <= 0.5:
                
                if d11_inter.enrichment_status_tag_pair == 'NE':
                    if chr_size_dict[d11_inter.chrA] < d11_inter.toB + i_dist:
                        end_pos = chr_size_dict[d11_inter.chrA]
                        warn_count += 1
                        warn = True
                    else:
                        end_pos = d11_inter.toB + i_dist
                    sb_num = tbs.get_number_of_boundaries_spanned_by_region(d11_inter.chrA, d11_inter.fromB, end_pos)

                else: # EN
                    if d11_inter.fromA - i_dist < 0:
                        sta_pos = 0
                        warn_count += 1
                        warn = True
                    else:
                        sta_pos = d11_inter.fromA - i_dist
                    sb_num = tbs.get_number_of_boundaries_spanned_by_region(d11_inter.chrA, sta_pos, d11_inter.toA)
            else:        
                sb_num = tbs.get_number_of_boundaries_spanned_by_region(d11_inter.chrA, d11_inter.toA, d11_inter.fromB)

            if not warn: # Skip interactions that cannot be flipped
                # Increment numbers for interaction category
                spanned_boundary_length_dict[d11_inter.get_category()]['I_NUM'] += 1
                spanned_boundary_length_dict[d11_inter.get_category()]['I_DIST'] += i_dist
                spanned_boundary_length_dict[d11_inter.get_category()]['SB_NUM'] += sb_num

                # Increment numbers for all interaction categories combined
                spanned_boundary_length_dict['ALL']['I_NUM'] += 1
                spanned_boundary_length_dict['ALL']['I_DIST'] += i_dist
                spanned_boundary_length_dict['ALL']['SB_NUM'] += sb_num
                
    if 0 < warn_count:        
        print('\tSkipped ' + str(warn_count) + ' interactions due to invalid positions after flipping!')
        
    return spanned_boundary_length_dict

### Flip intactions at baits

In [80]:
tbs = TadBoundarySet(tad_boundary_bed_file = tad_boundary_bed_file, chr_size_file = chr_size_file)

# Observed
##########

# Get dictionary with total numbers of spanned boundaries and distances
spanned_boundary_length_obs_dict = get_sum_of_spanned_boundaies_and_total_length(
    d11_interaction_set=d11_interaction_set,
    tbs=tbs)

# Calulate tad boundary per base (TBPB) statistic for different categories
for i_cat in ['DIX', 'DI', 'UIR', 'UI', 'ALL']:
    spanned_boundary_length_obs_dict[i_cat]['TBPB'] = spanned_boundary_length_obs_dict[i_cat]['SB_NUM'] / spanned_boundary_length_obs_dict[i_cat]['I_DIST']

# Calculate difference in TBPB between DI and UIR
spanned_boundary_length_obs_dict['DI-UIR'] = {}
spanned_boundary_length_obs_dict['DI-UIR']['TBPB'] = spanned_boundary_length_obs_dict['DI']['TBPB'] - spanned_boundary_length_obs_dict['UIR']['TBPB']


# Random
########

# Init dictionary that will take the results from all randomizations
spanned_boundary_length_random_lists_dict = {
    'DIX': {
        'I_NUM': [], 
        'TBPB': []
    },
    'DI': {
        'I_NUM': [],
        'TBPB': []
    },
    'UIR': {
        'I_NUM': [],
        'TBPB': []
    },
    'UI': {
        'I_NUM': [],
        'TBPB': []
    },
    'ALL': {
        'I_NUM': [],
        'TBPB': []
    },
    'DI-UIR': {
        'I_NUM': [],
        'TBPB': []
    }
}

# Perform n randomizations and add results to dictionary
for random_seed in range(0,iter_num):
    
    print('Iteration: ' + str(random_seed))
    
    # Get dictionary with total numbers of spanned boundaries and distances after randomization
    spanned_boundary_length_random_dict = get_sum_of_spanned_boundaies_and_total_length(
    d11_interaction_set=d11_interaction_set,
    tbs=tbs,
    random_flip_interaction=True) # Use the same function but with random flip
    
    # Add results to dictionary
    for i_cat in ['DIX', 'DI', 'UIR', 'UI', 'ALL']:
        
        # For interaction numbers and total distances and numbers of spanned boundaries
        spanned_boundary_length_random_lists_dict[i_cat]['I_NUM'].append(spanned_boundary_length_random_dict[i_cat]['I_NUM'])
        
        # Calulate tad boundary per base (TBPB) statistic for different categories
        tbpb = spanned_boundary_length_random_dict[i_cat]['SB_NUM'] / spanned_boundary_length_random_dict[i_cat]['I_DIST']
        spanned_boundary_length_random_lists_dict[i_cat]['TBPB'].append(tbpb)
    
    # Keep also track of the difference in TBPB between DI and UIR
    i_num_di_uir_mean = (spanned_boundary_length_random_dict['DI']['I_NUM'] + spanned_boundary_length_random_dict['UIR']['I_NUM']) / 2
    spanned_boundary_length_random_lists_dict['DI-UIR']['I_NUM'].append(i_num_di_uir_mean)
    
    tbpb_di_uir_diff = spanned_boundary_length_random_lists_dict['DI']['TBPB'][-1] - spanned_boundary_length_random_lists_dict['UIR']['TBPB'][-1]    
    spanned_boundary_length_random_lists_dict['DI-UIR']['TBPB'].append(tbpb_di_uir_diff)
    
print("Done.")

Iteration: 0
	Skipped 19131 interactions due to invalid positions after flipping!
Iteration: 1
	Skipped 19307 interactions due to invalid positions after flipping!
Iteration: 2
	Skipped 19137 interactions due to invalid positions after flipping!
Iteration: 3
	Skipped 19104 interactions due to invalid positions after flipping!
Iteration: 4
	Skipped 19138 interactions due to invalid positions after flipping!
Iteration: 5
	Skipped 19093 interactions due to invalid positions after flipping!
Iteration: 6
	Skipped 19229 interactions due to invalid positions after flipping!
Iteration: 7
	Skipped 19189 interactions due to invalid positions after flipping!
Iteration: 8
	Skipped 19130 interactions due to invalid positions after flipping!
Iteration: 9
	Skipped 19258 interactions due to invalid positions after flipping!
Iteration: 10
	Skipped 19162 interactions due to invalid positions after flipping!
Iteration: 11
	Skipped 19219 interactions due to invalid positions after flipping!
Iteration: 12


	Skipped 19125 interactions due to invalid positions after flipping!
Done.


In [81]:
print(OUT_PREFIX)
print(tad_boundary_bed_file)
print(iter_num)
print(INTERACTION_FILE)
print()
print('I_CAT\tOBS\tMEAN_RAND\tSTD_RAND\tZ_SCORE\tST_OBS\tI_NUM')

for i_cat in ['DIX', 'DI', 'UIR', 'UI', 'ALL', 'DI-UIR']:
    
    n = np.mean(spanned_boundary_length_random_lists_dict[i_cat]['I_NUM'])
    observed = spanned_boundary_length_obs_dict[i_cat]['TBPB']
    mean = np.mean(spanned_boundary_length_random_lists_dict[i_cat]['TBPB'])
    std = np.std(spanned_boundary_length_random_lists_dict[i_cat]['TBPB'])
    z_score = (observed - mean) / std
    st_obs = 0 # Find number of smaller than observed
    for sb in spanned_boundary_length_random_lists_dict[i_cat]['TBPB']:
        if sb < spanned_boundary_length_obs_dict[i_cat]['TBPB']:
            st_obs += 1
            
    print(i_cat + '\t' + "{:.2f}".format(1000000*observed) + '\t' + "{:.5f}".format(1000000*mean) + '\t' + "{:.5f}".format(1000000*std) + '\t' + "{:.2f}".format(z_score) + '\t' + str(st_obs) + '\t' + "{:.1f}".format(n))


JAV_NCD4_RALT_20000_st_fdr005
../additional_files/javierre_2016/tad_regions_hg38/hglft_genome_TADs_NCD4_hg38.bed
100
../DICer_interactions/ST_FDR005/CHC/JAV_NCD4_RALT_20000_st_fdr005_evaluated_and_categorized_interactions.tsv.gz

I_CAT	OBS	MEAN_RAND	STD_RAND	Z_SCORE	ST_OBS	I_NUM
DIX	0.43	1.47487	0.24091	-4.34	0	210.0
DI	1.09	1.29737	0.00359	-57.21	0	216999.2
UIR	1.15	1.24561	0.00275	-33.13	0	216919.9
UI	1.17	1.18817	0.00037	-45.04	0	4915845.2
ALL	1.17	1.19095	0.00037	-57.24	0	5349974.3
DI-UIR	-0.06	0.05176	0.00467	-24.54	0	216959.6


```
JAV_NCD4_RALT_20000_st_fdr005
../additional_files/javierre_2016/tad_regions_hg38/hglft_genome_TADs_NCD4_hg38.bed
100
../DICer_interactions/ST_FDR005/CHC/JAV_NCD4_RALT_20000_st_fdr005_evaluated_and_categorized_interactions.tsv.gz

I_CAT	OBS	MEAN_RAND	STD_RAND	Z_SCORE	ST_OBS	I_NUM
DIX	0.43	1.47487	0.24091	-4.34	0	210.0
DI	1.09	1.29737	0.00359	-57.21	0	216999.2
UIR	1.15	1.24561	0.00275	-33.13	0	216919.9
UI	1.17	1.18817	0.00037	-45.04	0	4915845.2
ALL	1.17	1.19095	0.00037	-57.24	0	5349974.3
DI-UIR	-0.06	0.05176	0.00467	-24.54	0	216959.6


JAV_NB_RALT_20000_st_fdr005
../additional_files/javierre_2016/tad_regions_hg38/hglft_genome_TADs_NB_hg38.bed
100
../DICer_interactions/ST_FDR005/CHC/JAV_NB_RALT_20000_st_fdr005_evaluated_and_categorized_interactions.tsv.gz

I_CAT	OBS	MEAN_RAND	STD_RAND	Z_SCORE	ST_OBS	I_NUM
DIX	0.00	2.02475	0.48036	-4.22	0	81.0
DI	1.06	1.37088	0.00492	-63.14	0	175018.3
UIR	1.16	1.32459	0.00333	-50.73	0	174996.6
UI	1.21	1.27935	0.00055	-132.14	0	3902106.6
ALL	1.20	1.28200	0.00055	-143.59	0	4252202.5
DI-UIR	-0.10	0.04629	0.00537	-26.45	0	175007.5

JAV_MAC_M0_RALT_20000_st_fdr005
../additional_files/javierre_2016/tad_regions_hg38/hglft_genome_TADs_MAC_M0_hg38.bed
100
../DICer_interactions/ST_FDR005/CHC/JAV_MAC_M0_RALT_20000_st_fdr005_evaluated_and_categorized_interactions.tsv.gz

I_CAT	OBS	MEAN_RAND	STD_RAND	Z_SCORE	ST_OBS	I_NUM
DIX	0.97	2.06611	0.39446	-2.78	0	138.0
DI	1.26	1.66394	0.00475	-84.98	0	216756.0
UIR	1.38	1.60272	0.00296	-76.74	0	216699.2
UI	1.45	1.55852	0.00052	-211.38	0	4665131.7
ALL	1.44	1.56136	0.00049	-238.97	0	5098724.9
DI-UIR	-0.12	0.06122	0.00601	-29.33	0	216727.6


JAV_MON_RALT_20000_st_fdr005
../additional_files/javierre_2016/tad_regions_hg38/hglft_genome_TADs_MON_hg38.bed
100
../DICer_interactions/ST_FDR005/CHC/JAV_MON_RALT_20000_st_fdr005_evaluated_and_categorized_interactions.tsv.gz

I_CAT	OBS	MEAN_RAND	STD_RAND	Z_SCORE	ST_OBS	I_NUM
DIX	0.00	1.00341	0.64104	-1.57	0	37.0
DI	0.94	1.26295	0.00619	-51.51	0	90767.1
UIR	1.09	1.26568	0.00460	-39.11	0	90743.5
UI	1.16	1.25071	0.00056	-166.40	0	3340359.4
ALL	1.15	1.25110	0.00056	-172.86	0	3521906.9
DI-UIR	-0.14	-0.00274	0.00800	-17.40	0	90755.3


JAV_NEU_RALT_20000_st_fdr005
../additional_files/javierre_2016/tad_regions_hg38/hglft_genome_TADs_NEU_hg38.bed
100
../DICer_interactions/ST_FDR005/CHC/JAV_NEU_RALT_20000_st_fdr005_evaluated_and_categorized_interactions.tsv.gz

I_CAT	OBS	MEAN_RAND	STD_RAND	Z_SCORE	ST_OBS	I_NUM
DIX	0.00	0.00000	0.00000	nan	0	12.0
DI	0.83	1.02497	0.00615	-32.49	0	79733.2
UIR	0.95	1.03242	0.00396	-21.58	0	79677.6
UI	0.99	1.01992	0.00050	-60.41	0	3392140.5
ALL	0.99	1.02015	0.00050	-64.56	0	3551563.3
DI-UIR	-0.12	-0.00745	0.00720	-15.89	0	79705.4


JAV_ERY_RALT_20000_st_fdr005
../additional_files/javierre_2016/tad_regions_hg38/hglft_genome_TADs_ERY_hg38.bed
100
../DICer_interactions/ST_FDR005/CHC/JAV_ERY_RALT_20000_st_fdr005_evaluated_and_categorized_interactions.tsv.gz

I_CAT	OBS	MEAN_RAND	STD_RAND	Z_SCORE	ST_OBS	I_NUM
DIX	1.88	2.10619	0.58645	-0.38	20	51.0
DI	1.12	1.47763	0.00507	-70.37	0	183836.7
UIR	1.23	1.42556	0.00294	-68.01	0	183788.6
UI	1.27	1.38478	0.00043	-254.74	0	4507833.8
ALL	1.27	1.38737	0.00043	-272.74	0	4875510.0
DI-UIR	-0.10	0.05207	0.00588	-26.66	0	183812.6

JAV_MK_RALT_20000_st_fdr005
../additional_files/javierre_2016/tad_regions_hg38/hglft_genome_TADs_MK_hg38.bed
100
../DICer_interactions/ST_FDR005/CHC/JAV_MK_RALT_20000_st_fdr005_evaluated_and_categorized_interactions.tsv.gz

I_CAT	OBS	MEAN_RAND	STD_RAND	Z_SCORE	ST_OBS	I_NUM
DIX	0.20	1.62956	0.29465	-4.85	0	145.0
DI	1.17	1.57886	0.00444	-93.09	0	197773.3
UIR	1.28	1.51568	0.00297	-79.86	0	197740.6
UI	1.35	1.47993	0.00049	-260.48	0	4516837.9
ALL	1.35	1.48251	0.00048	-283.00	0	4912496.7
DI-UIR	-0.11	0.06318	0.00541	-32.56	0	197756.9
```

### Randomize TAD boundaries 1

For each chromosome, a corresponding number of random TAD boundaries is randomly selected from the entire sequence of the chromosome.

In [39]:
tbs = TadBoundarySet(tad_boundary_bed_file = tad_boundary_bed_file, chr_size_file = chr_size_file)

spanned_boundary_length_obs_dict = get_sum_of_spanned_boundaies_and_total_length(
    d11_interaction_set=d11_interaction_set,
    tbs=tbs)

spanned_boundary_length_random_lists_dict = {
    'DIX': {
        'I_NUM': [], 
        'I_DIST': [], 
        'SB_NUM': []       
    },
        'DI': {
        'I_NUM': [],
        'I_DIST': [], 
        'SB_NUM': []        
    },
    'UIR': {
        'I_NUM': [],
        'I_DIST': [], 
        'SB_NUM': []          
    },
    'UI': {
        'I_NUM': [],
        'I_DIST': [], 
        'SB_NUM': []          
    },
    'ALL': {
        'I_NUM': [],
        'I_DIST': [], 
        'SB_NUM': []      
    }
}
for random_seed in range(0,iter_num):
    print('Iteration: ' + str(random_seed))
    tbs.randomize_tad_boundaries(random_seed=random_seed)
    spanned_boundary_length_random_dict = get_sum_of_spanned_boundaies_and_total_length(
    d11_interaction_set=d11_interaction_set,
    tbs=tbs)
    for i_cat in ['DIX', 'DI', 'UIR', 'UI', 'ALL']:
        for num_type in ['I_NUM', 'I_DIST', 'SB_NUM']:
            spanned_boundary_length_random_lists_dict[i_cat][num_type].append(spanned_boundary_length_random_dict[i_cat][num_type])

print("Done.")

Iteration: 0
Iteration: 1
Iteration: 2
Iteration: 3
Iteration: 4
Iteration: 5
Iteration: 6
Iteration: 7
Iteration: 8
Iteration: 9
Done.


In [40]:
print(OUT_PREFIX)
print(tad_boundary_bed_file)
print()
print('I_CAT\tN\tOBS\tMEAN_RAND\tSTD_RAND\tZ_SCORE\tST_OBS')
for i_cat in ['DIX', 'DI', 'UIR', 'UI', 'ALL']:
    n = len(spanned_boundary_length_random_lists_dict[i_cat]['I_NUM'])
    observed = (spanned_boundary_length_obs_dict[i_cat]['SB_NUM'] / spanned_boundary_length_obs_dict[i_cat]['I_DIST'])
    mean = np.mean(spanned_boundary_length_random_lists_dict[i_cat]['SB_NUM']) / spanned_boundary_length_obs_dict[i_cat]['I_DIST']
    std = np.std(spanned_boundary_length_random_lists_dict[i_cat]['SB_NUM']) / spanned_boundary_length_obs_dict[i_cat]['I_DIST']
    z_score = (observed - mean) / std
    st_obs = 0 # Find number of smaller than observed
    for sb in spanned_boundary_length_random_lists_dict[i_cat]['SB_NUM']:
        if sb < spanned_boundary_length_obs_dict[i_cat]['SB_NUM']:
            st_obs += 1
    print(i_cat + '\t' + str(n) + '\t' + str(1000000*observed) + '\t' + str(1000000*mean) + '\t' + str(1000000*std) + '\t' + str(z_score) + '\t' + str(st_obs))

JAV_MK_RALT_20000_st_fdr005
../additional_files/javierre_2016/tad_regions_hg38/hglft_genome_TADs_MK_hg38.bed

I_CAT	N	OBS	MEAN_RAND	STD_RAND	Z_SCORE	ST_OBS
DIX	10	0.19921300900791422	1.2650026072002554	0.3647557010749146	-2.9219271831845774	0
DI	10	1.165488985611845	1.1698296876633323	0.026965444249828455	-0.1609727624463177	3
UIR	10	1.2784624730896113	1.1751723249423383	0.015860504027395472	6.512412718338732	10
UI	10	1.3511935003644306	1.1737061573979823	0.013248789900107841	13.396494646277365	10
ALL	10	1.3461871526998521	1.173681148122332	0.013441758001741848	12.833589516725866	10


### Randomize TAD boundaries 2

For each TAD boundary, a random postion is selected from the sourrounding sequence.

In [41]:
tbs = TadBoundarySet(tad_boundary_bed_file = tad_boundary_bed_file, chr_size_file = chr_size_file)

spanned_boundary_length_obs_dict = get_sum_of_spanned_boundaies_and_total_length(
    d11_interaction_set=d11_interaction_set,
    tbs=tbs)

spanned_boundary_length_random_lists_dict = {
    'DIX': {
        'I_NUM': [], 
        'I_DIST': [], 
        'SB_NUM': []       
    },
        'DI': {
        'I_NUM': [],
        'I_DIST': [], 
        'SB_NUM': []        
    },
    'UIR': {
        'I_NUM': [],
        'I_DIST': [], 
        'SB_NUM': []          
    },
    'UI': {
        'I_NUM': [],
        'I_DIST': [], 
        'SB_NUM': []          
    },
    'ALL': {
        'I_NUM': [],
        'I_DIST': [], 
        'SB_NUM': []      
    }
}
for random_seed in range(0,iter_num):
    print('Iteration: ' + str(random_seed))
    #tbs.randomize_tad_boundaries(random_seed=random_seed)
    tbs.randomize_tad_boundaries_2(random_seed=random_seed, random_range=random_range)
    spanned_boundary_length_random_dict = get_sum_of_spanned_boundaies_and_total_length(
    d11_interaction_set=d11_interaction_set,
    tbs=tbs)
    for i_cat in ['DIX', 'DI', 'UIR', 'UI', 'ALL']:
        for num_type in ['I_NUM', 'I_DIST', 'SB_NUM']:
            spanned_boundary_length_random_lists_dict[i_cat][num_type].append(spanned_boundary_length_random_dict[i_cat][num_type])

print("Done.")

Iteration: 0
Iteration: 1
Iteration: 2
Iteration: 3
Iteration: 4
Iteration: 5
Iteration: 6
Iteration: 7
Iteration: 8
Iteration: 9
Done.


In [42]:
print(OUT_PREFIX)
print(tad_boundary_bed_file)
print()
print('I_CAT\tN\tOBS\tMEAN_RAND\tSTD_RAND\tZ_SCORE\tST_OBS')
for i_cat in ['DIX', 'DI', 'UIR', 'UI', 'ALL']:
    n = len(spanned_boundary_length_random_lists_dict[i_cat]['I_NUM'])
    observed = (spanned_boundary_length_obs_dict[i_cat]['SB_NUM'] / spanned_boundary_length_obs_dict[i_cat]['I_DIST'])
    mean = np.mean(spanned_boundary_length_random_lists_dict[i_cat]['SB_NUM']) / spanned_boundary_length_obs_dict[i_cat]['I_DIST']
    std = np.std(spanned_boundary_length_random_lists_dict[i_cat]['SB_NUM']) / spanned_boundary_length_obs_dict[i_cat]['I_DIST']
    z_score = (observed - mean) / std
    st_obs = 0 # Find number of smaller than observed
    for sb in spanned_boundary_length_random_lists_dict[i_cat]['SB_NUM']:
        if sb < spanned_boundary_length_obs_dict[i_cat]['SB_NUM']:
            st_obs += 1
    print(i_cat + '\t' + str(n) + '\t' + str(1000000*observed) + '\t' + str(1000000*mean) + '\t' + str(1000000*std) + '\t' + str(z_score) + '\t' + str(st_obs))

JAV_MK_RALT_20000_st_fdr005
../additional_files/javierre_2016/tad_regions_hg38/hglft_genome_TADs_MK_hg38.bed

I_CAT	N	OBS	MEAN_RAND	STD_RAND	Z_SCORE	ST_OBS
DIX	10	0.19921300900791422	1.7530744792696453	0.32183846556983664	-4.828078792603349	0
DI	10	1.165488985611845	1.4912093960744	0.03380086434439816	-9.636452107963226	0
UIR	10	1.2784624730896113	1.4398972119701914	0.02138710273619661	-7.548228522199966	0
UI	10	1.3511935003644306	1.4081261714671642	0.013258107942166258	-4.29417767234072	0
ALL	10	1.3461871526998521	1.4103449026326884	0.013688582457762128	-4.6869535345097395	0
