In [3]:
# Create directory for output files generated in this notebook 
NOTEBOOK_RESULTS_DIR = 'results/usage_of_pooler'
%mkdir -p $NOTEBOOK_RESULTS_DIR

# `pooler.py`: Combining read counts from different biological replicates

In this notebook, we describe how to use the python script `pooler.py` to combine interactions from different Diachromatic interaction files. Two interactions are considered equal if they have the same digest coordinates and the read pair counts are summed up separately by type. For instance, the two interactions:

```
chr2	95043367	95054745	E	chr2	121918565	121924527	N	5:2:8:0
chr2	95043367	95054745	E	chr2	121918565	121924527	N	4:1:7:2
```

will be combined to:

```
chr2	95043367	95054745	E	chr2	121918565	121924527	N	9:3:15:2
```

It does not matter whether the interactions occur in the same file or in different files. This condition can be met by first applying the `pooler.py` script to individual files and then to different files. In this notebook, we assume that interactions occur only once within idividual files.

For testing purposes, we have prepared four small Dichromatic interaction files. The first file contains only one interaction, which is also contained in the other three files, but with different read pair counts.

In [28]:
!gunzip -c ../../tests/data/test_01/diachromatic_interaction_file_r1.tsv.gz

chr1	46297999	46305684	E	chr1	51777391	51781717	N	1:1:1:0


The second file contains another interaction, which is also contained in two other files.

In [29]:
!gunzip -c ../../tests/data/test_01/diachromatic_interaction_file_r2.tsv.gz

chr1	46297999	46305684	E	chr1	51777391	51781717	N	2:0:1:0
chr17	72411026	72411616	N	chr17	72712662	72724357	N	3:0:1:1


The third file contains another interaction, which is also contained in another file.

In [30]:
!gunzip -c ../../tests/data/test_01/diachromatic_interaction_file_r3.tsv.gz

chr1	46297999	46305684	E	chr1	51777391	51781717	N	0:2:1:0
chr17	72411026	72411616	N	chr17	72712662	72724357	N	3:0:0:2
chr7	69513952	69514636	N	chr7	87057837	87061499	E	3:1:1:2


The fourth file contains another interaction that is not contained in any other file.

In [31]:
!gunzip -c ../../tests/data/test_01/diachromatic_interaction_file_r4.tsv.gz

chr1	46297999	46305684	E	chr1	51777391	51781717	N	1:1:1:0
chr17	72411026	72411616	N	chr17	72712662	72724357	N	3:0:2:0
chr7	69513952	69514636	N	chr7	87057837	87061499	E	2:2:2:1
chr11	47259263	47272706	N	chr11	91641153	91642657	E	3:2:1:3


The Python script `pooler.py` can be used to combine the interactions from all files.

In [4]:
%run ../../pooler.py \
--out-prefix $NOTEBOOK_RESULTS_DIR/DEMO \
--required-replicates 2 \
--interaction-files-path ../../tests/data/test_01/

[INFO] Input parameters
	[INFO] --out-prefix: results/usage_of_pooler/DEMO
	[INFO] --interaction-files-path: ../../tests/data/test_01/
	[INFO] --required-replicates: 2

[INFO] Parsing Diachromatic interaction file ...
	[INFO] ../../tests/data/test_01/diachromatic_interaction_file_r1.tsv.gz
	[INFO] Set size: 1
[INFO] ... done.
[INFO] Parsing Diachromatic interaction file ...
	[INFO] ../../tests/data/test_01/diachromatic_interaction_file_r3.tsv.gz
	[INFO] Set size: 3
[INFO] ... done.
[INFO] Parsing Diachromatic interaction file ...
	[INFO] ../../tests/data/test_01/diachromatic_interaction_file_r4.tsv.gz
	[INFO] Set size: 4
[INFO] ... done.
[INFO] Parsing Diachromatic interaction file ...
	[INFO] ../../tests/data/test_01/diachromatic_interaction_file_r2.tsv.gz
	[INFO] Set size: 4
[INFO] ... done.

[INFO] Writing Diachromatic interaction file ...
	[INFO] Required replicates: 2
	[INFO] Target file: results/usage_of_pooler/DEMO_at_least_2_combined_interactions.tsv.gz
[INFO] ... done.


Here is the content of the resulting Diachromatic interaction file. The interaction on chromosome 11 does not occur because we required an interaction to occur for at least two replicates (`--required-replicates 2`). For the remaining interactions, the four counts from the different files were summed up separately.

In [2]:
!gunzip -c DEMO_at_least_2_combined_interactions.tsv.gz | head

chr1	46297999	46305684	E	chr1	51777391	51781717	N	4:4:4:0
chr17	72411026	72411616	N	chr17	72712662	72724357	N	9:0:3:3
chr7	69513952	69514636	N	chr7	87057837	87061499	E	5:3:3:3


In addition, a file with summary statistics is created.

In [33]:
cat DEMO_at_least_2_combined_summary.txt

[INFO] Input parameters
	[INFO] --out-prefix: DEMO
	[INFO] --interaction-files-path: ../../tests/data/test_01/
	[INFO] --required-replicates: 2

[INFO] Report on reading files:
	[INFO] Read interaction data from 4 files:
		[INFO] 1 interactions from: 
			[INFO] ../../tests/data/test_01/diachromatic_interaction_file_r1.tsv.gz
			[INFO] Minimum number of read pairs: 0
			[INFO] Skipped because less than 0 read pairs: 0
			[INFO] Minimum interaction distance: 0
			[INFO] Skipped because shorter than 0 bp: 0
			[INFO] Added to set: 1
			[INFO] Set size: 1
		[INFO] 3 interactions from: 
			[INFO] ../../tests/data/test_01/diachromatic_interaction_file_r3.tsv.gz
			[INFO] Minimum number of read pairs: 0
			[INFO] Skipped because less than 0 read pairs: 0
			[INFO] Minimum interaction distance: 0
			[INFO] Skipped because shorter than 0 bp: 0
			[INFO] Added to set: 3
			[INFO] Set size: 3
		[INFO] 4 interactions from: 
			[INFO] ../../tests/data/test_01/diachromatic_in