SPIN (Spatial Position Inference of the Nuclear genome) is a integrative computational method to identify genome-wide chromosome localization patterns relative to multiple nuclearcompartments using TSA-seq, DamID, and Hi-C data.
SPIN requires the following Python packages to be installed:
- Python (tested on version 3.6)
scikit-learn
(tested on version 0.22.2)NumPy
SciPy
pandas
pickle
Juicer tools is required to extract Hi-C data from .hic files. Requires Java Runtime Engine installed on your file system.
After install all dependencies, run the following python command:
python main.py -i <input_signal> --hic <hic_interactions> -w <window_size> -n <number_of_states> -o <output_path> -g <genome_bin> [--prev <previous_model>] [--save]
The options:
- -i <input_signal> : 1D genomic measurements of nuclear organization.
input_signal
file should be a tab-separated text file where each line is a vector of 1D genomic measurements. For example (header not included):
signal1 | signal2 | signal3 | signal4 |
---|---|---|---|
0.4209 | -0.3468 | -0.0405 | 0.01026 |
0.9098 | -0.7316 | -0.0961 | 0.0224 |
1.3589 | -1.0421 | -0.2001 | 0.0229 |
1.4688 | -1.1082 | -0.2716 | 0.0105 |
1.4552 | -1.1599 | -0.3605 | -0.0345 |
1.3385 | -1.1504 | -0.3953 | -0.0727 |
... | ... | ... | ... |
- --hic <hic_interactions> : List of Hi-C interactions added as edges in the model.
hic_interactions
file should be a tab-separated text file where the first two columns are the index numbers of genomic bins (needs to be consistent withgenome_bin
) and the third column is edge weight (opitional). For example (header not included):
bin1 | bin2 | weight |
---|---|---|
1 | 7 | 1.0 |
3 | 5 | 1.0 |
4 | 20 | 1.0 |
1 | 299 | 1.0 |
7 | 312 | 1.0 |
... | ... | ... |
-
-w <window_size> : Window size of each genome bin. Default = 100000
-
-n <number_of_states> : Number of states to estimate for SPIN. Default = 5
-
-o <output_path> : Output path.
-
-g <genome_bin>: Genomic coordinates of each bin.
genome_bin
file should be a tab-separated text file where the first three columns are the genomic coordinates of each bin and the fourth column is the index number (starting from zero). For example (header not included):
chr | start | end | index number |
---|---|---|---|
chr1 | 0 | 25000 | 0 |
chr1 | 25000 | 50000 | 1 |
chr1 | 50000 | 75000 | 2 |
chr1 | 75000 | 100000 | 3 |
chr1 | 100000 | 125000 | 4 |
... | ... | ... | ... |
-
--prev <previous_model>: (opitional) Load previously saved model.
-
--save : (opitional) Save curent model to .pkl file.
Example:
python main.py -i input_chr1.txt --hic Hi-C_chr1.txt -w 25000 -n 5 -m <mode> -o example_chr1 -g bin_chr1.bed --save
Predicted states state_n
can be found under output_path
folder. To convert the results to bed format:
merge2bed.sh genome_bin_file state_n output.bed
The bed file can be manually uploaded as custom annotation tracks to UCSC genome browser for visualization. An additional header must be added as the first line of the file in the following format:
track name='<track_name>' description='<description>' itemRgb='On'
Then go to a UCSC genome browser view, click the "add custom tracks" or "manage custom tracks" button below the tracks window. On the Add Custom Tracks page, load the annotation track data or URL for your custom track into the upper text box and the track documentation (optional) into the lower text box, then click the "Submit" button.
It is recommended to use distinguishable colors for different states. For example:
State | itemRgb |
---|---|
State_1 | 228,26,28 |
State_2 | 55,126,184 |
State_3 | 77,175,74 |
State_4 | 152,78,163 |
State_5 | 255,127,0 |
... | ... |
Additional BED fields can be added (with 9th column showing the color in RGB value). For example (header not included):
chrom | chromStart | chromEnd | name | score | strand | thickStart | thickEnd | itemRgb |
---|---|---|---|---|---|---|---|---|
chr1 | 0 | 725000 | State_1 | 0 | . | 0 | 725000 | 102,194,165 |
chr1 | 725000 | 875000 | State_2 | 0 | . | 725000 | 875000 | 252,141,98 |
chr1 | 875000 | 2550000 | State_3 | 0 | . | 875000 | 2550000 | 141,160,203 |
chr1 | 2550000 | 2650000 | State_2 | 0 | . | 2550000 | 2650000 | 252,141,98 |
chr1 | 2650000 | 2800000 | State_1 | 0 | . | 2650000 | 2800000 | 102,194,165 |
chr1 | 2800000 | 2925000 | State_2 | 0 | . | 2800000 | 2925000 | 252,141,98 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
Use the bedToBigBed utility to create a bigBed file.
bedToBigBed input.bed chrom.sizes myBigBed.bb
It is also recommended to use track bubs for more configurable custom annotations. See the track hub help page for more information.
In addition, tracks can be visualized on Nucleome Browser. See the Nucleome Browser help page for more instructions.
SPIN annotations for K562 can be found here.
If you use SPIN in your work
@article{wang2020spin,
title={SPIN reveals genome-wide landscape of nuclear compartmentalization},
author={Wang, Yuchuan and Zhang, Yang and Zhang, Ruochi and van Schaik, Tom and Zhang, Liguo and Sasaki, Takayo and Peric-Hupkes, Daniel and Chen, Yu and Gilbert, David M and van Steensel, Bas and others},
journal={bioRxiv},
year={2020},
publisher={Cold Spring Harbor Laboratory}
}