# Conversion from `.hic` to holoSEq `hseq` format

This Jupyter notebook shows an example of converting a .hic file directly from GEO (Gene Expression Omnibus), and then launching a panel server by the command line to view it.

This notebook has been checked out via git, to get the entire repository of code.

In [None]:
# run this to install all required dependencies before running the main code if not already in the venv
# If you made a kernel out of the package's venv, you can skip this step
! pip install datashader 'dask[dataframe]' 'holoviews[recommended]' pandas matplotlib bokeh hic-straw

# Configuration
The following paths can be changed to point to other samples or output names

In [1]:
## Name of the sample to put as metadata in the hseq file
HIC_TITLE = "A001C007"
## URL to the hic file  
HIC_URL = "https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM6326nnn/GSM6326543/suppl/GSM6326543%5FA001C007%2Ehg38%2Enodups%2Epairs%2Ehic"
## Local download path
HIC_FILE = "{HIC_TITLE}_hic.txt.gz"

## Output file names
hseq_filename = f"{HIC_TITLE}_hseq.txt.gz"
lenfile_name = f"{HIC_TITLE}_hseq.txt.gz.len"

## Number of chromosomes to include in the hseq file.
## In the example file, the first two chromosomes are the "ALL" catchall and the mitochondrial chromosome.
## Set this to 0 to convert all chromosomes.
MAX_CHROM = 5

In [2]:
## Load the conversion code from the holoSeq repository
import sys
sys.path.append("../scripts")
import hic2hseq

# Download from GEO
The given example file is about a 5GB download.

In [None]:
from urllib.request import urlretrieve
urlretrieve(HIC_URL, HIC_FILE)

# Convert to hseq
Conversion of the entire 5GB .hic file takes 10-20 minutes, and the output is about 500MB. Fewer chromosomes will convert faster and have a smaller output file.

In [None]:
lenfile_stream = open(lenfile_name, mode="w")
with hic2hseq.GzipOut(hseq_filename) as ostream:
    hic2hseq.convert_hic_to_hseq(HIC_FILE, ostream, lenfile_stream, lenfile_name, MAX_CHROM, HIC_TITLE)

# Start the panel server

In [None]:
!panel serve ../scripts/holoseq_display.py --show --args --inFile {HIC_TITLE}_hseq.txt.gz --size 1000