You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi there,
I am trying to visualize a Hi-C contact map from Instagraal. The size of genome is large (~2.6Gb) and the analysis takes a long time. Although I defined -t 40 in the job script, program just used 8 threads :
hicstuff view -t 40 --normalize --binning 5kb --frags fragments_list.txt abs_fragments_contacts_weighted.txt --output map5k.png
INFO :: Note: NumExpr detected 40 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO :: NumExpr defaulting to 8 threads.
The main problem is time consuming process, which takes more than 30 h on a node with 40 cpu and 187 G memory. I wonder how can I adjust the --binning parameter for such large genome?
Is there any way to adjust maximum number of threads?
I installed hicstuff 3.0.1 using pip install.
Thanks
The text was updated successfully, but these errors were encountered:
First, the hicstuff view command is mostly single-threaded, the -t option stands for --trim, you can always check what each option does using hicstuff view --help. Here we can disregard the NumExpr info message.
Your genome is large and you want to view the entire map. The default DPI (dots per inch) is 300, meaning your image will have 300 pixels for each inch (it can be adjusted with the --dpi option). With the default size of matplotlib figures (6.4x4.8 inches), this means your figure will have 1920x1440 pixels.
If you bin a 2.6Gb genome at 5kb, you will generate a matrix of 520,000x520,000 pixels. You are computing lots of values, but they will not be visible in the final image anyways.
You have two options:
massively increasing the DPI to have an extremely high resolution (and heavy) image
Choose the binning to target a reasonable resolution on the final image.
I would strongly recommend the second option, this will make everything must faster and easier and avoid having a >40GB figure. In your case if you really want to view the whole genome matrix, you'd want a ~ 2000x2000 image, so you would have to bin at 2.6Gb / 2000 = 1.3Mb
Of course you could increase DPI a bit and bin at a finer resolution in consequence. If you would like to interactively zoom and explore regions of the map at very high resolutions, I would recommend getting your matrix to cool format and using cooler show instead [github], which can load only the visible part of the matrix in memory.
Hi there,
I am trying to visualize a Hi-C contact map from Instagraal. The size of genome is large (~2.6Gb) and the analysis takes a long time. Although I defined -t 40 in the job script, program just used 8 threads :
hicstuff view -t 40 --normalize --binning 5kb --frags fragments_list.txt abs_fragments_contacts_weighted.txt --output map5k.png
INFO :: Note: NumExpr detected 40 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO :: NumExpr defaulting to 8 threads.
The main problem is time consuming process, which takes more than 30 h on a node with 40 cpu and 187 G memory. I wonder how can I adjust the --binning parameter for such large genome?
Is there any way to adjust maximum number of threads?
I installed hicstuff 3.0.1 using pip install.
Thanks
The text was updated successfully, but these errors were encountered: