Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NumExpr detected 40 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8. #48

Closed
Karimi-81 opened this issue May 15, 2021 · 1 comment
Labels
question Further information is requested

Comments

@Karimi-81
Copy link

Hi there,
I am trying to visualize a Hi-C contact map from Instagraal. The size of genome is large (~2.6Gb) and the analysis takes a long time. Although I defined -t 40 in the job script, program just used 8 threads :
hicstuff view -t 40 --normalize --binning 5kb --frags fragments_list.txt abs_fragments_contacts_weighted.txt --output map5k.png
INFO :: Note: NumExpr detected 40 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO :: NumExpr defaulting to 8 threads.

The main problem is time consuming process, which takes more than 30 h on a node with 40 cpu and 187 G memory. I wonder how can I adjust the --binning parameter for such large genome?
Is there any way to adjust maximum number of threads?
I installed hicstuff 3.0.1 using pip install.

Thanks

@cmdoret
Copy link
Member

cmdoret commented May 20, 2021

Hi @Karimi-81,

First, the hicstuff view command is mostly single-threaded, the -t option stands for --trim, you can always check what each option does using hicstuff view --help. Here we can disregard the NumExpr info message.

Your genome is large and you want to view the entire map. The default DPI (dots per inch) is 300, meaning your image will have 300 pixels for each inch (it can be adjusted with the --dpi option). With the default size of matplotlib figures (6.4x4.8 inches), this means your figure will have 1920x1440 pixels.

If you bin a 2.6Gb genome at 5kb, you will generate a matrix of 520,000x520,000 pixels. You are computing lots of values, but they will not be visible in the final image anyways.

You have two options:

  • massively increasing the DPI to have an extremely high resolution (and heavy) image
  • Choose the binning to target a reasonable resolution on the final image.

I would strongly recommend the second option, this will make everything must faster and easier and avoid having a >40GB figure. In your case if you really want to view the whole genome matrix, you'd want a ~ 2000x2000 image, so you would have to bin at 2.6Gb / 2000 = 1.3Mb

Of course you could increase DPI a bit and bin at a finer resolution in consequence. If you would like to interactively zoom and explore regions of the map at very high resolutions, I would recommend getting your matrix to cool format and using cooler show instead [github], which can load only the visible part of the matrix in memory.

@cmdoret cmdoret added the question Further information is requested label Oct 26, 2021
@ABignaud ABignaud closed this as completed Oct 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants