Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

thread usage dependence on cpu architecture #2205

Open
memphizz opened this issue Jan 31, 2020 · 7 comments
Open

thread usage dependence on cpu architecture #2205

memphizz opened this issue Jan 31, 2020 · 7 comments
Labels
lazyflow issues related to the lazyflow codebase

Comments

@memphizz
Copy link

Hi,
I have noticed a significant difference in CPU utilization and thread usage by Ilastik 1.3.3post2 in interactive mode (GUI) while training an autoContext classifier on two linux machines. One system has 36-vcores intel i9 and the other is a quad CPU intel with 192-vcores. CPU utilization on Intel i9 system is consistent with the LAZYFLOW_THREAD and memory parameters (at time reaches 100% CPU utilization), CPU utilization on the system with 192-vcores is max outs at 8 threads running the same project file independent of LAZYFLOW_THREAD.

I tried to undo the change in this commit:
ilastik/lazyflow@77576cc
but that didn't change cpu utilization

btw, thank you for all your efforts developing ilastik. Ilastik was a key tool in a recent paper we published: https://science.sciencemag.org/content/367/6475/eaaz5357.editor-summary

Abbas

@k-dominik
Copy link
Contributor

Hey @memphizz,

first of all very nice that ilastik was useful for you! Didn't have time to read that one yet, but the visuals look awesome :)

as for your question, we have noticed this as well (some benchmark runs in this issue. I don't understand why the 192 vcore machine would max out at 8 threads - the LAZYFLOW_THREAD setting should be respected in any case. The biggest machine I could get my hands on so far has 20 vcores. On that one (see also the thread referenced above) we didn't see any performance improvements beyond 20 threads, but also found that RAM usage made quite the difference.

@memphizz
Copy link
Author

memphizz commented Feb 3, 2020

Hi @k-dominik,
I am familiar with the benchmark post you shared. You may recall that I posted a related question about user-defined volume partitioning https://forum.image.sc/t/ilastik-headless-userdefined-subdivision-of-3d-image-volume-in-pixel-classification/19241/3. The machine with 192-vcores (4x INTEL XEON PLATINUM 8160) has 1.5TB of ram which is more than the recommended memory per thread requirement, memory limitation should be minimal.

Is there a way to find out which steps of the workflow are the bottleneck? I have tried changing all of the parameters you mentioned in the post without any luck.

btw, the science cover visual was all done using open-source tools. ilastik for segmentation, paraview for mesh generation and blender for rendering. We are gearing up to do a bigger study. Speeding up the interactive training is going to be a big help.

@memphizz
Copy link
Author

@k-dominik Do you have something like a flowchart that explains data flow and parallelization in ilastik auto-context segmentation?

@k-dominik
Copy link
Contributor

k-dominik commented Feb 10, 2020

Hi @memphizz,

sorry I meant to get back to you a lot earlier, but I was traveling...

btw, the science cover visual was all done using open-source tools. ilastik for segmentation, paraview for mesh generation and blender for rendering. We are gearing up to do a bigger study. Speeding up the interactive training is going to be a big help.

That's awesome! OSS ftw!

The machine with 192-vcores (4x INTEL XEON PLATINUM 8160) has 1.5TB of ram which is more than the recommended memory per thread requirement, memory limitation should be minimal.

impressive machine, I would say it is extremely weird that you don't see any performance there beyond 8 threads... that's just really bad. Have you confirmed that the threadpool is initialized with 8 threads there only, or are there many more and they just idle?

ilastik works blockwise through the data. Blocking is governed by BigRequestStreamer. So it will only request a roi from the result. What part of the image has to be touched for that and all the intermediate results are determined by the computational graph. Per default it will have as many concurrent batches as there are available threads in the threadpool. Easiest would probably to check the number of batches there for your export.

@memphizz
Copy link
Author

memphizz commented Feb 11, 2020

Hi @k-dominik, thanks for the response. How can I check the number of threads actually initialized in the thread pool?
I typically run ilastik like this:
/ilastik-1.3.3post2-Linux$ LAZYFLOW_THREADS=100 LAZYFLOW_TOTAL_RAM_MB=512000 ./run_ilastik.sh
But I don't know a way to check the threadpool while in ilastik GUI.

@k-dominik
Copy link
Contributor

aye, sorry, I only touched upon the headless case. Volumina (the viewer) sends requests off via the same threadpool as the computation uses, so you could maybe just put a print statement in the init method of the threadpool to ensure it actually gets enough threads:

https://github.com/ilastik/lazyflow/blob/8d58bc148a5a8f8588187fab4aa3bf644b678090/lazyflow/request/threadPool.py#L39

here in the init put maybe something like print("Theadpool number of threads:", num_workers)
so that you can at least see that it gets initialized correctly.

When ilastik is running you can also use a tool like htop to see how many threads the process has spawned

@memphizz
Copy link
Author

I inserted the print at the end of init.here is the output of the print:

LAZYFLOW_THREADS=40  ./run_ilastik.sh
Warning: Ignoring your non-empty LD_LIBRARY_PATH
Could not find /home/sablagrp/ashirini/Downloads/ilastik-1.3.3post2-Linux/python-scripts/ilastik-install. ignoring.
Starting ilastik from "/home/sablagrp/ashirini/Downloads/ilastik-1.3.3post2-Linux".

(python:43962): Gtk-WARNING **: 07:41:22.100: Unable to locate theme engine in module_path: "adwaita",
Theadpool number of threads: 8
Theadpool number of threads: 40

Not sure why it printed both 8 and 40. It prints 8 twice when I set the threads to 8. The function initialized twice?

I repeated a test (training an auto-contex model in 3D 960x960x85 8bit 18 features stage 1 and 54 features stage 2) with threads set to 8 and 40 and 192:
8 threads: thread usage was about 6 but reaches 8 at times.
40 threads: thread usage was about 12 but reaches 25 at times.
192 threads: thread usage was about 8 but reaches 12 at times.

it seems that setting threads takes effect, but thread usage is limited by other factors. Any idea what's happening?

@m-novikov m-novikov transferred this issue from ilastik/lazyflow Mar 20, 2020
@m-novikov m-novikov added the lazyflow issues related to the lazyflow codebase label Mar 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lazyflow issues related to the lazyflow codebase
Projects
None yet
Development

No branches or pull requests

3 participants