Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need better heuristic for default value of LAZYFLOW_THREADS #1458

Open
stuarteberg opened this issue Apr 5, 2017 · 7 comments
Open

Need better heuristic for default value of LAZYFLOW_THREADS #1458

stuarteberg opened this issue Apr 5, 2017 · 7 comments

Comments

@stuarteberg
Copy link
Member

stuarteberg commented Apr 5, 2017

Lou Scheffer was experiencing poor interactive performance during pixel classification on a large 2D image. But on my machine, I get good performance, even using the same project file. The most notable difference between our two systems is the number of CPUs: Lou has 48 (presumably, half are hyper-threads).

We timed (with a stopwatch) how long it took to completely predict all tiles of a 12577x750 image using various settings of LAZYFLOW_THREADS. We found that the benefit to using more CPUs disappears rather quickly. Maximum performance is achieved around 4-8 CPUs. Using all 48 CPUs is nearly as bad as using just a single thread.

I believe that our upcoming switch to 512px tiles will improve the situation, but we should still come up with a better heuristic for how to set LAZYFLOW_THREADS if the user hasn't set it themselves. Otherwise, users with beefy workstations are likely to experience worse performance than users with modern laptops.

LAZYFLOW_THREADS Time (seconds)
1 95
2 63
4 37
8 35
16 41
32 60
48 (max) 72

Image Size: (12577, 750)
Features: All
Prediction classes: 2
Viewer tile size: 256x256
OS: Fedora 20
ilastik version: 1.2.0

screen shot 2017-04-05 at 5 30 10 pm

@stuarteberg
Copy link
Member Author

There is some hope that the situation will magically improve when we upgrade to Python 3, since it uses the "New GIL" implementation:

http://www.dabeaz.com/python/NewGIL.pdf

But it's difficult to say for sure.

@akreshuk
Copy link
Member

I'm writing it here, since it might be related and the old issue is closed now. Our switch to 512x512 tiles for 2D gave a very noticeable speed-up there, but, as I just found, also a very noticeable slow-down for 3D data. First of all, we should make the tile size conditional on the dataset dimensions, but it would be good to understand this behaviour in principle.

stuarteberg added a commit to ilastik/lazyflow that referenced this issue Apr 27, 2017
@akreshuk
Copy link
Member

akreshuk commented Jan 12, 2018

Looks like the default behavior only got worse in the new version. Here is a screenshot of my machine doing autocontext. The whole thing feels slow and most of the time it's running less than 8 cores and at less than 100 %.
workspace 1_006

@k-dominik
Copy link
Contributor

k-dominik commented Sep 5, 2018

So I did some benchmarking on a machine with 20 cores, 40 threads.
Task was pixel classification on some cremi sample dataset. In addition to varying n_threads, I also varied the amount of RAM for lazyflow, with some interesting results:

In short, I could reproduce the behavior that @stuarteberg showed in the first post. However, with more RAM, ilastik could take more and more threads, without slowing down.

I guess we see three effects:

  1. With small amounts of ram and many threads, we get very small block sizes -> halo slows us down
  2. With a small number of threads and loads of ram, threads will start concurrently and finish approximately at the same time. When writing occurs, the other threads are also more or less finished and wait for writing to finish.
  3. at some point (all threads more or less running all the time, one thread writing all the time) we run into saturation: can only be overcome by parallel writing.

So in the end, we should find a better heuristic to set both, LAZYFLOW_THREADS and LAZYFLOW_TOTAL_RAM_MB, jointly.

ram-16384
ram-65536
ram-131072
ram-240000

@imagesc-bot
Copy link

This issue has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/notable-memory-usage-difference-when-running-ilastik-in-headless-mode-on-different-machines/41144/2

@imagesc-bot
Copy link

This issue has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/cpu-and-ram-core-limit-for-ilastik/52428/2

@imagesc-bot
Copy link

This issue has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/multiphase-segmentation-and-other-questions/78696/4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants