-
Notifications
You must be signed in to change notification settings - Fork 776
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UMAP hangs when training in a multiprocessing.Process #707
Comments
This may be an interaction between numba and multi-processing. That's certainly beyond my expertise unfortunately. I don't see why it should be a problem though. Definitely this is challenging to debug. Sorry I can't be more help. |
Opened a SO post here: https://stackoverflow.com/questions/68131348/training-a-python-umap-model-hangs-in-a-multiprocessing-process Hoping there's more insight. Will post updates as I debug. |
Ok so after many print statements, the hang looks like it starts here:
The shape of X at this point is (100, 100). It works fine outside of the Process; no such luck inside the Process. Interestingly, I discovered that in the minimal example above if I don't run the 'normal' version, the 'multiprocess' version works fine. That's....really unintuitive. Numba might be maintaining state somewhere? Unfortunately this doesn't help me much, because I still get hangs in the multiprocessing version. More digging required... |
Like many umap/numba problems, switching to a different backend fixed the problem. I was previously using workqueues, which would just hang. I switched to 'omp', which showed me an actual error:
Switching to tbb seemed to work with the minimal example above, though I had a fair bit of trouble getting tbb to actually load (see: numba/numba#7148) I'll close this out, but this was definitely a weird interaction between numba and some other multiprocessing stuff. Seems brittle, but not really sure what's to be done about it 🤔 |
Glad you found a solution, but it definitely seems brittle. In general the tbb backend seems to fix most problems, but it sadly is not the default a lot of the time for users. |
Hey Leland,
Thanks for the great library.
I've got a strange error. Looks like umap training completely hangs if it is run inside a multiprocessing.Process. Minimum example on py3.8.5:
This results in the following output:
after which I have to cntrl-C because nothing happens.
Any ideas what is going on?
The text was updated successfully, but these errors were encountered: