-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: optimize: COBYLA hangs on some CPUs #15527
Comments
Hi @apixandru, thanks for reporting. Did you try the latest release of SciPy? The development version? Also please remove the zip file, it's a security issue. |
@tupui 1.8.0 still hangs but i'm not sure how to install The development version. I'm getting The |
Installing a dev environment is more complicated than this. You have to follow a guide from our doc: https://scipy.github.io/devdocs/dev/contributor/contributor_toc.html#development-environment Basically |
@tupui yep, it's dead on dev too
|
I tried precomputing the results of all my numpy,.dot computations and having the constraints be just popping the value from the list and that worked
Then i tried running just the numpy.dot function in isolation for all the inputs that i'm sending here and that also worked Seems like it has to be a combination of both for the issue to reproduce in certain cloud environments. |
The issue still occurs even in the latest available version
|
Describe your issue.
A couple of weeks back we have noticed that some of our workloads have been hanging.
I attached the code that reproduces the bug
Works with any consumer grade Intel CPUs that I tried. It only has problems on servers.
Works with any AMD machine that I tried (consumer or on servers)
Works with Intel Xeon Haswell CPUs E5-2676 v3 (Amazon AWS t2.micro instance or Google GCP N1 instance)
Does not work under any circumstances with Cascade Lake CPUs, for instance Xeon(R) Platinum 8252C CPU (Amazon AWS m5zn.large or Google CGP N2 / C2 with Cascade Lake)
Works (somewhat) with Intel Xeon Ice Lake Cpus, for instance Xeon(R) Platinum 8375C (Amazon AWS
m6i.large or Google GCP N2 with Ice Lake)
On Ice Lake it only works if you specify the numpy dependency as
numpy==1.20.3
, if you let scipy download1.22.2
then it will hangOn Cascade Lake it hangs regardless of the versions and on Haswell it works regardless of the versions.
Here's a sample docker file if you want deploy it as a docker
Reproduces with every version of scipy since 1.4.0
Sample Dockerfile (our original dockerfile does actually compile python from source but I used this as an example to keep it very simple, it reproduces this way too)
To reproduce on Ice Lake/Cascade Lake cpus
requirements.txt (pretty much any scipy version since 1.4.0)
To reproduce working on Ice Lake but hanging on Cascade Lake
I included the Dockerfile for convenience, the issue reproduces if i execute the python process in the VM without docker.
Reproducing Code Example
main.py
Error message
There is no error, the process just hangs and the shell that started it becomes completely unusable, there appears to be some sort of access violation and the whole process is corrupt.
Ice Lake example (It hangs and no amount of Ctrl+C will you get you out of this one, that shell session is dead)
Haswell example (It completes successfully)
SciPy/NumPy/Python version information
1.7.3 1.22.2 sys.version_info(major=3, minor=8, micro=10, releaselevel='final', serial=0)
The text was updated successfully, but these errors were encountered: