-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Dot results in EOFError
on M1 chipsets
#458
Comments
Might need to play with the BLAS flags: https://discourse.pymc.io/t/frequently-asked-questions/74/19?u=ricardov94 At this point, it's pretty hard to know if there's anything that can/should be done on the PyTensor side. |
I can sample the 2nd model on my M1 laptop (macbook pro 14 inch 2021, 16GB ram) without error. I also added an extra zero to the size of the data and could still sample. Can you confirm that the users who raised the issue are using Accelerate BLAS for apple silicon? |
@jessegrabowski I'm using a 2021 MBP with M1 Pro chipset and 32GB RAM, but I had NumPy built against OpenBLAS instead of Accelerate. I just retried it with NumPy built against Accelerate, but I still get the same error. As reported in the issue on the Bambi repo, passing Also, since Tomás didn't have version details, this has been happening on Python 3.10.9 as well as 3.11.5, with the following package versions: pytensor==2.16.3
pymc==5.8.2
numpy==1.25.2 |
Can you run the pytensor blas checker script ( I ask because the BLAS the numpy uses isn't the important one here, it will be the one that pytensor uses when it compiles your graph to C code. |
When I run the BLAS checker that line reads: |
You might be right, I'm not 100% sure how the library checking works (and there's even a PR to change it underway now). I guess my suspicion that it was BLAS was wrong. Does it sample when you set |
Yeah, sampling on one core works because then there's no multiprocessing (only whatever multithreading BLAS does). |
I was hoping that would throw some other error the multiprocessing error was hiding. Since I can't reproduce it on my end I guess I'm stumped, sorry. I hope someone else with a mac can test your code -- @twiecki ? |
What version of macOS are you on? (I'm on 13.5.2) |
13.1 (22C65) |
As I understand it, forking processes is known to be kind of error-prone, so I wonder if any new issues were introduced between 13.1 and 13.5. Let me know if anything breaks when you eventually upgrade 😁 |
You might want to try with nutpie/numba. |
Using nutpie or numpyro as the sampler backend works just fine, as does sampling on just one core or setting the multiprocessing context to |
Describe the issue:
Matrix multiplication can result in
EOFError
on M1 chipsets. The following example allows to see the issue.Reproducable code example:
Error message:
PyTensor version information:
Context for the issue:
Creating a design matrix with
formulae.design_matrices()
isn't the thing that breaks undermp_ctx='fork'
, but computing the dot product usingpm.math.dot()
does. Computing the dot product using element-wise multiplication and summation works just fine, presumably because that compiles down differently thanpm.math.dot()
.I can still get the model to sample by reducing the size of the dataset, so it's not like
pm.math.dot()
doesn't ever run on M1 chipsets at all, but it does seem like it's not thread/fork-safe and should be used with theforkserver
multiprocessing context instead.Edited to add: The dot-product that
pm.math.dot()
has to compute here is dense, so I think it ends up callingpytensor.tensor.math.dense_dot()
which I think calls an instance ofpytensor.tensor.math.Dot
. I'm not sure what it is in there that isn't thread/fork-safe, maybe memory allocation for calls to BLAS or something like that?Originally posted by @jvparidon in bambinos/bambi#700 (comment)
The text was updated successfully, but these errors were encountered: