-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Bambi logistic regression fails on M1 silicon #6415
Comments
Does a super simple model
also produce that error? |
no, this works fine. also the bambi t-test example samples without problems. |
@tarmojuristo could you try with this example? import numpy as np
import pymc as pm
size = 100
rng = np.random.default_rng(1234)
x = rng.normal(size=size)
g = rng.choice(list("ABC"), size=size)
y = rng.integers(0, 1, size=size, endpoint=True)
g_levels, g_idxs = np.unique(g, return_inverse=True)
coords = {"group": g_levels}
with pm.Model(coords=coords) as model:
coef_x = pm.Normal("x")
coef_g = pm.Normal("g", dims="group")
p = pm.math.softmax(coef_x + coef_g[g_idxs])
pm.Bernoulli("y", p=p, observed=y)
idata = pm.sample() |
all good: Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 1 seconds. |
If you agree, I can give you a couple of examples you can try until we find the broken part. What if you do import bambi as bmb
import numpy as np
import pymc as pm
data = bmb.load_data("adults")
age_mean = np.mean(data["age"])
age_std = np.std(data["age"])
data["age"] = (data["age"] - age_mean) / age_std
model = bmb.Model("income['>50K'] ~ sex + age", data, family="bernoulli")
model.build()
with model.backend.model:
idata = pm.sample() |
this one fails with the same error |
Ok! Later today I'll give another chunk of PyMC code to test. I'll try to reproduce the PyMC model created by Bambi more closely |
Make sure to avoid model code on the 0th indent level. This can lead to recursion problems with multiprocessing depending on the OS-dependent fork/spawn setting. Just wrap everything in a simple def run():
# do stuff
if __name__ == "__main__":
run() |
Thanks, this is a useful tip - definitely avoids some trouble on my intel-based macbook - but on M1 the above example still fails, even when wrapped into a function. |
+1, I get this error as well when using a Jupyter notebook. |
import bambi as bmb
import pandas as pd
import numpy as np
import pymc as pm
import pytensor.tensor as pt
from formulae import design_matrices
data = bmb.load_data("adults")
age_mean = np.mean(data["age"])
age_std = np.std(data["age"])
data["age"] = (data["age"] - age_mean) / age_std
def logit(x):
"""Logit function that ensures result is in (0, 1)"""
eps = np.finfo(float).eps
result = pt.sigmoid(x)
result = pt.switch(pt.eq(result, 0), eps, result)
result = pt.switch(pt.eq(result, 1), 1 - eps, result)
return result
dm = design_matrices("income['>50K'] ~ sex + age", data)
y = np.array(dm.response)
X = np.array(dm.common)
coords = {"sex_dim": ["Male"]}
with pm.Model(coords=coords) as pm_model:
eta = 0
intercept = pm.Normal("intercept", mu=0, sigma=4.34)
sex = pt.atleast_1d(pm.Normal("sex", mu=0, sigma=5.31, dims="sex_dim"))
age = pt.atleast_1d(pm.Normal("age", mu=0, sigma=2.5, shape=None))
# Concatenate parameters
coefs = pt.concatenate([sex, age])
# Create linear predictor
eta += intercept
eta += pt.dot(X[:, 1:], coefs) # drop first column because that's for the intercept
# Use inverse of link function
p = logit(eta)
pm.Bernoulli("income", p=p, observed=y) This resembles much more closely what's happening under the hood. Could you run it and let me know if it works or fails? pm.model_to_graphviz(pm_model) |
When I sample this one, I once again get the EOFError. |
Now I think we can try "removing" parts that may be conflicting. I would start with the |
I am not quite sure how should I remove dims here though. |
Try sex = pm.Normal("sex", mu=0, sigma=5.31)
age = pm.Normal("age", mu=0, sigma=2.5)
coefs = pt.stack([sex, age]) and sex = pt.atleast_1d(pm.Normal("sex", mu=0, sigma=5.31))
age = pt.atleast_1d(pm.Normal("age", mu=0, sigma=2.5))
coefs = pt.concatenate([sex, age]) |
(Potentially related / helpful) I was getting an identical More info here: https://discourse.pymc.io/t/dataset-size-dependent-eoferror/11977 |
@tarmojuristo is this still a problem? |
This is still happening on M1 chipsets for me. As far as I can tell it's an out-of-memory issue when using multiprocessing to sample from multiple chains. Sampling using only one core doesn't trigger the issue (because no multiprocessing) and reducing the size of the dataset often solves it (because there is sufficient memory then). It's weird though, these datasets are not large enough to expect memory limitations (sampling these same models on an Intel chipset with four cores and less memory works just fine). |
@jvparidon does it happen to you in the following cases too?
if __name__ == "__main__":
with model:
idata = pm.sample()
|
Neither of those fixes the problem. What does work is passing |
I'm sorry but I'm not familiar with that either. Do you think we could write that model in PyMC and, if the issue persists, we open an issue in the PyMC repository? |
I'll try to create a minimal working example in PyMC this weekend and do a little poking around to see if switching to |
@jvparidon if you have the bambi model and some data that can be used, I could do it for you :) |
Unfortunately I can't share either of those, but I'm sure I can create a synthetic dataset and model when I have a minute. |
Just realize we're in the PyMC repo 😅 |
Yeah, as it turns out it is a PyMC/multiprocessing issue rather than a Bambi issue so that's convenient. EDIT: After some testing, the above seems to be incorrect. It is an issue with the default multiprocessing context in |
Describe the issue:
Running a logistic regression in Bambi with default settings fails. However, if
chains=1
is set inmodel.fit()
then everything works fine.Reproduceable code example:
Error message:
PyMC version information:
PyMC 5.0.1
PyTensor 2.8.11
Python 3.10
MacOS Ventura 13.0.1 on M1
mambaforge
Context for the issue:
No response
The text was updated successfully, but these errors were encountered: