Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Bambi logistic regression fails on M1 silicon #6415

Open
tarmojuristo opened this issue Dec 27, 2022 · 26 comments
Open

BUG: Bambi logistic regression fails on M1 silicon #6415

tarmojuristo opened this issue Dec 27, 2022 · 26 comments
Labels
bug macOS macOS related

Comments

@tarmojuristo
Copy link

Describe the issue:

Running a logistic regression in Bambi with default settings fails. However, if chains=1 is set in model.fit() then everything works fine.

Reproduceable code example:

import bambi as bmb
import numpy as np

data = bmb.load_data("adults")

age_mean = np.mean(data["age"])
age_std = np.std(data["age"])

data["age"] = (data["age"] - age_mean) / age_std

model = bmb.Model("income['>50K'] ~ sex + age", data, family="bernoulli")
fitted = model.fit(draws=1000)

Error message:

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [Intercept, sex, age]
Traceback (most recent call last):000 00:00<? Sampling 4 chains, 0 divergences]
  File "/Users/tarmo/Desktop/SALK/streamlit-model/example.py", line 12, in <module>
    fitted = model.fit(draws=1000)
  File "/Users/tarmo/mambaforge/envs/pymc/lib/python3.10/site-packages/bambi/models.py", line 271, in fit
    return self.backend.run(
  File "/Users/tarmo/mambaforge/envs/pymc/lib/python3.10/site-packages/bambi/backend/pymc.py", line 98, in run
    result = self._run_mcmc(
  File "/Users/tarmo/mambaforge/envs/pymc/lib/python3.10/site-packages/bambi/backend/pymc.py", line 289, in _run_mcmc
    idata = pm.sample(
  File "/Users/tarmo/mambaforge/envs/pymc/lib/python3.10/site-packages/pymc/sampling/mcmc.py", line 529, in sample
    mtrace = _mp_sample(**sample_args, **parallel_args)
  File "/Users/tarmo/mambaforge/envs/pymc/lib/python3.10/site-packages/pymc/sampling/mcmc.py", line 1019, in _mp_sample
    for draw in sampler:
  File "/Users/tarmo/mambaforge/envs/pymc/lib/python3.10/site-packages/pymc/sampling/parallel.py", line 449, in __iter__
    draw = ProcessAdapter.recv_draw(self._active)
  File "/Users/tarmo/mambaforge/envs/pymc/lib/python3.10/site-packages/pymc/sampling/parallel.py", line 320, in recv_draw
    msg = ready[0].recv()
  File "/Users/tarmo/mambaforge/envs/pymc/lib/python3.10/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/Users/tarmo/mambaforge/envs/pymc/lib/python3.10/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
  File "/Users/tarmo/mambaforge/envs/pymc/lib/python3.10/multiprocessing/connection.py", line 388, in _recv
    raise EOFError
EOFError

PyMC version information:

PyMC 5.0.1
PyTensor 2.8.11
Python 3.10
MacOS Ventura 13.0.1 on M1
mambaforge

Context for the issue:

No response

@twiecki
Copy link
Member

twiecki commented Dec 27, 2022

Does a super simple model

with pm.Model():
    x = pm.Normal("x")
    pm.sample()

also produce that error?

@tarmojuristo
Copy link
Author

no, this works fine. also the bambi t-test example samples without problems.

@tomicapretto
Copy link
Contributor

@tarmojuristo could you try with this example?

import numpy as np
import pymc as pm

size = 100
rng = np.random.default_rng(1234)
x = rng.normal(size=size)
g = rng.choice(list("ABC"), size=size)
y = rng.integers(0, 1, size=size, endpoint=True)

g_levels, g_idxs = np.unique(g, return_inverse=True)
coords = {"group": g_levels}

with pm.Model(coords=coords) as model:
    coef_x = pm.Normal("x")
    coef_g = pm.Normal("g", dims="group")
    p = pm.math.softmax(coef_x + coef_g[g_idxs])
    pm.Bernoulli("y", p=p, observed=y)
    idata = pm.sample()

@tarmojuristo
Copy link
Author

all good: Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 1 seconds.

@tomicapretto
Copy link
Contributor

If you agree, I can give you a couple of examples you can try until we find the broken part.

What if you do

import bambi as bmb
import numpy as np
import pymc as pm

data = bmb.load_data("adults")

age_mean = np.mean(data["age"])
age_std = np.std(data["age"])

data["age"] = (data["age"] - age_mean) / age_std

model = bmb.Model("income['>50K'] ~ sex + age", data, family="bernoulli")
model.build()
with model.backend.model:
    idata = pm.sample()

@tarmojuristo
Copy link
Author

this one fails with the same error

@tomicapretto
Copy link
Contributor

Ok! Later today I'll give another chunk of PyMC code to test. I'll try to reproduce the PyMC model created by Bambi more closely

@michaelosthege michaelosthege added the macOS macOS related label Jan 9, 2023
@michaelosthege
Copy link
Member

Make sure to avoid model code on the 0th indent level. This can lead to recursion problems with multiprocessing depending on the OS-dependent fork/spawn setting.

Just wrap everything in a simple run() function:

def run():
    # do stuff


if __name__ == "__main__":
    run()

@tarmojuristo
Copy link
Author

Make sure to avoid model code on the 0th indent level. This can lead to recursion problems with multiprocessing depending on the OS-dependent fork/spawn setting.

Just wrap everything in a simple run() function:

def run():
    # do stuff


if __name__ == "__main__":
    run()

Thanks, this is a useful tip - definitely avoids some trouble on my intel-based macbook - but on M1 the above example still fails, even when wrapped into a function.

@ivanmkc
Copy link

ivanmkc commented Jan 31, 2023

+1, I get this error as well when using a Jupyter notebook.

@tomicapretto
Copy link
Contributor

import bambi as bmb
import pandas as pd
import numpy as np
import pymc as pm
import pytensor.tensor as pt

from formulae import design_matrices

data = bmb.load_data("adults")

age_mean = np.mean(data["age"])
age_std = np.std(data["age"])

data["age"] = (data["age"] - age_mean) / age_std

def logit(x):
    """Logit function that ensures result is in (0, 1)"""
    eps = np.finfo(float).eps
    result = pt.sigmoid(x)
    result = pt.switch(pt.eq(result, 0), eps, result)
    result = pt.switch(pt.eq(result, 1), 1 - eps, result)
    return result

dm = design_matrices("income['>50K'] ~ sex + age", data)
y = np.array(dm.response)
X = np.array(dm.common)

coords = {"sex_dim": ["Male"]}
with pm.Model(coords=coords) as pm_model:
    eta = 0
    intercept = pm.Normal("intercept", mu=0, sigma=4.34)
    sex = pt.atleast_1d(pm.Normal("sex", mu=0, sigma=5.31, dims="sex_dim"))
    age = pt.atleast_1d(pm.Normal("age", mu=0, sigma=2.5, shape=None))
    
    # Concatenate parameters
    coefs = pt.concatenate([sex, age])
    
    # Create linear predictor
    eta += intercept 
    eta += pt.dot(X[:, 1:], coefs) # drop first column because that's for the intercept

    # Use inverse of link function
    p = logit(eta)

    pm.Bernoulli("income", p=p, observed=y)

This resembles much more closely what's happening under the hood. Could you run it and let me know if it works or fails?

pm.model_to_graphviz(pm_model)

image

@tarmojuristo
Copy link
Author

tarmojuristo commented Feb 1, 2023

When I sample this one, I once again get the EOFError.

@tomicapretto
Copy link
Contributor

Now I think we can try "removing" parts that may be conflicting. I would start with the dims for sex or the replacing the custom logit function with pt.sigmoid

@tarmojuristo
Copy link
Author

pt.sigmoid(eta) gives the same EOF error.

I am not quite sure how should I remove dims here though.

@tomicapretto
Copy link
Contributor

Try

    sex = pm.Normal("sex", mu=0, sigma=5.31)
    age = pm.Normal("age", mu=0, sigma=2.5)
    coefs = pt.stack([sex, age])

and

    sex = pt.atleast_1d(pm.Normal("sex", mu=0, sigma=5.31))
    age = pt.atleast_1d(pm.Normal("age", mu=0, sigma=2.5))
    coefs = pt.concatenate([sex, age])

@davipatti
Copy link

davipatti commented Apr 25, 2023

(Potentially related / helpful)

I was getting an identical EOFError whilst using pymc 5.1.1 directly to implement a survival model. For testing I had generated a dataset with 10,000 observations. Reducing the number of observations to 1,000 resolved the EOFError and allowed me to sample from the model.

More info here: https://discourse.pymc.io/t/dataset-size-dependent-eoferror/11977

@tomicapretto
Copy link
Contributor

@tarmojuristo is this still a problem?

@jvparidon
Copy link

This is still happening on M1 chipsets for me. As far as I can tell it's an out-of-memory issue when using multiprocessing to sample from multiple chains. Sampling using only one core doesn't trigger the issue (because no multiprocessing) and reducing the size of the dataset often solves it (because there is sufficient memory then). It's weird though, these datasets are not large enough to expect memory limitations (sampling these same models on an Intel chipset with four cores and less memory works just fine).

@tomicapretto
Copy link
Contributor

@jvparidon does it happen to you in the following cases too?

  1. Using a Python script, put the sampling statement inside the following condition
if __name__ == "__main__":
    with model:
        idata = pm.sample()
  1. Using a Jupyter Notebook?

@jvparidon
Copy link

Neither of those fixes the problem. What does work is passing mp_ctx='forkserver' as an argument to the .fit(). That changes the multiprocessing context for pm.sample() from fork to forkserver. Unfortunately I'm not familiar enough with the multiprocessing internals to know why that fixes things or if there are any risks to changing the multiprocessing context.

@tomicapretto
Copy link
Contributor

I'm sorry but I'm not familiar with that either. Do you think we could write that model in PyMC and, if the issue persists, we open an issue in the PyMC repository?

@jvparidon
Copy link

I'll try to create a minimal working example in PyMC this weekend and do a little poking around to see if switching to forkserver for the multiprocessing context breaks anything obvious. I'll make sure to loop you in if I open an issue or PR.

@tomicapretto
Copy link
Contributor

@jvparidon if you have the bambi model and some data that can be used, I could do it for you :)

@jvparidon
Copy link

Unfortunately I can't share either of those, but I'm sure I can create a synthetic dataset and model when I have a minute.

@tomicapretto
Copy link
Contributor

I'm sorry but I'm not familiar with that either. Do you think we could write that model in PyMC and, if the issue persists, we open an issue in the PyMC repository?

Just realize we're in the PyMC repo 😅

@jvparidon
Copy link

jvparidon commented Sep 22, 2023

Yeah, as it turns out it is a PyMC/multiprocessing issue rather than a Bambi issue so that's convenient.

EDIT: After some testing, the above seems to be incorrect. It is an issue with the default multiprocessing context in pm.sample but as far as I can tell it only crashes Bambi models, not PyMC models. See bambinos/bambi#700 for a reprex.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug macOS macOS related
Projects
None yet
Development

No branches or pull requests

7 participants