## PyMC Issue

In [1]:
%pip install -q --upgrade numpy bambi pymc

Note: you may need to restart the kernel to use updated packages.


In [2]:
# ALT. version where the code works
# %pip install --quiet --upgrade numpy==2.3.5 bambi==0.16.0 pymc==5.26.1

In [3]:
# Load Python modules
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

In [4]:
import pymc as pm
import pytensor
import bambi as bmb

print(f"Running Bambi v{bmb.__version__}")
print(f"Running on PyMC v{pm.__version__}")
print(f"Running on PyTensor v{pytensor.__version__}")
print(f"Running NymPy v{np.__version__}")

Running Bambi v0.16.0
Running on PyMC v5.26.1
Running on PyTensor v2.35.1
Running NymPy v2.3.5


In [5]:
import sys
sys.platform, sys.version

('darwin',
 '3.12.4 (v3.12.4:8e8a4baf65, Jun  6 2024, 17:33:18) [Clang 13.0.0 (clang-1300.0.29.30)]')

## Basic check of `axpy` 

In [6]:
import numpy as np
from scipy import linalg

dtype = "float64"  # I assume
axpy = linalg.blas.get_blas_funcs("axpy", dtype=dtype)
axpy(np.ones(5), np.ones(5), -0.09)

array([1., 1., 1., 1., 1.])

In [7]:
# import numpy as np
# from scipy import linalg

# dtype = "float64"  # I assume
# axpy = linalg.blas.get_blas_funcs("axpy", dtype=dtype)
# axpy(np.ones(921), np.ones(3), a=-0.09)

## Example: complete pooling model on Radon dataset

= common linear regression model for all counties

In [8]:
radon = pd.read_csv("https://raw.githubusercontent.com/minireference/noBSstats/refs/heads/main/datasets/radon.csv")
radon.shape

(919, 6)

In [9]:
radon.head()

Unnamed: 0,idnum,state,county,floor,log_radon,log_uranium
0,5081,MN,AITKIN,ground,0.788457,-0.689048
1,5082,MN,AITKIN,basement,0.788457,-0.689048
2,5083,MN,AITKIN,basement,1.064711,-0.689048
3,5084,MN,AITKIN,basement,0.0,-0.689048
4,5085,MN,ANOKA,basement,1.131402,-0.847313


### Bayesian model


We can  pool all the data and estimate one big regression to asses the influence of the floor variable
on radon levels across all counties.

\begin{align*}
    R			&\sim	\calN(M_R, \, \Sigma_R),  	\\
    M_R			&=		B_0 + B_{\!f}\!\cdot\!f,	\\
    \Sigma_R	&\sim	\Tdist^+\!(4, 1),			\\
    B_0			&\sim	\calN(1, 2), 				\\
    B_f			&\sim	\calN(0, 5).
\end{align*}

The variable $f$ corresponds to the column `floor` in the `radon` data frame,
which will be internally coded as binary
with $0$ representing basement,
and $1$ representing ground floor.

By ignoring the county feature, we do not differenciate on counties.

### Bambi model


In [10]:
import bambi as bmb

priors1 = {
    "Intercept": bmb.Prior("Normal", mu=1, sigma=2),
    "floor": bmb.Prior("Normal", mu=0, sigma=5),
    "sigma": bmb.Prior("HalfStudentT", nu=4, sigma=1),
}

mod1 = bmb.Model(formula="log_radon ~ 1 + floor",
                 family="gaussian",
                 link="identity",
                 priors=priors1,
                 data=radon)
mod1

       Formula: log_radon ~ 1 + floor
        Family: gaussian
          Link: mu = identity
  Observations: 919
        Priors: 
    target = mu
        Common-level effects
            Intercept ~ Normal(mu: 1.0, sigma: 2.0)
            floor ~ Normal(mu: 0.0, sigma: 5.0)
        
        Auxiliary parameters
            sigma ~ HalfStudentT(nu: 4.0, sigma: 1.0)

### Model fitting and analysis

In [11]:
idata1 = mod1.fit(random_seed=42, chains=1)

Initializing NUTS using jitter+adapt_diag...
Sequential sampling (1 chains in 1 job)
NUTS: [sigma, Intercept, floor]


Output()

error: (len(y)-offy>(n-1)*abs(incy)) failed for 1st keyword n: daxpy:n=921

## Example 2: no pooling model

= separate intercept for each county 

### Bayesian model

If we treat different counties as independent,
so each one gets an intercept term:

\begin{align*}
    R_j			&\sim	\calN(M_j, \, \Sigma_R),  					\\
    M_j			&=		B_{0j} + B_{\!f}\!\cdot\!f,					\\
    \Sigma_R		&\sim	\Tdist^+\!(4, 1),							\\
    B_{0j}		&\sim	\calN(1, 2),							\\
    B_f			&\sim	\calN(0, 5).
\end{align*}

### Bambi model


In [12]:
priors2 = {
    "county": bmb.Prior("Normal", mu=1, sigma=2),
    "floor": bmb.Prior("Normal", mu=0, sigma=5),
    "sigma": bmb.Prior("HalfStudentT", nu=4, sigma=1),
}

mod2 = bmb.Model("log_radon ~ 0 + county + floor",
                 family="gaussian",
                 link="identity",
                 priors=priors2,
                 data=radon)
mod2

       Formula: log_radon ~ 0 + county + floor
        Family: gaussian
          Link: mu = identity
  Observations: 919
        Priors: 
    target = mu
        Common-level effects
            county ~ Normal(mu: 1.0, sigma: 2.0)
            floor ~ Normal(mu: 0.0, sigma: 5.0)
        
        Auxiliary parameters
            sigma ~ HalfStudentT(nu: 4.0, sigma: 1.0)

In [13]:
mod2.build()

In [14]:
model = mod2.backend
model.components

{'sigma': <bambi.backend.model_components.ConstantComponent at 0x123bf8fe0>,
 'mu': <bambi.backend.model_components.DistributionalComponent at 0x123bf8bc0>}

### Model fitting and analysis

In [None]:
idata2 = mod2.fit(random_seed=42)

Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [sigma, county, floor]


Output()

IN _build_subtreeIN _build_subtreeIN _build_subtree


self.ndim=87self.ndim=87self.ndim=87

IN _build_subtree
IN _single_stepIN _single_stepIN _single_step



self.ndim=87left.q_grad.shape=(87,)left.q_grad.shape=(87,)left.q_grad.shape=(87,)



IN _single_stepin tryin tryin try



left.q_grad.shape=(87,)right.q_grad.shape=(87,)right.q_grad.shape=(87,)right.q_grad.shape=(87,)



in tryin elsein elsein else



right.q_grad.shape=(87,)right.q_grad.shape=(87,)right.q_grad.shape=(87,)right.q_grad.shape=(87,)



in else
IN _build_subtreeIN _build_subtreeIN _build_subtree
right.q_grad.shape=(87,)


self.ndim=87self.ndim=87self.ndim=87
IN _build_subtree

IN _single_step
IN _single_step
IN _single_stepself.ndim=87

left.q_grad.shape=(87,)left.q_grad.shape=(87,)

left.q_grad.shape=(87,)
IN _single_step
in tryin try
in try

left.q_grad.shape=(87,)
right.q_grad.shape=(87,)right.q_grad.shape=(87,)
right.q_grad.shape=(87,)


in tryin elsein elsein else



right.q_grad.shape=(87,)right.q_grad.shape=(8

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)

