Skip to content

Storing and reading inference data with arviz not working #44

@shakasaki

Description

@shakasaki

Hi there, I've been running the mcbackend with clickhouse and noticed that it is only possible to store the inference data with cloudpickle but not directly with arviz (as netcdf, for example). Here is a minimal reproducible example that uses pymc. I load the trace and convert to inference data, but when I try to save directly (either with .to_netcdf() call or with arviz) I get the following error:
ValueError: unsupported dtype for netCDF4 variable: bool`

It is true that one of the variables in the inference xarray is boolean, see info below the code

import clickhouse_driver
import mcbackend
import pymc as pm
import arviz as az
import numpy as np
import cloudpickle

# Initialize random number generator
RANDOM_SEED = 8927
np.random.seed(RANDOM_SEED)
az.style.use("arviz-darkgrid")

# True parameter values
alpha, sigma = 1, 1
beta = [1, 2.5]

# Size of dataset
size = 100

# Predictor variable
X1 = np.random.randn(size)
X2 = np.random.randn(size) * 0.2

# Simulate outcome variable
Y = alpha + beta[0] * X1 + beta[1] * X2 + np.random.randn(size) * sigma

ch_client = clickhouse_driver.Client("localhost")
backend = mcbackend.ClickHouseBackend(ch_client)

with pm.Model():
    alpha = pm.Normal("alpha", mu=0, sigma=10)
    beta = pm.Normal("beta", mu=0, sigma=10, shape=2)
    sigma = pm.HalfNormal("sigma", sigma=1)
    mu = alpha + beta[0] * X1 + beta[1] * X2
    Y_obs = pm.Normal("Y_obs", mu=mu, sigma=sigma, observed=Y)
    trace = mcbackend.pymc.TraceBackend(backend)
    pm.sample(trace=trace)

ch_client = clickhouse_driver.Client("localhost")
backend = mcbackend.ClickHouseBackend(ch_client)
run = backend.get_run(trace.run_id)
idata = run.to_inferencedata()

# save with cloudpickle
with open('clickhouse_backend_idata_as_pkl.pkl', mode='wb') as file:
    cloudpickle.dump(idata, file)

with open('clickhouse_backend_idata_as_pkl.pkl', mode='rb') as file:
    instance = cloudpickle.load(file)

print(instance)

# test saving directly
idata.to_netcdf('clickhouse_backend_idata_as_netcdf')
# test saving with arviz
az.to_netcdf(idata, 'clickhouse_backend_idata_as_netcdf_w_az')

# Last two approaches give: ValueError: unsupported dtype for netCDF4 variable: bool`

Output of idata.sample_stats

<xarray.Dataset>
Dimensions:                         (chain: 4, draw: 1000)
Coordinates:
  * chain                           (chain) int64 0 1 2 3
  * draw                            (draw) int64 0 1 2 3 4 ... 996 997 998 999
Data variables: (12/18)
    tune                            (chain, draw) bool False False ... False
    sampler_0__depth                (chain, draw) object 2 2 2 2 2 ... 2 2 2 2 2
    sampler_0__step_size            (chain, draw) object 0.9996654167024928 ....
    sampler_0__tune                 (chain, draw) object False False ... False
    sampler_0__mean_tree_accept     (chain, draw) object 0.8445087419794938 ....
    sampler_0__step_size_bar        (chain, draw) object 0.9975006666986591 ....
    ...                              ...
    sampler_0__process_time_diff    (chain, draw) object 0.000682679999999935...
    sampler_0__perf_counter_diff    (chain, draw) object 0.000682451999978184...
    sampler_0__perf_counter_start   (chain, draw) object 888.214387209 ... 88...
    sampler_0__largest_eigval       (chain, draw) object nan nan nan ... nan nan
    sampler_0__smallest_eigval      (chain, draw) object nan nan nan ... nan nan
    sampler_0__index_in_trajectory  (chain, draw) object 2 -1 2 3 ... -1 1 -2 2
Attributes:
    created_at:     2022-08-09T16:06:07.465421
    arviz_version:  0.12.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions