Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Passing tuples as coordinate values gives ValueError after sampling #5043

Closed
michaelosthege opened this issue Oct 5, 2021 · 1 comment
Closed
Labels

Comments

@michaelosthege
Copy link
Member

Description of your problem

This minimum example uses a tuple for coordinate values. Nothing fancy, right?

with pymc3.Model(coords={
    "city": ("Bonn", "Berlin")
}):
    pymc3.Normal("x", dims="city")
    pymc3.sample(return_inferencedata=True)

Kaboom 🧨 after sampling.

ValueError: dimensions ('city',) must have the same length as the number of data dimensions, ndim=0
Traceback
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-132-c3a70b936a71> in <module>
      3 }):
      4     pymc3.Normal("x", dims="city")
----> 5     pymc3.sample(return_inferencedata=True)

~\AppData\Local\Continuum\miniconda3\envs\CARenv\lib\site-packages\pymc3\sampling.py in sample(draws, step, init, n_init, start, trace, chain_idx, chains, cores, tune, progressbar, model, random_seed, discard_tuned_samples, compute_convergence_checks, callback, jitter_max_retries, return_inferencedata, idata_kwargs, mp_ctx, pickle_backend, **kwargs)
    652         if idata_kwargs:
    653             ikwargs.update(idata_kwargs)
--> 654         idata = pm.to_inference_data(trace, **ikwargs)
    655 
    656     if compute_convergence_checks:

~\AppData\Local\Continuum\miniconda3\envs\CARenv\lib\site-packages\pymc3\backends\arviz.py in to_inference_data(trace, prior, posterior_predictive, log_likelihood, coords, dims, model, save_warmup, density_dist_obs)
    636         return trace
    637 
--> 638     return InferenceDataConverter(
    639         trace=trace,
    640         prior=prior,

~\AppData\Local\Continuum\miniconda3\envs\CARenv\lib\site-packages\pymc3\backends\arviz.py in to_inference_data(self)
    567         """
    568         id_dict = {
--> 569             "posterior": self.posterior_to_xarray(),
    570             "sample_stats": self.sample_stats_to_xarray(),
    571             "log_likelihood": self.log_likelihood_to_xarray(),

~\AppData\Local\Continuum\miniconda3\envs\CARenv\lib\site-packages\arviz\data\base.py in wrapped(cls, *args, **kwargs)
     44                 if all([getattr(cls, prop_i) is None for prop_i in prop]):
     45                     return None
---> 46             return func(cls, *args, **kwargs)
     47 
     48         return wrapped

~\AppData\Local\Continuum\miniconda3\envs\CARenv\lib\site-packages\pymc3\backends\arviz.py in posterior_to_xarray(self)
    337                 )
    338         return (
--> 339             dict_to_dataset(
    340                 data,
    341                 library=pymc3,

~\AppData\Local\Continuum\miniconda3\envs\CARenv\lib\site-packages\pymc3\backends\arviz.py in dict_to_dataset(data, library, coords, dims, attrs, default_dims, skip_event_dims, index_origin)
    119     """
    120     if default_dims is None:
--> 121         return _dict_to_dataset(
    122             data,
    123             attrs=attrs,

~\AppData\Local\Continuum\miniconda3\envs\CARenv\lib\site-packages\arviz\data\base.py in dict_to_dataset(data, attrs, library, coords, dims, skip_event_dims)
    238     data_vars = {}
    239     for key, values in data.items():
--> 240         data_vars[key] = numpy_to_data_array(
    241             values, var_name=key, coords=coords, dims=dims.get(key), skip_event_dims=skip_event_dims
    242         )

~\AppData\Local\Continuum\miniconda3\envs\CARenv\lib\site-packages\arviz\data\base.py in numpy_to_data_array(ary, var_name, coords, dims, skip_event_dims)
    196 
    197     # filter coords based on the dims
--> 198     coords = {key: xr.IndexVariable((key,), data=coords[key]) for key in dims}
    199     return xr.DataArray(ary, coords=coords, dims=dims)
    200 

~\AppData\Local\Continuum\miniconda3\envs\CARenv\lib\site-packages\arviz\data\base.py in <dictcomp>(.0)
    196 
    197     # filter coords based on the dims
--> 198     coords = {key: xr.IndexVariable((key,), data=coords[key]) for key in dims}
    199     return xr.DataArray(ary, coords=coords, dims=dims)
    200 

~\AppData\Local\Continuum\miniconda3\envs\CARenv\lib\site-packages\xarray\core\variable.py in __init__(self, dims, data, attrs, encoding, fastpath)
   2532 
   2533     def __init__(self, dims, data, attrs=None, encoding=None, fastpath=False):
-> 2534         super().__init__(dims, data, attrs, encoding, fastpath)
   2535         if self.ndim != 1:
   2536             raise ValueError(f"{type(self).__name__} objects must be 1-dimensional")

~\AppData\Local\Continuum\miniconda3\envs\CARenv\lib\site-packages\xarray\core\variable.py in __init__(self, dims, data, attrs, encoding, fastpath)
    313         """
    314         self._data = as_compatible_data(data, fastpath=fastpath)
--> 315         self._dims = self._parse_dimensions(dims)
    316         self._attrs = None
    317         self._encoding = None

~\AppData\Local\Continuum\miniconda3\envs\CARenv\lib\site-packages\xarray\core\variable.py in _parse_dimensions(self, dims)
    572         dims = tuple(dims)
    573         if len(dims) != self.ndim:
--> 574             raise ValueError(
    575                 f"dimensions {dims} must have the same length as the "
    576                 f"number of data dimensions, ndim={self.ndim}"

ValueError: dimensions ('city',) must have the same length as the number of data dimensions, ndim=0

Versions and main components

  • PyMC3 Version: main (not the latest though, but AFAIK there was no work done on the converter recently)
@OriolAbril
Copy link
Member

I think I got the "culprit". coordinate values are taken as is by pymc and arviz and eventually passed as data argument to https://xarray.pydata.org/en/stable/generated/xarray.IndexVariable.html which does not seem to support tuples.

We can fix that in https://github.com/arviz-devs/arviz/blob/main/arviz/data/base.py#L249 by using np.asarray(coords[key]) for example but I'm not sure to which extent this is a design choice on xarray's part or an undesired side effect. I don't think I'll make a PR for the fix until after 2-3 weeks so feel free to take over.

I also realized that the workaround that needs to be removed (see #5042) is probably wrong and untested, so the quicker we get rid of it the better.

@michaelosthege michaelosthege added the trace-backend Traces and ArviZ stuff label Oct 6, 2021
michaelosthege added a commit to michaelosthege/pymc that referenced this issue Oct 8, 2021
michaelosthege added a commit to michaelosthege/pymc that referenced this issue Oct 8, 2021
michaelosthege added a commit to michaelosthege/pymc that referenced this issue Oct 8, 2021
And always make them numpy arrays for the InferenceData conversion.

Closes pymc-devs#5043
michaelosthege added a commit to michaelosthege/pymc that referenced this issue Oct 9, 2021
michaelosthege added a commit to michaelosthege/pymc that referenced this issue Oct 9, 2021
And always make them numpy arrays for the InferenceData conversion.

Closes pymc-devs#5043
OriolAbril pushed a commit to michaelosthege/pymc that referenced this issue Oct 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants