BUG: pytensor.shared does not respect masked missing values #258

kamicollo · 2023-03-29T03:19:02Z

Describe the issue:

This is related to pymc-devs/pymc#6626 - it seems that pytensor.shared() (and thus pm.MutableData() does not respect masked missing values. They get unmasked in the process, which is especially problematic if the missing data was encoded as an actual number.

Reproducable code example:

import pymc as pm
import pytensor as pt
import arviz as az

#basic example:
X = np.ma.masked_greater(np.array([1, 2, 3, 4]), 3)
print(pt.shared(X).container.value)  # (returns [1,2,3,4])

#example where inference is wrong in pyMC as a result:

real_X = np.random.default_rng().normal(size=1000)
Y = np.random.default_rng().normal(loc=3 * real_X, scale=0.1)
X = real_X.copy()
X[0:10] = 999
masked_X = np.ma.masked_where(X == 999, X)


with pm.Model() as m:
    β = pm.Normal("β", 0, 1)
    σ = pm.Exponential("σ", 1)
    X = pm.Normal("X", 0, 1, observed = pm.MutableData("masked_X", masked_X))    
    pm.Normal("Y", pm.math.dot(X, β), σ, observed=Y) 
    trace = pm.sample()


az.summary(trace)
# yields β == 0 which is incorrect

Error message:

No response

PyTensor version information:

2.10.1

Context for the issue:

This issue fails silently and can lead to incorrect inference results by pyMC users.

The text was updated successfully, but these errors were encountered:

kamicollo · 2023-03-29T03:34:41Z

I also found that this only happens when instantiating the shared variable. Setting data afterwards works without issues, e.g.:

X = np.ma.masked_greater(np.array([1, 2, 3, 4]), 3)
sh_val = pt.shared(X)
print(sh_val.get_value()) #prints [1,2,3,4]
sh_val.set_value(X)
print(sh_val.get_value()) #prints [1,2,3,--]

ricardoV94 · 2023-03-29T06:56:51Z

PyTensor doesn't have a type that behaves like masked arrays. It's not just a problem of wrapping a masked array.

In your example, as soon as you try to do some operation, you will get incorrect values:

import pytensor
import numpy as np

X = np.ma.masked_greater(np.array([1, 2, 3, 4]), 3)
sh_val = pytensor.shared(X)
print(sh_val.eval()) # [1,2,3,4]
sh_val.set_value(X)
print(sh_val.eval()) # [1,2,3,--]
print((sh_val + 1).eval())  # [2 3 4 5]

We can raise explicitly when a user tries to pass a masked array. To actually support numpy-like behavior, we would need to implement something like MaskedTensorVariables and write all the operations to support those types. Similar to how we handle SparseTensorVariables.

kamicollo added the bug Something isn't working label Mar 29, 2023

ricardoV94 added NumPy compatibility feature request labels Mar 29, 2023

This was referenced Mar 29, 2023

BUG: automatic data imputation does not work when observed=pm.Data() tensors pymc-devs/pymc#6626

Open

Implement MaskedTensorVariables and operations #259

Open

Raise NotImplementedError when trying to convert MaskedArrays #260

Merged

michaelosthege closed this as completed in #260 Mar 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: pytensor.shared does not respect masked missing values #258

BUG: pytensor.shared does not respect masked missing values #258

kamicollo commented Mar 29, 2023 •

edited

Loading

kamicollo commented Mar 29, 2023

ricardoV94 commented Mar 29, 2023 •

edited

Loading

BUG: pytensor.shared does not respect masked missing values #258

BUG: pytensor.shared does not respect masked missing values #258

Comments

kamicollo commented Mar 29, 2023 • edited Loading

Describe the issue:

Reproducable code example:

Error message:

PyTensor version information:

Context for the issue:

kamicollo commented Mar 29, 2023

ricardoV94 commented Mar 29, 2023 • edited Loading

kamicollo commented Mar 29, 2023 •

edited

Loading

ricardoV94 commented Mar 29, 2023 •

edited

Loading