Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: pytensor.shared does not respect masked missing values #258

Closed
kamicollo opened this issue Mar 29, 2023 · 2 comments · Fixed by #260
Closed

BUG: pytensor.shared does not respect masked missing values #258

kamicollo opened this issue Mar 29, 2023 · 2 comments · Fixed by #260
Labels

Comments

@kamicollo
Copy link

kamicollo commented Mar 29, 2023

Describe the issue:

This is related to pymc-devs/pymc#6626 - it seems that pytensor.shared() (and thus pm.MutableData() does not respect masked missing values. They get unmasked in the process, which is especially problematic if the missing data was encoded as an actual number.

Reproducable code example:

import pymc as pm
import pytensor as pt
import arviz as az

#basic example:
X = np.ma.masked_greater(np.array([1, 2, 3, 4]), 3)
print(pt.shared(X).container.value)  # (returns [1,2,3,4])

#example where inference is wrong in pyMC as a result:

real_X = np.random.default_rng().normal(size=1000)
Y = np.random.default_rng().normal(loc=3 * real_X, scale=0.1)
X = real_X.copy()
X[0:10] = 999
masked_X = np.ma.masked_where(X == 999, X)


with pm.Model() as m:
    β = pm.Normal("β", 0, 1)
    σ = pm.Exponential("σ", 1)
    X = pm.Normal("X", 0, 1, observed = pm.MutableData("masked_X", masked_X))    
    pm.Normal("Y", pm.math.dot(X, β), σ, observed=Y) 
    trace = pm.sample()


az.summary(trace)
# yields β == 0 which is incorrect

Error message:

No response

PyTensor version information:

2.10.1

Context for the issue:

This issue fails silently and can lead to incorrect inference results by pyMC users.

@kamicollo kamicollo added the bug Something isn't working label Mar 29, 2023
@kamicollo
Copy link
Author

I also found that this only happens when instantiating the shared variable. Setting data afterwards works without issues, e.g.:

X = np.ma.masked_greater(np.array([1, 2, 3, 4]), 3)
sh_val = pt.shared(X)
print(sh_val.get_value()) #prints [1,2,3,4]
sh_val.set_value(X)
print(sh_val.get_value()) #prints [1,2,3,--]

@ricardoV94
Copy link
Member

ricardoV94 commented Mar 29, 2023

PyTensor doesn't have a type that behaves like masked arrays. It's not just a problem of wrapping a masked array.

In your example, as soon as you try to do some operation, you will get incorrect values:

import pytensor
import numpy as np

X = np.ma.masked_greater(np.array([1, 2, 3, 4]), 3)
sh_val = pytensor.shared(X)
print(sh_val.eval()) # [1,2,3,4]
sh_val.set_value(X)
print(sh_val.eval()) # [1,2,3,--]
print((sh_val + 1).eval())  # [2 3 4 5]

We can raise explicitly when a user tries to pass a masked array. To actually support numpy-like behavior, we would need to implement something like MaskedTensorVariables and write all the operations to support those types. Similar to how we handle SparseTensorVariables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants