# Using random variables as `observed` #2226

Closed
opened this issue May 26, 2017 · 11 comments

Projects
None yet
Contributor

### aplavin commented May 26, 2017

 I'm trying to build a regression model, which takes into account errors both in x and y measurements. There is a solution here: ``````with pm.Model() as model: intercept = pm.Normal('intercept', 0, sd=20) gradient = pm.Normal('gradient', 0, sd=20) epsilon = pm.HalfCauchy('epsilon', 5) obs_x = pm.Normal('obs_x', mu=x, sd=x_err, shape=len(x)) obs_y = pm.Normal('obs_y', mu=y, sd=y_err, shape=len(y)) likelihood = pm.Normal('y', mu=intercept + gradient * obs_x, sd=epsilon, observed=obs_y) `````` But as I understand, using `obs_y` as `observed` is not supported anymore and gives corresponding error. While it probably makes sense, I can't get around how to rewrite this model in another way? Is something like this correct? ``````with pm.Model() as model: intercept = pm.Normal('intercept', 0, sd=20) gradient = pm.Normal('gradient', 0, sd=20) epsilon = pm.HalfCauchy('epsilon', 5) obs_x = pm.Normal('obs_x', mu=x, sd=x_err, shape=len(x)) obs_y = pm.Normal('obs_y', mu=intercept + gradient * obs_x, sd=epsilon) likelihood = pm.Normal('y', mu=obs_y, sd=y_err, observed=y) ``````
Member

### fonnesbeck commented May 27, 2017

 You should have another likelihood for x, where x is passed as observed. The mu from that likelihood should then be used in the linear model. You want to be using the true underlying x as the covariate, and not the observed (noisy) variable.
Member

### fonnesbeck commented May 27, 2017

 Going to close this because it is not an issue but feel free to continue the conversation.

### fonnesbeck closed this May 27, 2017

Contributor Author

### aplavin commented May 27, 2017 • edited

 @fonnesbeck, so it should look like this? ```with pm.Model() as model: intercept = pm.Normal('intercept', 0, sd=20) gradient = pm.Normal('gradient', 0, sd=20) true_x = pm.Normal('true_x', mu=x, sd=x_err, shape=len(x)) likelihood_x = pm.Normal('x', mu=true_x, sd=x_err, observed=x) true_y = pm.Normal('true_y', mu=intercept + gradient * true_x, sd=y_err) likelihood_y = pm.Normal('y', mu=true_y, sd=y_err, observed=y)``` And where do I put estimated error (like `epsilon` in first post), if anywhere? `x_err` and `y_err` here are known for individual measurements.
Member

### fonnesbeck commented May 27, 2017 • edited

 You know the measurement error? Then you can put them in as the sd for the respective likelihoods as you have done, though it looks like you have used the twice. It looks to me like your `true_x` is redundant -- I don't see what it's for I light if `likelihood_x` and `true_y` should not be stochastic. Should just be: ``````true_y = intercept + gradient * true_x `````` If your error is known, you don't need to estimate it (epsilon) though if there is still uncertainty you could use the known parameterize a prior for it.
Contributor Author

### aplavin commented May 27, 2017 • edited

 @fonnesbeck yes, totally agree about `true_y`. But still I don't understand how can I write this model using `x_err` only once, it seems like this error should be both in `true_x` definition (as observed `x` equals `true_x` plus noise) and in likelihood for `x` (otherwise where to put `observed=x`?). Currently with your remark on `true_y` the model looks like this: ```with pm.Model() as model: intercept = pm.Normal('intercept', 0, sd=20) gradient = pm.Normal('gradient', 0, sd=20) true_x = pm.Normal('true_x', mu=x, sd=x_err, shape=len(x)) likelihood_x = pm.Normal('x', mu=true_x, sd=x_err, observed=x) true_y = pm.Deterministic('true_y', intercept + gradient * true_x) likelihood_y = pm.Normal('y', mu=true_y, sd=y_err, observed=y)```
Member

### junpenglao commented May 27, 2017 • edited

 The `true_x` should be noiseless as it is latent. Observed that if you used `x_err` twice basically the error propagate twice to `x_observed`, and by the Gaussian property as defined here it becomes a Gaussian (a convolution of two Gaussian). What you should do should be something like this: ```with pm.Model() as model: intercept = pm.Normal('intercept', 0, sd=20) gradient = pm.Normal('gradient', 0, sd=20) # depends on your prior knowledge of true_x true_x = pm.Normal('true_x', mu=0, sd=20, shape=len(x)) likelihood_x = pm.Normal('x', mu=true_x, sd=x_err, observed=x) true_y = pm.Deterministic('true_y', intercept + gradient * true_x) likelihood_y = pm.Normal('y', mu=true_y, sd=y_err, observed=y)```
Member

### fonnesbeck commented May 27, 2017

 You only want to account for the observation error once (per source of error).
Contributor Author

### aplavin commented May 27, 2017

 Oh, thanks, now I see!

### payeldas commented Dec 14, 2017

 Thanks for this really useful contribution. I have a few related questions: How would you deal with multiple x variables? If I don't know much about true_x, what is the best prior to use? If I would like to calculate the posterior predictive distribution given new observed x, there is now no new observed y, does this mean I only update likelihood_x but not likelihood_y? Is this a problem if new observed x has a different sample size? Thanks!
Member

### junpenglao commented Dec 14, 2017

 Hi @payeldas, I think these followup question would be better to post it on our discourse https://discourse.pymc.io

### guro86 commented Mar 20, 2019 • edited

 Hello @junpenglao, with the example posted above, how would the joint posterior probability of the intercept, gradient, true_x, true_y given the observed_x and observed_y be formulated mathematically? ``````with pm.Model() as model: intercept = pm.Normal('intercept', 0, sd=20) gradient = pm.Normal('gradient', 0, sd=20) # depends on your prior knowledge of true_x true_x = pm.Normal('true_x', mu=0, sd=20, shape=len(x)) likelihood_x = pm.Normal('x', mu=true_x, sd=x_err, observed=x) true_y = pm.Deterministic('true_y', intercept + gradient * true_x) likelihood_y = pm.Normal('y', mu=true_y, sd=y_err, observed=y) ``````