Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: fix bound evaluation in dist_math #1591

Closed
wants to merge 13 commits into from
Closed

WIP: fix bound evaluation in dist_math #1591

wants to merge 13 commits into from

Conversation

fonnesbeck
Copy link
Member

This is a first pass at fixing the bug in bound that manifests itself in #1579. I've used stack to get around the heterogeneous arguments, but this results in dimension-matching problems for some models that I don't yet understand.

Also added the tanks example from #1579 to test_examples.

Closes #1579

@twiecki
Copy link
Member

twiecki commented Dec 10, 2016

does the original code not work?

@fonnesbeck
Copy link
Member Author

Original code?

@twiecki
Copy link
Member

twiecki commented Dec 12, 2016

     ret = 1
     for c in vals:
         ret = ret * (1 * c)
     return ret

@twiecki twiecki added the WIP label Dec 12, 2016
@twiecki
Copy link
Member

twiecki commented Dec 12, 2016

1b7b4bb

@fonnesbeck
Copy link
Member Author

I assumed it would reintroduce #1449

@twiecki
Copy link
Member

twiecki commented Dec 12, 2016

Oh that was the original issue.

I think that bears taking another look. Either we have broadcasting in bound which can lead to sometimes odd behavior as in #1449 or we don't, but I don't think there's a middle ground.

@twiecki
Copy link
Member

twiecki commented Dec 12, 2016

Perhaps we should have a kwarg broadcast=False that can turn broadcasting on or off. Both use-cases (DiscreteUniform and Multinomial) are valid I think.

@twiecki
Copy link
Member

twiecki commented Dec 12, 2016

How does this handle the #1449 case?

@fonnesbeck
Copy link
Member Author

It does fix it, at least based on the example @AustinRochford used in the issue. Perhaps I will include that as a test, incase of regression.

@fonnesbeck
Copy link
Member Author

Please have a peek at this @ferrine and/or @AustinRochford

@ferrine
Copy link
Member

ferrine commented Dec 12, 2016

I'm a bit sick and have a deadline for tomorrow:( I'll review the PR in some days

@fonnesbeck
Copy link
Member Author

@ferrine no big deal, thanks. Get well.

@AustinRochford
Copy link
Member

This looks like a reasonable compromise to me. 👍

try:
return tt.all(tt.stack([1*val for val in vals]), axis=0)
except (TypeError, IndexError):
return tt.all([tt.all(1 * val) for val in vals])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is it right to do this second version?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes tt.stack will fail due to different lengths of the elements of vals (the TypeError; happens with the Poisson mixture) and when there is no axis to iterate over (resulting in the IndexError).

Copy link
Member

@ferrine ferrine Dec 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can try to ravel all variables after safe converting them in theano, then concatenate and finally use tt.all

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought stack did most of that (including the conversion) for me automatically?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You run into situations where vals is not only of different types, but also of different lengths, so we need to deal with those potential combinations. tt.stack doesn't like heterogeneous dimensions. Haven't seen a case that tricks both yet, however.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different dims can be excluded by ravel

@jsalvatier
Copy link
Member

jsalvatier commented Dec 12, 2016 via email

@fonnesbeck
Copy link
Member Author

The original problem is #1449, which had some unexpected broadcasting behavior.

@jsalvatier
Copy link
Member

Right, okay. Then suggest we make bound_elemwise and bound and alltrue_elemwise and alltrue or do something similar.

@fonnesbeck
Copy link
Member Author

fonnesbeck commented Dec 12, 2016

Perhaps its just simpler (or at least clearer, if not simpler) to call all explicitly in each distribution as needed, and pass that to bound. e.g. tt.all(value >= 0), tt.all(0 <= p), etc. That way, each call to the logp would always return a 1D array of booleans (one for each condition).

@jsalvatier
Copy link
Member

For multivariate distributions? That would be okay. Though I think we should still rename bound to elemwise_bound or similar. Just to be less confusing.

We should keep elemwise bound to be completely elemwise so that the logps are elemwise (and individual elements can be nans). Its useful for debugging.

@fonnesbeck
Copy link
Member Author

No, I mean for all the distributions (multivariates already do this). Unless I am missing something, if any condition for any element of a call to logp fails, the logp should return -inf. You can still debug inside of the logp if you need to.

So, if you have a vector-valued Gamma distribution, for example, you would return -inf on the call to logp on any violation of the bound by any element, be it a bad parameter element or a bad value element.

@jsalvatier
Copy link
Member

jsalvatier commented Dec 13, 2016

Gotcha, I think that seems bad to me. Elementwise distributions should be as elementwise as possible. This change would make them partially a single distribution. Its more conceptually coherent.

What do you mean you can still debug inside the logp?

The problems I'm thinking of are things like: if you pass bad initial values to an elemwise dist, its better if you can look at the logp to tell which ones are bad. Or if find_MAP messes up. Or if you write a bad sampler or distribution.

You can also imagine custom samplers using the -inf information to adjust the scale for particular elements.

I also don't really see the benefit. We can just have two functions with clearer names.

@fonnesbeck
Copy link
Member Author

I think I've lost the plot on this one. I will close this and let someone else take a shot at it. I'm sure the answer is easy and I'm just missing it.

@fonnesbeck fonnesbeck closed this Dec 13, 2016
@fonnesbeck fonnesbeck deleted the alltrue_fix branch December 13, 2016 01:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error when using DiscreteUniform
6 participants