Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapt and run blackbox likelihood tutorial #28

Merged
merged 10 commits into from Dec 19, 2021

Conversation

OriolAbril
Copy link
Member

@OriolAbril OriolAbril commented Jan 23, 2021

I was unable to run the cython notebook due to cython compilation issues, so I decided to create a numpy version of the same concept instead.

The immediate goal was to test the observed/givens split which seems to be on the right track and then I figured I could share this version too even though I'm not much of a fan of the blackbox likelihood it seems it is quite common.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@canyon289
Copy link
Member

Skimming over the notebook that ran this PR it seems quite nice!

@MarcoGorelli MarcoGorelli self-requested a review January 24, 2021 10:45
Copy link
Collaborator

@AlexAndorra AlexAndorra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is super thorough @OriolAbril, thanks a lot 😍
I left a few comments below -- hope it's useful!

@review-notebook-app
Copy link

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2021-01-24T11:15:53Z
----------------------------------------------------------------

  • This can be problematic went when you need to pass parameters set as PyMC3 distributions
  • "If you have a model that is defined as a set of Theano operators then this is no problem": isn't this any PyMC3 model?
  • "but if your model is essentially a "black box" then you won't...": maybe you can quickly define what black box means here -- is it basically any external function that can't do autodiff?

@review-notebook-app
Copy link

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2021-01-24T11:15:54Z
----------------------------------------------------------------

  • "Here, the gradient function is defined in Cython for speed...": I think there is no Cython here, isn't there?
  • "As sigma=1 in this case the dropped factor is only a factor 2...": how do you derive the values of the factor? Maybe it'd be worth to quickly mention what "normalization constants" are?

@review-notebook-app
Copy link

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2021-01-24T11:15:54Z
----------------------------------------------------------------

"We can now check that the gradient Op works was expected we expect it to"


@review-notebook-app
Copy link

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2021-01-24T11:15:55Z
----------------------------------------------------------------

Ideally, I would add a few lines explaining what to make of these summaries


@review-notebook-app
Copy link

View / edit / reply to this conversation on ReviewNB

MarcoGorelli commented on 2021-01-29T12:13:10Z
----------------------------------------------------------------

the "custom distributions" link is dead


@MarcoGorelli
Copy link
Contributor

Thanks for doing this, I too was unable to run the Cython version, your work here is much appreciated

@twiecki
Copy link
Member

twiecki commented Mar 28, 2021

Ping @OriolAbril

@OriolAbril
Copy link
Member Author

I have this in mind, but as we won't add the givens argument, the ideal solution would be to use pm.Potential instead, or refactor how the arguments are passed to use densitydicts that only use actual observed variables as observed kwarg. I will get to this eventually, but for now I will prioritize organizing issues and the contributing guide for outreachy applicants

@mjhajharia
Copy link
Member

this is really cool!!!!

@OriolAbril OriolAbril merged commit 3290b46 into pymc-devs:main Dec 19, 2021
@OriolAbril OriolAbril deleted the blackbox branch December 19, 2021 18:35
@rahulrobotics
Copy link

Hello, Thanks for the awesome tutorial! I have a question about the normal_gradients function. I don't understand how this works to take the gradient of the likelihood. Also, this comment is present in the function:

"the derivatives are calculated using the central difference, using an iterative method to check that the values converge as step size decreases."

However, I do not see how the normal_gradients function is accomplishing this. There does not seem to be any iteration. Can you please explain how the normal_gradients function works? Thank you!

@OriolAbril
Copy link
Member Author

Hi, can you ask on https://discourse.pymc.io? this way it will be easier for other people with the same question to see and you might also get answers faster as more people follow that but not this PR

@rayhagimoto
Copy link

rayhagimoto commented Jun 6, 2022

Hello, Thanks for the awesome tutorial! I have a question about the normal_gradients function. I don't understand how this works to take the gradient of the likelihood. Also, this comment is present in the function:

"the derivatives are calculated using the central difference, using an iterative method to check that the values converge as step size decreases."

However, I do not see how the normal_gradients function is accomplishing this. There does not seem to be any iteration. Can you please explain how the normal_gradients function works? Thank you!

Hi Rahul,

I couldn't find your post on discourse.pymc.io. I also had the same question but after looking at the notebook again I understand what the function is doing. In contrast to the description, normal_gradients isn't calculating partial derivatives using finite differences or doing anything iterative. It is just the symbolic derivative of the log likelihood (explained further in the next paragraph.) The source of the confusion is that the comments and descriptions are leftover from the original blog post by Matt Pitkin where an iterative finite difference approach was used. If you go to the blog post and look at his gradients function, you'll see he is actually performing an iterative finite difference.

Explanation of what I mean by symbolic derivative

We are trying to calculate the gradient of the log likelihood. The log likelihood has been defined by the following functions

def my_model(theta, x):
    m, c = theta
    return m * x + c

def my_loglike(theta, x, data, sigma):
    model = my_model(theta, x)
    return -(0.5 / sigma**2) * np.sum((data - model) ** 2)

Symbolically we could write this as
$$\ln \mathcal{L}(\theta | D_X, \sigma) = -\frac{1}{2\sigma^2}\sum_{X} \left[D_X - M_X(\theta)\right]^2$$
where $\theta = (m,c)$ and $M_X(\theta) = m X + c$.

This function is simple enough that we can calculate the gradient by hand as
$$\frac{\partial}{\partial {\theta}} \ln \mathcal{L} = \frac{1}{\sigma^2} \sum_X \left[D_X - M_X(\theta)\right] \frac{\partial M_X}{\partial \theta}$$
where I'm using the short-hand notation
$$\frac{\partial} {\partial \theta} = \left(\frac{\partial }{\partial m}, \frac{\partial }{\partial c}\right)$$
e.g.
$$\frac{\partial M_X}{\partial \theta} = \left(\frac{\partial M_X}{\partial m}, \frac{\partial M_X}{\partial c}\right) = (x, 1)$$
Compare these expressions with the normal_gradients function definition (repeated below for convenience). They are identical (up to an irrelevant constant factor.)

def normal_gradients(theta, x, data, sigma):

    grads = np.empty(2)
    aux_vect = data - my_model(theta, x)  # /(2*sigma**2)
    grads[0] = np.sum(aux_vect * x)
    grads[1] = np.sum(aux_vect)

    return grads

If you link me the discourse post I am happy to repeat this answer there.

twiecki pushed a commit that referenced this pull request Jan 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants