New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add examples of inference for static Bayesian models #9

Closed
LeahPrice opened this Issue Jul 29, 2017 · 8 comments

Comments

Projects
None yet
3 participants
@LeahPrice
Collaborator

LeahPrice commented Jul 29, 2017

Currently the examples are based on particle filtering so it would be nice to have an example of how to use this package for inference on static Bayesian models. My plan is to add an example which is based on estimating the parameters of a linear regression model. This example is sometimes used as a test for new methods (albeit a 'toy' test) and I thought it would be good for pedagogical reasons due to its simplicity. The log normalizing constant is known for this example.

Data annealing and likelihood annealing are two common approaches, so I will create files for each approach and do a single pull request (they are fairly similar).

  • For the data annealing approach, I'll return the final estimates of the parameters, the weights and the standard SMC estimator for the log evidence. I plan to do posterior density plots for the marginals.
  • For the likelihood annealing approach, I'll make the temperature schedule one of the input arguments and I'll return the particles and their associated log likelihoods, log priors and weights for each of the temperatures (the results from the power posteriors can be used in recycling schemes). I'll also return the ESS and log normalizing constant estimates using the standard SMC estimator and path sampling estimators.

I plan to do adaptation on the likelihood annealing approach in a future request, probably as a third file. If three files on one example is too much then I can hold off on the likelihood annealing version until I do adaptation.

@eddelbuettel @adamjohansen

@eddelbuettel

This comment has been minimized.

Show comment
Hide comment
@eddelbuettel

eddelbuettel Jul 29, 2017

Collaborator

That sounds like a nice idea. Do you plan to keep this self-contained within a file in demo/, or make it a new top-level function with tests, help page and all that?

Collaborator

eddelbuettel commented Jul 29, 2017

That sounds like a nice idea. Do you plan to keep this self-contained within a file in demo/, or make it a new top-level function with tests, help page and all that?

@LeahPrice

This comment has been minimized.

Show comment
Hide comment
@LeahPrice

LeahPrice Jul 29, 2017

Collaborator

My plan was to make it a top-level function with tests, help page and later include it in a vignette. For the help page, I was going to either do a separate help file for each and have @seealso linking them or just do a single help file.

What is your preference on demo vs top-level function?

Collaborator

LeahPrice commented Jul 29, 2017

My plan was to make it a top-level function with tests, help page and later include it in a vignette. For the help page, I was going to either do a separate help file for each and have @seealso linking them or just do a single help file.

What is your preference on demo vs top-level function?

@eddelbuettel

This comment has been minimized.

Show comment
Hide comment
@eddelbuettel

eddelbuettel Jul 29, 2017

Collaborator

I sometimes cheat and leave them in demo/ which is easier. I also sometimes think demo/ is easier for "expositional" functions. But I also know a full function -- with tests, help page and all that -- is more powerful and more visible. It is also in line with what the package has done before.

So thumbs up from me.

Collaborator

eddelbuettel commented Jul 29, 2017

I sometimes cheat and leave them in demo/ which is easier. I also sometimes think demo/ is easier for "expositional" functions. But I also know a full function -- with tests, help page and all that -- is more powerful and more visible. It is also in line with what the package has done before.

So thumbs up from me.

@adamjohansen

This comment has been minimized.

Show comment
Hide comment
@adamjohansen

adamjohansen Aug 1, 2017

Collaborator

This all sounds good to me... and I'd use as many source files as you need to organize things logically (if things start to get out of hand we can always introduce some more structure).

I think one of my comments on your last commit was based on forgetting exactly what was in which repository, so just ignore it.

Collaborator

adamjohansen commented Aug 1, 2017

This all sounds good to me... and I'd use as many source files as you need to organize things logically (if things start to get out of hand we can always introduce some more structure).

I think one of my comments on your last commit was based on forgetting exactly what was in which repository, so just ignore it.

@LeahPrice

This comment has been minimized.

Show comment
Hide comment
@LeahPrice

LeahPrice Aug 1, 2017

Collaborator

Thanks @eddelbuettel and @adamjohansen for such quick responses regarding the number of changes in my planned pull request.

Adam's comment about whether it's a good idea to use default data if there are issues with the data that the user inputs got me thinking about how I've implemented this example. I figured that the choice of priors, tuning parameters, etc. are all specific to this one example so it probably doesn't make sense to allow users to input their own data. The example is generally used for model choice so I've included the full data set in the package and users can select the model. Does this sound alright to you? I could try to make this example more generally applicable but dealing with the priors could be challenging.

When I was including the full data set, I went back to the original source and noticed an error in the data I had been using. It turns out that the paper I got the data from had an error where one observation was misprinted. This unfortunately means that I need to redo the numerical integration to get the true log evidence estimates. This isn't a major issue - it'll just be two numbers in the .Rd file that need updating later.

Collaborator

LeahPrice commented Aug 1, 2017

Thanks @eddelbuettel and @adamjohansen for such quick responses regarding the number of changes in my planned pull request.

Adam's comment about whether it's a good idea to use default data if there are issues with the data that the user inputs got me thinking about how I've implemented this example. I figured that the choice of priors, tuning parameters, etc. are all specific to this one example so it probably doesn't make sense to allow users to input their own data. The example is generally used for model choice so I've included the full data set in the package and users can select the model. Does this sound alright to you? I could try to make this example more generally applicable but dealing with the priors could be challenging.

When I was including the full data set, I went back to the original source and noticed an error in the data I had been using. It turns out that the paper I got the data from had an error where one observation was misprinted. This unfortunately means that I need to redo the numerical integration to get the true log evidence estimates. This isn't a major issue - it'll just be two numbers in the .Rd file that need updating later.

@adamjohansen

This comment has been minimized.

Show comment
Hide comment
@adamjohansen

adamjohansen Aug 1, 2017

Collaborator

Personally, my inclination would probably be to make both (hyper)prior parameters and data something which can be specified from R because that's closer to the way I'd expect most real applications to look, but that's a detail and something that could be tweaked later. A working proof of concept implementation of this sort of thing seems useful.

The other detail doesn't seem like too much to worry about at this stage, either, but do stick a comment in somewhere indicating that the calculation does need redoing to avoid confusing anyone if it gets forgotten.

Collaborator

adamjohansen commented Aug 1, 2017

Personally, my inclination would probably be to make both (hyper)prior parameters and data something which can be specified from R because that's closer to the way I'd expect most real applications to look, but that's a detail and something that could be tweaked later. A working proof of concept implementation of this sort of thing seems useful.

The other detail doesn't seem like too much to worry about at this stage, either, but do stick a comment in somewhere indicating that the calculation does need redoing to avoid confusing anyone if it gets forgotten.

@LeahPrice

This comment has been minimized.

Show comment
Hide comment
@LeahPrice

LeahPrice Aug 1, 2017

Collaborator

Thanks Adam.

My questions were mostly in response to Adam's comment so I'll submit this as a pull request now to make the most of the time difference.

I removed the incorrect log evidence and put the correct log evidence values for the two models in but they are only correct to one decimal place so I will improve on that later. Sorry, that wasn't really clear from what I said.

Collaborator

LeahPrice commented Aug 1, 2017

Thanks Adam.

My questions were mostly in response to Adam's comment so I'll submit this as a pull request now to make the most of the time difference.

I removed the incorrect log evidence and put the correct log evidence values for the two models in but they are only correct to one decimal place so I will improve on that later. Sorry, that wasn't really clear from what I said.

@adamjohansen

This comment has been minimized.

Show comment
Hide comment
@adamjohansen

adamjohansen Aug 1, 2017

Collaborator

Great, thanks.

Collaborator

adamjohansen commented Aug 1, 2017

Great, thanks.

@LeahPrice LeahPrice closed this Aug 3, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment