Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upLinear Regression Example #10
Conversation
This simple example shows how to use the package to sample from the posterior distribution of a static Bayesian model and to estimate the model evidence. The example was originally presented in the literature as a model choice problem which is simple enough for the true log evidence estimates (and Bayes factor) to be obtainable via numerical integration. Users can select which model they wish to perform inference on (model 1 or 2).
|
This looks good to me. There are a few specific comments, but nothing that needs any changes to code other than perhaps one or two of the comments at this stage. |
| LinReg<- function(model, particles=1000, plot=FALSE) { | ||
|
|
||
| data(radiata) | ||
| # if no data supplied, use default |
adamjohansen
Aug 1, 2017
Collaborator
Don't we /always/ use the default at the moment?
Don't we /always/ use the default at the moment?
|
|
||
|
|
||
| data(radiata) | ||
| # if no data supplied, use default |
adamjohansen
Aug 1, 2017
Collaborator
Idem.
Idem.
|
|
||
| namespace LinReg { | ||
|
|
||
| class rad_state |
adamjohansen
Aug 1, 2017
Collaborator
Just a remark and I wouldn't necessarily change anything here, but in general I don't know if there's a need to define a class which just contains another type. If it makes the code more readable then fair enough, but an arma::vec or std::pair of arma::vec would be perfectly reasonable classes to use.
Just a remark and I wouldn't necessarily change anything here, but in general I don't know if there's a need to define a class which just contains another type. If it makes the code more readable then fair enough, but an arma::vec or std::pair of arma::vec would be perfectly reasonable classes to use.
| @@ -0,0 +1,56 @@ | |||
| // -*- mode: C++; c-indent-level: 4; c-basic-offset: 4; indent-tabs-mode: nil; -*- | |||
| // | |||
| // LinReg_LA.h: Rcpp wrapper for SMC library -- A simple example for estimating | |||
adamjohansen
Aug 1, 2017
Collaborator
It's quite possible that the answer is "yes", and it might be cleaner this way, but is there a good reason for having completely separate implementations of the two approaches to linear regression?
There seems to be an amount of code overlap and my first instinct would have been to share common code between the two (defining two classes with the same name and content in different namespaces just feels like harder work than necessary, but it at least gives a good degree of modularity).
It's quite possible that the answer is "yes", and it might be cleaner this way, but is there a good reason for having completely separate implementations of the two approaches to linear regression?
There seems to be an amount of code overlap and my first instinct would have been to share common code between the two (defining two classes with the same name and content in different namespaces just feels like harder work than necessary, but it at least gives a good degree of modularity).
LeahPrice
Aug 2, 2017
Author
Collaborator
I mostly just did that to keep things clean and easy to follow for now and for later when we add in the adaptation. I could combine them and have helper functions to get the prior, f(y_t|theta) and f(y_{1_t}|theta) but it might make the code a little harder to read. I'm also keeping track of the log likelihood and log prior in the likelihood annealing example but not in data annealing because that wouldn't really make sense.
I don't have a particularly strong preference either way -- combining three sets of files (data annealing, likelihood annealing and likelihood annealing with adaptation) might make it harder to read, but at the same time having three sets of files for one example seems over the top.
I mostly just did that to keep things clean and easy to follow for now and for later when we add in the adaptation. I could combine them and have helper functions to get the prior, f(y_t|theta) and f(y_{1_t}|theta) but it might make the code a little harder to read. I'm also keeping track of the log likelihood and log prior in the likelihood annealing example but not in data annealing because that wouldn't really make sense.
I don't have a particularly strong preference either way -- combining three sets of files (data annealing, likelihood annealing and likelihood annealing with adaptation) might make it harder to read, but at the same time having three sets of files for one example seems over the top.
adamjohansen
Aug 2, 2017
Collaborator
Fair enough, I can see an argument for keeping the two implementations quite separate to make it very easy for someone looking at the code to see exactly what it does. I really just wanted to check that the existing code hadn't given the impression that there needed to be completely separate source files for each principal function or anything like that.
If I was doing this in practice I'd have written one function which evaluated the likelihood over a specified subset of the data and then just called that appropriately for different implementations, but that's not the only sensible way of doing things and might not be the clearest for someone new to the code.
Fair enough, I can see an argument for keeping the two implementations quite separate to make it very easy for someone looking at the code to see exactly what it does. I really just wanted to check that the existing code hadn't given the impression that there needed to be completely separate source files for each principal function or anything like that.
If I was doing this in practice I'd have written one function which evaluated the likelihood over a specified subset of the data and then just called that appropriately for different implementations, but that's not the only sensible way of doing things and might not be the clearest for someone new to the code.
|
|
||
| where \eqn{\epsilon_i ~ N(0,\sigma^2)} and \eqn{i=1,\ldots,42}. | ||
| Here \eqn{y} is the maximum compression strength and \eqn{x} is either the density | ||
| (model 1) or the adjusted density (model 2). |
adamjohansen
Aug 1, 2017
Collaborator
Is it possible to tell what the difference between model 1 and model 2 is without reference to the original paper? It might be helpful to the uninitiated reader if it was clear exactly what the two are from the documentation.
Is it possible to tell what the difference between model 1 and model 2 is without reference to the original paper? It might be helpful to the uninitiated reader if it was clear exactly what the two are from the documentation.
LeahPrice
Aug 2, 2017
Author
Collaborator
The difference between the two models is in the choice of predictor: model 1 uses density and model 2 uses the adjusted density (taking into account the resin content). I've tried to make this a bit clearer in the commit I just did but I'm still not very happy with how it's documented.
I know that a more standard way to write this would be to write two sets of equations with different parameters. This is what's done in most papers:
y_i = \alpha + \beta (x_i - \bar{x}) + \epsilon_i where \epsilon_i ~ N(0,\sigma^2)
and
y_i = \gamma + \lambda (z_i - \bar{z}) + \epsilon_i where \epsilon_i ~ N(0,\tau^2)
I was trying to avoid using different notation for the two models because everything else is explained using the first notation and I didn't want to add subscripts because that looks messy in the .Rd format (unless I'm missing something).
If you think what I have now is unclear then please let me know. If we are going to change this work later so that it is more widely applicable then the description of this example would probably be less of an issue.
The difference between the two models is in the choice of predictor: model 1 uses density and model 2 uses the adjusted density (taking into account the resin content). I've tried to make this a bit clearer in the commit I just did but I'm still not very happy with how it's documented.
I know that a more standard way to write this would be to write two sets of equations with different parameters. This is what's done in most papers:
y_i = \alpha + \beta (x_i - \bar{x}) + \epsilon_i where \epsilon_i ~ N(0,\sigma^2)
and
y_i = \gamma + \lambda (z_i - \bar{z}) + \epsilon_i where \epsilon_i ~ N(0,\tau^2)
I was trying to avoid using different notation for the two models because everything else is explained using the first notation and I didn't want to add subscripts because that looks messy in the .Rd format (unless I'm missing something).
If you think what I have now is unclear then please let me know. If we are going to change this work later so that it is more widely applicable then the description of this example would probably be less of an issue.
adamjohansen
Aug 2, 2017
Collaborator
My thinking on this is that most people reading it are likely to be familiar with the maths and so it shouldn't be too offputting. What we want is for someone looking at the code to be able to understand what it does without consulting external references except where it's essential.
Actually, the two equations are the same up to the labelling of the variables and the values upon which you are regression -- that is what you exploit in the code -- so my inclination would just be to tell people what the predictor is in a little bit more detail (just tell people what the density is the density of and what adjusted density means).
My thinking on this is that most people reading it are likely to be familiar with the maths and so it shouldn't be too offputting. What we want is for someone looking at the code to be able to understand what it does without consulting external references except where it's essential.
Actually, the two equations are the same up to the labelling of the variables and the values upon which you are regression -- that is what you exploit in the code -- so my inclination would just be to tell people what the predictor is in a little bit more detail (just tell people what the density is the density of and what adjusted density means).
LeahPrice
Aug 2, 2017
Author
Collaborator
Great, thank you. Hopefully my most recent commit covers this (and I can change it if not).
"Here \eqn{y} is the maximum compression strength in pounds per square inch. The density (in pounds per cubic foot) of the radiata pine is considered a useful predictor, so model 1 uses density for \eqn{x}. Model 2 instead considers the density adjusted for resin content, which is associated with high density but not with strength."
Great, thank you. Hopefully my most recent commit covers this (and I can change it if not).
"Here \eqn{y} is the maximum compression strength in pounds per square inch. The density (in pounds per cubic foot) of the radiata pine is considered a useful predictor, so model 1 uses density for \eqn{x}. Model 2 instead considers the density adjusted for resin content, which is associated with high density but not with strength."
|
I look forward to seeing this example once finalized. That is really nice to have too. |
Giving more detail about the two competing models for the radiata pine regression example and removing obsolete comments about data handling.
|
Thanks; that should be enough for someone new to this to understand (qualitatively) what's going on I think. This all looks good to me now. |
|
@LeahPrice I just added to commits for minor polish. R CMD check was pointing out missing globals, missing imports and the ChangeLog was not quite in the right format (Emacs knows best). Thanks for keeping the ChangeLog up to date, |
This simple example shows how to use the package to sample from the posterior distribution of a static Bayesian model and to estimate the model evidence. The example was originally presented in the literature as a model choice problem which is simple enough for the true log evidence estimates (and Bayes factor) to be obtainable via numerical integration. Users can select which model they wish to perform inference on (model 1 or 2).