-
-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
next manual 2.6.0 #1081
Comments
Also from Andrew: Every time I see “Normal,” I wince—but maybe that’s just my problem! I can live with "normal(0,1)", but don't like the "N(0,1)" used in BDA.
|
|
|
From Rob Goedman:
|
|
Anything else we want to do this for? |
There seems to be a typo on page 34 of stan-reference-2.5.0.pdf From @seldomworks in #1097:
I think it was intended to be
|
Where you have the single loop in s for the changepoint, you
and then in the model needs to be changed to convert the matrix
The problem is that as there are more change points, the computational complexity grows. You can see it intuitively from the loops. Suppose there are N items. With a single change point, you need to consider all N positions. Each position requires an amount N of work, so overall complexity is O(N^2). With two change points you need to consider all (N choose 2) change points. That's a quadratic number of pairs, each requiring an amount N of work, so overall complexity is O(N^3). |
From Sebastian Weber on stan-users:
|
|
Related to a modeling issue brought up by Guido Biele on stan-users list:
|
From Andrew via e-mail:
|
|
|
Regarding the Kalman filter examples, I have examples and fully implemented Kalman filtering (of several flavors: with / without missing values, as batch / sequentially) with backward sampling in the generated quantities block here: https://github.com/jrnold/ssmodels-in-stan. I could probably write up that section. |
From David Hallvig in issue #1138
HMM is part of the Time-Series Models chapter (Ch. 6) but is stated, in the first paragraph of said chapter, to be part of a later chapter (although the reference is given to a subsection of the Time-Series Models, i.e. 6.6). Here's the relevant section:
|
Question from Jon Zelner on stan-users: My implementation is essentially the same as the vanilla GP implementation on page 130 of the Stan reference (hence the lack of a code example). However, since certain symptoms may be less important to the lab-confirmed diagnosis than others, I have been trying to implement the automatic relevance determination on page 133 of the reference manual. When I try to do this, however, the parameter values tend to blow up. So, I was wondering if anyone out there with experience working with these kinds of models had a suggestion for a) priors for the relevance parameters and/or b) constraints that might make for a more useful model. Response from Aki: Note also that, rho and eta are weakly identifiable and the ratio of them is better identified and affects the nonlinearity. However it is usual to use independent priors (I don't remember if anyone uses joint prior with dependency). I usually like to think suitable prior for the length scale l=1/rho. If the length scale is larger than the scale of the data the model is practically linear (wrt the particular covariate) and increasing the length scale does not change the model. Thus you should use a prior which goes down for the larger length scale values. If the length scale is so small that the correlation between data points is zero, then decreasing the length scale further does not change the model. Usually I've had no need to restrict the length scale to go to very small values, but sometimes. I usually use half-t prior for the length scale as a weakly informative prior. Eta corresponds to how much of the variation is explained by the regression function and has a similar role to the prior variance for linear model weights. Thus we can use same weakly informative priors as in linear models. I often use half-t prior Question from Herra Huu: I'm a bit puzzled by Aki's response. It would make sense for me, if the dimension of the input (==D) would be one. But if I understood the original question correctly, here we would have: Why, in this case, the term “automatic relevance determination” is misleading? I mean, if for example rho[1]==0, then we could just drop the first input dimension and we still would get exactly the same results. In general, wouldn't it be true that the closer the rho[d] is to zero the less effect x[,d] would have? (well, the original scale of the inputs matters, so it's not exactly that straightforward unless we normalize the inputs etc) Response from Andrew: I accept that we should continue to use the term “automatic relevance determination” because it exists, and people use it. But I find the term a bit distracting because, from the perspective of Stan, it’s just hierarchical Bayesian modeling:
Response from Aki: In general, wouldn't it be true that the closer the rho[d] is to zero the less effect x[,d] would have? A priori yes, but not a posteriori as the actual dependencies between x and y affect also. What I tried to say is that with a covariate x1 having a linear effect and another covariate x2 having a nonlinear effect, it is possible that rho1<rho2 even if the predictive relevance of x1 is higher. The rho is related to the relevance, but it is more accurate to say that it measures the nonlinearity (or the expected number of upcrossings, GPML p. 80). I couldn't quickly find a nice example with GP, but figures 1+3 in http://becs.aalto.fi/en/research/bayes/publications/LampinenVehtari_NN2001_preprint.pdf illustrate the same issue with MLP. We have made the same experiment with GPs, but just couldn't now find if I have the figures somewhere. |
|
From above: “lowercase all math densities, e.g., replacing "Normal" with "normal", including all the ones named after people such as "Weibull" and "Wishart"”. Doesn't every manual of style require the capitalization of proper names? |
You can get Bob and me all wound up on this one! In my books, I capitalize Poisson and Wishart and Bernoulli but not normal and binomial and gamma. But I recall that Bob made a compelling argument that, in a software manual, typographical consistency is a more important concern.
|
Typographically, the Stan manual uses: sans-serif: mathematical probability function Gelman and Hill; Gelman et al. use typewriter: code function In Andrew's book with Jennifer, you see "dpois" in code, because I'm OK with sans-serif OR serif for mathematical prob functions and OK with lower-case or upper-case for probability functions I'm less happy with "N" or "G" for normal or gamma because it looks inconsistent, Andrew --- do you want to decide? There's some work in changing it because there
|
I think it boils down to the following question, do you view the Stan manual as a book or text in its own right or is it merely a convenient place to store documentation, which really could or should all be online. If the former, then you do need to follow some accepted manual of style, none of which I know of allow for the lowercase first letter for a person's name. If the latter (i.e. if Stan were interactive, you wouldn't have a document, you would have R-like webpages that pop-up by calling help, but the software as such doesn't allow that), then I think you can be a bit more lax. The fact that you have a separate citation for the manual from that of the software, implies that you view the Stan manual as a stand-alone written text (eligible for an ISBN, perhaps?), and I think you should, at the very least, uppercase proper names. As for what to do with normal or negative binomial, I don't have a good source for that. In my own writing, I tend to leave them lowercase as per standard English, but I can hear the argument that in the realm of statistics, they are the "proper names" of the distributions. If you are transcribing code ( From reading math textbooks, I'm comfortable with calligraphic N for normal, and I think 99% of those who use Stan would be as well. I'm less comfortable with G for gamma, as I've also seen the actual Greek letter used, as well as it spelled out much more often than normal is spelled out. So if I had my druthers (which I don't, I know :) ), I'd go with the current Stan typography which seperates mathematical formulæ from text from code, and go with the "proper name Upper/standard words lower" split for probability functions, as if there isn't anything special about them (treat tem like the rest of the English language). As for N/G, I think you can get away with N, but it's probably better to spell everything out. I certainly agree that regardless of the final decision, the manual must be consistent. |
We think of it as a manual. There are really three parts to
Are you equating "online" with being in HTML format? Other people Don't worry, we'll continue to capitalize names used as names.
|
Moved from issue #1180 created by @ksvanhorn: On p. 189 of the Stan Modeling Language Manual it gives example code for reparameterizing a Wishart distribution. The code is incorrect -- the last column of the matrix A is never initialized.
|
|
Krzysztof Sakrejda on stan-users suggested an alternative description for the Here is an example of a function to assign standard normal priors to a vector of coefficients, along with a center and scale, and return the translated and scaled coefficients.- [x] use Krzysztof's formulation - [x] add cross-ref to parameterization discussion |
http://www.r-project.org/doc/R-FDA.pdf Something along the lines of the SDLC section would be great to have in your user-manual. Basically this describes the software life-cycle as how Stan is programmed, released and managed. |
regarding the "process description chapter", I think you mean, "Process for Software Development and Release", which would be wonderful for supporting corporate IT computer systems validation work by providing justification that "STAN is developed in a way which makes it fit for the purpose of the process of performing a Bayesian statistical analyses". Sure it might seem obvious, but for dotting the i's and crossing the t's for a critical review, it makes things that much easier if the process for software development is described and followed. I'd be happy to review and comment if you want before you release (speaking as co-ghost-writer of the R-FDA document cited). |
|
|
Note that the next sentence of the manual is wrong. It should say
|
Contributed by Gokcen Eraslan via a patch to master #1227 that we haven't merged because it was to master:
|
John Sutton mentioned on stan-users:
|
|
|
Krzysztof Sakrejda pointed out on stan-users that this is wrong:
|
One more, in the section on "reparameterizing the Cauchy", pg. 182: The text "The inverse of the cumulative distribution function, F X −1 : (0, 1) → R, is thus" is followed by an equation for F^-1(y) specified in terms of x.
|
|
Fixing some typos. |
|
This is where issues for the manual after 2.5.0 go.
The text was updated successfully, but these errors were encountered: