Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

next manual 2.6.0 #1081

Closed
bob-carpenter opened this issue Oct 12, 2014 · 46 comments
Closed

next manual 2.6.0 #1081

bob-carpenter opened this issue Oct 12, 2014 · 46 comments
Assignees
Milestone

Comments

@bob-carpenter
Copy link
Contributor

This is where issues for the manual after 2.5.0 go.

@bob-carpenter bob-carpenter self-assigned this Oct 12, 2014
@bob-carpenter bob-carpenter added this to the v2.5.0++ milestone Oct 12, 2014
@bob-carpenter
Copy link
Contributor Author

Also from Andrew:

Every time I see “Normal,” I wince—but maybe that’s just my problem!

I can live with "normal(0,1)", but don't like the "N(0,1)" used in BDA.

  • lowercase all math densities, e.g., replacing "Normal" with "normal", including all the ones named after people such as "Weibull" and "Wishart"
  • no, don't do it. The problem is for distros like ExpModNormal, which need camel case in order to be parsable.

@bob-carpenter
Copy link
Contributor Author

  • break sparse and ragged coding out into its own chapter

@betanalpha
Copy link
Contributor

  • In 9.3 "Zero-Inflated Models", change "Other distributions than the Poisson can also be inflated in this way." to "Other discrete distributions than the Poisson can also be inflated in this way." Zero-inflation doesn't work for continuous distributions.

@syclik syclik modified the milestones: v2.5.0++, Future Oct 20, 2014
@bob-carpenter
Copy link
Contributor Author

From Rob Goedman:

  • include pointers to new interfaces
    • Julia
    • MATLAB

@syclik
Copy link
Member

syclik commented Oct 21, 2014

  • fix —fixme— in Assigning subsection in Array Data Types section

@bob-carpenter
Copy link
Contributor Author

  • add cbind and rbind in index with pointers to append_col and append_row rather than signatures to help out R users looking for the functions.

Anything else we want to do this for?

@bob-carpenter
Copy link
Contributor Author

There seems to be a typo on page 34 of stan-reference-2.5.0.pdf
The pdf has

From @seldomworks in #1097:

model {
  y ~ normal(x*beta, sigma); // likelihood
}

I think it was intended to be

model {
  y ~ normal(x*beta + alpha, sigma); // likelihood
}
  • clarify this by either adding the alpha back in or mentioning explicitly that x is assumed to include a column of 1s for the intercept; the advantage of keeping it separate is that it can be given a different prior

@bob-carpenter
Copy link
Contributor Author

  • Add discussion of multiple change points as extension to change point section

Where you have the single loop in s for the changepoint, you
need to instead keep pairs. Easiest way to do it would be
in a matrix:

  matrix[T,T] lp;
  lp <- rep_matrix(log_unif,T);
  for (s1 in 1:T)
    for (s2 in 1:T)
      for (t in 1:T)
        lp[s1,s2] <- lp[s1,s2] 
                     + poisson_log(D[t], if_else(t < s1, 
                                                 e, if_else(t < s2, m, l)));

and then in the model needs to be changed to convert the matrix lp to a vector so it can be passed to log_sum_exp:

  increment_log_prob(log_sum_exp(to_vector(lp)));

The problem is that as there are more change points, the computational complexity grows. You can see it intuitively from the loops.

Suppose there are N items. With a single change point, you need to consider all N positions. Each position requires an amount N of work, so overall complexity is O(N^2).

With two change points you need to consider all (N choose 2) change points. That's a quadratic number of pairs, each requiring an amount N of work, so overall complexity is O(N^3).

@bob-carpenter
Copy link
Contributor Author

From Sebastian Weber on stan-users:

  • in the MCMC algorithms chapter, include an example of the kind of model you'd call with fixed parameters (e.g., one that uses the generated quantities block to generate variates)
  • make sure to mention you can have parameters in the model and initialize them in the usual ways

@bob-carpenter
Copy link
Contributor Author

  • clean up parameterizations of multilevel 2PL IRT model to ensure identifiability even if the priors don't exactly match the ones given (p. 49)

@bob-carpenter
Copy link
Contributor Author

Related to a modeling issue brought up by Guido Biele on stan-users list:

  • in truncation section add discussion of how y ~ normal(mu,sigma) T[L,] requires y > L or the probability is zero; in user-written truncations, say in a mixture, this must be done explicitly

@bob-carpenter
Copy link
Contributor Author

From Andrew via e-mail:

  • on page 92, in block of code, there should be a semicolon after sigma_x

@bob-carpenter
Copy link
Contributor Author

  • pull discussion of ragged and sparse matrices into its own chapter

@bob-carpenter
Copy link
Contributor Author

  • change reference to "standard vector" in doc for segment function to "array" (manual should be written from the user's perspective, where there's no notion of "standard vector") [p. 329]
  • check for other instances to change

@jrnold
Copy link
Contributor

jrnold commented Nov 19, 2014

Regarding the Kalman filter examples, I have examples and fully implemented Kalman filtering (of several flavors: with / without missing values, as batch / sequentially) with backward sampling in the generated quantities block here: https://github.com/jrnold/ssmodels-in-stan. I could probably write up that section.

@bob-carpenter
Copy link
Contributor Author

From David Hallvig in issue #1138

  • fix refs
  • thank David Hallvig

HMM is part of the Time-Series Models chapter (Ch. 6) but is stated, in the first paragraph of said chapter, to be part of a later chapter (although the reference is given to a subsection of the Time-Series Models, i.e. 6.6).

Here's the relevant section:

\chapter{Time-Series Models}

\noindent
Times series data come arranged in temporal order. This chapter
presents two kinds of time series models, regression-like models such
as autogression and moving average models, and hidden Markov models.

In later chapters, we discuss two alternative models which may be
applied to time-series data, 
%
\begin{itemize}
\item Gaussian processes (GP) in \refchapter{gaussian-processes} and 
\item hidden Markov models (HMM) in \refsection{hmms}.
\end{itemize}

@bob-carpenter
Copy link
Contributor Author

  • summarize discussion in doc

Question from Jon Zelner on stan-users:

My implementation is essentially the same as the vanilla GP implementation on page 130 of the Stan reference (hence the lack of a code example). However, since certain symptoms may be less important to the lab-confirmed diagnosis than others, I have been trying to implement the automatic relevance determination on page 133 of the reference manual. When I try to do this, however, the parameter values tend to blow up. So, I was wondering if anyone out there with experience working with these kinds of models had a suggestion for a) priors for the relevance parameters and/or b) constraints that might make for a more useful model.

Response from Aki:

Note also that, rho and eta are weakly identifiable and the ratio of them is better identified and affects the nonlinearity. However it is usual to use independent priors (I don't remember if anyone uses joint prior with dependency).

I usually like to think suitable prior for the length scale l=1/rho. If the length scale is larger than the scale of the data the model is practically linear (wrt the particular covariate) and increasing the length scale does not change the model. Thus you should use a prior which goes down for the larger length scale values. If the length scale is so small that the correlation between data points is zero, then decreasing the length scale further does not change the model. Usually I've had no need to restrict the length scale to go to very small values, but sometimes. I usually use half-t prior for the length scale as a weakly informative prior.

Eta corresponds to how much of the variation is explained by the regression function and has a similar role to the prior variance for linear model weights. Thus we can use same weakly informative priors as in linear models. I often use half-t prior
for eta.

Question from Herra Huu:

I'm a bit puzzled by Aki's response. It would make sense for me, if the dimension of the input (==D) would be one. But if I understood the original question correctly, here we would have:
a) D>1
b) covariance function: f(x[i], x[j]) = eta * exp(-sum_{d=1}^{D} rho[d] * pow(x[i,d] - x[j,d],2))

Why, in this case, the term “automatic relevance determination” is misleading? I mean, if for example rho[1]==0, then we could just drop the first input dimension and we still would get exactly the same results. In general, wouldn't it be true that the closer the rho[d] is to zero the less effect x[,d] would have? (well, the original scale of the inputs matters, so it's not exactly that straightforward unless we normalize the inputs etc)

Response from Andrew:

I accept that we should continue to use the term “automatic relevance determination” because it exists, and people use it. But I find the term a bit distracting because, from the perspective of Stan, it’s just hierarchical Bayesian modeling:

  1. It’s not any more “automatic” than any other Bayesian inference
  2. The “relevance” interpretation seems tied to some very specific model choices
  3. “Determination” is just inference.

Response from Aki:

In general, wouldn't it be true that the closer the rho[d] is to zero the less effect x[,d] would have?

A priori yes, but not a posteriori as the actual dependencies between x and y affect also. What I tried to say is that with a covariate x1 having a linear effect and another covariate x2 having a nonlinear effect, it is possible that rho1<rho2 even if the predictive relevance of x1 is higher. The rho is related to the relevance, but it is more accurate to say that it measures the nonlinearity (or the expected number of upcrossings, GPML p. 80). I couldn't quickly find a nice example with GP, but figures 1+3 in http://becs.aalto.fi/en/research/bayes/publications/LampinenVehtari_NN2001_preprint.pdf illustrate the same issue with MLP. We have made the same experiment with GPs, but just couldn't now find if I have the figures somewhere.

@syclik
Copy link
Member

syclik commented Nov 24, 2014

  • get rid of references to models by path altogether in the manual. There are only links for some old models --- it is hard to maintain.

@syclik syclik modified the milestones: v2.5.0++, Future Nov 24, 2014
@aadler
Copy link
Contributor

aadler commented Dec 19, 2014

From above: “lowercase all math densities, e.g., replacing "Normal" with "normal", including all the ones named after people such as "Weibull" and "Wishart"”. Doesn't every manual of style require the capitalization of proper names?

@andrewgelman
Copy link
Collaborator

You can get Bob and me all wound up on this one! In my books, I capitalize Poisson and Wishart and Bernoulli but not normal and binomial and gamma. But I recall that Bob made a compelling argument that, in a software manual, typographical consistency is a more important concern.

On Dec 18, 2014, at 9:51 PM, Avraham Adler notifications@github.com wrote:

From above: “lowercase all math densities, e.g., replacing "Normal" with "normal", including all the ones named after people such as "Weibull" and "Wishart"”. Doesn't every manual of style require the capitalization of proper names?


Reply to this email directly or view it on GitHub #1081 (comment).

@bob-carpenter
Copy link
Contributor Author

Typographically, the Stan manual uses:

sans-serif: mathematical probability function
typewriter: code function
serif: running text
other mathematical functions like log() or exp()

Gelman and Hill; Gelman et al. use

typewriter: code function
serif: running text
mathematical probability function
other mathematical functions

In Andrew's book with Jennifer, you see "dpois" in code, because
it's BUGS code (which Andrew calls "Bugs"). And then you see three
conventions for distributions: N for normal, upper cased for those named
after a person, and lower cased for others, all in the running text font.

I'm OK with

sans-serif OR serif for mathematical prob functions

and OK with

lower-case or upper-case for probability functions

I'm less happy with "N" or "G" for normal or gamma because it looks inconsistent,
and even less happy when people put it in a script (caligraphic in TeX-speak) font.
And if we go lower case, I'd just as soon lower-case "poisson" and "weibull".

Andrew --- do you want to decide? There's some work in changing it because there
are hundreds of pages packed with distribution names, but some of the changes
are just macros.

  • Bob

On Dec 18, 2014, at 9:56 PM, Andrew Gelman notifications@github.com wrote:

You can get Bob and me all wound up on this one! In my books, I capitalize Poisson and Wishart and Bernoulli but not normal and binomial and gamma. But I recall that Bob made a compelling argument that, in a software manual, typographical consistency is a more important concern.

On Dec 18, 2014, at 9:51 PM, Avraham Adler notifications@github.com wrote:

From above: “lowercase all math densities, e.g., replacing "Normal" with "normal", including all the ones named after people such as "Weibull" and "Wishart"”. Doesn't every manual of style require the capitalization of proper names?


Reply to this email directly or view it on GitHub #1081 (comment).


Reply to this email directly or view it on GitHub.

@aadler
Copy link
Contributor

aadler commented Dec 21, 2014

I think it boils down to the following question, do you view the Stan manual as a book or text in its own right or is it merely a convenient place to store documentation, which really could or should all be online. If the former, then you do need to follow some accepted manual of style, none of which I know of allow for the lowercase first letter for a person's name. If the latter (i.e. if Stan were interactive, you wouldn't have a document, you would have R-like webpages that pop-up by calling help, but the software as such doesn't allow that), then I think you can be a bit more lax. The fact that you have a separate citation for the manual from that of the software, implies that you view the Stan manual as a stand-alone written text (eligible for an ISBN, perhaps?), and I think you should, at the very least, uppercase proper names.

As for what to do with normal or negative binomial, I don't have a good source for that. In my own writing, I tend to leave them lowercase as per standard English, but I can hear the argument that in the realm of statistics, they are the "proper names" of the distributions.

If you are transcribing code (dpois for example) then I believe that as long as it is explicit that it is code (blockquote or typewriter font) it needs to be printed exactly as it should be in a program. Capitalization would be contraindicated if the code is meant to be used lowercase.

From reading math textbooks, I'm comfortable with calligraphic N for normal, and I think 99% of those who use Stan would be as well. I'm less comfortable with G for gamma, as I've also seen the actual Greek letter used, as well as it spelled out much more often than normal is spelled out. So if I had my druthers (which I don't, I know :) ), I'd go with the current Stan typography which seperates mathematical formulæ from text from code, and go with the "proper name Upper/standard words lower" split for probability functions, as if there isn't anything special about them (treat tem like the rest of the English language). As for N/G, I think you can get away with N, but it's probably better to spell everything out.

I certainly agree that regardless of the final decision, the manual must be consistent.

@bob-carpenter
Copy link
Contributor Author

We think of it as a manual. There are really three parts to
the manual:

  • user's/programmer's guide
  • reference manual
  • intro to Bayesian inference, MCMC, MLE and optimization, etc.

Are you equating "online" with being in HTML format? Other people
have said we should render in HTML for searchability.

Don't worry, we'll continue to capitalize names used as names.
The question is only what to do with the mathematical function
symbols, which are neither running text (where they're clearly
capitalized) or computer code (where they're clearly lowercased).

  • Bob

On Dec 21, 2014, at 2:54 AM, Avraham Adler notifications@github.com wrote:

I think it boils down to the following question, do you view the Stan manual as a book or text in its own right or is it merely a convenient place to store documentation, which really could or should all be online. If the former, then you do need to follow some accepted manual of style, none of which I know of allow for the lowercase first letter for a person's name. If the latter (i.e. if Stan were interactive, you wouldn't have a document, you would have R-like webpages that pop-up by calling help, but the software as such doesn't allow that), then I think you can be a bit more lax. The fact that you have a separate citation for the manual from that of the software, implies that you view the Stan manual as a stand-alone written text (eligible for an ISBN, perhaps?), and I think you should, at the very least, uppercase proper names.

As for what to do with normal or negative binomial, I don't have a good source for that. In my own writing, I tend to leave them lowercase as per standard English, but I can hear the argument that in the realm of statistics, they are the "proper names" of the distributions.

If you are transcribing code (dpois for example) then I believe that as long as it is explicit that it is code (blockquote or typewriter font) it needs to be printed exactly as it should be in a program. Capitalization would be contraindicated if the code is meant to be used lowercase.

From reading math textbooks, I'm comfortable with calligraphic N for normal, and I think 99% of those who use Stan would be as well. I'm less comfortable with G for gamma, as I've also seen the actual Greek letter used, as well as it spelled out much more often than normal is spelled out. So if I had my druthers (which I don't, I know :) ), I'd go with the current Stan typography which seperates mathematical formulæ from text from code, and go with the "proper name Upper/standard words lower" split for probability functions, as if there isn't anything special about them (treat tem like the rest of the English language). As for N/G, I think you can get away with N, but it's probably better to spell everything out.

I certainly agree that regardless of the final decision, the manual must be consistent.


Reply to this email directly or view it on GitHub.

@bob-carpenter
Copy link
Contributor Author

Moved from issue #1180 created by @ksvanhorn:

On p. 189 of the Stan Modeling Language Manual it gives example code for reparameterizing a Wishart distribution. The code is incorrect -- the last column of the matrix A is never initialized.

  • It needs something like the following additional lines:
    for (i in 1:(K-1))
        A[i,K] <- 0;
    A[K,K] <- sqrt(c[K]);
  • The example code for reparameterizing an inverse Wishart distribution has the same issue.

@bob-carpenter
Copy link
Contributor Author

  • include discussion of decision to throw exceptions at boundary in general overview of probability functions in same section as vectorization

@bob-carpenter
Copy link
Contributor Author

Krzysztof Sakrejda on stan-users suggested an alternative description for the center_lp function example:

Here is an example of a function to assign standard normal priors to a vector of coefficients, along with a center and scale, and return the translated and scaled coefficients.
- [x] use Krzysztof's formulation - [x] add cross-ref to parameterization discussion

@bob-carpenter
Copy link
Contributor Author

  • add process description chapter

http://www.r-project.org/doc/R-FDA.pdf

Something along the lines of the SDLC section would be great to have in your user-manual. Basically this describes the software life-cycle as how Stan is programmed, released and managed.

@blindglobe
Copy link

regarding the "process description chapter", I think you mean, "Process for Software Development and Release", which would be wonderful for supporting corporate IT computer systems validation work by providing justification that "STAN is developed in a way which makes it fit for the purpose of the process of performing a Bayesian statistical analyses". Sure it might seem obvious, but for dotting the i's and crossing the t's for a critical review, it makes things that much easier if the process for software development is described and followed. I'd be happy to review and comment if you want before you release (speaking as co-ghost-writer of the R-FDA document cited).

@bob-carpenter
Copy link
Contributor Author

  • replace lambda/sqrt(pi) with lambda/2 in exp_mod_normal doc (the implementation is correct, but our doc was wrong)
  • thank Andrew Ellis in the manual for pointing it out

@bob-carpenter
Copy link
Contributor Author

  • add clarification that inverse Wishart and Wishart both take the scale matrix S as a parameter
  • compare to BDA's notation Sigma ~ Inv-Wishart(inv(S))

@bob-carpenter
Copy link
Contributor Author

  • Fix this one (from Ben on stan-users)

Note that the next sentence of the manual is wrong. It should say

At $\nu = 1$, the LKJ correlation distribution reduces to the uniform distribution over correlation matrices of order $K$.

@bob-carpenter
Copy link
Contributor Author

Contributed by Gokcen Eraslan via a patch to master #1227 that we haven't merged because it was to master:

  • remove duplicate "externally" in programming.tex after soft k-means
  • thank G"{o}k\c{c}en in the acknowledgments

@bob-carpenter
Copy link
Contributor Author

John Sutton mentioned on stan-users:

  • append_row needs to have its own description, not just a cut-and-paste of append_col
  • thank John in the acknowledgments

@bob-carpenter
Copy link
Contributor Author

  • undo the fix that went in for Andrew on cbind and rbind so as not to make the index confusing as to what is a function and what isn't

@bob-carpenter
Copy link
Contributor Author

  • thank Juan Sebastián Casallas for a doc patch

@bob-carpenter
Copy link
Contributor Author

Krzysztof Sakrejda pointed out on stan-users that this is wrong:

Arrays, on the other hand, should be traversed in 
row-major (or first-index fastest) order.
  • fix it

@sakrejda
Copy link
Contributor

One more, in the section on "reparameterizing the Cauchy", pg. 182: The text "The inverse of the cumulative distribution function, F X −1 : (0, 1) → R, is thus" is followed by an equation for F^-1(y) specified in terms of x.

  • \pi(x-1/2) should be \pi(y-1/2) ... (?)

@bob-carpenter
Copy link
Contributor Author

  • thank Alex Zvoleff for a code patch

@syclik syclik closed this as completed in eee24e4 Jan 28, 2015
@bob-carpenter
Copy link
Contributor Author

Fixing some typos.

@bob-carpenter bob-carpenter reopened this Jan 28, 2015
@bob-carpenter
Copy link
Contributor Author

  • fix issue with normal typesetting in ragged arrays chapter

@syclik syclik closed this as completed Feb 5, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants