next manual 2.6.0 #1081

bob-carpenter · 2014-10-12T23:33:00Z

This is where issues for the manual after 2.5.0 go.

bob-carpenter · 2014-10-12T23:43:08Z

Also from Andrew:

Every time I see “Normal,” I wince—but maybe that’s just my problem!

I can live with "normal(0,1)", but don't like the "N(0,1)" used in BDA.

lowercase all math densities, e.g., replacing "Normal" with "normal", including all the ones named after people such as "Weibull" and "Wishart"
no, don't do it. The problem is for distros like ExpModNormal, which need camel case in order to be parsable.

bob-carpenter · 2014-10-13T17:41:16Z

break sparse and ragged coding out into its own chapter

betanalpha · 2014-10-15T22:10:59Z

In 9.3 "Zero-Inflated Models", change "Other distributions than the Poisson can also be inflated in this way." to "Other discrete distributions than the Poisson can also be inflated in this way." Zero-inflation doesn't work for continuous distributions.

bob-carpenter · 2014-10-20T21:22:51Z

From Rob Goedman:

include pointers to new interfaces
- Julia
- MATLAB

syclik · 2014-10-21T15:46:53Z

fix —fixme— in Assigning subsection in Array Data Types section

bob-carpenter · 2014-10-23T23:00:33Z

add cbind and rbind in index with pointers to append_col and append_row rather than signatures to help out R users looking for the functions.

Anything else we want to do this for?

bob-carpenter · 2014-10-24T19:59:03Z

There seems to be a typo on page 34 of stan-reference-2.5.0.pdf
The pdf has

From @seldomworks in #1097:

model {
  y ~ normal(x*beta, sigma); // likelihood
}

I think it was intended to be

model {
  y ~ normal(x*beta + alpha, sigma); // likelihood
}

clarify this by either adding the alpha back in or mentioning explicitly that x is assumed to include a column of 1s for the intercept; the advantage of keeping it separate is that it can be given a different prior

bob-carpenter · 2014-10-25T16:36:28Z

Add discussion of multiple change points as extension to change point section

Where you have the single loop in s for the changepoint, you
need to instead keep pairs. Easiest way to do it would be
in a matrix:

  matrix[T,T] lp;
  lp <- rep_matrix(log_unif,T);
  for (s1 in 1:T)
    for (s2 in 1:T)
      for (t in 1:T)
        lp[s1,s2] <- lp[s1,s2] 
                     + poisson_log(D[t], if_else(t < s1, 
                                                 e, if_else(t < s2, m, l)));

and then in the model needs to be changed to convert the matrix lp to a vector so it can be passed to log_sum_exp:

  increment_log_prob(log_sum_exp(to_vector(lp)));

The problem is that as there are more change points, the computational complexity grows. You can see it intuitively from the loops.

Suppose there are N items. With a single change point, you need to consider all N positions. Each position requires an amount N of work, so overall complexity is O(N^2).

With two change points you need to consider all (N choose 2) change points. That's a quadratic number of pairs, each requiring an amount N of work, so overall complexity is O(N^3).

bob-carpenter · 2014-10-28T18:04:46Z

From Sebastian Weber on stan-users:

in the MCMC algorithms chapter, include an example of the kind of model you'd call with fixed parameters (e.g., one that uses the generated quantities block to generate variates)
make sure to mention you can have parameters in the model and initialize them in the usual ways

syclik · 2014-11-04T15:41:39Z

thank @apw for the manual patches (Manual: Fix six typos involving "the the". #1116, Manual: Typo fix: "declares and define a". #1117, Manual: Typo fix: "whihc includes". #1118, Manual: Typo: missing parenthesis. #1119)

bob-carpenter · 2014-11-04T17:11:04Z

clean up parameterizations of multilevel 2PL IRT model to ensure identifiability even if the priors don't exactly match the ones given (p. 49)

bob-carpenter · 2014-11-07T07:09:49Z

Related to a modeling issue brought up by Guido Biele on stan-users list:

in truncation section add discussion of how y ~ normal(mu,sigma) T[L,] requires y > L or the probability is zero; in user-written truncations, say in a mixture, this must be done explicitly

bob-carpenter · 2014-11-11T01:15:46Z

From Andrew via e-mail:

on page 92, in block of code, there should be a semicolon after sigma_x

bob-carpenter · 2014-11-11T16:40:38Z

pull discussion of ragged and sparse matrices into its own chapter

bob-carpenter · 2014-11-11T16:42:59Z

change reference to "standard vector" in doc for segment function to "array" (manual should be written from the user's perspective, where there's no notion of "standard vector") [p. 329]
check for other instances to change

jrnold · 2014-11-19T06:28:13Z

Regarding the Kalman filter examples, I have examples and fully implemented Kalman filtering (of several flavors: with / without missing values, as batch / sequentially) with backward sampling in the generated quantities block here: https://github.com/jrnold/ssmodels-in-stan. I could probably write up that section.

bob-carpenter · 2014-11-19T22:20:40Z

From David Hallvig in issue #1138

fix refs
thank David Hallvig

HMM is part of the Time-Series Models chapter (Ch. 6) but is stated, in the first paragraph of said chapter, to be part of a later chapter (although the reference is given to a subsection of the Time-Series Models, i.e. 6.6).

Here's the relevant section:

\chapter{Time-Series Models}

\noindent
Times series data come arranged in temporal order. This chapter
presents two kinds of time series models, regression-like models such
as autogression and moving average models, and hidden Markov models.

In later chapters, we discuss two alternative models which may be
applied to time-series data, 
%
\begin{itemize}
\item Gaussian processes (GP) in \refchapter{gaussian-processes} and 
\item hidden Markov models (HMM) in \refsection{hmms}.
\end{itemize}

bob-carpenter · 2014-11-23T05:02:02Z

summarize discussion in doc

Question from Jon Zelner on stan-users:

My implementation is essentially the same as the vanilla GP implementation on page 130 of the Stan reference (hence the lack of a code example). However, since certain symptoms may be less important to the lab-confirmed diagnosis than others, I have been trying to implement the automatic relevance determination on page 133 of the reference manual. When I try to do this, however, the parameter values tend to blow up. So, I was wondering if anyone out there with experience working with these kinds of models had a suggestion for a) priors for the relevance parameters and/or b) constraints that might make for a more useful model.

Response from Aki:

Note also that, rho and eta are weakly identifiable and the ratio of them is better identified and affects the nonlinearity. However it is usual to use independent priors (I don't remember if anyone uses joint prior with dependency).

I usually like to think suitable prior for the length scale l=1/rho. If the length scale is larger than the scale of the data the model is practically linear (wrt the particular covariate) and increasing the length scale does not change the model. Thus you should use a prior which goes down for the larger length scale values. If the length scale is so small that the correlation between data points is zero, then decreasing the length scale further does not change the model. Usually I've had no need to restrict the length scale to go to very small values, but sometimes. I usually use half-t prior for the length scale as a weakly informative prior.

Eta corresponds to how much of the variation is explained by the regression function and has a similar role to the prior variance for linear model weights. Thus we can use same weakly informative priors as in linear models. I often use half-t prior
for eta.

Question from Herra Huu:

I'm a bit puzzled by Aki's response. It would make sense for me, if the dimension of the input (==D) would be one. But if I understood the original question correctly, here we would have:
a) D>1
b) covariance function: f(x[i], x[j]) = eta * exp(-sum_{d=1}^{D} rho[d] * pow(x[i,d] - x[j,d],2))

Why, in this case, the term “automatic relevance determination” is misleading? I mean, if for example rho[1]==0, then we could just drop the first input dimension and we still would get exactly the same results. In general, wouldn't it be true that the closer the rho[d] is to zero the less effect x[,d] would have? (well, the original scale of the inputs matters, so it's not exactly that straightforward unless we normalize the inputs etc)

Response from Andrew:

I accept that we should continue to use the term “automatic relevance determination” because it exists, and people use it. But I find the term a bit distracting because, from the perspective of Stan, it’s just hierarchical Bayesian modeling:

It’s not any more “automatic” than any other Bayesian inference
The “relevance” interpretation seems tied to some very specific model choices
“Determination” is just inference.

Response from Aki:

In general, wouldn't it be true that the closer the rho[d] is to zero the less effect x[,d] would have?

A priori yes, but not a posteriori as the actual dependencies between x and y affect also. What I tried to say is that with a covariate x1 having a linear effect and another covariate x2 having a nonlinear effect, it is possible that rho1<rho2 even if the predictive relevance of x1 is higher. The rho is related to the relevance, but it is more accurate to say that it measures the nonlinearity (or the expected number of upcrossings, GPML p. 80). I couldn't quickly find a nice example with GP, but figures 1+3 in http://becs.aalto.fi/en/research/bayes/publications/LampinenVehtari_NN2001_preprint.pdf illustrate the same issue with MLP. We have made the same experiment with GPs, but just couldn't now find if I have the figures somewhere.

syclik · 2014-11-24T17:23:35Z

get rid of references to models by path altogether in the manual. There are only links for some old models --- it is hard to maintain.

aadler · 2014-12-19T02:51:05Z

From above: “lowercase all math densities, e.g., replacing "Normal" with "normal", including all the ones named after people such as "Weibull" and "Wishart"”. Doesn't every manual of style require the capitalization of proper names?

andrewgelman · 2014-12-19T02:56:47Z

You can get Bob and me all wound up on this one! In my books, I capitalize Poisson and Wishart and Bernoulli but not normal and binomial and gamma. But I recall that Bob made a compelling argument that, in a software manual, typographical consistency is a more important concern.

On Dec 18, 2014, at 9:51 PM, Avraham Adler notifications@github.com wrote:

From above: “lowercase all math densities, e.g., replacing "Normal" with "normal", including all the ones named after people such as "Weibull" and "Wishart"”. Doesn't every manual of style require the capitalization of proper names?

—
Reply to this email directly or view it on GitHub #1081 (comment).

bob-carpenter · 2014-12-19T20:30:29Z

Typographically, the Stan manual uses:

sans-serif: mathematical probability function
typewriter: code function
serif: running text
other mathematical functions like log() or exp()

Gelman and Hill; Gelman et al. use

typewriter: code function
serif: running text
mathematical probability function
other mathematical functions

In Andrew's book with Jennifer, you see "dpois" in code, because
it's BUGS code (which Andrew calls "Bugs"). And then you see three
conventions for distributions: N for normal, upper cased for those named
after a person, and lower cased for others, all in the running text font.

I'm OK with

sans-serif OR serif for mathematical prob functions

and OK with

lower-case or upper-case for probability functions

I'm less happy with "N" or "G" for normal or gamma because it looks inconsistent,
and even less happy when people put it in a script (caligraphic in TeX-speak) font.
And if we go lower case, I'd just as soon lower-case "poisson" and "weibull".

Andrew --- do you want to decide? There's some work in changing it because there
are hundreds of pages packed with distribution names, but some of the changes
are just macros.

Bob

On Dec 18, 2014, at 9:56 PM, Andrew Gelman notifications@github.com wrote:

You can get Bob and me all wound up on this one! In my books, I capitalize Poisson and Wishart and Bernoulli but not normal and binomial and gamma. But I recall that Bob made a compelling argument that, in a software manual, typographical consistency is a more important concern.

On Dec 18, 2014, at 9:51 PM, Avraham Adler notifications@github.com wrote:

From above: “lowercase all math densities, e.g., replacing "Normal" with "normal", including all the ones named after people such as "Weibull" and "Wishart"”. Doesn't every manual of style require the capitalization of proper names?

—
Reply to this email directly or view it on GitHub #1081 (comment).

—
Reply to this email directly or view it on GitHub.

aadler · 2014-12-21T07:54:17Z

I think it boils down to the following question, do you view the Stan manual as a book or text in its own right or is it merely a convenient place to store documentation, which really could or should all be online. If the former, then you do need to follow some accepted manual of style, none of which I know of allow for the lowercase first letter for a person's name. If the latter (i.e. if Stan were interactive, you wouldn't have a document, you would have R-like webpages that pop-up by calling help, but the software as such doesn't allow that), then I think you can be a bit more lax. The fact that you have a separate citation for the manual from that of the software, implies that you view the Stan manual as a stand-alone written text (eligible for an ISBN, perhaps?), and I think you should, at the very least, uppercase proper names.

As for what to do with normal or negative binomial, I don't have a good source for that. In my own writing, I tend to leave them lowercase as per standard English, but I can hear the argument that in the realm of statistics, they are the "proper names" of the distributions.

If you are transcribing code (dpois for example) then I believe that as long as it is explicit that it is code (blockquote or typewriter font) it needs to be printed exactly as it should be in a program. Capitalization would be contraindicated if the code is meant to be used lowercase.

From reading math textbooks, I'm comfortable with calligraphic N for normal, and I think 99% of those who use Stan would be as well. I'm less comfortable with G for gamma, as I've also seen the actual Greek letter used, as well as it spelled out much more often than normal is spelled out. So if I had my druthers (which I don't, I know :) ), I'd go with the current Stan typography which seperates mathematical formulæ from text from code, and go with the "proper name Upper/standard words lower" split for probability functions, as if there isn't anything special about them (treat tem like the rest of the English language). As for N/G, I think you can get away with N, but it's probably better to spell everything out.

I certainly agree that regardless of the final decision, the manual must be consistent.

bob-carpenter · 2014-12-22T00:21:29Z

We think of it as a manual. There are really three parts to
the manual:

user's/programmer's guide
reference manual
intro to Bayesian inference, MCMC, MLE and optimization, etc.

Are you equating "online" with being in HTML format? Other people
have said we should render in HTML for searchability.

Don't worry, we'll continue to capitalize names used as names.
The question is only what to do with the mathematical function
symbols, which are neither running text (where they're clearly
capitalized) or computer code (where they're clearly lowercased).

Bob

On Dec 21, 2014, at 2:54 AM, Avraham Adler notifications@github.com wrote:

I think it boils down to the following question, do you view the Stan manual as a book or text in its own right or is it merely a convenient place to store documentation, which really could or should all be online. If the former, then you do need to follow some accepted manual of style, none of which I know of allow for the lowercase first letter for a person's name. If the latter (i.e. if Stan were interactive, you wouldn't have a document, you would have R-like webpages that pop-up by calling help, but the software as such doesn't allow that), then I think you can be a bit more lax. The fact that you have a separate citation for the manual from that of the software, implies that you view the Stan manual as a stand-alone written text (eligible for an ISBN, perhaps?), and I think you should, at the very least, uppercase proper names.

As for what to do with normal or negative binomial, I don't have a good source for that. In my own writing, I tend to leave them lowercase as per standard English, but I can hear the argument that in the realm of statistics, they are the "proper names" of the distributions.

If you are transcribing code (dpois for example) then I believe that as long as it is explicit that it is code (blockquote or typewriter font) it needs to be printed exactly as it should be in a program. Capitalization would be contraindicated if the code is meant to be used lowercase.

From reading math textbooks, I'm comfortable with calligraphic N for normal, and I think 99% of those who use Stan would be as well. I'm less comfortable with G for gamma, as I've also seen the actual Greek letter used, as well as it spelled out much more often than normal is spelled out. So if I had my druthers (which I don't, I know :) ), I'd go with the current Stan typography which seperates mathematical formulæ from text from code, and go with the "proper name Upper/standard words lower" split for probability functions, as if there isn't anything special about them (treat tem like the rest of the English language). As for N/G, I think you can get away with N, but it's probably better to spell everything out.

I certainly agree that regardless of the final decision, the manual must be consistent.

—
Reply to this email directly or view it on GitHub.

bob-carpenter · 2014-12-22T00:23:52Z

Moved from issue #1180 created by @ksvanhorn:

On p. 189 of the Stan Modeling Language Manual it gives example code for reparameterizing a Wishart distribution. The code is incorrect -- the last column of the matrix A is never initialized.

It needs something like the following additional lines:

    for (i in 1:(K-1))
        A[i,K] <- 0;
    A[K,K] <- sqrt(c[K]);

The example code for reparameterizing an inverse Wishart distribution has the same issue.

bob-carpenter · 2014-12-23T20:44:26Z

include discussion of decision to throw exceptions at boundary in general overview of probability functions in same section as vectorization

bob-carpenter · 2014-12-30T00:37:05Z

Krzysztof Sakrejda on stan-users suggested an alternative description for the center_lp function example:

Here is an example of a function to assign standard normal priors to a vector of coefficients, along with a center and scale, and return the translated and scaled coefficients.

- [x] use Krzysztof's formulation - [x] add cross-ref to parameterization discussion

bob-carpenter · 2015-01-06T19:32:46Z

add process description chapter

http://www.r-project.org/doc/R-FDA.pdf

Something along the lines of the SDLC section would be great to have in your user-manual. Basically this describes the software life-cycle as how Stan is programmed, released and managed.

blindglobe · 2015-01-07T12:54:23Z

regarding the "process description chapter", I think you mean, "Process for Software Development and Release", which would be wonderful for supporting corporate IT computer systems validation work by providing justification that "STAN is developed in a way which makes it fit for the purpose of the process of performing a Bayesian statistical analyses". Sure it might seem obvious, but for dotting the i's and crossing the t's for a critical review, it makes things that much easier if the process for software development is described and followed. I'd be happy to review and comment if you want before you release (speaking as co-ghost-writer of the R-FDA document cited).

bob-carpenter · 2015-01-09T20:41:13Z

replace lambda/sqrt(pi) with lambda/2 in exp_mod_normal doc (the implementation is correct, but our doc was wrong)
thank Andrew Ellis in the manual for pointing it out

bob-carpenter · 2015-01-13T01:04:57Z

add clarification that inverse Wishart and Wishart both take the scale matrix S as a parameter
compare to BDA's notation Sigma ~ Inv-Wishart(inv(S))

bob-carpenter · 2015-01-15T20:27:21Z

Fix this one (from Ben on stan-users)

Note that the next sentence of the manual is wrong. It should say

At $\nu = 1$, the LKJ correlation distribution reduces to the uniform distribution over correlation matrices of order $K$.

bob-carpenter · 2015-01-21T18:39:54Z

Contributed by Gokcen Eraslan via a patch to master #1227 that we haven't merged because it was to master:

remove duplicate "externally" in programming.tex after soft k-means
thank G"{o}k\c{c}en in the acknowledgments

bob-carpenter · 2015-01-21T18:59:16Z

John Sutton mentioned on stan-users:

append_row needs to have its own description, not just a cut-and-paste of append_col
thank John in the acknowledgments

bob-carpenter · 2015-01-21T19:07:57Z

undo the fix that went in for Andrew on cbind and rbind so as not to make the index confusing as to what is a function and what isn't

bob-carpenter · 2015-01-22T19:32:03Z

thank Juan Sebastián Casallas for a doc patch

bob-carpenter · 2015-01-25T22:06:47Z

Krzysztof Sakrejda pointed out on stan-users that this is wrong:

Arrays, on the other hand, should be traversed in 
row-major (or first-index fastest) order.

fix it

sakrejda · 2015-01-25T22:15:27Z

One more, in the section on "reparameterizing the Cauchy", pg. 182: The text "The inverse of the cumulative distribution function, F X −1 : (0, 1) → R, is thus" is followed by an equation for F^-1(y) specified in terms of x.

\pi(x-1/2) should be \pi(y-1/2) ... (?)

bob-carpenter · 2015-01-27T23:04:53Z

thank Alex Zvoleff for a code patch

bob-carpenter · 2015-01-28T21:44:09Z

Fixing some typos.

bob-carpenter · 2015-01-28T21:47:45Z

fix issue with normal typesetting in ragged arrays chapter

bob-carpenter added the documentation label Oct 12, 2014

bob-carpenter self-assigned this Oct 12, 2014

bob-carpenter added this to the v2.5.0++ milestone Oct 12, 2014

syclik modified the milestones: v2.5.0++, Future Oct 20, 2014

bob-carpenter mentioned this issue Oct 24, 2014

Typo in model on page 34 of reference manual #1097

Closed

bob-carpenter mentioned this issue Nov 19, 2014

Incorrect reference to later chapters in Time-Series Models chapter #1138

Closed

syclik modified the milestones: v2.5.0++, Future Nov 24, 2014

bob-carpenter mentioned this issue Dec 22, 2014

Reparameterization code is incorrect in Stan Modeling Language Manual #1180

Closed

bob-carpenter closed this as completed Jan 13, 2015

bob-carpenter reopened this Jan 13, 2015

bob-carpenter mentioned this issue Jan 21, 2015

Typo fix. #1227

Closed

bob-carpenter pushed a commit that referenced this issue Jan 22, 2015

finishes updates manual for 2.6, resolving issue #1081

966cf02

syclik closed this as completed in eee24e4 Jan 28, 2015

bob-carpenter reopened this Jan 28, 2015

syclik closed this as completed Feb 5, 2015

next manual 2.6.0 #1081

next manual 2.6.0 #1081

Comments

bob-carpenter commented Oct 12, 2014

bob-carpenter commented Oct 12, 2014

bob-carpenter commented Oct 13, 2014

betanalpha commented Oct 15, 2014

bob-carpenter commented Oct 20, 2014

syclik commented Oct 21, 2014

bob-carpenter commented Oct 23, 2014

bob-carpenter commented Oct 24, 2014

bob-carpenter commented Oct 25, 2014

bob-carpenter commented Oct 28, 2014

syclik commented Nov 4, 2014

bob-carpenter commented Nov 4, 2014

bob-carpenter commented Nov 7, 2014

bob-carpenter commented Nov 11, 2014

bob-carpenter commented Nov 11, 2014

bob-carpenter commented Nov 11, 2014

jrnold commented Nov 19, 2014

bob-carpenter commented Nov 19, 2014

bob-carpenter commented Nov 23, 2014

syclik commented Nov 24, 2014

aadler commented Dec 19, 2014

andrewgelman commented Dec 19, 2014

bob-carpenter commented Dec 19, 2014

aadler commented Dec 21, 2014

bob-carpenter commented Dec 22, 2014

bob-carpenter commented Dec 22, 2014

bob-carpenter commented Dec 23, 2014

bob-carpenter commented Dec 30, 2014

bob-carpenter commented Jan 6, 2015

blindglobe commented Jan 7, 2015

bob-carpenter commented Jan 9, 2015

bob-carpenter commented Jan 13, 2015

bob-carpenter commented Jan 15, 2015

bob-carpenter commented Jan 21, 2015

bob-carpenter commented Jan 21, 2015

bob-carpenter commented Jan 21, 2015

bob-carpenter commented Jan 22, 2015

bob-carpenter commented Jan 25, 2015

sakrejda commented Jan 25, 2015

bob-carpenter commented Jan 27, 2015

bob-carpenter commented Jan 28, 2015

bob-carpenter commented Jan 28, 2015