Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

next manual, 2.12.0++ #2051

Closed
bob-carpenter opened this issue Sep 2, 2016 · 27 comments
Closed

next manual, 2.12.0++ #2051

bob-carpenter opened this issue Sep 2, 2016 · 27 comments

Comments

@bob-carpenter
Copy link
Contributor

Summary:

This is where updates for the 2.12 manual should go.

v2.12.0

@bob-carpenter
Copy link
Contributor Author

bob-carpenter commented Sep 2, 2016

From @wds15 on mailing list, moved from stan-dev/math#44:

  • add doc in multi_normal_rng that suggests a more efficient approach based on transforming independent unit normals given a Cholesky factor

@mawds
Copy link

mawds commented Sep 6, 2016

Section 11.2 Meta-Analysis; in the transformed data block the loop runs over j, but the indices on the RHS for the calculation of sigma[j] are for i

  • fix
  • Thank David Mawdsley

@bob-carpenter
Copy link
Contributor Author

bob-carpenter commented Sep 6, 2016

From Gary Schulz on stan-users:

Reference manual for v2.11.0 in section 45.3 says that the exponent of $y$ in the pdf of {/sf InvChiSquare} is $-(\nu/2 - 1)$, but I think it should be $-(\nu/2 + 1)$. (All using LaTeX notation.) Note that it is correct in section 45.4 for the {\sf ScaledInvChiSquare} distribution.

  • correct
  • thank Gary Schulz in acknowledgements

@syclik
Copy link
Member

syclik commented Sep 9, 2016

From @davharris in #2065.

The mod operator % isn't listed under Section 32.1's subsection on "Binary Infix Operators" for integers, although it seems to be included in the language. I was only able to find fmod (for reals) in the documentation.

  • add to src/docs/stan-reference/functions.tex
  • thank David Harris in acknowledgements

@bob-carpenter
Copy link
Contributor Author

bob-carpenter commented Sep 13, 2016

  • Add model for hyperprior on Dirichlet in reparameterization section of programming part. Maybe start with beta-binomial.
parameters {
  real<lower=0.1> kappa;  // prior count minus K
  simplex[K] theta;       // prior mean
  simplex[K] phi[J];      // parameters


model {
  ... phi is a K-simplex, either parameters or data ...
  for (j in 1:J)
    phi[j] ~ dirichlet(theta * kappa);  // hierarchical prior on phi[k]
  kappa ~ pareto(0.1, 1.5);  // hyperprior on prior count
  theta ~ dirichlet(alpha);  // alpha is hyperprior

It decomposes the Dirichlet into mean theta and prior count (minus K) kappa. You can just skip the dirichlet prior on theta in which case it defaults to uniform over simplexes.

@bob-carpenter
Copy link
Contributor Author

bob-carpenter commented Sep 15, 2016

Martin Stjernman on stan-users wrote in that:

  1. In the manual (page 268) there is as an example the following declarations:
vector[5] a[4, 3];
vector[5] b[4];

and then it is stated that the following assignment is legal: b = a[1];

  • declare a as vector[5] a[3, 4]; to make this legal at runtime
  1. He also points out the doc says (p. 267) that
vector[7] mu[3]; //gives a three-dimensional array of 7-vectors

to me this is an one-dimensional array that has 3 elements (length 3) where each element is a 7 element vector

  • fix it to say a one-dimensional array of size 3 containing 7-element vectors.
  1. Also,
matrix[7, 2] mu[15, 12] // gives a 15x12-dimensional array of 7x2 matrices

to me this is a two-dimensional array of 7x2 matrices, it has 15*12 elements arranged in 15 rows and 12 columns where each element is a 7 by 2 matrix

  • fix to say "size" or just say 15 x 12 array rather than using "dimensional"
  1. Another example with
vector<lower=-1, upper=1>[3,3] corr;
  • make that matrix instead of vector
  1. On page 270
vector b[4];
  • Fix that to say vector[4] b;
  • thank Martin Stjernman in the acknowledgements

@bob-carpenter
Copy link
Contributor Author

bob-carpenter commented Sep 16, 2016

From Luiz Max Carvalho on stan-users:

Imagine I have N (discrete) observations from n individuals. Each observation X_i is a vector {X_1i, X_2i, ..., X_k}, where sum(X_i) = n. The problem I have is: N is in the millions, and since we can only have n_(n +1)/2 different values for X, this means we'll have many "repeated" observations, i.e, each X_j (j =1, 2, n_(n +1)/2) will appear f_j times, sum(f_j) = N. So far, I've been downsampling the data proportional to f_j, but I'd like to use all of the data.

How can I include the frequencies in the multinomial likelihood in stan? If I were doing this outside of stan I'd just sum the log of each frequency f_j to the (multinomial) log-likelihood of X_j. Is there a way of incrementing l_p to achieve this?

@bgoodri replied:

In this case, you can do

for (j in 1:J) target += f_j * multinomial_lpmf(...);

Since the duplicative observations are actually observed, this is okay, as opposed to the situation where the f_j are estimates of how many people in the population are in stratum j

  • add example to model sufficient statistics description
  • describe that same technique can be used for weighted regression, but that it's not a Bayesian generative model

@bob-carpenter
Copy link
Contributor Author

bob-carpenter commented Sep 21, 2016

Portia Brat reported on stan-users that there's a bug on p. 74--75 in "Optimization through Vectorization" section asking users to replace

data {
  matrix[N, K] x;
parameters {
  vector[K] beta[J];

for (n in 1:N)
        y[n] ~ normal(x[n] * beta[jj[n]], sigma);

with

y ~ normal( rows_dot_product(beta[jj] , x),sigma);
  • instead, replacement should be
parameters {
  matrix[J, K] beta;
...
  y ~ normal(rows_dot_product(x, beta[jj]), sigma);
  • thank Portia Brat for pointing out error

@bob-carpenter
Copy link
Contributor Author

bob-carpenter commented Oct 9, 2016

From Sean Matthews on stan-users list:

  • page 131, top: enclosing brackets around term for log exp sum function appear not size balanced: close bigger than open?
  • /stepper/ at bottom of page -> /steeper/?
  • thank Sean in acknowledgements

@sakrejda
Copy link
Contributor

sakrejda commented Oct 11, 2016

  • Page 115, long version of log-sum-exp equation has a two-argument log function log(x,y), whereas it should be log(x) + log(y)

@bob-carpenter
Copy link
Contributor Author

bob-carpenter commented Oct 16, 2016

@bob-carpenter
Copy link
Contributor Author

bob-carpenter commented Oct 18, 2016

  • thank Roman Cheplyaka for a doc patch

@bob-carpenter
Copy link
Contributor Author

bob-carpenter commented Oct 18, 2016

  • change doc for vectorized functions to something much simpler that says it's elementwise with a pointer to a reference section that explains the R and T return type and argument rules using R and T explicitly to make it clear how it's connected to the doc

@jgabry
Copy link
Member

jgabry commented Oct 20, 2016

Thanks to @skanskan via stan-dev/rstan#348:

Typo on page 36:

  • vector<lower=-1,upper=1>[3,3] corr; should be matrix<lower=-1,upper=1>[3,3] corr;

Or I suppose it could conceivably be vector<lower=-1,upper=1>[3] corr; if it's a vector of correlations.

@bob-carpenter
Copy link
Contributor Author

bob-carpenter commented Oct 24, 2016

  • SKIP (already discussion just like it---really need to expand into better methodology section)

Incluce a simple example for generating data from a univariate regression at the point where we talk about fake data simulation ("One of the best ways to make sure your model is doing the right thing computationally is to generate simulated") Also reference the "Sampling without Parameters" section in the HMC chapter.

You can generate fake data for a regression given the parameters (and sizes):

data {
  real alpha;
  real beta;
  real<lower=0> sigma;
}
model { }
generated quantities {
  real y[10];
  real x[10];
  for (n in 1:10) {
    x[n] = normal_rng(0, 1);
    y[n] = normal_rng(alpha * x[n] + beta, sigma);
  } 
}

If you had priors on alpha, beta and sigma, you would specify the prior parameters as data and then generate alpha, beta and sigma in the generated quantities block; in fact, without such priors, you're not generating from the model itself

@bob-carpenter
Copy link
Contributor Author

bob-carpenter commented Oct 24, 2016

  • thank Kyle Meyer for code patches (StanMode!)
  • thank Joerg Rings for doc patches (PyStan)

@bob-carpenter
Copy link
Contributor Author

bob-carpenter commented Oct 26, 2016

  • update HMC chapter to reflect multinomial replacement for slice sampling

@UnkindPartition
Copy link
Contributor

UnkindPartition commented Oct 27, 2016

Documentation for sampling statements mentions functions that aren't defined anywhere, such as

  • Increment log probability with poisson_log(n, lambda)
  • Increment log probability with poisson_log_log(n, alpha)

Shouldn't those be *_lpdf/*_lpmf functions instead?

  • yes, fix them

@bob-carpenter
Copy link
Contributor Author

bob-carpenter commented Oct 27, 2016

@feuerbach Correct, thoe should be _lpdf and _lpmf. And thanks for reporting them here.

@bob-carpenter
Copy link
Contributor Author

bob-carpenter commented Oct 31, 2016

  • add a discussion of the covariance and correlation matrix algebra
Sigma = diag_matrix(sigma) * Omega * diag_matrix(sigma)

Omega[i, j] = Sigma[i, j] / ( sigma[i] * sigma[j])

sigma[i] = sqrt(Sigma[i, i])

@bob-carpenter
Copy link
Contributor Author

bob-carpenter commented Oct 31, 2016

From Stephen Martin on stan-users:

Sampling from betas and dirichlets where the parameters are < 1 pushes the probability mass toward the edges of the constraint. When inverted, it looks like a plateau, and the particle simulation can hit some difficulties, causing divergent transitions.

All that to say: If you want to encourage categorization with high probabilities, reparameterize your model to sample log-odds parameters (logistic regression, softmax regression) and convert to probability for the likelihood. If you do this, you can get a similar effect to 'inverted betas and dirichlets' by simply making the logistic prior very wide, such that most of the probability mass is beyond 1 and -1.

  • add discussion to reparameterization chapter

Plot histograms of logit(logistic_rng(N, 0, sigma)) for a range of orders of magnitudes of sigma.

This lets you generalize beyond intercepts, too, if you have other predictors.

@bob-carpenter
Copy link
Contributor Author

bob-carpenter commented Nov 2, 2016

  • add discussion to mixture or clustering or problematic posteriors chapter on what kind of inferences are supported for label-switching models. I wrote this on stan-users in a response to a question from Stephen Martin (in italics):

Just curious, what sort of inferences aren't sensitive to the labels?

You can do prediction, i.e., likelihood for a new observation x', where
you estimate p(y' | x', theta) where theta is all the parameters, x is
predictors and y is observed data. It's the same likelihood calculation
as in the model.

You can do similarity, i.e., Pr[items n and n' belong to same group | x, y].

Say you're wanting to fit a latent group model from data. You want to know what probability each person has of belonging to each of K groups, then the parameters of said K groups.

Exactly because of label switching this isn't a valid inference.

Is there a way of obtaining such parameters in a way that doesn't have a label switching problem?

No. You can't evaluate what the probability is that item n belongs to mixture component k. You can get the marginal for a parameter, such as mu, given item i. It'll be multimodal.

The conversation went on, and Stephen suggested adding constriaints. In some cases, that can lead to an identifiable model. Also, if the modes truly are symmetric (as in a classic mixture, not as in an LDA model with substantive alternative modes) and just about label switching, you can sometimes post-process and do Bayes-like inferences on the identities of the clusters, like the probability of an item belonging to a cluster.

Also, to make things more precise, talk about which inferences are invariant under label switching.

@UnkindPartition
Copy link
Contributor

UnkindPartition commented Nov 6, 2016

  • Typo on page 119: «conditionla operator»

@skanskan
Copy link

skanskan commented Nov 7, 2016

  • Typo at page 245, section Standarizing Predictors and Outputs.
    "Thus a data point u is standarized is standarized with respect a vector"
    standarized is duplicated.
  • Typo at page 265, section Cholesky Factors of Covariance Matrices.
    "with a positive diagonal (L[k,k]=0)"
    should be >=0
  • positive_infinity() is explained very quickly and no example is provided.

@bob-carpenter
Copy link
Contributor Author

bob-carpenter commented Nov 8, 2016

  • add note in matrices vs. arrays and/or indexing chapter about safety and what happens when you provide an out-of-bounds index
  • add a note about memory locality
  • make the existing discussion that calls out the C++ translation more user friendly by working through the implications explicitly
  • explicitly talk about lack of auto-conversion and existence of functions to convert

@bob-carpenter
Copy link
Contributor Author

bob-carpenter commented Nov 10, 2016

  • add link to MathematicaStan from intro material
  • add Thel Seraphim as developer
  • add Vincent Picaud as developer
  • thank Tobias Madsen for a doc patch

@syclik
Copy link
Member

syclik commented Nov 16, 2016

  • integrate_ode_*() functions all have the incorrect return type. The return type should be real[,] instead of real[].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants