Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser errors/warnings for new syntax are confusing #1940

Closed
1 task done
jgabry opened this issue Jun 28, 2016 · 12 comments
Closed
1 task done

Parser errors/warnings for new syntax are confusing #1940

jgabry opened this issue Jun 28, 2016 · 12 comments
Assignees
Milestone

Comments

@jgabry
Copy link
Member

jgabry commented Jun 28, 2016

Summary:

When playing around with the new syntax I found that the parser errors and warnings were not very informative. Well, they were informative enough for me because I already know the correct syntax and was intentionally trying to trigger the error messages, but I imagine they would pretty opaque to users. Example in Reproducible Steps, below.

Description:

See below in Reproducible Steps

  • fix error message for missing vertical bar

The duplication problem is a duplicate isseu with #1946, where there are more cases to fix.

Reproducible Steps:

First put the following in logistic.stan

data {
  int<lower=0> N;
  int<lower=0,upper=1> y[N];
}
parameters {
  real theta;
}
model {
  theta ~ normal(0, 1);
  y ~ bernoulli_logit(theta);
}
generated quantities {
  vector[N] log_lik;
  for (n in 1:N)
    log_lik[n] = bernoulli_logit_log(y[n], theta);
}

then run

library("rstan")
stanc(file = "logistic.stan")

This gives the warning:

DIAGNOSTIC(S) FROM PARSER:
Warning: Deprecated function 'bernoulli_logit_log'; please replace suffix '_log' with '_lpdf' for density functions or '_lpmf' for mass functions
Warning: Deprecated function 'bernoulli_logit_log'; please replace suffix '_log' with '_lpdf' for density functions or '_lpmf' for mass functions

which is good (not one of the ones I found confusing), but why does it appear twice if
bernoulli_logit_log is only used once in the Stan program? Is it because the sampling
statement y ~ bernoulli_logit(theta) is also used? Either way it's kind of weird.

The stuff I found more confusing comes next, when trying to follow the suggestion in the warning. First I modify the bernoulli_logit_log line in generated quantities:

log_lik[n] = bernoulli_logit_lpdf(y[n], theta); \\ I know this is incorrect

and the error is

SYNTAX ERROR, MESSAGE(S) FROM PARSER:
ERROR at line 15

 13:      vector[N] log_lik;
 14:      for (n in 1:N)
 15:        log_lik[n] = bernoulli_logit_lpdf(y[n], theta);
                                             ^
 16:    }

PARSER EXPECTED: <probability function arguments>
Error in stanc(model_code = paste(program, collapse = "\n"), model_name = model_cppname,  : 
  failed to parse Stan model 'logistic' due to the above error.

which just says that it expected <probability function arguments>. I assume that's trying to tell me that I forgot the vertical bar but that's not what the message says. Would it be possible to mention the vertical bar in the message? In addition, I used a function that doesn't exist (bernoulli_logit_lpdf even though it's a mass function) but didn't get any error about that. So it seems to be giving me an error about using the wrong arguments to a function that doesn't exist.

So next I use the vertical bar:

log_lik[n] = bernoulli_logit_lpdf(y[n] | theta);

and that gets rid of the mysterious <probability function arguments> and now I get the error about bernoulli_logit_lpdf not existing:

SYNTAX ERROR, MESSAGE(S) FROM PARSER:

No matches for: 

  bernoulli_logit_lpdf(int, real)

Function bernoulli_logit_lpdf not found.

ERROR at line 15

 13:      vector[N] log_lik;
 14:      for (n in 1:N)
 15:        log_lik[n] = bernoulli_logit_lpdf(y[n] | theta);
                                                           ^
 16:    }

Error in stanc(model_code = paste(program, collapse = "\n"), model_name = model_cppname,  : 
  failed to parse Stan model 'logistic' due to the above error.

except it doesn't tell me that I just need to use lpmf instead of lpdf. Would it be possible to detect that that someone is using a function that just has an incorrect suffix?

Current Output:

The current output is given above in Reproducible Steps

Expected Output:

I think it would be helpful if:

  • Errors about functions not existing were thrown before errors about incorrect arguments
  • Errors about missing vertical bars said something about missing vertical bars instead of
    just <probability function arguments>
  • Only one warning is issued if deprecated function is only once in a Stan program
  • Using lpdf instead of lpmf or vice-versa resulted in a more informative message
    indicating that the correct function just has a different suffix

Are there reasons I'm overlook why any of these would be difficult?

Additional Information:

Current Version:

v2.10.0

@jgabry
Copy link
Member Author

jgabry commented Jun 28, 2016

Here are some possible alternatives for a few of the error/warning messages mentioned above:

Current:

  • Function bernoulli_logit_lpdf not found.

Suggestions:

  • Function bernoulli_logit_lpdf not found. Did you mean bernoulli_logit_lpmf?
  • Function bernoulli_logit_lpdf not found. For mass functions use _lpmf instead of _lpdf.

Current: - `PARSER EXPECTED: `

I saw this in the context of a missing vertical bar, but I'm not sure if that's the only time this error is thrown. If it only pertains to the vertical bar then maybe replace with something like one of these

Suggestions:

  • PARSER EXPECTED: vertical bar instead of comma after first argument of bernoulli_logit_lpmf
  • PARSER EXPECTED: '|' instead of ',' after first argument of bernoulli_logit_lpmf

@jrnold
Copy link
Contributor

jrnold commented Jul 3, 2016

Regarding PARSER EXPECTED: <probability function arguments>, between that error message and the manual, it's not clear to me when I'm supposed to use |.

@bob-carpenter bob-carpenter added this to the v2.10.0++ milestone Jul 4, 2016
@bob-carpenter bob-carpenter self-assigned this Jul 4, 2016
@bob-carpenter
Copy link
Contributor

After the first argument whenever there's an lpdf, lpmf, lcdf, or lccdf.

I could relax the syntax to make the '|' optionally a ',', but I've been
reluctant to have two ways of doing the exact same thing.

On Jul 3, 2016, at 3:18 PM, Jeffrey Arnold notifications@github.com wrote:

Regarding PARSER EXPECTED: , between that error message and the manual, it's not clear to me when I'm supposed to use |.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

@betanalpha
Copy link
Contributor

But can the parser identify when a | is missing and suggest it instead
of just giving the “probability function arguments” warning?

On Jul 4, 2016, at 8:54 PM, Bob Carpenter notifications@github.com wrote:

After the first argument whenever there's an lpdf, lpmf, lcdf, or lccdf.

I could relax the syntax to make the '|' optionally a ',', but I've been
reluctant to have two ways of doing the exact same thing.

On Jul 3, 2016, at 3:18 PM, Jeffrey Arnold notifications@github.com wrote:

Regarding PARSER EXPECTED: , between that error message and the manual, it's not clear to me when I'm supposed to use |.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

@bob-carpenter
Copy link
Contributor

Yes, that's what I plan to do. It's already recognizing
it, just a poor choice of error message---I should've provided
an example, as in variate | parameters. I'm still not exactly
sure how to say that. Should I just say "expected |" and be done
with it?

On Jul 4, 2016, at 12:11 PM, Michael Betancourt notifications@github.com wrote:

But can the parser identify when a | is missing and suggest it instead
of just giving the “probability function arguments” warning?

On Jul 4, 2016, at 8:54 PM, Bob Carpenter notifications@github.com wrote:

After the first argument whenever there's an lpdf, lpmf, lcdf, or lccdf.

I could relax the syntax to make the '|' optionally a ',', but I've been
reluctant to have two ways of doing the exact same thing.

On Jul 3, 2016, at 3:18 PM, Jeffrey Arnold notifications@github.com wrote:

Regarding PARSER EXPECTED: , between that error message and the manual, it's not clear to me when I'm supposed to use |.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.


You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub, or mute the thread.

@betanalpha
Copy link
Contributor

How about “Expected a | separating the random variables from the parameters in ”?

On Jul 4, 2016, at 10:38 PM, Bob Carpenter notifications@github.com wrote:

Yes, that's what I plan to do. It's already recognizing
it, just a poor choice of error message---I should've provided
an example, as in variate | parameters. I'm still not exactly
sure how to say that. Should I just say "expected |" and be done
with it?

On Jul 4, 2016, at 12:11 PM, Michael Betancourt notifications@github.com wrote:

But can the parser identify when a | is missing and suggest it instead
of just giving the “probability function arguments” warning?

On Jul 4, 2016, at 8:54 PM, Bob Carpenter notifications@github.com wrote:

After the first argument whenever there's an lpdf, lpmf, lcdf, or lccdf.

I could relax the syntax to make the '|' optionally a ',', but I've been
reluctant to have two ways of doing the exact same thing.

On Jul 3, 2016, at 3:18 PM, Jeffrey Arnold notifications@github.com wrote:

Regarding PARSER EXPECTED: , between that error message and the manual, it's not clear to me when I'm supposed to use |.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.


You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub, or mute the thread.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.

@Samreay
Copy link

Samreay commented Aug 22, 2016

I'll +1 this issue as well if possible.

I was going through the deprecation appendix in the latest stan doco and updating all _log functions to _lpdf calls, and had no idea that the | syntax was even used, especially because the API documentation explicitly uses a comma and not a separator (eg chapter 44.3 for skew_normal - the first distribution I went to update and got stuck on, not a special example).

@bob-carpenter
Copy link
Contributor

bob-carpenter commented Aug 22, 2016 via email

@bob-carpenter
Copy link
Contributor

@betanalpha That would be confusing because any of the arguments can be random variables and the first one isn't required to be a random variable. It's much more work to call out the specific function (I'd have to write a new rule for the case where someone forgets the vertical bar, then create a custom message for it, then have it fail after it's matched and reported an error.) What I went with produces this:

SYNTAX ERROR, MESSAGE(S) FROM PARSER:

ERROR at line 5

  3:    }
  4:    model {
  5:      target += normal_lpdf(y, 0, 1);
                               ^
  6:    }

PARSER EXPECTED: <probability function arguments with first
 two arguments separated by a vertical bar;
 e.g., foo_lpdf(y | alpha, beta).
 This syntax required for functions with suffixes
 _lpdf, _lpmf, _lcdf, _lccdf>

@betanalpha
Copy link
Contributor

To be technical, variables on the right hand side of the bar aren’t
actually random anymore. When you condition you condition on
an explicit value. In other words, the conditional distribution is not
defining a distribution for those variables, hence they take on a
different interpretation.

On Aug 22, 2016, at 9:07 AM, Bob Carpenter notifications@github.com wrote:

@betanalpha That would be confusing because any of the arguments can be random variables and the first one isn't required to be a random variable. It's much more work to call out the specific function (I'd have to write a new rule for the case where someone forgets the vertical bar, then create a custom message for it, then have it fail after it's matched and reported an error.) What I went with produces this:

SYNTAX ERROR, MESSAGE(S) FROM PARSER:

ERROR at line 5

3: }
4: model {
5: target += normal_lpdf(y, 0, 1);
^
6: }

PARSER EXPECTED: <probability function arguments with first
two arguments separated by a vertical bar;
e.g., foo_lpdf(y | alpha, beta).
This syntax required for functions with suffixes
_lpdf, _lpmf, _lcdf, _lccdf>

You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@bob-carpenter
Copy link
Contributor

Understood, but it's the "anymore" that's problematic. They are
random variables from the Stan program's perspective. And if you
talk to Andrew (or read BDA), then everything's a random variable
in Bayesian stats, even the predictors in a regression that come in
as data and get no distribution. That's why he refuses to let us use
the semicolon notation (the frequentists use that precisely to distinguish
the random variables from other variables). Going against BDA, even with
technical correctness on our side, seems like a mug's game --- it'll just
confuse users and annoy Andrew without much upside.

@betanalpha
Copy link
Contributor

That’s not quite the argument.

The semicolon is purely frequentist and is meant to avoid
interpreting the likelihood as a conditional probability distribution,
and I agree that this has no place in Bayesian inference (in some
sense the difference between | and ; distills the fundamental
differences between frequentist and Bayesian inference).

Instead the difference is in the definition and use of a random
variable. When using a conditional probability distribution, any
variables to the left of the | are having their distribution defined
by the conditional distribution. Any variables to the right of the
|, however, are just be queried by value independent of their
distribution. Sequential generative modeling with conditional
probability distributions is just a way to isolate how a random
variable is defined and how it is used to influence other random
variables.

In other words, the interpretation of what is a random variable
and what isn’t is being locally scoped just to the definition of
that conditional probability distribution. I understand how this
would be confusing to users not familiar with probability, and
even to those who are as the error messages would be
referring to this very local scope and not the global scope of
the program where everything is random variable. It was just
a suggestion. Ultimately anything we choose is either going to
be ambiguous at some level, inconsistent with the actual math,uses
or confusing to most users.

On Aug 22, 2016, at 11:03 AM, Bob Carpenter notifications@github.com wrote:

Understood, but it's the "anymore" that's problematic. They are
random variables from the Stan program's perspective. And if you
talk to Andrew (or read BDA), then everything's a random variable
in Bayesian stats, even the predictors in a regression that come in
as data and get no distribution. That's why he refuses to let us use
the semicolon notation (the frequentists use that precisely to distinguish
the random variables from other variables). Going against BDA, even with
technical correctness on our side, seems like a mug's game --- it'll just
confuse users and annoy Andrew without much upside.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

bob-carpenter pushed a commit that referenced this issue Aug 24, 2016
update error msg for , instead of | in prob funs, fixes #1940
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants