New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better error reporting in case of missing 'rec' in let-bindings #1472

Merged
merged 6 commits into from Nov 17, 2017

Conversation

Projects
None yet
8 participants
@Armael
Contributor

Armael commented Nov 10, 2017

This is a first attempt at splitting "easy" and self-contained bits of PR #102 by @charguer, and follows an offline discussion involving @charguer, @gasche, @let-def and myself.
As described in http://chargueraud.org/research/2015/ocaml_errors/ocaml_errors.pdf (section 5, Message for missing "rec"), the purpose of the code extracted in this PR is to provide better error reporting for the case of a forgotten rec keyword.

In essence, what this PR does is additionally track in the environment "ghost" bindings for the non-recursive let-bindings that have been encountered. Then, in case of an unbound identifier, we additionally check if this identifier is present as a ghost binding in the environment. If this is the case, we display a error message indicating that a rec keyword may be missing at the binding site of this identifier.

In Arthur's original patch (and in the first commit of this PR), ghost bindings were implemented by adding a special string marker at the beginning of normal identifier. As this is relatively fragile, I rewrote this mechanism by adding a new kind of "ghost" bindings to typing environments.

This is my first dive in this part of OCaml's source code, so it is very possible that this patch contains rookie mistakes. I'll be happy to take remarks and criticism :-) .

@gasche

I haven't done a full review of the patch yet. Two remarks:

  1. I think it would be nice if the patchset was nice. Here you have Arthur's ugly patch first, and then a more general approach as a diff over it. Maybe it would be possible to add the more general mechanism first, and then implement Arthur's idea on top of it? This impacts the code: in 293b522 you document your Env_ghost_value by saying that it is about let rec, but this is wrong, the explanation is over-specialized. This should be a more generic concept.
  • You chose to implement "ghost value bindings" as one extra kind of environment. I find it surprising as I would rather have expected that you would add a Val_ghost kind to the "value kinds" stored in the value environment. I don't know which of the two approaches is best, but I think it deserves discussion.

Having a separate environment means that there is no ordering between value bindings and ghost value bindings. Having a single environment means that ghost bindings will shadow value bindings of the same name and conversely. In your case it doesn't really matter as the "unbound identifier" error will only arise if there are no normal-value-bindings for that name, but for other uses of ghost bindings, what would be the better behavior?

(One thing that is a bit strange about having it as an extra part of the environment is that it is only about value names, not type names or module names.)

Show outdated Hide outdated typing/env.ml Outdated
Show outdated Hide outdated typing/env.ml Outdated
@Armael

This comment has been minimized.

Show comment
Hide comment
@Armael

Armael Nov 11, 2017

Contributor

Implementing "ghost value bindings" as a different value kind should indeed result in a cleaner and less invasive patch. I'll rebase the patch and do that.

Contributor

Armael commented Nov 11, 2017

Implementing "ghost value bindings" as a different value kind should indeed result in a cleaner and less invasive patch. I'll rebase the patch and do that.

@Armael

This comment has been minimized.

Show comment
Hide comment
@Armael

Armael Nov 11, 2017

Contributor

However, with this way of doing things it seems harder to convince myself that I don't break things: with a separate environment, it's impossible for existing parts of the code to look up ghost bindings by mistake. With a different value kind, it seems possible or even likely for the new "ghost" kind to be matched by a wildcard pattern somewhere and handled as something else.

Contributor

Armael commented Nov 11, 2017

However, with this way of doing things it seems harder to convince myself that I don't break things: with a separate environment, it's impossible for existing parts of the code to look up ghost bindings by mistake. With a different value kind, it seems possible or even likely for the new "ghost" kind to be matched by a wildcard pattern somewhere and handled as something else.

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Nov 11, 2017

Member

I guess that git grep -C 20 " Val_" is your friend. But I think it would be nice to find another use-case to have in mind to know whether the way you adapt the code makes sense. (And possibly another use-case to test again.) Ghost values seem like a generally useful thing to have, can we find a second use-case?

Note that there is also Val_unbound that plays a not-completely-different role. I looked (as a result of running the grep above) and it seems useful to hide some object-system variables from other part of the code, and generates a custom error message (currently object-system-specific) when it is found during lookup. Maybe we actually want to generalize Val_unbound with a reason (instance variable, ghost recursive variable, etc.) instead of adding a second kind? (In that case most places don't have to change, although I doubt that they would actually do the right thing today if they were reached by a Val_unbound binding.)

Member

gasche commented Nov 11, 2017

I guess that git grep -C 20 " Val_" is your friend. But I think it would be nice to find another use-case to have in mind to know whether the way you adapt the code makes sense. (And possibly another use-case to test again.) Ghost values seem like a generally useful thing to have, can we find a second use-case?

Note that there is also Val_unbound that plays a not-completely-different role. I looked (as a result of running the grep above) and it seems useful to hide some object-system variables from other part of the code, and generates a custom error message (currently object-system-specific) when it is found during lookup. Maybe we actually want to generalize Val_unbound with a reason (instance variable, ghost recursive variable, etc.) instead of adding a second kind? (In that case most places don't have to change, although I doubt that they would actually do the right thing today if they were reached by a Val_unbound binding.)

@shindere

This comment has been minimized.

Show comment
Hide comment
@shindere

shindere Nov 11, 2017

Contributor
Contributor

shindere commented Nov 11, 2017

@Armael

This comment has been minimized.

Show comment
Hide comment
@Armael

Armael Nov 11, 2017

Contributor

Sure, but having a separate environment does, IMO, not scale. You wouldn't imagine adding one new kind of environment for every different kind of value you'd want to handle, would you?

Yes, I agree it was a bad idea.

I like the idea of reusing Val_unbound, so I'll try that next.

Contributor

Armael commented Nov 11, 2017

Sure, but having a separate environment does, IMO, not scale. You wouldn't imagine adding one new kind of environment for every different kind of value you'd want to handle, would you?

Yes, I agree it was a bad idea.

I like the idea of reusing Val_unbound, so I'll try that next.

@Armael

This comment has been minimized.

Show comment
Hide comment
@Armael

Armael Nov 11, 2017

Contributor

Having a single environment means that ghost bindings will shadow value bindings of the same name and conversely. In your case it doesn't really matter [...]

By playing with the idea of reusing Val_unbound, I just realized this is actually an issue. Consider the following piece of code:

let x = 1 in
let x = x+1 in
()

Adding a ghost binding in the environment of the lhs of the second let (x+1) will shadow the outer regular binding x = 1, and then fail, producing a "missing rec" error message.

Contributor

Armael commented Nov 11, 2017

Having a single environment means that ghost bindings will shadow value bindings of the same name and conversely. In your case it doesn't really matter [...]

By playing with the idea of reusing Val_unbound, I just realized this is actually an issue. Consider the following piece of code:

let x = 1 in
let x = x+1 in
()

Adding a ghost binding in the environment of the lhs of the second let (x+1) will shadow the outer regular binding x = 1, and then fail, producing a "missing rec" error message.

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Nov 11, 2017

Member

You have to check that no identifier of that name already exists in scope if you want to add a ghost/unbound binding for it. If a name already exists, there is no point in adding the ghost binding anyway, as there will not be an unbound-identifier error later.

Member

gasche commented Nov 11, 2017

You have to check that no identifier of that name already exists in scope if you want to add a ghost/unbound binding for it. If a name already exists, there is no point in adding the ghost binding anyway, as there will not be an unbound-identifier error later.

@Armael

This comment has been minimized.

Show comment
Hide comment
@Armael

Armael Nov 13, 2017

Contributor

I pushed a new patchset, based on the idea of reusing Val_unbound. A non-trivial part of the patch consists in extending Env.lookup_* functions to allow disabling "marking", in order to implement @gasche suggestion in the previous comment.
Indeed, one wants to add a ghost binding to the environment only if the name is not bound already. For this, one needs to look up a value in the environment, without triggering the side-effects tracking which bindings/modules are used (in order to print warnings "value foo is not used"). Consequently, the second commit of this patch extends the lookup functions to take an additional optional argument, specifying whether marking should be done (and the default value reflects the current behavior).

Contributor

Armael commented Nov 13, 2017

I pushed a new patchset, based on the idea of reusing Val_unbound. A non-trivial part of the patch consists in extending Env.lookup_* functions to allow disabling "marking", in order to implement @gasche suggestion in the previous comment.
Indeed, one wants to add a ghost binding to the environment only if the name is not bound already. For this, one needs to look up a value in the environment, without triggering the side-effects tracking which bindings/modules are used (in order to print warnings "value foo is not used"). Consequently, the second commit of this patch extends the lookup functions to take an additional optional argument, specifying whether marking should be done (and the default value reflects the current behavior).

@Armael

This comment has been minimized.

Show comment
Hide comment
@Armael

Armael Nov 13, 2017

Contributor

Umm, travis seems to have failed, but I have no idea why?

Contributor

Armael commented Nov 13, 2017

Umm, travis seems to have failed, but I have no idea why?

@Octachron

This comment has been minimized.

Show comment
Hide comment
@Octachron

Octachron Nov 13, 2017

Contributor

Travis is only failing due to the lack of new change entry in Changes.

Contributor

Octachron commented Nov 13, 2017

Travis is only failing due to the lack of new change entry in Changes.

@Armael

This comment has been minimized.

Show comment
Hide comment
@Armael

Armael Nov 13, 2017

Contributor

Thanks. I added a Changes entry.

Contributor

Armael commented Nov 13, 2017

Thanks. I added a Changes entry.

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Nov 13, 2017

Member

@alainfrisch, I think we need your opinion on the ?mark:bool parameter on lookup functions. Is this something that you agree is a reasonable design? It's a bit invasive, but my intuition is that we would have needed it eventually anyway, so adding this explicit control now is a good idea.

Member

gasche commented Nov 13, 2017

@alainfrisch, I think we need your opinion on the ?mark:bool parameter on lookup functions. Is this something that you agree is a reasonable design? It's a bit invasive, but my intuition is that we would have needed it eventually anyway, so adding this explicit control now is a good idea.

@alainfrisch

This comment has been minimized.

Show comment
Hide comment
@alainfrisch

alainfrisch Nov 15, 2017

Contributor

I think we need your opinion on the ?mark:bool parameter on lookup functions. Is this something that you agree is a reasonable design? It's a bit invasive, but my intuition is that we would have needed it eventually anyway, so adding this explicit control now is a good idea.

Yes, I think it is reasonable, the code looks good, and I agree there might be other reasons in the future to need the flag anyway. I'm only wondering whether we shouldn't make the parameter non-optional. It'd be even more invasive (and not really related to the goal of this PR), but this might catch instances where marking is actually not desired, and at least make things more explicit.

Contributor

alainfrisch commented Nov 15, 2017

I think we need your opinion on the ?mark:bool parameter on lookup functions. Is this something that you agree is a reasonable design? It's a bit invasive, but my intuition is that we would have needed it eventually anyway, so adding this explicit control now is a good idea.

Yes, I think it is reasonable, the code looks good, and I agree there might be other reasons in the future to need the flag anyway. I'm only wondering whether we shouldn't make the parameter non-optional. It'd be even more invasive (and not really related to the goal of this PR), but this might catch instances where marking is actually not desired, and at least make things more explicit.

@Armael

This comment has been minimized.

Show comment
Hide comment
@Armael

Armael Nov 15, 2017

Contributor

I'm only wondering whether we shouldn't make the parameter non-optional. It'd be even more invasive (and not really related to the goal of this PR), but this might catch instances where marking is actually not desired, and at least make things more explicit.

I could submit an other PR that would do that (but maybe after this one is merged?).

Contributor

Armael commented Nov 15, 2017

I'm only wondering whether we shouldn't make the parameter non-optional. It'd be even more invasive (and not really related to the goal of this PR), but this might catch instances where marking is actually not desired, and at least make things more explicit.

I could submit an other PR that would do that (but maybe after this one is merged?).

@shindere

This comment has been minimized.

Show comment
Hide comment
@shindere

shindere Nov 15, 2017

Contributor
Contributor

shindere commented Nov 15, 2017

@alainfrisch

This comment has been minimized.

Show comment
Hide comment
@alainfrisch

alainfrisch Nov 15, 2017

Contributor

I could submit an other PR that would do that (but maybe after this one is merged?).

Yes, ok, no need to "pollute" this PR.

Would it make sense to have two families of lookup functions, one that marks and one that does not?

There is a risk of combinatorial explosion if we ever need to introduce other flags on the lookup mechanism. And even if we don't, having a single val lookup: mark:bool -> ... seems nicer to me than val lookup_mark: ... and val lookup_dont_mark: ....

Contributor

alainfrisch commented Nov 15, 2017

I could submit an other PR that would do that (but maybe after this one is merged?).

Yes, ok, no need to "pollute" this PR.

Would it make sense to have two families of lookup functions, one that marks and one that does not?

There is a risk of combinatorial explosion if we ever need to introduce other flags on the lookup mechanism. And even if we don't, having a single val lookup: mark:bool -> ... seems nicer to me than val lookup_mark: ... and val lookup_dont_mark: ....

@gasche

gasche approved these changes Nov 15, 2017

I like the final state of the patch and I'm considering merging it (sometime next week?). @garrigue, @alainfrisch, if you have second thoughts about it, now is the time to voice them.

@Armael

This comment has been minimized.

Show comment
Hide comment
@Armael

Armael Nov 17, 2017

Contributor

I just pushed a batch of small tweaks. These should be rebased into the previous ones before merging, but have not been rebased yet for easy reviewing.

Contributor

Armael commented Nov 17, 2017

I just pushed a batch of small tweaks. These should be rebased into the previous ones before merging, but have not been rebased yet for easy reviewing.

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Nov 17, 2017

Member

The tweaks are good (I just reviewed them) but they don't deserve separate commits. I don't want to squash the whole PRs because the first 4 commits are cleanly separated. Could you squash the tweaks into the relevant commit(s)?

Member

gasche commented Nov 17, 2017

The tweaks are good (I just reviewed them) but they don't deserve separate commits. I don't want to squash the whole PRs because the first 4 commits are cleanly separated. Could you squash the tweaks into the relevant commit(s)?

charguer and others added some commits Sep 19, 2014

@Armael

This comment has been minimized.

Show comment
Hide comment
@Armael

Armael Nov 17, 2017

Contributor

Done.

Contributor

Armael commented Nov 17, 2017

Done.

Show outdated Hide outdated Changes Outdated

@gasche gasche merged commit c224184 into ocaml:trunk Nov 17, 2017

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Apr 9, 2018

Member

In #1678, @xclerc proposed with an example where the warning is inappropriate:

# let myvar1 = 0;;
val myvar1 : int = 0
# let myvar2 = myvar2 + 1;;
Error: Unbound value myvar2.
Hint: You are probably missing the `rec' keyword on line 1.

@mshinwell reports that there are many wrong hints given when working on the Flambda codebase, and proposes to revert this PR for the time being, which sounds like a reasonable idea.

Member

gasche commented Apr 9, 2018

In #1678, @xclerc proposed with an example where the warning is inappropriate:

# let myvar1 = 0;;
val myvar1 : int = 0
# let myvar2 = myvar2 + 1;;
Error: Unbound value myvar2.
Hint: You are probably missing the `rec' keyword on line 1.

@mshinwell reports that there are many wrong hints given when working on the Flambda codebase, and proposes to revert this PR for the time being, which sounds like a reasonable idea.

@mshinwell

This comment has been minimized.

Show comment
Hide comment
@mshinwell

mshinwell Apr 9, 2018

Contributor

(responding to @gasche 's comment on #1678) I don't have any examples to hand unfortunately, but I think they were all typos/mistakes along the lines of what @xclerc wrote above.

Contributor

mshinwell commented Apr 9, 2018

(responding to @gasche 's comment on #1678) I don't have any examples to hand unfortunately, but I think they were all typos/mistakes along the lines of what @xclerc wrote above.

@damiendoligez

This comment has been minimized.

Show comment
Hide comment
@damiendoligez

damiendoligez Apr 9, 2018

Member

In preparation for the 4.07 freeze and after discussing with @mshinwell and @Armael, I have reverted this PR (commit 90fbe53), except for commit 3b77d91 (Generalize Env.lookup_* functions to allow disabling marking), which is an independent change and used by a subsequent PR.

A modified version will have to be proposed, see #1678.

Member

damiendoligez commented Apr 9, 2018

In preparation for the 4.07 freeze and after discussing with @mshinwell and @Armael, I have reverted this PR (commit 90fbe53), except for commit 3b77d91 (Generalize Env.lookup_* functions to allow disabling marking), which is an independent change and used by a subsequent PR.

A modified version will have to be proposed, see #1678.

damiendoligez added a commit that referenced this pull request Apr 9, 2018

Revert "Merge pull request #1472 from Armael/improved-error-letrec"
This reverts commit c224184, reversing
changes made to 2fc77a2.

As an exception, commit 3b77d91 (Generalize Env.lookup_* functions to allow disabling marking) is NOT reverted, because it was used by subsequent commits.

@Armael Armael referenced this pull request Apr 11, 2018

Merged

Missing "rec" hint: v2 #1720

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment