Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tentative fix for letrec check error (MPR#7706) #1565

Merged
merged 2 commits into from Apr 10, 2018

Conversation

Projects
None yet
6 participants
@gasche
Copy link
Member

gasche commented Jan 11, 2018

See MPR#7706, this definition is unsafe and must be rejected (but is currently accepted):

 let rec x =
    let y = if false then (fun z -> 1) else (fun z -> x 4 + 1) in
    y

The typechecker-level check for recursive value depends on whether
recursive values use a memory size that is statically known or depends
on their dynamic evaluation. An error in this check results in
a potential segfault.

The error is that the check currently considers variables/identifiable
to have a statically-known size. This is certainly wrong for
locally-defined identifiers (that may be bound to
dynamically-sized expressions), but it is not even clear why that
would be desirable for other expressions of the nested letrec, or for
variables from the context.

This patch considers all identifiers to have dynamic size.

@lpw25

This comment has been minimized.

Copy link
Contributor

lpw25 commented Jan 11, 2018

I think this approach is probably too restrictive. It's definitely a regression compared to OCaml before the new check was added. I think that a simple environment mapping local bindings to Static | Dynamic and treating non-local bindings as Static would be a fairly reasonable thing to do.

@gasche

This comment has been minimized.

Copy link
Member Author

gasche commented Jan 11, 2018

@lpw25 I wasn't sure whether that was more restrictive than the previous check, but the testsuite agrees with you. The following were accepted and fail with the restriction:

(* letrec/allowed.ml *)
let rec x = let y = (x; ()) in y;;

(* letrec/nested.ml *)
let rec r = (let rec x = `A r and y = fun () -> x in y)

Clearly for both of these I need to keep a environment to know, for each local binder, whether it was Static or Dynamic. I will change the patch accordingly.

On the other hand, I'm not sure I understand the suggestion to consider non-local identifiers as Static. What is the semantics of Static that makes this correct? Note that, the way the current check is implemented, (check_recursive_expression, at the funny line number 2323), Dynamic expressions that do not depend on any other recursively-bound variable are accepted, so I believe that ... and x = nonlocal_identifier will always pass.

@lpw25

This comment has been minimized.

Copy link
Contributor

lpw25 commented Jan 11, 2018

What is the semantics of Static that makes this correct?

I can see two ways to reason about this. One is that the condition is really that the size of the resulting value needs to be known before executing the recursive binding -- which holds for a non-local identifier since the size can be read from the header. Another way of looking at it is that the reason we need to know the size is so that we can pre-allocate the value and then copy the fields of the result of the binding into this pre-allocated value -- for a non-local identifier we can skip this pre-allocation and just use the existing value itself.

@gasche

This comment has been minimized.

Copy link
Member Author

gasche commented Jan 11, 2018

the condition is really that the size of the resulting value needs to be known before executing the recursive binding -- which holds for a non-local identifier since the size can be read from the header.

Thinking about this more, the answer is a bit different: the "size" of an old binding is always 1. Indeed, all OCaml values have size 1; the only reason why we may have non-1 size is that the values currently being recursively defined share a single block, so we are talking about their in-block layout. Old references may be referenced but are never inlined within the block (that would make no sense), so their size is 1.

@lpw25

This comment has been minimized.

Copy link
Contributor

lpw25 commented Jan 11, 2018

I think what you're saying corresponds to my second argument. Although this:

the only reason why we may have non-1 size is that the values currently being recursively defined share a single block,

seems slightly wrong to me: we must talk about their in-block layout because they need to be pre-allocated. They need to be pre-allocated so that their actual value -- the address of a block -- is available before executing the recursive binding, even though the contents of this block are not yet ready for inspection.

@lpw25

This comment has been minimized.

Copy link
Contributor

lpw25 commented Jan 11, 2018

I also have a slightly more subtle example that produces incorrect results:

# let r = ref None;;
val r : '_weak1 option ref = {contents = None}
# let rec x =
    let s = { contents = 5 } in
    r.contents <- Some s;
    let _ = x in
    s;;
val x : int ref = {contents = 5}
# match !r with
  | None -> assert false
  | Some s -> s == x;;
    - : bool = false

Here the problem is that let bound definitions can escape, which means that mutable constructions cannot be considered Static when they are on the right-hand side of a let.

@gasche

This comment has been minimized.

Copy link
Member Author

gasche commented Jan 11, 2018

I think that the result is incorrect not because of mutable variables escaping, but because of the compilation scheme that is wrong -- because it considers that the static size of x is the static size of s, and thus allocates the memory for the record twice.

 (let x/1002 (extcall "caml_alloc_dummy" 3 val)
   (extcall "caml_update_dummy" x/1002
     (let s/1003 (alloc{test.ml:2,12-28} block-hdr(1024){test.ml:2,12-28} 11)
       x/1002 [] s/1003)
     unit)
@lpw25

This comment has been minimized.

Copy link
Contributor

lpw25 commented Jan 11, 2018

because of the compilation scheme that is wrong

I don't think the compilation scheme is wrong, it's just not complete. Not every possible recursive definition can be compiled using it, but doing better seems like a lot of effort. Note that the compilation scheme always allocates recursive definitions twice -- but that doesn't matter if the value is immutable or cannot escape.

@gasche

This comment has been minimized.

Copy link
Member Author

gasche commented Jan 11, 2018

Right, I misunderstood above, the compilation is as expected.

@gasche gasche force-pushed the gasche:letrec-fix branch from 6142315 to 79df06a Jan 11, 2018

@gasche

This comment has been minimized.

Copy link
Member Author

gasche commented Jan 11, 2018

Ok, I pushed an implementation that carries a local environment. I believe that all tests should now pass correctly -- in particular, the dangerous example is rejected.

I keep defaulting to Dynamic if an identifier is not found in the environment (which may be because it is non-local, but also because my environment-tracking implementation is incomplete: it does not handle complex patterns or local modules).

@lpw25, I spent some time trying to find an example of a program that may be unfairly rejected by considering non-local identifiers as Dynamic, but I couldn't think of any. We are talking of a recursive binding that (1) makes guarded or delayed uses of other names of the same recursive nest yet (2) always returns the value of an older identifier without branching (so the body is of the form let ... = ... in old_name or ... ; old_name).

Unless I am missing something, guarded/delayed uses that are ignored can always be erased, so all such bodies are equivalent to just old_name, and we are not forbidding anything by considering non-local bindings as Dynamic. Given that I am still not confident that Static would be a good choice here, and the imprecisions added by the implementation (we may keep under-approximating the set of local variables), I would be in favor of keeping the pessimistic Dynamic default for bindings missing from the size environment.

@gasche

This comment has been minimized.

Copy link
Member Author

gasche commented Jan 15, 2018

@lpw25, @garrigue, @yallop: I would be interested if one of you had a look at the proposed implementation, for which some details may still need to be fleshed out. Does the approach look correct and would you be willing to merge a fleshed-out version of the fix?

I left three FIXME for three aspects that are not handled as well as they could: local recursive nests, non-trivial patterns and local module declarations. My personal guess is that we should fix the third one (traverse module structures and record the size of their declarations) before considering merging the PR, but that the first two can be left for later (maybe with a less anxiety-generating comment than FIXME). Does that sound reasonable?

@yallop

This comment has been minimized.

Copy link
Member

yallop commented Jan 16, 2018

Thanks for looking at this, @gasche! I'll try to review later this week.

@yallop

This comment has been minimized.

Copy link
Member

yallop commented Feb 3, 2018

This fix looks sensible to me. I'm looking forward to the day when the information produced from this analysis is passed through and used in compilation, rather than simply attempting a sound but unconnected approximation of the compilation scheme that is immediately discarded.

My personal guess is that we should fix the third one (traverse module structures and record the size of their declarations)

Does leaving module structures untraversed cause any concrete problems?

maybe with a less anxiety-generating comment than FIXME

It would be useful to distinguish "There is opportunity for refinement/improvement here" and "This is somehow wrong and should be changed".

| Texp_letexception (_, e) -> classify_expression e
| Texp_ident _
| Texp_letexception (_, e) ->
classify_expression env e

This comment has been minimized.

@yallop

yallop Feb 3, 2018

Member

There are several changes here that are purely about style (indentation, etc.). Perhaps they could be put into a separate commit.

@gasche gasche force-pushed the gasche:letrec-fix branch 3 times, most recently from f4d78e3 to b54e47a Feb 25, 2018

@gasche

This comment has been minimized.

Copy link
Member Author

gasche commented Feb 25, 2018

I went back to this PR and made the suggested changes. I looked into dealing with module-level declarations, and gave up, because it's a lot of work and no motivated by an actual need so far. I removed the scary-looking FIXME wordings, and split the commit in two as suggested. As far as I am concerned, and barring any CI surprise, this should be good to go.

@gasche gasche force-pushed the gasche:letrec-fix branch from b54e47a to 9424afd Feb 25, 2018

@gasche

This comment has been minimized.

Copy link
Member Author

gasche commented Feb 26, 2018

The CI fails and I have no idea why: I cannot reproduce the issue on my machine. A CI log is available at https://api.travis-ci.org/v3/job/346054625/log.txt. Excerpt:

Running tests from 'tests/typing-unboxed' ...
 ... testing 'test.ml' with 1 (toplevel) => passed
Running tests from 'tests/typing-warnings' ...
 ... testing 'ambiguous_guarded_disjunction.ml' with 1 (toplevel) => passed
[...]
 ... testing 'unused_types.ml' with 1 (toplevel) => passed
make[1]: *** [new-without-report] Error 1
make: *** [all] Error 2
@trefis

This comment has been minimized.

Copy link
Contributor

trefis commented Feb 26, 2018

@gasche you added a "pr7706.ml" entry to an ocamltests file, but the related .ml is not present. Did you forget to commit it?

@gasche gasche force-pushed the gasche:letrec-fix branch from 9424afd to a56424b Feb 26, 2018

@gasche

This comment has been minimized.

Copy link
Member Author

gasche commented Feb 26, 2018

That's certainly the issue, thanks. I rebased the PR and will look into printing an error message from ocamltest to help debuggability next time.

@gasche

This comment has been minimized.

Copy link
Member Author

gasche commented Feb 26, 2018

The problem was showing in the log but I missed it:

Running tests from 'tests/letrec-disallowed' ...
 ... testing 'disallowed.ml' with 1 (toplevel) => passed
 ... testing 'extension_constructor.ml' with 1 (toplevel) => passed
 ... testing 'float_block_allowed.ml' with 1 (no-flat-float-array) => skipped (compiler configured with -flat-float-array)
 ... testing 'float_block_allowed.ml' with 1.1 (toplevel) => skipped
 ... testing 'float_block_disallowed.ml' with 1 (flat-float-array) => passed
 ... testing 'float_block_disallowed.ml' with 1.1 (toplevel) => passed
 ... testing 'generic_arrays.ml' with 1 (toplevel) => passed
 ... testing 'lazy_.ml' with 1 (toplevel) => passed
 ... testing 'module_constraints.ml' with 1 (toplevel) => passed
 ... testing 'pr7215.ml' with 1 (toplevel) => passed
 ... testing 'pr7231.ml' with 1 (toplevel) => passed
tests/letrec-disallowed/pr7706.ml: No such file or directory
 ... testing 'unboxed.ml' with 1 (toplevel) => passed
@shindere

This comment has been minimized.

Copy link
Contributor

shindere commented Feb 26, 2018

@gasche

This comment has been minimized.

Copy link
Member Author

gasche commented Feb 26, 2018

I proposed a PR fixing the user-interface issue in #1634.

@alainfrisch alainfrisch added the bug label Feb 28, 2018

@gasche

This comment has been minimized.

Copy link
Member Author

gasche commented Mar 24, 2018

Would someone be available to do a formal review of this PR? ( I'm adding @chambart to the conversation, as he worked on recursive values recently, but don't expect him to actually have time. )

@gasche gasche force-pushed the gasche:letrec-fix branch from a56424b to cb882a2 Apr 9, 2018

@gasche

This comment has been minimized.

Copy link
Member Author

gasche commented Apr 9, 2018

@lpw25 would you be able to review this PR? (I just rebased it.)

@lpw25

lpw25 approved these changes Apr 9, 2018

Copy link
Contributor

lpw25 left a comment

Looks good to me. It might be good to add my nastier example from an earlier comment as a test.

@lpw25

This comment has been minimized.

Copy link
Contributor

lpw25 commented Apr 9, 2018

Actually, whilst I approve this fix for MPR#7706 I think my other example is still going to be wrong with it.

@gasche gasche force-pushed the gasche:letrec-fix branch from cb882a2 to 1115d6a Apr 9, 2018

@gasche

This comment has been minimized.

Copy link
Member Author

gasche commented Apr 9, 2018

The CI was broken (due to unrellated change), so I just rebased the PR again.

@lpw25: could you open a new MPR to track this issue? I don't see a nice way to rule this example out without forbidding things that look like they should be allowed, such as

  let rec thunk = ref (fun () ->
    let v = f () in
    thunk := (fun () -> v);
    v)

Besides, if I understand correctly your example is in the category of "there is a shallow copy of the data that I didn't expect", which is different from an unsound generated code -- and I don't think can break type-soundness.

gasche added some commits Jan 11, 2018

tentative fix for letrec check error (MPR#7706)
The typechecker-level check for recursive value depends on whether
recursive values use a memory size that is statically known or depends
on their dynamic evaluation. An error in this check results in
a potential segfault.

The error is that the check currently considers all
variables/identifiers to have a statically-known size. This is
certainly wrong for locally-defined identifiers, that may be bound to
dynamically-sized expressions. We rule this case out by carrying an
environment during size-checking, that remembers the size
(static or dynamic) of local bindings.

The implementation is incomplete in certain ways, but safely defaults
to Dynamic as the size of bindings that it does not track through the
environment.
@lpw25

This comment has been minimized.

Copy link
Contributor

lpw25 commented Apr 9, 2018

I think the fix is quite easy:

  1. Change classify_expression to return:
type sd = Static of Asttypes.mutable_flag | Dynamic

marking the static constructs that create mutable things as mutable.
2. A local identifier that binds a Static Mutable expression is still Dynamic.

@lpw25

This comment has been minimized.

Copy link
Contributor

lpw25 commented Apr 9, 2018

(You should feel free to merge this anyway as is, since it is a clear improvement)

@gasche

This comment has been minimized.

Copy link
Member Author

gasche commented Apr 9, 2018

I'm not sure that your proposal is correct: I find it unpalatable to handle different kind of subexpressions differently. Maybe there is a wider design that does some more systematic escape analysis that you are proposing a sound sub-analysis of, but maybe this is just a patch for this one example and there are other broken examples of this nature that you are not preventing.

@lpw25

This comment has been minimized.

Copy link
Contributor

lpw25 commented Apr 9, 2018

It just relies on the fact that expressions which are not let bound are used linearly, which the current check is also relying on.

@lpw25

This comment has been minimized.

Copy link
Contributor

lpw25 commented Apr 9, 2018

(Although, my understanding is that Pierre is planning to change all of this Static/Dynamic stuff to be simpler and more restrictive in 4.08 anyway, so we probably shouldn't worry too much about solving my awkward example right now.)

@gasche gasche merged commit 7b4ced8 into ocaml:trunk Apr 10, 2018

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@gasche

This comment has been minimized.

Copy link
Member Author

gasche commented Apr 10, 2018

It just relies on the fact that expressions which are not let bound are used linearly

It's easy for me to play the devil advocate because I don't have the details of the checks in memory. Is this true? What about applications (fun s -> ...) {contents=5}, and use of a recursive identifiers in other recursive bindings?

@gasche

This comment has been minimized.

Copy link
Member Author

gasche commented Apr 10, 2018

I submitted a bugreport for Leo's issue at MPR#7768.

@lpw25

This comment has been minimized.

Copy link
Contributor

lpw25 commented Apr 10, 2018

What about applications (fun s -> ...) {contents=5}, and use of a recursive identifiers in other recursive bindings?

Applications are always Dynamic and the recursive identifiers refer not to the value returned by the binding but to the shallow copy of the value.

@lpw25

This comment has been minimized.

Copy link
Contributor

lpw25 commented Apr 10, 2018

Re. your comment in the mantis issue:

(This does not look obvious to me, and I am frustrated by the constant overloading of what those "Static" and "Dynamic" markers mean. It's past time for some proper design work.)

I do agree this is getting somewhat out of hand. Probably we should just have a simpler more restrictive set of rules for "static" and break backwards compatibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.