Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only treat pure patterns as inactive #1308

Merged
merged 2 commits into from
Sep 14, 2017
Merged

Conversation

lpw25
Copy link
Contributor

@lpw25 lpw25 commented Aug 30, 2017

Parmatch.inactive does not treat patterns that read mutable variables as active. This leads to the following bug:

# let f { contents = c } =
    fun () -> c;;
val f : 'a ref -> unit -> 'a = <fun>

# let r = ref 10;;
val r : int ref = {contents = 10}

# let h = f r;;
val h : unit -> int = <fun>

#   h ();;
- : int = 10

# r := 20;;
- : unit = ()

# h ();;
- : int = 20

because the read of the reference has been pushed underneath the fun () ->.

This PR fixes this by treating reads of mutable fields as active patterns. To try and avoid conflicts I've made it on top of #1195, so when reviewing this change only the second commit is relevant.

| Tpat_tuple ps | Tpat_construct (_, _, ps) | Tpat_array ps ->
| Tpat_constant c -> begin
match c with
| Const_string _ -> false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you mutate a string literal used in a constant pattern?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could imagine:

let f ("foo" as c) = fun () -> c;;
let s = "foo";;
let g = f "foo";;
s.[0] <- 'g';;
g ()

Although in this case the functions won't be merged as the match is not (and cannot be) exhaustive.
Still, the code "feels more correct".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right, sorry, the problem is of course mutating the matched value, not the pattern.

Could you adapt to take into account Config.safe_string? (I understand this is a bit useless since string literal patterns are not exhaustive.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I have a feeling that what I wrote makes no sense at all, oh well...)

@@ -0,0 +1,2 @@
10
10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Off-topic question for @gasche (somehow I'm guessing you're the one who cares/knows about this sort of things) is there a reason why the content of this directory has not been switched to expect tests? Do we actually prefer expect tests nowadays or ..?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the expect tests currently only work for checking the output of the compiler itself (inferred types, warnings/errors, etc.), not the dynamic output of the compiled programs as is done here. We could try to generalize / also implement this. (I thought at first that this may be subsumed by ocamltest in #681, but in fact this is orthogonal, ocamltest is a test driver in OCaml but for this kind of tests it still uses .reference files).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks!

| Tpat_tuple ps | Tpat_construct (_, _, ps) | Tpat_array ps ->
| Tpat_constant c -> begin
match c with
| Const_string _ -> false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this off-line but I'm putting it here so it's not forgotten: false should be changed to Clflags.unsafe_string.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather not Config.safe_string. We need the system to be configured with safe strings to be sure that strings cannot be mutated.

Copy link
Contributor

@alainfrisch alainfrisch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks correct to me.

This could potentially lead to performance degradation (allocating intermediate closures for functions with multiple arguments, some of them extracting the value of mutable fields), but it's better to be on the safe side, and people can always write their code to keep the currying compilation scheme. One could consider reporting a warning when currying is broken because of the change, but I don't think it is worth the effort.

So good to merge, after the tweak using Config.safe_string.

@gasche
Copy link
Member

gasche commented Aug 30, 2017

I'm not convinced by this change.

inactive is documented as "matching on them does not incur any computation", which I understand is a way to guard against provoking side-effects affecting matches in the current clause set. (There were bugs with lazy thunk forcing invalidating the assumptions made by the pattern-matching compiler).

The current PR overloads the name with something that becomes false when reading mutable state (arrays or mutable records), apparently getting closer to a notion of "pure" pattern, but it does not update the documentation comment nor explain why the change of semantics is desirable in general.

I don't have a full understanding of the issue, but I have the impression that an optimization was wrong to rely on only this notion of "inactivity" (it needs a stricter check with different semantics), but that you fixed the symptoms by changing the definition instead of changing the optimization.

@lpw25
Copy link
Contributor Author

lpw25 commented Aug 30, 2017

but I have the impression that an optimization was wrong to rely on only this notion of "inactivity" (it needs a stricter check with different semantics), but that you fixed the symptoms by changing the definition instead of changing the optimization.

The inactive function only exists for this optimisation. It is supposed to mean a notion of "pure" pattern and was just incorrect before. I can change the documentation if you like, although for me reading a mutable variable is clearly not "trivial computations" meaning that the documentation is still correct.

@gasche
Copy link
Member

gasche commented Aug 30, 2017

I looked at the story of this code in more details:

  • I think the bug was already in the first implementation of curryfication by @xavierleroy in 1995 ( 62bfd98 )
  • Some version of this bug for match-failure (instead of delayed side-effects) was spotted and fixed by @mauny in 2008 ( 6373550 ), as part of Michel's work on lazy pattern that led to lazy patterns being singled out in the later "fluid" definition ( 6ba024a )

Some remarks:

  • this is unrelated to @maranget's recent work on interaction of side-effects with pattern-matching compilation, contrarily to what I assumed above (by the way, his work appears to be still-unmerged in Fix MPR7241: Pattern matching with mutable and lazy patterns is unsound #717 )
  • it would be correct to change the definition of "fluid" as this optimization test is the only user
  • there should be a clear semantic explanation for when currification (which delays evaluation) is valid or not, to use as documentation comment for the function used in this code path
  • an extremely similar bug has been discussed in the past, PR#7531, "Delayed effects in partially applied functions"; a careful study of the issue would have caught this
  • The issue above was found by differential fuzz-testing the OCaml compilers, as reported in an upcoming ICFP 2017 article. I wondered why this fuzzer did not catch the current bug; the answer is that it does differential testing of native vs. bytecode compilation, and both are affected. We need to plug a naive reference evaluator (@johnwhitington's ocamli?) in their tool to catch more bugs.

@gasche
Copy link
Member

gasche commented Aug 30, 2017

It is supposed to mean a notion of "pure" pattern and was just incorrect before. I can change the documentation if you like, although for me reading a mutable variable is clearly not "trivial computations" meaning that the documentation is still correct.

This is not at all how I read the current documentation. I quote:

(* An inactive pattern is a pattern whose matching needs only
   trivial computations (tag/equality tests).
   Patterns containing (lazy _) subpatterns are active. *)

let rec inactive pat = match pat with
| Tpat_lazy _ ->
    false
| Tpat_any | Tpat_var _ | Tpat_constant _ | Tpat_variant (_, None, _) ->
    true
| Tpat_tuple ps | Tpat_construct (_, _, ps) | Tpat_array ps ->
    List.for_all (fun p -> inactive p.pat_desc) ps
| Tpat_alias (p,_,_) | Tpat_variant (_, Some p, _) ->
    inactive p.pat_desc
| Tpat_record (ldps,_) ->
    List.exists (fun (_, _, p) -> inactive p.pat_desc) ldps
| Tpat_or (p,q,_) ->
    inactive p.pat_desc && inactive q.pat_desc

(* A `fluid' pattern is both irrefutable and inactive *)

let fluid pat =  irrefutable pat && inactive pat.pat_desc

Indeed the notion of "trivial" computation is not defined, but I don't see how one would assume that reading from an immutable block is "trivial" while reading from a mutable block is "not trivial". The implementation and the documentation both only mention forcing a lazy subpattern as the only source of "non-trivial computation", making it pretty clear to me that the intent is "executing OCaml code" versus "only accessing already-evaluated values".

(By the way, Alain would chime in to point out that re-boxing unboxed floating-point values on the fly may execute arbitrary OCaml code if the GC kicks in, but let's ignore that issue for now, just as it is ignored in the rest of the pattern-matching machinery.)

@lpw25
Copy link
Contributor Author

lpw25 commented Aug 30, 2017

I don't see how one would assume that reading from an immutable block is "trivial" while reading from a mutable block is "not trivial"

Well, personally I would never use "trivial" to describe a computation that does side-effects. Still I take your point about the documentation. I'll change it to something more explicit.

Alain would chime in to point out that re-boxing unboxed floating-point values on the fly may execute arbitrary OCaml code if the GC kicks in

Maybe I'm missing something, but I don't see how that is relevant here. We're only concerned here with not pushing the pattern's side-effects under a lambda.

@gasche
Copy link
Member

gasche commented Aug 30, 2017

My point is that you are silently changing the intent of the inactive function from something that was inadequate (it does not prevent wrong optimizations) but well-defined (imho) to something that may be more correct but that you have not clearly defined. Please clearly specify what we are checking in the pattern (what does impure mean) and why it should now be adequate (why is this particular notion of impurity used here? would it have caught previous similar issues such as MPR#7531?).

Maybe I'm missing something, but I don't see how that is relevant here.

The point was that reading a floating-point number from a pure (unboxed) record may execute code, and thus that curryfication, by delaying this read (and the underlying allocation), changes the observable behavior of the code. Apologies for the distraction.

@lpw25
Copy link
Contributor Author

lpw25 commented Aug 30, 2017

My point is that you are silently changing the intent of the inactive function from something that was inadequate (it does not prevent wrong optimizations) but well-defined (imho) to something that may be more correct but that you have not clearly defined.

I really don't see how the old definition was any more well defined than the new one, but I've replaced "needs only trivial computations" with "does not perform side-effects".

true
| Tpat_constant c -> begin
match c with
| Const_string _ -> not Config.safe_string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the not be removed here? When Config.safe_string, reads from strings are inactive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. Well spotted. Fixed.

partial = partial'; }} as exp}]
when Parmatch.fluid pat ->
partial = partial'; }} as exp}], Total
when Parmatch.inactive pat ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you decide to change the way (ir)refutability/partiality is checked in this test? (I'm fine with moving away with the fairly unclear name "fluid".) Is there a difference in meaning between Parmatch.irrefutable and this new test?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cf Leo's note in the PR description:

To try and avoid conflicts I've made it on top of #1195, so when reviewing this change only the second commit is relevant.

(* A `fluid' pattern is both irrefutable and inactive *)

let fluid pat = irrefutable pat && inactive pat.pat_desc
(* An inactive pattern is a pattern which does not perform side-effects. Patterns
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does not perform side-effects: matching against it may be duplicated, erased or delayed without change in observable behavior of the program. (I think that pure would also be a fine name, but inactive is acceptable.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are discussing the documentation side, I'd rather move the description of the function to the interface.

I'd also avoid the reference to "performing side-effects" (and the notion of "pure"), since this can have different meaning depending on the context. @gasche explanation seems to correspond closely to what it needed where the function is used, which is good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I've moved the comment to the .mli and used @gasche's description rather than "performing side-effects".

@alainfrisch
Copy link
Contributor

and thus that curryfication, by delaying this read (and the underlying allocation), changes the observable behavior of the code. Apologies for the distraction.

Curryfication can always change the allocation behavior, and interleaving of effects between threads and from finalizer codes is not specified. So I don't think there is any issue here with the possibility of triggering the GC because of boxing, since the pattern is irrefutable. (The problem is for pattern-matching, where the "asynchronous" effects can break invariants assumed by the pattern matching compiler and lead to unsound code.)


let fluid pat = irrefutable pat && inactive pat.pat_desc
(* An inactive pattern is a pattern which does not perform side-effects. Patterns
containing (lazy _) subpatterns or reads of immutable fields are active. *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/immutable/mutable/

@lpw25 lpw25 changed the title Impure inactive Impure inactive patterns Aug 31, 2017
@lpw25 lpw25 changed the title Impure inactive patterns Only treat pure patterns as inactive Aug 31, 2017
| Tpat_alias (p,_,_) | Tpat_variant (_, Some p, _) ->
inactive p
| Tpat_record (ldps,_) ->
List.exists (fun (_, lbl, p) -> lbl.lbl_mut = Mutable || inactive p) ldps
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be List.for_all, = Immutable, &&?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think you're right. One day someone will invent a type system that saves me from having to think so carefully about which way around logical operators need to be.

@gasche
Copy link
Member

gasche commented Sep 1, 2017

I apologize for the final nitpicking, but the current definition of inactive does not quite match the specification, given that it does not check for partiality (erasing/dropping the check of a partial pattern is invalid if the matching would raise a faliure). This could be fixed by either:

  1. checking for totality in this function as before (but I don't understand if it can be done in a type-informed way)
  2. adding a ~partial:Typedtree.partial parameter to the function that makes it explicit that the context should provide the totality information, and uses it to return the result
  3. just adding to the documentation an indication that totality is assumed / a precondition (I don't like this choice very much)

I think (2) would be the nicest choice.

r := 20;;

print_endline (string_of_int (h ()));;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to extend the test to cover the case of a record with some mutable and some immutable fields.

@xavierleroy
Copy link
Contributor

@lpw25 : could you please address @gasche 's nitpicking and @alainfrisch 's suggestion for an extra test? I think this PR should be part of 4.06, but for this to happen you need to act quickly.

@lpw25
Copy link
Contributor Author

lpw25 commented Sep 11, 2017

Ok, nitpicking attended too, branch rebased, and change entry added.

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if "Cousin nitpick" is a better character than "Old Scrooge" ;-) but for the record I think this PR is now essentially perfect and would encourage its merge -- after the typo fix, heh.

Changes Outdated
@@ -73,6 +73,10 @@ Working version
stubs.
(Mark Shinwell, review by Pierre Chambart and Leo White)

- GPR#1195: Merge functions based on partiality rather than
Parmatch.irrefutable.
(Leo White, revuew by Thomas Refis and Gabriel Scherer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we are at it, you should add Alain in the review list, even if his first round of review was a bit too enthusiastic.

@lpw25 lpw25 merged commit e458e45 into ocaml:trunk Sep 14, 2017
damiendoligez added a commit to damiendoligez/bitstring that referenced this pull request Nov 21, 2017
@nojb
Copy link
Contributor

nojb commented May 2, 2018

Is the following behaviour intended (i.e. is the pattern supposed to be evaluated before all arguments of f are passed) ? See https://caml.inria.fr/mantis/view.php?id=7789:

# let f {contents = c} () = c;;
val f : 'a ref -> unit -> 'a = <fun>
# let r = ref 10;;
val r : int ref = {contents = 10}
# let g = f r;;
val g : unit -> int = <fun>
# g ();;
- : int = 10
# r := 20;;
- : unit = ()
# g ();;
- : int = 10

tbrk added a commit to inria-parkas/sundialsml that referenced this pull request May 2, 2018
@gasche
Copy link
Member

gasche commented May 2, 2018

Yes, the semantics of let f p q = ... are intended to be the same as let f = fun p -> (fun q -> ...).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants