Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pattern matching with mutable and lazy patterns is unsound #7241

Open
vicuna opened this issue Apr 25, 2016 · 12 comments
Open

Pattern matching with mutable and lazy patterns is unsound #7241

vicuna opened this issue Apr 25, 2016 · 12 comments
Assignees

Comments

@vicuna
Copy link

@vicuna vicuna commented Apr 25, 2016

Original bug ID: 7241
Reporter: @stedolan
Assigned to: @maranget
Status: assigned (set by @mshinwell on 2017-03-09T12:39:53Z)
Resolution: open
Priority: normal
Severity: crash
Category: typing
Related to: #5992
Monitored by: junsli braibant @gasche @hcarty

Bug description

Optimised pattern matching skips checking conditions that seem redundant. However, since OCaml supports pattern-matching mutable fields, and code execution during matching via "lazy" patterns, the truth of some conditions can vary during matching.

This can cause seemingly-impossible cases to be taken, if forcing lazy values causes mutations that confuse the optimised matching logic. Due to the presence of GADTs, taking an impossible case is a soundness issue.

For example, this program segfaults:

type (_, _) eq = Refl : ('a, 'a) eq
type (_, _) deq = Deq : ('a, 'x) eq option * ('x, 'b) eq option -> ('a, 'b) deq

let deq1 = Deq (Some Refl, None)
let deq2 = Deq (None, Some Refl)

type ('a, 'b) t = { 
  a : bool; 
  mutable b : ('a, 'b) deq;
  mutable c : int Lazy.t
}

let g (type a) (type b) : (a, b) t -> (a, b) eq = function
| { a = true; b = Deq (None, _) }
| { a = true; b = Deq (Some _, _); c = lazy 1 }
| { a = false }
| { b = Deq (_, None) } -> 
   assert false
| { b = Deq (Some Refl, Some Refl) } ->
   Refl

let bad =
  let r = { a = true; b = deq1; c = lazy 1 } in
  r.c <- lazy (r.b <- deq2; 2);
  g r

let castint (type a) (Refl : (int, a) eq) (x : int) : a = x
let _ = print_string (castint bad 42)

This program uses mutation to change a field from "deq1" to "deq2" during matching, making it seem like the impossible "Deq (Some Refl, Some Refl)". (The behaviour is very dependent on the exact sequence of cases in "g", and seemingly-equivalent programs will often give different behaviour).

@vicuna
Copy link
Author

@vicuna vicuna commented Apr 25, 2016

Comment author: @stedolan

Playing with it a bit more, the example can be simplified to the following, which does not use laziness:

type app = App : ('x -> unit) option * 'x -> app

let app1 = App (Some print_string, "hello")
let app2 = App (None, 42)

type t = { 
  a : bool; 
  mutable b : app
}

let f = function
| { a = false } -> assert false
| { a = true; b = App (None, _) } -> assert false 
| { a = true; b = App (Some _, _) } as r 
    when (r.b <- app2; false) -> assert false
| { b = App (Some f, x) } ->
   f x

let _ = f { a = true; b = app1 }

The issue is not the type-equality behaviour of GADTs, but the existential quantification. In this example, mutation causes the optimised pattern-matching to confuse the values bound under two different existential quantifiers. Either lazy patterns (as in the original example) or when guards (above) are enough to cause mutation during matching.

@vicuna
Copy link
Author

@vicuna vicuna commented Apr 26, 2016

Comment author: bvaugon

Remarks, the "when" is also broken. For example, the following code crash in the similar way:

type (_, _) eq = Refl : ('a, 'a) eq
type (_, _) deq = Deq : ('a, 'x) eq option * ('x, 'b) eq option -> ('a, 'b) deq
    
let deq1 = Deq (Some Refl, None)
let deq2 = Deq (None, Some Refl)
    
type ('a, 'b) t = { 
  a : bool; 
  mutable b : ('a, 'b) deq;
}
    
let r = { a = true; b = deq1 }

let g (type a) (type b) : (a, b) t -> (a, b) eq = function
  | { a = true; b = Deq (Some _, _) } when (r.b <- deq2; false) ->
    assert false
  | { a = true; b = Deq (None, _) }
  | { a = false }
  | { b = Deq (_, None) } ->
    assert false
  | { b = Deq (Some Refl, Some Refl) } ->
    Refl

let bad = g r
    
let castint (type a) (Refl : (int, a) eq) (x : int) : a = x
let _ = print_string (castint bad 42)

@vicuna
Copy link
Author

@vicuna vicuna commented Jul 13, 2016

Comment author: @alainfrisch

I did not try to produce an example, and this would be more tricky, but even without lazy or when guards, it is impossible to guarantee that arbitrary code won't be executed during pattern matching as soon as patterns need to allocate (e.g. to read from a float from unboxed records or arrays). These allocations can trigger the GC and thus finalizer which could modify mutable parts of the matched value. It might be possible to delay these allocations (at least until the "when" guard), but this would probably require some refactoring of the PM compiler. (Of course, the pattern can "traverse" the unboxed floats.)

@vicuna
Copy link
Author

@vicuna vicuna commented Jul 13, 2016

Comment author: @alainfrisch

Setting Target version to "later", since there is no clear resolution plan.

@vicuna
Copy link
Author

@vicuna vicuna commented Jul 19, 2016

Comment author: @garrigue

I don't even think that one needs existentials to do that.
Here is an example in core ocaml; it segfaults even after fixing Matching.check_partial (which currently only downgrade if the same pattern contains both mutable and lazy)

type u = {a: bool; mutable b: int option}

let f x =
  match x with
    {a=false} -> 0
  | {b=None} -> 1
  | _ when (x.b <- None; false) -> 2
  | {a=true; b=Some y} -> y

let _ = f {a=true; b=Some 5}

The (insufficient) fix in matching.ml is

let check_partial is_mutable is_lazy pat_act_list = function
  | Partial -> Partial
  | Total ->
      if
        pat_act_list = [] ||  (* allow empty case list *)
        List.exists (fun (pats, _) -> is_mutable pats) pat_act_list &&
        List.exists (fun (pats, lam) -> is_guarded lam || is_lazy pats)
          pat_act_list
      then Partial
      else Total

@vicuna
Copy link
Author

@vicuna vicuna commented Jul 19, 2016

Comment author: @alainfrisch

Frightening. Jacques: do you agree with my guess that this could even occur without when guards (with patterns that allocates and GC alarms/finalizers)?

@vicuna
Copy link
Author

@vicuna vicuna commented Feb 16, 2017

Comment author: @xavierleroy

Currenty worked on at #717

@github-actions
Copy link

@github-actions github-actions bot commented Jun 1, 2020

This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.

@github-actions github-actions bot added the Stale label Jun 1, 2020
@gasche gasche removed the Stale label Jun 1, 2020
@yallop
Copy link
Member

@yallop yallop commented Jun 1, 2020

The program in the original report still causes a segmentation fault, which seems reason enough to leave this open.

There's also been recent activity on this issue elsewhere. Under #717, @trefis wrote in October 2019:

That work is still ongoing, and we definitely hope to fix #7241 as a result of it.

@bluddy
Copy link
Contributor

@bluddy bluddy commented Jun 1, 2020

I wonder how other languages with pattern matching and mutation deal with this issue (F#, Scala...). Do they all suffer from the same unsoundness?
Also, is there any possible way to address this in a multicore world with mutable state shared between threads?

@gasche
Copy link
Member

@gasche gasche commented Jun 1, 2020

Whether one is unsound or not is a question of assumptions and optimizations made during matching. Very naive implementations would typically be correct, and slightly more elaborate implementations may very well be buggy just as ours.

Handling multicore is not a difficulty. What we need is to reason about the points during pattern-matching where side-effects may be performed. Currently this happens during evaluation of guards, float unboxing and lazy forcing; Multicore (or really any implementation) could introduce additional "poll points" during pattern-matching, but I don't think this is planned for now (there is no risk of undue latency caused by pattern-matching that would require polling, as it is always bounded in time and very fast).

@github-actions
Copy link

@github-actions github-actions bot commented Nov 1, 2021

This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants