feat(library/init/data/list): add array_list by digama0 · Pull Request #1438 · leanprover/lean3

digama0 · 2017-03-08T07:03:33Z

Add support for an array list type, using the "mutable" array implementation.

leodemoura · 2017-03-08T15:40:06Z

+
+def write {α} (l : array_list α) : fin l^.length → α → array_list α :=
+λ ⟨n, h⟩ v, { l with data := l^.data^.write ⟨n, l^.lt_capacity h⟩ h v }
+


The VM will not be able to perform a destructive update on l^.data even if the reference counter to l is 1 and it owns data. This structure update is currently desugared into

array_list.mk l^.capacity l^.length (l^.data^.write ⟨n, l^.lt_capacity h⟩ h v) l^.len_le

In the current code generator we evaluate arguments from left to right.
So, after we execute l^.data, the array will have at least two references to it: one on the stack, and one from l. So, a destructive update is not performed.
To make sure we can perform a destructive update at run time, we need to write this function using a pattern matching

def write {α} : ∀ (l : array_list α), fin l^.length → α → array_list α | ⟨cap, len, data, hle⟩ ⟨n, h⟩ v := ⟨cap, len, data^.write ⟨n, nat.lt_of_lt_of_le h hle⟩ h v, hle⟩

We can inspect whether we are performing destructive updates or not at run time by using the option trace.array.updat.

set_option trace.array.update true vm_eval ((list.to_array_list [1, 2, 3])^.write (fin.of_nat 1) 10)^.read (fin.of_nat 1)

I have considered using match-with to compile structure updates, but this has nasty consequences.
For example, in the current approach we have the following definitional equality

{ l with data := ...}^.capacity =?= l^.capacity

That is, the projection of a non-updated field is definitionally to the same projection before the update.
This property is essential when we are building hierarchies of type classes (e.g., algebra).
If we use match-with to compile structure updates, then these two terms will be provably equal but not definitionally equal.
This is why I want to support lenses in Lean (#1431). With lenses, we can perform this kind of update in a much more convenient way and make sure a destructive update will be performed at run time if the reference counters are 1. Note that, in the lenses definition we will have to use a match-with. Otherwise we will have the same problem.

That being said, I realize that it would be great to have a mechanism to statically enforce that the reference counters are 1 at run time. To be able to do this, we would have to add linear types to Lean.
We have considered this, but the complexity is too high and they are super inconvenient to use.

@leodemoura I tried your version and a few others:

def write2 {α} : ∀ (l : array_list α), fin l^.length → α → array_list α | ⟨cap, len, data, hle⟩ ⟨n, h⟩ v := ⟨cap, len, data^.write ⟨n, nat.lt_of_lt_of_le h hle⟩ h v, hle⟩ def write3 {α} (l : array_list α) : fin l^.length → α → array_list α | ⟨n, h⟩ v := match l, h with ⟨cap, len, data, hle⟩, h := ⟨cap, len, data^.write ⟨n, nat.lt_of_lt_of_le h hle⟩ h v, hle⟩ end def write4 {α} (l : array_list α) : fin l^.length → α → array_list α | ⟨n, h⟩ v := let cap := l^.capacity, len := l^.length, data := l^.data, hle := l^.len_le in ⟨cap, len, data^.write ⟨n, nat.lt_of_lt_of_le h hle⟩ h v, hle⟩ def write5 {α} (l : array_list α) : fin l^.length → α → array_list α | ⟨n, h⟩ v := match l^.capacity, l^.length, h, l^.data, l^.len_le with cap, len, h, data, hle := ⟨cap, len, data^.write ⟨n, nat.lt_of_lt_of_le h hle⟩ h v, hle⟩ end set_option trace.array.update true vm_eval ([1, 2, 3]^.to_array_list^.write (fin.of_nat 1) 10)^.read (fin.of_nat 1) -- non-destructive vm_eval ([1, 2, 3]^.to_array_list^.write2 (fin.of_nat 1) 10)^.read (fin.of_nat 1) -- destructive vm_eval ([1, 2, 3]^.to_array_list^.write3 (fin.of_nat 1) 10)^.read (fin.of_nat 1) -- destructive vm_eval ([1, 2, 3]^.to_array_list^.write4 (fin.of_nat 1) 10)^.read (fin.of_nat 1) -- non-destructive vm_eval ([1, 2, 3]^.to_array_list^.write5 (fin.of_nat 1) 10)^.read (fin.of_nat 1) -- destructive example {α} (l : array_list α) (n : ℕ) (h : n < l^.length) (v : α) : (l^.write ⟨n, h⟩ v)^.capacity = l^.capacity := rfl example {α} (l : array_list α) (n : ℕ) (h : n < l^.length) (v : α) : (l^.write2 ⟨n, h⟩ v)^.capacity = l^.capacity := rfl --failed example {α} (l : array_list α) (n : ℕ) (h : n < l^.length) (v : α) : (l^.write3 ⟨n, h⟩ v)^.capacity = l^.capacity := rfl --failed example {α} (l : array_list α) (n : ℕ) (h : n < l^.length) (v : α) : (l^.write4 ⟨n, h⟩ v)^.capacity = l^.capacity := rfl example {α} (l : array_list α) (n : ℕ) (h : n < l^.length) (v : α) : (l^.write5 ⟨n, h⟩ v)^.capacity = l^.capacity := rfl

Version 2 is the one you presented above, and version 3 is the suggested match expression. Both of them fail to reduce projections definitionally. I have a few alternatives, versions 4 and 5.

I assume that let expressions allow controlling the evaluation order, so I was hoping that version 4 would work. But it doesn't achieve the destructive write - do you know why? Perhaps it is just because l is still in scope, even though it is not being used. If this version can be made to work, it is a good alternative compilation for {s with ...}.

The only version which has both properties is version 5, which uses a match expression on the projections, but this involves an extra construction / destruction step.

I was hoping that version 4 would work. But it doesn't achieve the destructive write - do you know why?

The problem is that there are implicit references to l in the data^.write application (i.e., in the implicit arguments, and one of them is being captured by a closure that is just dead code).
They can be eliminated. I will try to fix this.

That being said, I also think write4 is a great alternative for compiling {s with ...}. I will try to work on this too.

I pushed a fix that erases the artificial dependency that was preventing the destructive update.
write4 is performing a destructive update after this fix.
I will implement the new approach for {s with ...}

@digama0

…{s with ...}` See discussion at leanprover#1438 leanprover#1438 (comment) @digama0 With this commit, the original `array_list.write` will also perform a destructive update when the reference counter for `l` is 1. ```lean def write {α} (l : array_list α) : fin l^.length → α → array_list α := λ ⟨n, h⟩ v, { l with data := l^.data^.write ⟨n, l^.lt_capacity h⟩ h v } ```

leodemoura · 2017-03-09T15:35:09Z

@digama0 I added the new encoding for {s with ...}, but I'm not sure whether we should keep it or not. It is not a complete solution for "enabling" destructive updates. That is, it is only guaranteed to work if the user is updating a single field. Example {s with sz := s^.sz + 1, data := s^.data^.write i v} may or may not update s^.data destructively. It depends on the evaluation order and where the fields sz and data are located in the structure. If we have two array updates, then for sure one of them will not be destructive {s with data1 := s^.data1^.write i1 v1, data2 := s^.data2^.write i2 v2}. There is also a performance problem. For example, suppose we have a structure s with many fields (e.g. field), then the new {s with ...} is much bigger than the previous one. The problem is that the let stores the type of each variable, and the types are not small in a structure like field. I tried to minimize the problem by creating a let only for fields which are not propositions. Another problem is when proving things about definitions that use {s with ...}. We will probably have to execute an initial zeta expansion step. To eliminate these auxiliary lets, but this can get complicated when we want to eliminate only the ones introduced by the {s with ...} notation.

digama0 · 2017-03-10T01:54:29Z

@leodemoura Do you have a specific example with this behavior? I have the following, but it's not showing what you say.

structure test :=
(data1 : array nat 3)
(data2 : array nat 3)
(sz: nat)

def test.write (s : test) (i : fin 3) (v : nat) :=
{s with data1 := s^.data1^.write i v, data2 := s^.data2^.write i v}

set_option trace.array.update true
vm_eval let a := {test . sz := 4, data1 := {data := λ_, 5}, data2 := {data := λ_, 1}}^.write
  (fin.of_nat 1) 10 in
  (a^.data1^.read (fin.of_nat 1), a^.data2^.read (fin.of_nat 2)) -- destructive write

leodemoura · 2017-03-10T02:23:34Z

@digama0 Here is an example

structure test :=
(data1 : array nat 3)
(data2 : array nat 3)
(sz: nat)

def test.write (s : test) (i : fin 3) (v : nat) :=
{s with data1 := s^.data1^.write i v, data2 := s^.data2^.write i v}

def mk_test (n : nat) : test :=
{ data1 := mk_array 3 0,
  data2 := mk_array 3 1,
  sz    := n }

set_option trace.array.update true

#eval test.write (mk_test 10) (fin.of_nat 1) 10
/- 
Output:
[array.update] non-destructive write at #1
[array.update] destructive write at #1
-/
def test.write2 (s : test) (i : fin 3) (v : nat) : test :=
match s with
| ⟨d₁, d₂, s⟩ := ⟨d₁^.write i v, d₂^.write i v, s⟩
end

#eval test.write2 (mk_test 10) (fin.of_nat 1) 10
/-
Output:
[array.update] destructive write at #1
[array.update] destructive write at #1
-/

leodemoura · 2017-03-10T02:26:43Z

In your example

vm_eval let a := {test . sz := 4, data1 := {data := λ_, 5}, data2 := {data := λ_, 1}}^.write
  (fin.of_nat 1) 10 in
  (a^.data1^.read (fin.of_nat 1), a^.data2^.read (fin.of_nat 2)) -- destructive write

The compiler simplifies a^.data1 and a^.data2 at compilation time because a is a constructor after inlining test.write.

leodemoura · 2017-03-10T02:37:44Z

BTW, your example help me to find a potential performance problem in the compiler.
It is being too aggressive in the function inlining step, and may take a very long time in some cases, and/or consume a lot of time. It should have not unfolded/inlined test.write at compilation time.

See discussion at #1438

digama0 · 2017-03-14T07:46:42Z

@leodemoura My recommended solution to your {s with ... } problem is to search for references to any let-bound projections in the body, and replace them with the variables. That is, in the example

structure test :=
(data1 : array nat 3)
(data2 : array nat 3)
(sz: nat)

def test.write (s : test) (i : fin 3) (v : nat) :=
{s with data1 := s^.data1^.write i v, data2 := s^.data2^.write i v}

we compile this to

let _sz := s^.sz in
let _data1 := s^.data1 in
let _data2 := s^.data2 in
{test . sz := _sz, data1 := _data1^.write i v, _data2 := data2^.write i v}

(where let bindings are only generated for data, as you suggested).

leodemoura · 2017-03-15T18:26:03Z

@leodemoura My recommended solution to your {s with ... } problem is to search for references to any let-bound projections in the body, and replace them with the variables. That is, in the example

At commit c58f61e, the transformation was implemented at elaborator.cpp. Your suggestion cannot be reliably implemented there. The problem is that the term is full of "holes" at that point, and we can't collect all occurrences of s^.... since some of these occurrences will be in holes that will be "filled" later.
I think we can try to implement your suggestion in the code generator after we have erases proofs. Then,
1- All holes have already been filled and we can reliably collect all occurrences of s^.....
2- Proofs have been erased, and we don't have to be concerned about occurrences of s^.... in proof terms.
3- Users will not be puzzled by alien let _fname := ... terms in their declarations .

digama0 · 2017-03-15T21:27:40Z

@leodemoura That actually makes a lot of sense. Evaluation order isn't even supposed to be a thing users need to worry about in dependent type theory, and the code generator is the one supplying the evaluation order on top of the original structure anyway.

What you really need is a proper analysis of the arr^.write terms in the expression relative to arr^.read, and ensure that the former are always evaluated first, unless a read depends on a write, in which case a non-destructive write is required. In general, it seems like linear types would definitely be useful, but I am not an expert.

Just a quick comment. I didn't read the Haskell paper carefully, but I think their approach is not very useful for us (i.e., performing destructive updates) because given f : A -o B, they allow the caller to use f with non unique types. So, we cannot perform destructive updates on A.

In their implementation, there are two heaps, one for regular objects and one for linear objects, which must be routed through linear functions and have no GC. For our purposes, you can make sure that A is a linear object before entering the function, by copying it if the refcounter is > 1, that is, assuming that any function with a linear argument will modify its argument and so preempt the copying when the write in the function is found. That way, the code for f can always safely apply destructive updates.

leodemoura · 2017-03-22T15:10:02Z

@jroesch and I needed a char buffer yesterday. We defined it as
https://github.com/leanprover/lean/blob/master/library/data/buffer.lean#L8
We realized it is essentially array_list because the internal array implementation already has support for capacity/size.
Sorry for not addressing this PR earlier. I will not merge it since buffer is simpler.

digama0 · 2017-03-22T15:35:35Z

I am confused by your assertion. Exactly how is array being stored internally, that would cause size and capacity to diverge?

…

On Mar 22, 2017 11:10 AM, "Leonardo de Moura" ***@***.***> wrote: @jroesch <https://github.com/jroesch> and I needed a char buffer yesterday. We defined it as https://github.com/leanprover/lean/blob/master/library/data/buffer.lean#L8 We realized it is essentially array_list because the internal array implementation already has support for capacity/size. Sorry for not addressing this PR earlier. I will not merge it since buffer is simpler. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1438 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA1A7FlNCIh1RBkd2LQcHPn71vZNOI6lks5roTnMgaJpZM4MWcMd> .

leodemoura · 2017-03-22T15:41:23Z

@digama0
parray is used to implement Lean arrays.
https://github.com/leanprover/lean/blob/master/src/library/vm/vm_array.cpp#L14
The capacity in parray is stored before the elements.
https://github.com/leanprover/lean/blob/master/src/library/parray.h#L26
The size is stored in a cell:
https://github.com/leanprover/lean/blob/master/src/library/parray.h#L65

digama0 · 2017-03-22T22:00:40Z

@leodemoura Although it is nice to know that the array implementation has these additional features, I would really like to have access to a raw array type (without even array size stored), from which more complicated verified programs can be built. Perhaps parray can also be exposed by a different name, but for proper complexity control for lean programs, the real underlying array structure should be exposed as much as possible, so that more of it can be formalized and verified and fewer e.g. bounds checks are needed. In particular, defined_array is meant to be an abstraction for uninitialized or partially-initialized arrays, and it would be nice if the VM could actually avoid array initialization for this structure.

…

On Wed, Mar 22, 2017 at 11:41 AM, Leonardo de Moura < ***@***.***> wrote: @digama0 <https://github.com/digama0> parray is used to implement Lean arrays. https://github.com/leanprover/lean/blob/master/src/library/ vm/vm_array.cpp#L14 The capacity in parray is stored before the elements. https://github.com/leanprover/lean/blob/master/src/library/parray.h#L26 The size is stored in a cell: https://github.com/leanprover/lean/blob/master/src/library/parray.h#L65 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1438 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA1A7BHWpVkakfR_bnOmm78BI1ssRckAks5roUElgaJpZM4MWcMd> .

leodemoura · 2017-03-22T22:48:46Z

@digama0 Yes, it would be nice to have a primitive C-like array. We will need this kind of array to interface with external code. We can call them c_array or carray.

digama0 · 2017-03-22T23:38:50Z

@leodemoura Here is a slight variation on defined_array that doesn't require inhabited α for its operation. Would it be feasible to reflect this in C?

structure partial_carray (α : Sort u) (n) (s : fin n → Prop) :=
(data : Π (n : fin n), s n → α)

def carray (α : Sort u) (n) := partial_carray α n (λ_, true)

If I understand correctly, the parameter s will not be erased by the VM, but I can't think of any way to extract data from it (since equality on fin n → Prop is not decidable). Do you know if it is consistent to erase it? The idea is that s represents the subset of values of the array which have been initialized (which is not stored in the data structure). The management of this parameter is only used in order to ensure validity of the data access.

leodemoura · 2017-03-23T05:38:13Z

If I understand correctly, the parameter s will not be erased by the VM, but I can't think of any way to extract data from it (since equality on fin n → Prop is not decidable). Do you know if it is consistent to do so?

An object of type carray has only one field: the function Π (n : fin n), s n → α.
The lambda lifting step may capture unnecessary dependencies, but we can refine it.
We cannot treat this function as an unary function because of partial applications.
The problem is easy to see with safe_div : Pi (x y : nat), y > 0 -> nat.
An user may create the closure safe_div 0 0. This is fine, but he will not be able to build a proof for 0 > 0, and execute safe_divwhich would probably crash the program. However, if we erase the third argument, then we would be able to crash the program usingsafe_div 0 0` (which is a fine term).

BTW, we erase irrelevant data (i.e., types, propositions (which are also types) and proofs) by replacing them with unit (i.e. ()). We currently do not erase functions that produce irrelevant data.
The plan is to replace a term t : Pi (a_1 : A_1) ... (a_n : A_n), B where B is irrelevant with mk_unit_closure_n which is a pre generated byte code for fun a_1 ... a_n, ().
This is needed in cases such as

list.length (list.map (fun _, true) [1, 2, 3])

That is, the user creates a list of propositions using list.map. At runtime, we would be creating a list of units. By using this approach we make sure the code generation is uniform.

digama0 · 2017-03-23T15:22:48Z

@leodemoura

We currently do not erase functions that produce irrelevant data.
The plan is to replace a term t : Pi (a_1 : A_1) ... (a_n : A_n), B where B is irrelevant with mk_unit_closure_n which is a pre generated byte code for fun a_1 ... a_n, ().

I think you can take this one step farther: Just replace t with () in the VM (i.e. Pi (a : A), B is irrelevant if B is). Then if apply is called on (), you ignore the argument and return () again.

The lambda lifting step may capture unnecessary dependencies, but we can refine it.
We cannot treat this function as an unary function because of partial applications.
The problem is easy to see with safe_div : Pi (x y : nat), y > 0 -> nat.
An user may create the closure safe_div 0 0. This is fine, but he will not be able to build a proof for 0 > 0, and execute safe_div which would probably crash the program. However, if we erase the third argument, then we would be able to crash the program using safe_div 0 0 (which is a fine term).

Hm, that's too bad. This seems like an edge case, so it might be worth working around it. What is lambda lifting exactly? If I'm reading the code correctly, you are uncurrying sequences of lambdas. Does a function of type A -> B -> C always get compiled to a two arg function, or will it remain curried if the function is defined that way? If yes, you could eliminate all irrelevant args, and then compile a fully applied function application as normal, and a function application where only irrelevant args are unapplied to a thunk that calls the function. More generally, a member of type Pi (a :: A), B compiles to a function whose arguments are only the relevant arguments in A, unless there are none, in which case it compiles to a thunk.

I'm starting to see why a solid semantics for the VM data types is useful.

leodemoura · 2017-03-23T17:10:54Z

I think you can take this one step farther: Just replace t with () in the VM (i.e. Pi (a : A), B is irrelevant if B is). Then if apply is called on (), you ignore the argument and return () again.

For the bytecode interpreter, this would be reasonable. However, for native code generation, we don't want this kind of overhead.

leodemoura · 2017-03-23T17:14:40Z

What is lambda lifting exactly?

It is one of the preprocessing steps. For each nested lambda (except the ones used for cases_on), we create an auxiliary definition foo a_1 ... a_n. Where a_1, ... a_n are the local constants used in the lambda expression. At runtime, foo a_1 ... a_n becomes a tuple which contains a pointer to foo and a_1 ... a_n. When we get the remaining arguments, we execute foo.

leodemoura · 2017-03-23T17:18:11Z

We reduce the overhead of safe_div by creating an auxiliary definition: safe_div.aux which takes only 2 arguments. That is, the generated code is something like

def safe_div.aux x y :=
... -- code for the original safe_div is here

def safe_div x y h :=
safe_div.aux x y

Then, for fully applied safe_div x y h application, we inline and get

safe_div.aux x y

That is, we only pay for the overhead for partial applications.

feat(library/init/data/list): add array_list

459477b

leodemoura reviewed Mar 8, 2017

View reviewed changes

fixing overwrite optimizations

cb727b3

leodemoura added a commit that referenced this pull request Mar 13, 2017

chore(frontends/lean/elaborator): retract c58f61e

bc7089d

See discussion at #1438

leodemoura closed this Mar 22, 2017


		def write {α} (l : array_list α) : fin l^.length → α → array_list α :=
		λ ⟨n, h⟩ v, { l with data := l^.data^.write ⟨n, l^.lt_capacity h⟩ h v }

Conversation

digama0 commented Mar 8, 2017

Uh oh!

leodemoura Mar 8, 2017

Choose a reason for hiding this comment

Uh oh!

digama0 Mar 8, 2017

Choose a reason for hiding this comment

Uh oh!

leodemoura Mar 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leodemoura Mar 8, 2017

Choose a reason for hiding this comment

Uh oh!

leodemoura commented Mar 9, 2017

Uh oh!

digama0 commented Mar 10, 2017

Uh oh!

leodemoura commented Mar 10, 2017

Uh oh!

leodemoura commented Mar 10, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leodemoura commented Mar 10, 2017

Uh oh!

digama0 commented Mar 14, 2017

Uh oh!

leodemoura commented Mar 15, 2017

Uh oh!

digama0 commented Mar 15, 2017

Uh oh!

leodemoura commented Mar 22, 2017

Uh oh!

digama0 commented Mar 22, 2017 via email

Uh oh!

leodemoura commented Mar 22, 2017

Uh oh!

digama0 commented Mar 22, 2017 via email

Uh oh!

leodemoura commented Mar 22, 2017

Uh oh!

digama0 commented Mar 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leodemoura commented Mar 23, 2017

Uh oh!

digama0 commented Mar 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leodemoura commented Mar 23, 2017

Uh oh!

leodemoura commented Mar 23, 2017

Uh oh!

leodemoura commented Mar 23, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

leodemoura Mar 8, 2017 •

edited

Loading

leodemoura commented Mar 10, 2017 •

edited

Loading

digama0 commented Mar 22, 2017 •

edited

Loading

digama0 commented Mar 23, 2017 •

edited

Loading