Skip to content
This repository was archived by the owner on Oct 14, 2023. It is now read-only.

feat(library/init/data/list): add array_list#1438

Closed
digama0 wants to merge 2 commits into
leanprover:masterfrom
digama0:master
Closed

feat(library/init/data/list): add array_list#1438
digama0 wants to merge 2 commits into
leanprover:masterfrom
digama0:master

Conversation

@digama0
Copy link
Copy Markdown
Contributor

@digama0 digama0 commented Mar 8, 2017

Add support for an array list type, using the "mutable" array implementation.


def write {α} (l : array_list α) : fin l^.length → α → array_list α :=
λ ⟨n, h⟩ v, { l with data := l^.data^.write ⟨n, l^.lt_capacity h⟩ h v }

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The VM will not be able to perform a destructive update on l^.data even if the reference counter to l is 1 and it owns data. This structure update is currently desugared into

array_list.mk l^.capacity l^.length (l^.data^.write ⟨n, l^.lt_capacity h⟩ h v) l^.len_le

In the current code generator we evaluate arguments from left to right.
So, after we execute l^.data, the array will have at least two references to it: one on the stack, and one from l. So, a destructive update is not performed.
To make sure we can perform a destructive update at run time, we need to write this function using a pattern matching

def write {α} : ∀ (l : array_list α), fin l^.length → α → array_list α 
| ⟨cap, len, data, hle⟩ ⟨n, h⟩ v :=
  ⟨cap, len, data^.write ⟨n, nat.lt_of_lt_of_le h hle⟩ h v, hle⟩ 

We can inspect whether we are performing destructive updates or not at run time by using the option trace.array.updat.

set_option trace.array.update true
vm_eval ((list.to_array_list [1, 2, 3])^.write (fin.of_nat 1) 10)^.read (fin.of_nat 1)

I have considered using match-with to compile structure updates, but this has nasty consequences.
For example, in the current approach we have the following definitional equality

{ l with data := ...}^.capacity   =?=   l^.capacity

That is, the projection of a non-updated field is definitionally to the same projection before the update.
This property is essential when we are building hierarchies of type classes (e.g., algebra).
If we use match-with to compile structure updates, then these two terms will be provably equal but not definitionally equal.
This is why I want to support lenses in Lean (#1431). With lenses, we can perform this kind of update in a much more convenient way and make sure a destructive update will be performed at run time if the reference counters are 1. Note that, in the lenses definition we will have to use a match-with. Otherwise we will have the same problem.

That being said, I realize that it would be great to have a mechanism to statically enforce that the reference counters are 1 at run time. To be able to do this, we would have to add linear types to Lean.
We have considered this, but the complexity is too high and they are super inconvenient to use.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leodemoura I tried your version and a few others:

def write2 {α} : ∀ (l : array_list α), fin l^.length → α → array_list α 
| ⟨cap, len, data, hle⟩ ⟨n, h⟩ v :=
  ⟨cap, len, data^.write ⟨n, nat.lt_of_lt_of_le h hle⟩ h v, hle⟩ 

def write3 {α} (l : array_list α) : fin l^.length → α → array_list α
| ⟨n, h⟩ v :=
  match l, h with
  ⟨cap, len, data, hle⟩, h := ⟨cap, len, data^.write ⟨n, nat.lt_of_lt_of_le h hle⟩ h v, hle⟩
  end

def write4 {α} (l : array_list α) : fin l^.length → α → array_list α
| ⟨n, h⟩ v :=
  let cap := l^.capacity, len := l^.length, data := l^.data, hle := l^.len_le in
  ⟨cap, len, data^.write ⟨n, nat.lt_of_lt_of_le h hle⟩ h v, hle⟩

def write5 {α} (l : array_list α) : fin l^.length → α → array_list α
| ⟨n, h⟩ v :=
  match l^.capacity, l^.length, h, l^.data, l^.len_le with
  cap, len, h, data, hle := ⟨cap, len, data^.write ⟨n, nat.lt_of_lt_of_le h hle⟩ h v, hle⟩
  end

set_option trace.array.update true
vm_eval ([1, 2, 3]^.to_array_list^.write (fin.of_nat 1) 10)^.read (fin.of_nat 1) -- non-destructive
vm_eval ([1, 2, 3]^.to_array_list^.write2 (fin.of_nat 1) 10)^.read (fin.of_nat 1) -- destructive
vm_eval ([1, 2, 3]^.to_array_list^.write3 (fin.of_nat 1) 10)^.read (fin.of_nat 1) -- destructive
vm_eval ([1, 2, 3]^.to_array_list^.write4 (fin.of_nat 1) 10)^.read (fin.of_nat 1) -- non-destructive
vm_eval ([1, 2, 3]^.to_array_list^.write5 (fin.of_nat 1) 10)^.read (fin.of_nat 1) -- destructive

example {α} (l : array_list α) (n : ℕ) (h : n < l^.length) (v : α) : (l^.write ⟨n, h⟩ v)^.capacity = l^.capacity := rfl
example {α} (l : array_list α) (n : ℕ) (h : n < l^.length) (v : α) : (l^.write2 ⟨n, h⟩ v)^.capacity = l^.capacity := rfl --failed
example {α} (l : array_list α) (n : ℕ) (h : n < l^.length) (v : α) : (l^.write3 ⟨n, h⟩ v)^.capacity = l^.capacity := rfl --failed
example {α} (l : array_list α) (n : ℕ) (h : n < l^.length) (v : α) : (l^.write4 ⟨n, h⟩ v)^.capacity = l^.capacity := rfl
example {α} (l : array_list α) (n : ℕ) (h : n < l^.length) (v : α) : (l^.write5 ⟨n, h⟩ v)^.capacity = l^.capacity := rfl

Version 2 is the one you presented above, and version 3 is the suggested match expression. Both of them fail to reduce projections definitionally. I have a few alternatives, versions 4 and 5.

I assume that let expressions allow controlling the evaluation order, so I was hoping that version 4 would work. But it doesn't achieve the destructive write - do you know why? Perhaps it is just because l is still in scope, even though it is not being used. If this version can be made to work, it is a good alternative compilation for {s with ...}.

The only version which has both properties is version 5, which uses a match expression on the projections, but this involves an extra construction / destruction step.

Copy link
Copy Markdown
Member

@leodemoura leodemoura Mar 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping that version 4 would work. But it doesn't achieve the destructive write - do you know why?

The problem is that there are implicit references to l in the data^.write application (i.e., in the implicit arguments, and one of them is being captured by a closure that is just dead code).
They can be eliminated. I will try to fix this.

That being said, I also think write4 is a great alternative for compiling {s with ...}. I will try to work on this too.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a fix that erases the artificial dependency that was preventing the destructive update.
write4 is performing a destructive update after this fix.
I will implement the new approach for {s with ...}

leodemoura added a commit to leodemoura/lean that referenced this pull request Mar 9, 2017
…{s with ...}`

See discussion at leanprover#1438
leanprover#1438 (comment)

@digama0 With this commit, the original `array_list.write` will also
perform a destructive update when the reference counter for `l` is 1.

```lean def write {α} (l : array_list α) : fin l^.length → α → array_list α :=
λ ⟨n, h⟩ v, { l with data := l^.data^.write ⟨n, l^.lt_capacity h⟩ h v }
```
@leodemoura
Copy link
Copy Markdown
Member

@digama0 I added the new encoding for {s with ...}, but I'm not sure whether we should keep it or not. It is not a complete solution for "enabling" destructive updates. That is, it is only guaranteed to work if the user is updating a single field. Example {s with sz := s^.sz + 1, data := s^.data^.write i v} may or may not update s^.data destructively. It depends on the evaluation order and where the fields sz and data are located in the structure. If we have two array updates, then for sure one of them will not be destructive {s with data1 := s^.data1^.write i1 v1, data2 := s^.data2^.write i2 v2}. There is also a performance problem. For example, suppose we have a structure s with many fields (e.g. field), then the new {s with ...} is much bigger than the previous one. The problem is that the let stores the type of each variable, and the types are not small in a structure like field. I tried to minimize the problem by creating a let only for fields which are not propositions. Another problem is when proving things about definitions that use {s with ...}. We will probably have to execute an initial zeta expansion step. To eliminate these auxiliary lets, but this can get complicated when we want to eliminate only the ones introduced by the {s with ...} notation.

@digama0
Copy link
Copy Markdown
Contributor Author

digama0 commented Mar 10, 2017

@leodemoura Do you have a specific example with this behavior? I have the following, but it's not showing what you say.

structure test :=
(data1 : array nat 3)
(data2 : array nat 3)
(sz: nat)

def test.write (s : test) (i : fin 3) (v : nat) :=
{s with data1 := s^.data1^.write i v, data2 := s^.data2^.write i v}

set_option trace.array.update true
vm_eval let a := {test . sz := 4, data1 := {data := λ_, 5}, data2 := {data := λ_, 1}}^.write
  (fin.of_nat 1) 10 in
  (a^.data1^.read (fin.of_nat 1), a^.data2^.read (fin.of_nat 2)) -- destructive write

@leodemoura
Copy link
Copy Markdown
Member

@digama0 Here is an example

structure test :=
(data1 : array nat 3)
(data2 : array nat 3)
(sz: nat)

def test.write (s : test) (i : fin 3) (v : nat) :=
{s with data1 := s^.data1^.write i v, data2 := s^.data2^.write i v}

def mk_test (n : nat) : test :=
{ data1 := mk_array 3 0,
  data2 := mk_array 3 1,
  sz    := n }

set_option trace.array.update true

#eval test.write (mk_test 10) (fin.of_nat 1) 10
/- 
Output:
[array.update] non-destructive write at #1
[array.update] destructive write at #1
-/
def test.write2 (s : test) (i : fin 3) (v : nat) : test :=
match s with
| ⟨d₁, d₂, s⟩ := ⟨d₁^.write i v, d₂^.write i v, s⟩
end

#eval test.write2 (mk_test 10) (fin.of_nat 1) 10
/-
Output:
[array.update] destructive write at #1
[array.update] destructive write at #1
-/

@leodemoura
Copy link
Copy Markdown
Member

leodemoura commented Mar 10, 2017

In your example

vm_eval let a := {test . sz := 4, data1 := {data := λ_, 5}, data2 := {data := λ_, 1}}^.write
  (fin.of_nat 1) 10 in
  (a^.data1^.read (fin.of_nat 1), a^.data2^.read (fin.of_nat 2)) -- destructive write

The compiler simplifies a^.data1 and a^.data2 at compilation time because a is a constructor after inlining test.write.

@leodemoura
Copy link
Copy Markdown
Member

BTW, your example help me to find a potential performance problem in the compiler.
It is being too aggressive in the function inlining step, and may take a very long time in some cases, and/or consume a lot of time. It should have not unfolded/inlined test.write at compilation time.

leodemoura added a commit that referenced this pull request Mar 13, 2017
@digama0
Copy link
Copy Markdown
Contributor Author

digama0 commented Mar 14, 2017

@leodemoura My recommended solution to your {s with ... } problem is to search for references to any let-bound projections in the body, and replace them with the variables. That is, in the example

structure test :=
(data1 : array nat 3)
(data2 : array nat 3)
(sz: nat)

def test.write (s : test) (i : fin 3) (v : nat) :=
{s with data1 := s^.data1^.write i v, data2 := s^.data2^.write i v}

we compile this to

let _sz := s^.sz in
let _data1 := s^.data1 in
let _data2 := s^.data2 in
{test . sz := _sz, data1 := _data1^.write i v, _data2 := data2^.write i v}

(where let bindings are only generated for data, as you suggested).

@leodemoura
Copy link
Copy Markdown
Member

@leodemoura My recommended solution to your {s with ... } problem is to search for references to any let-bound projections in the body, and replace them with the variables. That is, in the example

At commit c58f61e, the transformation was implemented at elaborator.cpp. Your suggestion cannot be reliably implemented there. The problem is that the term is full of "holes" at that point, and we can't collect all occurrences of s^.... since some of these occurrences will be in holes that will be "filled" later.
I think we can try to implement your suggestion in the code generator after we have erases proofs. Then,
1- All holes have already been filled and we can reliably collect all occurrences of s^.....
2- Proofs have been erased, and we don't have to be concerned about occurrences of s^.... in proof terms.
3- Users will not be puzzled by alien let _fname := ... terms in their declarations .

@digama0
Copy link
Copy Markdown
Contributor Author

digama0 commented Mar 15, 2017

@leodemoura That actually makes a lot of sense. Evaluation order isn't even supposed to be a thing users need to worry about in dependent type theory, and the code generator is the one supplying the evaluation order on top of the original structure anyway.

What you really need is a proper analysis of the arr^.write terms in the expression relative to arr^.read, and ensure that the former are always evaluated first, unless a read depends on a write, in which case a non-destructive write is required. In general, it seems like linear types would definitely be useful, but I am not an expert.

Just a quick comment. I didn't read the Haskell paper carefully, but I think their approach is not very useful for us (i.e., performing destructive updates) because given f : A -o B, they allow the caller to use f with non unique types. So, we cannot perform destructive updates on A.

In their implementation, there are two heaps, one for regular objects and one for linear objects, which must be routed through linear functions and have no GC. For our purposes, you can make sure that A is a linear object before entering the function, by copying it if the refcounter is > 1, that is, assuming that any function with a linear argument will modify its argument and so preempt the copying when the write in the function is found. That way, the code for f can always safely apply destructive updates.

@leodemoura
Copy link
Copy Markdown
Member

@jroesch and I needed a char buffer yesterday. We defined it as
https://github.com/leanprover/lean/blob/master/library/data/buffer.lean#L8
We realized it is essentially array_list because the internal array implementation already has support for capacity/size.
Sorry for not addressing this PR earlier. I will not merge it since buffer is simpler.

@leodemoura leodemoura closed this Mar 22, 2017
@digama0
Copy link
Copy Markdown
Contributor Author

digama0 commented Mar 22, 2017 via email

@leodemoura
Copy link
Copy Markdown
Member

@digama0
Copy link
Copy Markdown
Contributor Author

digama0 commented Mar 22, 2017 via email

@leodemoura
Copy link
Copy Markdown
Member

@digama0 Yes, it would be nice to have a primitive C-like array. We will need this kind of array to interface with external code. We can call them c_array or carray.

@digama0
Copy link
Copy Markdown
Contributor Author

digama0 commented Mar 22, 2017

@leodemoura Here is a slight variation on defined_array that doesn't require inhabited α for its operation. Would it be feasible to reflect this in C?

structure partial_carray (α : Sort u) (n) (s : fin n → Prop) :=
(data : Π (n : fin n), s n → α)

def carray (α : Sort u) (n) := partial_carray α n (λ_, true)

If I understand correctly, the parameter s will not be erased by the VM, but I can't think of any way to extract data from it (since equality on fin n → Prop is not decidable). Do you know if it is consistent to erase it? The idea is that s represents the subset of values of the array which have been initialized (which is not stored in the data structure). The management of this parameter is only used in order to ensure validity of the data access.

@leodemoura
Copy link
Copy Markdown
Member

If I understand correctly, the parameter s will not be erased by the VM, but I can't think of any way to extract data from it (since equality on fin n → Prop is not decidable). Do you know if it is consistent to do so?

An object of type carray has only one field: the function Π (n : fin n), s n → α.
The lambda lifting step may capture unnecessary dependencies, but we can refine it.
We cannot treat this function as an unary function because of partial applications.
The problem is easy to see with safe_div : Pi (x y : nat), y > 0 -> nat.
An user may create the closure safe_div 0 0. This is fine, but he will not be able to build a proof for 0 > 0, and execute safe_divwhich would probably crash the program. However, if we erase the third argument, then we would be able to crash the program usingsafe_div 0 0` (which is a fine term).

BTW, we erase irrelevant data (i.e., types, propositions (which are also types) and proofs) by replacing them with unit (i.e. ()). We currently do not erase functions that produce irrelevant data.
The plan is to replace a term t : Pi (a_1 : A_1) ... (a_n : A_n), B where B is irrelevant with mk_unit_closure_n which is a pre generated byte code for fun a_1 ... a_n, ().
This is needed in cases such as

list.length (list.map (fun _, true) [1, 2, 3])

That is, the user creates a list of propositions using list.map. At runtime, we would be creating a list of units. By using this approach we make sure the code generation is uniform.

@digama0
Copy link
Copy Markdown
Contributor Author

digama0 commented Mar 23, 2017

@leodemoura

We currently do not erase functions that produce irrelevant data.
The plan is to replace a term t : Pi (a_1 : A_1) ... (a_n : A_n), B where B is irrelevant with mk_unit_closure_n which is a pre generated byte code for fun a_1 ... a_n, ().

I think you can take this one step farther: Just replace t with () in the VM (i.e. Pi (a : A), B is irrelevant if B is). Then if apply is called on (), you ignore the argument and return () again.

The lambda lifting step may capture unnecessary dependencies, but we can refine it.
We cannot treat this function as an unary function because of partial applications.
The problem is easy to see with safe_div : Pi (x y : nat), y > 0 -> nat.
An user may create the closure safe_div 0 0. This is fine, but he will not be able to build a proof for 0 > 0, and execute safe_div which would probably crash the program. However, if we erase the third argument, then we would be able to crash the program using safe_div 0 0 (which is a fine term).

Hm, that's too bad. This seems like an edge case, so it might be worth working around it. What is lambda lifting exactly? If I'm reading the code correctly, you are uncurrying sequences of lambdas. Does a function of type A -> B -> C always get compiled to a two arg function, or will it remain curried if the function is defined that way? If yes, you could eliminate all irrelevant args, and then compile a fully applied function application as normal, and a function application where only irrelevant args are unapplied to a thunk that calls the function. More generally, a member of type Pi (a :: A), B compiles to a function whose arguments are only the relevant arguments in A, unless there are none, in which case it compiles to a thunk.

I'm starting to see why a solid semantics for the VM data types is useful.

@leodemoura
Copy link
Copy Markdown
Member

I think you can take this one step farther: Just replace t with () in the VM (i.e. Pi (a : A), B is irrelevant if B is). Then if apply is called on (), you ignore the argument and return () again.

For the bytecode interpreter, this would be reasonable. However, for native code generation, we don't want this kind of overhead.

@leodemoura
Copy link
Copy Markdown
Member

What is lambda lifting exactly?

It is one of the preprocessing steps. For each nested lambda (except the ones used for cases_on), we create an auxiliary definition foo a_1 ... a_n. Where a_1, ... a_n are the local constants used in the lambda expression. At runtime, foo a_1 ... a_n becomes a tuple which contains a pointer to foo and a_1 ... a_n. When we get the remaining arguments, we execute foo.

@leodemoura
Copy link
Copy Markdown
Member

We reduce the overhead of safe_div by creating an auxiliary definition: safe_div.aux which takes only 2 arguments. That is, the generated code is something like

def safe_div.aux x y :=
... -- code for the original safe_div is here

def safe_div x y h :=
safe_div.aux x y

Then, for fully applied safe_div x y h application, we inline and get

safe_div.aux x y

That is, we only pay for the overhead for partial applications.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants