Tutorial

pikatchu edited this page Apr 8, 2011 · 9 revisions
Clone this wiki locally

Welcome to the "Release for friends"

If you are reading this, you are probably someone I know :-) I am putting together this little tutorial, so that you guys can play with the language, tell me what you think, what you like what you dislike etc ... I will assume that you all know functional programming, I am pretty sure I don't have to explain what a tail recursion is to any of you :-)

The syntax is very OCamlish. It is similar enough so that you can use the caml-mode for emacs (I didn't test Tuareg).

Your first program

(* file hello.lml *)
module HelloWorld = struct
val main: unit -> unit 
let main() = 
    Print.string "hello world!" ; 
    Print.newline()
end

To compile it:

limlc hello.lml -root HelloWorld -o hello

--> the executable hello has been created

The -root option indicates which "main" to use. This is handy if you want to generate multiple executables out of the same code. The signature of every function must be specified, I allowed "val" in modules so that you don't have to create a module signature every time.

Notes: You must have noticed that the argument of the main function is "unit", it should take a "string array", but because I am not done with string implementation it is unit for now.

The classics

  • lists: [1 ; 2 ; 3] <=> 1 :: 2 :: 3 :: []
  • arrays: there is no [| |] syntax for now, use Array.make etc ...
  • tuples: (1, 2, 3)
  • strings: "blabla"
  • (+ - * /) these operators work with both floats and ints (there is no +.)
  • Algebric data types: Myop (1, 2) or Myop 1 2
  • Records: { field1 = 0; field2 = "my_value" }

The pattern-matching:

For now, you can only pattern-match algebric data types, this is because I implemented a straight-forward top-down compilation of pattern-matching. This implies: no values, no "when" patterns:

| A x (A y) -> (* legal *)
| A 0 (A 1) -> (* NO! *)
| A x when x = ... -> (* NO ! *)
| A { field1 = x } -> (* tolerated because x is a variable *)
| A x as y -> (* YES ! *)

Exhaustiveness is checked, and unused branches are detected.

About tuples:

Tuples are very different from those implemented in OCaml. They are "expanded". I will elaborate more later on the reasons. Basically what it means is that (1, 2, 3) is striclty equivalent to ((1, 2), 3). Neither of these constructions allocate, and they never do ! However, the down side is that you cannot write "(1, 2) :: my_list" anymore. This is because 'a is no longer compatible with ('b * 'b). You will have to write "Pair 1 2 :: my_list" instead.

About functions:

Functions with multiple arguments are not curried by default, when you write "f 1 2", the type of f is expected to be "f: int * int -> int" as opposed to "f: int -> int -> int". I didn't implement nested functions (except for those defined with fun), so closure creation is either done through partial application or through anonymous functions.

Every function is recursive by default (no need for rec), and a function can call any function defined (there is no order).

Private functions (functions that are not visible outside of the module), are defined using the keyword "private" in the signature:

val private f: unit -> unit

In a similar manner types can be declared private or abstract. When a type is abstract, its name is visible outside of the module, but not its definition.

This is not very important for now, but you might see function types defined with a #->, this sign means its a function compiled using the C calling convention (which is slower than the native one). It's handy to define external functions, or to call LiML code from a C program.

The module system:

The properties:

  • There is no module nesting. In other words there is only one level of modules.
  • The name of a module is defined within a file, and is completely independent of the name of the file. So if in myfile1.lml you define the module A. The way you will access this module is by using the name A, not Myfile1.A.
  • You cannot open the namespace of a module. I find the module aliasing approach cleaner. If you are going to use the module MySuperLongNameModule multiple times, just define an alias within the module: module MSM = MySuperLongNameModule.

Separate compilation:

Separate compilation is not "file based", it's library based. To create a library you must use the "-lib" option: limlc myfile1.lml myfile2.lml -lib libtest.lmli

This will create the files "libtest.lmli" and "libtest.a" in the same directory. To use the library, just include the file libtest.lmli (make sure the file libtest.a stayed in the same directory).

Linear Types

Ah ! Here we are ! The heart of the language. One line explanation: in LiML, the default is, every value can be used only once ! Some people will argue that these are in fact "unique types" as opposed to "linear types", the difference being subtle (a unique type enforces that only ONE pointer exists at any given time, a linear type enforces that the amount of pointers is stable). However, you will often see unique types been called linear types in the literature, and if P. Wadler can do it, so can I !

In fact, LiML goes even further, not only does the type system guarantees that values are used once, it guarantees that they are indeed used exactly once. You cannot allocate a value and not use it. This guarantees sound usage of the resources, thanks to that, memory usage is handled statically, in more common words, the garbage collection is done statically.

Let's take an example:

let l1 = [1; 2; 3]
let l2 = List.rev l1
List.irelease l2

The right way to reason about values in LiML, is to think of them in terms of "resources", when you use a variable, you are "consuming" a resource. When you allocate a new value, you are allocating a new "resource". Anything that isn't of a primitive type is resource, this includes, file descriptors, records, variants etc ... Primitive types are not resources per se, they can be used multiple times without any restrictions on them. Now, let's comment the code, line by line and reason about it in terms of "resources".

let l1 = [1; 2; 3]

I am creating the "resource" l1.

let l2 = List.rev l1

"List.rev" is consuming the resource l1, I won't be able to re-use l1 passed this point. "List.rev" gives me back a new resource l2.

List.irelease l2 (frees the list)

I am using the resource l2. Passed this point, I used all of my resources, the function can return, the compiler won't complain.

That's right, if I had omitted this last line, the compiler would have complained, explaining that 'l2' hadn't been used.

Let's go further with a more complex example: "List.rev_append"

val rev_append: 'a list * 'a list -> 'a list
let rev_append l1 l2 = 
match l1 with 
| [] -> l2
| x :: rl -> rev_append rl (x :: l2)

Now, again, let's translate this in terms of resources:

let rev_append l1 l2 =

I am given 2 resources, l1 and l2, I will have to use them both once and only once.

match l1 with

I am consuming the resource l1, passed this point, I cannot re-use l1.

| [] -> l2

I am consuming l2, because l1 has already been consumed, I consumed all of the resources, the function can return.

| x :: rl -> rev_append rl (x :: l2)

The resources x and rl have been created. The resource (x :: l2), consumes x and l2 and creates a new resource. Whenever I pass values as arguments to a function, it consumes the arguments, this is safe, because it is the responsibility of the called function to manage its parameters as resources. In this example, it is now the responsibility of "rev_append" to consume the resources x and the newly create resource (x :: l2). Passed this point, all the resources have been consumed, the function can return.

How does this translate in terms of code generated ? Well, that's the good news ! In practice, this transformation will be done "in-place". If you are destroying a resource and allocating a new resource of the same type (or with a similar size), the resource is re-used, instead of being freed and re-allocated. In our example:

| x :: rl -> rev_append rl (x :: l2)

the pattern x :: rl destroys the head of the list and the expression x :: l2 creates a new one. The "in-place" optimization can take place.

Observed values

So far, so good, but there is a problem, sometimes, you don't want to destroy the object you are working on, some other times, you really don't want to !

A striking example is "List.length", let's try to write the code:

let length l =
match l with
[] -> 0
x :: rl -> 1 + length rl

Argh !!! We can not write that ! The resource x is never consumed ! Besides, how often do you want to get the length of a list that you are not going to re-use anyway ? Of course, what we could do is destroy the list, to re-create the same one:

let length l =
match l with
[] -> [], 0
x :: rl -> let rl, n = length rl in n+1, x :: rl

This works but is very frustrating ! Instead, we are going to use so called 'observed' values. An observed value allows you to tell the compiler: I am going to use this, but I don't want this operation to be destructive. After you have observed something, you still have to "really use" (consume) the value that you observed. There is one important restriction on observed values: you cannot embed them in a record or a variant. The way you declare observed values in the type signature is by using the predefined "obs" type. The signature of the "List.length" becomes:

val length: 'a list obs -> int

And its implementation is now a lot more natural:

let length l = match l with [] -> 0 | _ :: rl -> 1 + length rl

As you can see, I never used l (nor rl) in a newly created variant or record. The code is accepted by the compiler. On the caller's side:

let n = List.length !l

The '!' notation is used to indicate that we are observing l.

You must now better understand why tuple are expanded :-) If they weren't, they would be resources, and we wouldn't be able to put observed values in them. This would be far too restrictive !

Shared values

LiML let's you share values, but only explicitly using the type 'a Share.t

This is how you create a shared value: let shared_list = Share.make [1; 2; 3]

shared_list is of type: 'a list Share.t

From there on, I can 'clone' this shared values as many times as I want:

let shared_list2 = Share.clone !shared_list

I can now use shared_list2 and shared_list in a record or a variant, they are first class values. However, they are of type 'int list Share.t', how do I get to work on the 'int list' they contain ?

We need the primitive "visit", which returns an observer on the shared value itself

val visit: 'a Share.t obs -> 'a obs

In our case visit !shared_list returns an observer on the list [1;2;3]. Because a shared value is like any other resource, it must be used, the primitive "Share.release", frees a shared resource, its signature is: val release: 'a Share.t -> 'a option. It will return None, if there are still other users of the shared value, and Some of the original value if you are the last one using it.

Your next question is probably going to be: is there a way to transform an 'a Share.t into and 'a ? The answer is: no, there isn't (other than release). This is because it wouldn't be thread-safe! If I let you transform an 'a Shared into an 'a, you would be able to modify 'a in-place (as we have seen earlier), because you could have passed a pointer on this object to a different thread, this wouldn't be safe. When you think about it, it makes sense, if I want sharing in an parallel program, I need to enforce some sort of "read-only" mode on shared values.

Records

A bit of syntax:

let r = { field1 = 0 } in

Creates a record with exactly one field, called field1.

let r2 = { r with field1 = 2 } in

Consumes r, creates r2, setting the value of field1 to 2 (this operation is always done in-place).

let { r ; field1 = x } = r2 in

extracts the field field1, and binds it's value to x, r2 is consumed, r is bound to the record that has every field in r2 but "field1".

This last part is important, you cannot access the field field1 in the record 'r' anymore (except when it contains a primitive type), because that would let you have multiple pointers on the value r2.field1 ... The compiler will enforce the fact that at some point in the same function, you put a value back in field1 and that you didn't try to access this field in the mean time.

To summarize, I like to compare records in LiML to shelves in a library: Take what you need ... Do some work ... put it back !

A bit of sugar:

x.field1` <=> `let { x ; field1 = v } = !x in v
{ x ; ~field1 }` <=> `{ x ; field1 = field1 }

Arrays

There are 2 kinds of array primitives, those for arrays containing primitive values, and those for arrays containing anything else. Let's start with those that can contain anything, because it's fairly simple: you can only swap elements out of an array ! There is no way to set, or get an element, only swap.

let t = Array.make (fun i -> [i]) 10 Creates the array

let t, y = Array.swap t 0 [2 ; 3] Sets what was in cell 0 to [2 ; 3]. 'y' is now what was in cell 0.

If the index is out of the bounds of the array, the function swap gives you back the element you passed as an argument and leaves the array unchanged.

Now, arrays of primitive values. You can use Array.set and Array.get on those. Beside there is some sugar:

t.(i) <- v` <=> `let t = Array.set t i v
t.(i)` <=> `Array.get !t i

Array.set leaves the array unchanged if the index is out of bounds. Array.get gives back a default value when that is the case.

Dark magic with arrays

You probably guessed that a check on the bounds is performed at runtime to see if the index is in the bounds of the array or not. Well, there is an optimization phase that tries to get rid of these checks. It let's you write quite a few primitives without any checks at all. It works by doing some abstract interpretation of the code. A fixpoint is always reached rapidly so there should be no performance issue at compile time. Basically it is able to analyse the conditions of "if then else" constructions to avoid repeating useless checks.

A small example:

val private add1: int array * int -> int array

let add1 t i =
  if i >= Array.length t
  then t
  else (t.(i) <- t.(i) + 1 ; add1 t (i+1)) (* figured out that i < Array.length t *)

Now you might wonder, did it manage to get rid of the lower bound check too ? Well it depends, if the function was declared public, then no ... Because of separate compilation, this function could be called with any value, even with a negative index. But if it is private, then the compiler does an interprocedural check within the module, if every call to add1 was done with an index greater or equal to zero, the function with be compiled into something that doesn't any bound check at all !

Because it can be frustrating when you are trying to optimize your code to "guess" what the compiler is going to do, I added the option "-bounds", this option pretty prints all the places in the code where the bounds couldn't be checked, it will avoid you the headache of diving into the assembly.

I didn't extend the algorithm on matrices yet, don't try, it won't work :-)

Closures

There are 2 kinds of closures:

  • Observed closures, can be used multiple times, can only embed observed values. Their type looks like this ('a -> 'b) obs
  • Linear closures, can only be used once, can only embed linear values. Their type looks like this ('a -> 'b)

Anonymous functions come in 2 forms:

let l = [2 ; 3 ; 4] in fun (l: int list) -> 1 :: l

This is a "linear" closure, it can only be used once, the compiler checked that 'l' has a linear type. Its type is (int -> int list)

let n = 1 in !fun (x: int) -> n+1 (* notice the ! sign *)

This is an observed closure, it can be called multiple times, the compiler checked that n was observed (in that case, it's a primitive value, which is observed). Its type is (int -> int) obs.

Partial application, using the keyword "partial":

val g: 'a * 'a -> 'a
let f = partial g 1 in (* linear closure *)

f is of type (int -> int)

Partial application using !partial:

val g: 'a obs * 'a -> 'a
let f = !partial g 1 in (* observed closure *)

f is of type (int -> int) obs

The code generator

The compiler uses the llvm back-end and can therefore generate native code. I only tested on linux 32bits for now, work on 64bits linux is in progress (should be done soon). Floats are represented as single precision floats on 32bits, and double precision on 64bits. The type representation is exactly the same as in C. So if you define a record with the same type in both LiML and C, you will be able to use it without any conversions between both worlds.

Future work

  • I need to stabilize the compiler, there are still many many quirks bugs, missing features, etc ...
  • I am finishing the port to 64bits, should be done anytime
  • Finish writing the standard library
  • Write a lot more tests (unit + regression)
  • Perhaps add some form of extensional polymorphism, I was hesitating between different options, I really would like to have type classes, because I prefer them to functors, what do you think ?
  • Write some optimizations, even though, I will do that at the end.

Known bugs

  • Need to add a check to forbid recursive types, they blow up the typer for now
  • partial application is buggy both in the front-end and the back-end
  • closures are very unstable in the back-end, however, you can play with them, the front-end should be fairly stable.

Ahhh, I will stop here, there are many many things which are not yet implemented, or buggy. Vast majority of them are in the back-end. However, the front-end is stable enough so that you can try out the program you can write, what the type system allows etc ... even if it's frustrating not to be able to see the code running.