## Types

OCaml differs from languages such as Python in that it has a strong, static type system.   The type system is very expressive and makes it possible eliminate whole classes of errors from programs.

Unlike many statically-typed languages, OCaml also has type inference.   It is not usually necessary to declare the types of bindings and functions - the compiler will infer the appropriate type automatically.    We have seen type inference in action already in our simple examples.    Here, the compiler infers that the value bound to `x` is an int:

In [1]:
let x = 42

Type inference also works for functions.   Here, `f` is a function that takes two ints and returns another:

In [2]:
let f x y = x + y

`f` is defined in curried form, so this signature actually means that `f` takes an `int` and returns a function which takes another `int` and finally returns yet another `int`.

OCaml does not promote types automatically, so floats are distinct from integers and have their own operators:

In [3]:
let g x y = x +. (float_of_int y)

### Defining new types

OCaml gives us many ways to define new data structures.    One of the most common is the labelled variant:

In [4]:
type colours = Red | Green | Blue

The `colours` type behaves much like an enum in C:

In [5]:
let rose = Red

However labelled variants are much more powerful than this example shows.   The are actually more like unions:

In [6]:
type demo = 
    | BasicInt : int -> demo
    | String : string -> demo
    | Tuple : (int * string * float) -> demo
    | Bob : bool -> demo

Unlike unions, we can't mix up the type of the variant when we access it:

In [7]:
let mydemo = Bob true
in
let print_demo x = match x with
| String s -> print_string s
| BasicInt i -> print_int i
| Tuple _ -> print_string "I'm too lazy"
in
print_demo mydemo

File "[7]", line 3, characters 19-129:
Here is an example of a value that is not matched:
Bob _


This piece of code shows two things: 

  * first, the compiler warns us that that our `match` statement (equivalent to a `switch` in C) doesn't have branches to handle all the different lables defined in the `demo` type
  * second, when we run this code, we get a runtime exception because the match failed at runtime

## Destructuring bind

The match statement above is an example of a *destructuring bind* or *pattern match*.   This concept also exists in languages such as Python.   The left hand side of the bind specifies a pattern, and if the right hand side matches the pattern its values are unpacked and bound to the names given on the left hand side.    The compiler checks, statically, that the right hand side can be matched by the left hand side and warns or fails if this is not possible.   This technique is extremely powerful - many Ocaml programs consist mainly of pattern matches.

Here's a version of `print_demo` that handles all the variants in the type:

In [8]:
let print_demo' = function
| BasicInt i -> print_endline ("I got a basic int, and its value was " ^ (string_of_int i))
| String s -> print_endline ("Just a boring string, saying '" ^ s ^ "'")
| Tuple (first, second, third) -> 
    Printf.printf 
        "Oooh, a  tuple.   The first field was %d, the second was '%s' and the third was %0.3f"
            first second third

File "[8]", line 1, characters 18-372:
Here is an example of a value that is not matched:
Bob _


In [9]:
print_demo' (BasicInt 42);
print_demo' (String "hello");
print_demo' (Tuple (12, "ni", 3.14))

I got a basic int, and its value was 42
Just a boring string, saying 'hello'
Oooh, a  tuple.   The first field was 12, the second was 'ni' and the third was 3.140

In this case, we used the `function` form, which acts like a `match` but does not require us to name the variable being matched.

One very powerful aspect of this style of programming is that, if we ever change the definition of the `demo` type - by adding or removing a label, or by changing the structure of a value - the compiler will tell us about all the places in the code that don't match.  This makes refactoring very pleasant.

OCaml has a few other basic types, most important of which are records, demonstrated below.   With basic types, labelled variants, tuples, records and some other helpful types such as `option`, we can build almost any data structure.

In [10]:
type domestic_animal = Dog | Cat | Budgie | Rabbit
type person = {
    name : string;
    age : int;
    pets : (domestic_animal * string) list;
}
let people = [{name = "Bob"; age = 13; pets = [(Dog, "Rover")]}; {name = "Maria"; age = 39; pets = []}]

### Polymorphism

Sometimes we don't care what kind of data we store in a structure, or we don't need to know the type of the data passed into a function in order to do something with it.   For instance, to find the length of a list, we don't need to know the type of data in the list - we just need to know how to traverse it:

In [11]:
List.length

The same function can find the length of a list of strings or a list of ints, without modification:

In [12]:
Printf.printf "%d strings\n" (List.length ["one"; "two"; "three"]);
Printf.printf "%d ints\n" (List.length [1;2;3;4;5])

3 strings
5 ints


When defining data structures, we might not care what a particular type is, but we might want to ensure that different fields have the *same* type:

In [13]:
type 'a pair = {
    first : 'a;
    second : 'a;
}

We can define a pair of ints, or a pair of strings, but not a pair of an int and a string:

In [14]:
let x = { first = 1; second = 2 }
let y = { first = "foo"; second = "bar" }
let z = { first = "foo"; second = 42 }

## Modules

Ocaml's module system does 3 major things:

  * groups types and functions
  * allows types to be made abstract, hiding their implementation details
  * makes it possible to build generic libraries which are specialized by instantiation with different types and functions

### Basic module use

The most basic use of modules is to group types and the functions which operate on them:

In [2]:
module Buf = struct
    type t = string
    let empty = ""
    let append buf str = buf ^ str
    let contents buf = buf
end

In [3]:
Buf.(contents (append (append (append empty "testing ") "testing ") "1 2 3"))

It would be good style to hide the implementation of the `Buf` type, so we could change it later without breaking users of our library.   We can do that by creating a module signature which makes the type abstract:

In [17]:
module type BUF_SIG = sig
    type t
    val empty : t
    val append : t -> string -> t
    val contents : t -> string
end

In [18]:
module AbsBuf : BUF_SIG = Buf

In [19]:
let b = AbsBuf.(append empty "now we can't tell that buf is a string")

The only way we can get the contents of the buffer is to use the `contents` function:

In [20]:
AbsBuf.contents b

As the module is now abstract, we can replace the implementation with one based on lists which satisfies the same signature:

In [21]:
module ListBuf : BUF_SIG = struct
    type t = string list
    let empty = []
    let append buf str = str :: buf
    let contents buf = String.concat "" (List.rev buf)
end

In [22]:
let c = ListBuf.(append (append (append empty "testing ") "testing ") "1 2 3")

We can still get the contents of this buffer using the `contents` function:

In [23]:
ListBuf.contents c

However `b` and `c` have different types, and the type system prevents us from using functions appropriate for one type on the other:

In [24]:
ListBuf.contents b

Let's look at an example within the toolstack code base. In xcp-idl.git/lib/scheduler.ml we declare some types with concrete definitions, for example:

```ocaml
module Delay = struct
  (* Concrete type is the ends of a pipe *)
  type t = {
    (* A pipe is used to wake up a thread blocked in wait: *)
    mutable pipe_out: Unix.file_descr option;
    mutable pipe_in: Unix.file_descr option;
    (* Indicates that a signal arrived before a wait: *)
    mutable signalled: bool;
    m: Mutex.t
  }
...
end
```

but in the interface, we hide the quite ugly implementation details:

```ocaml
module Delay :
  sig
    type t

    (** Makes a Delay.t *)
    val make : unit -> t

    (** Wait for the specified amount of time. Returns true if we waited
        the full length of time, false if we were woken *)
    val wait : t -> float -> bool

    (** Signal anyone currently waiting with the Delay.t *)
    val signal : t -> unit
  end

```

By doing this, we can guarantee that nobody outside of the module can make any assumptions about how it's implemented, and therefore we can be free to change the way it works if it became necessary.

This is an excellent refactoring technique: Make a type opaque, fixing all thing things that then break, then modify the concrete implementation.

### Functors

Functors are generic modules, parameterized by other modules which can fill in types and functions used but not defined in the functor.

In the following module, `L.hello` is not defined in the `Hello` functor - it will be provided when the functor is instantiated:

In [25]:
module type HELLO = sig
    val hello : string
end

module Hello (L : HELLO) = struct
    let sayhi () = Printf.printf "%s, world!\n" L.hello
end

Here, we define three modules which match the `HELLO` signature.   Modules are structurally typed, so as long as they contain at least the types and functions mentioned in the signature, they can be used:

In [26]:
module EN = struct
    let hello = "hello"
end

module DE = struct
    let hello = "hallo"
    let goodbye = "auf wiedersehen"
end

module PT = struct
    let hello = "ola"
    let goodbye = "adeus"
    let good_morning = "bom dia"
    let good_afternoon = "boa tarde"
    let good_night = "boa noite"
end

We instantiate the functor with each of our modules.   Instantiating a functor yields a module like any other:

In [27]:
module HelloEN = Hello (EN)
module HelloDE = Hello (DE)
module HelloPT = Hello (PT)

In [28]:
HelloEN.sayhi(); HelloDE.sayhi(); HelloPT.sayhi()

hello, world!
hallo, world!
ola, world!


The `GOODBYE` interface is similar:

In [29]:
module type GOODBYE = sig
    val goodbye : string
end

module Goodbye (L : GOODBYE) = struct
    let saybye () = Printf.printf "%s, C++\n" L.goodbye
end

In [30]:
module GoodbyeDE = Goodbye(DE)

In [31]:
GoodbyeDE.saybye()

auf wiedersehen, C++


However the `EN` module does not satisfy this interface and cannot be used with `Goodbye`:

In [32]:
module GoodbyeEN = Goodbye(EN)

Let's look at an example of this sort of thing in Xapi's database module.

http://github.com/xapi-project/xen-api/blob/master/ocaml/database/db_cache_types.ml

A functor can be parameterized by more than one module:

In [33]:
module SayItDifferently(P : sig val postprocess : string -> string end)(L : sig val good_afternoon : string end) = struct
    let saybye () = Printf.printf "%s\n" (P.postprocess L.good_afternoon)
end

The `SayItDifferently` functor expects a module providing a `good_afternoon` function, but also another module providing a `postprocess` function, which it applies to the output of `good_afternoon`.   We can instantiate it as follows:

In [34]:
module SayItDifferentlyPT = SayItDifferently(struct let postprocess = String.uppercase end)(PT)

In [35]:
SayItDifferentlyPT.saybye()

BOA TARDE


## Phantom types

Here is a module that wraps access to a resource.   We will use an int to represent a file descriptor.     The "file" can be opened read-write and read-only, and we need to make sure that a user of our module can't write to a read-only file:

In [36]:
module File : sig
   type t
   val open_readwrite : string -> t
   val open_readonly : string -> t
   val read : t -> string
   val write : t -> string -> unit
end = struct
   type t = int
   let open_readwrite filename = 42
   let open_readonly filename = 42
   let read f = "test"
   let write f s = Printf.printf "wrote %s\n" s
end

let _ =
   let f = File.open_readonly "whatever" in
   File.write f "hello"  (* oops *)


wrote hello


In the example above, we opened a file read-only and then wrote to it.   With a bit more book-keeping, we could make `write` raise an exception.   However in this case we can use the type system to make writing to a read-only file a _compile time_ error, rather than a runtime error.  The technique we will use is called a Phantom Type:

In [37]:
module PFile : sig
   type ro
   type rw
   type 'a t
   val open_readwrite : string -> rw t
   val open_readonly : string -> ro t
   val read : 'a t -> string
   val write : rw t -> string -> unit
end = struct
   type ro
   type rw
   type 'a t = int
   let open_readwrite filename = 42
   let open_readonly filename = 42
   let read f = "test"
   let write f s = Printf.printf "wrote %s\n" s
end

let _ =
   let f = PFile.open_readonly "whatever" in
   PFile.write f "hello"


`File` and `PFile` are very similar: in particular, the function implementations are identical.  The only changes we have made are to introduce two new types, `ro` and `rw`, and to make `t` a parameterized type.   `ro` and `rw` are empty - we can't make values with those types, but we can use them to _tag_ values of type `t`.   We then add some constraints to the function types to say that, for example, `open_readonly` returns a `t` tagged with `ro`, and `write` requires a `t` tagged `ro`.   The type system tracks those constraints and won't let us call `write` with a `ro t`.   `read` can take any kind of `t`.   These constraints are all checked at compile-time - no extra object code is generated.

Once again, let's look in xapi's code to see where we use this. In this case, API references use phantom types to distinguish between a reference for a VM and a host for example.

[ref.ml](http://github.com/xapi-project/xen-api/blob/master/ocaml/xapi-types/ref.ml) defines a type `t` with a phantom type parameter. Internally, it's just a string, but we hide that from everyone else by declaring it abstract in the [ref.mli](http://github.com/xapi-project/xen-api/blob/master/ocaml/xapi-types/ref.mli) file.

The generated file aPI.ml then declared types using this Ref.t:

```ocaml
type ref_VM = [`VM] Ref.t 
```

then the signatures of functions that operate on VM references can use this, for example, `get_record`:

```ocaml
val get_record : rpc:(Rpc.call -> Rpc.response) -> session_id:ref_session -> self:ref_VM -> vM_t
```

This means that although references are just strings internally, we can never accidentally mix up one type of reference for another.