# Data Types in ACL2

## Scalar Data Types

We have already seen many scalar data types:

| Type       | Description                                | Example |
|:-----------|:-------------------------------------------|:--------|
| `int`      | an integer                                 | 42      |
| `pos`      | a positive integer                         | 12      |
| `neg`      | a negative integer                         | -5      |
| `nat`      | a natural number, or non-negative integer  | 0       |
| `rational` | a rational number                          | 22/7    |
| `boolean`  | a boolean value (`T` or `NIL`)             | T       |
| `symbol`   | an alphanumeric word                       | 'hello  |

Recall that each type features a "colon" keyword that names the type for function definitions and an ACL2 function that recognizes values of that type. For example, when you define a function that accepts a natural number as a parameter, you use the `:nat` keyword, and the function `(natp x)` returns true precisely when `x` is a natural number. The same is true for `:boolean` and `(booleanp x)`, and so on. Notice that the leuwprd has a `:` in front of the type name, and the function that recognizes elements of that type has a `p` at the end of the type name.

We are now going to define new types, and each new type will have a name, and the associated "colon" keyword and function recognizer (ending in `p`). Types are defined using

    (defdata type-name
             ...type specifier...)

The type specifier is an ACL2 expressions that describes elements of the new type. We introduce these specifiers by example.

Let's start with **range types** These refer to a subset of one of the numeric types. For example, the integers from 0 to 9 can be specified as follows:

    (defdata single-digit
             (range integer (0 <= _ < 10)))

Notice how the range of valid numbers is specified. You can specify a lower bound, an upper bound, or both. And you can use `<=` or `<` to indicate whether the bound is inclusive or exclusive. The `_` is used as a placeholder for the number that is being defined.

> Range types are cool, and some languages still support them, though they are not nearly as popular as they used to be. But think, for example, of a specific type that captures the valid indexes for an array. Pascal and Ada could do that.

Try it! Write a definition for the type of rational numbers between 0 and 1, including both endpoints.

Next, let's consider **enumerations**. These are very popular in other languages. E.g., in C, you can say

    typedef enum { RED, GREEN, BLUE } color

to define a type that contains just those three enumerated values. In ACL2, you would write

    (defdata color
             (enum '(red green blue)))

A variable of type `color` can have one of the values `'red`, `'greeen`, or `'blue`.  In particular, the following will evaluate to true:

    (colorp `green)

Notice the possible values are symbols, so you must qouote them with a leading single quote. In some ways, enumerations are siomilar to range types, except they introduce subsets of possible symbol values instead of subsets of possible numeric values.

> **Aside:** Take another look at the way the colors are specified: `'(red green blue)`. There are two new concepts here. First, ACL2 supports **lists**, which are actually the major data structure. Here, the list contains the values "red", "green", and "blue". But if you see the expression `(red green blue)` in ACL2, you would interpret that as the call of the function "red" with the paramaters "green" and "bluie" (which are variable names). The single quote in front of the list is what says "this is not a function call, just take me literally". I.e., a single quote is used to quote the next word, as in `'green`, but it actually quotes the next expression. In `'(red green blue)`. the entire list of colors is quoted.

You try it! Define an enumeration called `traffic-light` that holds the possible values of a traffic light.

Next up, let's consider **union types**, for example, a member of a type could be either an integer or a symbol; that's the union of the types `integer` and `symbol`. Here is a nice example. A `hex-digit` is either a `single-digit` (a range type defined above) or a letter from `a` go `f`:

    (defdata hex-digit
             (oneof single-digit
                    (enum '(a b c d e f))))

Notice that the options acceepted by `oneof` include named types (like `single-digit`) or new type specifications (like the enumeration). You can use `oneof` with 2, 3, or any number of arguments, so you can union integers, naturals, rationals, symbols, etc., in a single `oneof` definition.

You try it! Define a datatype that includes the positive and negative integers (but not zero).

## Composite Data Types

**Composite types** store more than one value. In languages like C++, this corresponds to a class or a struct. As we saw in a previous tutorial, ACL2 uses lists to support composite types. Now let's see how lists can be used to define new types.

In ACL2, lists are often (over-)used to represent **product types**. A product type is a pair, or triple, or ..., or n-tuple. That is, a product types consists of a fixed number of elements, each of which is of a particular type. For example, you could represent a date using three numbers, for the date, month, and year:

    (defdata date
             (list (range integer (1 <= _ <= 12))
                   (range integer (1 <= _ <= 31))
                   (range integer (1 <= _ <= 2100))))

Then `(datep (list 1 19 2021))` would be true. Here is another example. You may define a person type that keeps track of a person's first name, last name, and birthdate:

    (defdata person
             (list symbol
                   symbol
                   date))

For example, `(personp '(john galt (10 10 1957)))` would be true.

Notice that **product types** are simply special types of lists, and it is the programmer's responsibility to know, for example, that the first element is the first name, the second is the last name, and so on.

Try it! Define a **product type** that can represent a two-dimensional point, given by its `x` and `y` coordinates.

I recommend that you stick with product types when you're learning ACL2, but at some point you may be interested in using something more similar to records in other languages. ACL2 also supports **record types**, which are similar to product types, but with the added feature that the fields in the records have names. For example, here is an alternative way to define the date type:

    (defdata month (range integer (1 <= _ <= 12)))
    (defdata day   (range integer (1 <= _ <= 31)))
    (defdata year  (range integer (1 <= _ <= 2100)))

    (defdata date
             (record (mm . month)
                     (dd . day) 
                     (yy . year)))

Notice that the types for the fields in the records must be type **names**, not type expressions, so we first gave names to the types for months, days, and years.

> What are those dots in `(mm . month)`? It turns out that lists are **not** the most fundamental data structure in ACL2. The fundamental composite data structure is the pair of elements, written `(x . y)`. Lists are, in fact, syntactic sugar for deeply nested pairs. My advice is to ignore all that and simply use lists. If you ever see ACL2 return a dotted pair, treat it as an indication that you created a list incorrectly.

The record definition does more than define `:date` and `datep`. It also defines functions for constructing dates and for accessing the elements of a date. For example, the ACL2 expression `(date 10 10 1957)` returns a date object, and if `d` is a date object, then `(date-yy d)` is the year. It also creates functions for "changing" the value of the fields. For example, `(set-date-yy 1984 d)` returna **new** date object that has the same day and month as `d`, but with the year 1984. Notice that the `set-` functions return a new object. Remember, there is no such thing as changing the value of a variable in ACL2!

It's your turn! Define a record structure similar to the person type above. How can you contruct a new person object? How can you get the month and year of a person's birthday?

Now let's consider defining **list data types**. The easiest type of list is a homogeneous list, e.g., a list of integers. ACL2 provides a simple short-cut for defining lists of a single type

    (defdata list-of-integers
             (listof int))

After defining this data type, `(list-of-integersp '(1 2 3))` will evaluate to `T`, since `'(1 2 3)` is, in fact, a list of integers. Note that `(list-of-integersp NIL)` is also true, since `NIL`, the empty list, is a list of any type.

Lists can contain any type, including other lsts. For example, we can define the type of list of lists of integers:

    (defdata list-of-lists-of-integers
             (listof (listof int)))

(We could have also used the previously defined `list-of-integers` type, of course.) Now, `(list-of-lists-of-integersp '((1 2 3) (4 5) (6 7 8 9))` is true. What about lists of lists of lists of integers? That should be pretty easy, right?

Your turn! Define a type that corresponds to lists of lists of lists of integers. E.g., here is a list of this type '(((1 2) (3) (4 5)) ((6 7) (8 9)))`

But what if you wanted lists of lists of lists of ... of lists of integers? I.e., arbitrarily nested lists?

At first, it appears that this is an impossibility, but actually it's quite easy to do in ACL2. Think of an arbitrarily nested list of integers. What is the type of its elements?   The key observation is that each element of this arbitrarily nested list of integers must be either an integer or a (wait for it) arbitrarily nested list of integers! So we can define it like this:

    (defdata nested-list-of-integers
             (listof (oneof integer
                            nested-list-of-integers)))

This is a **recursive type definition**. At first blush, such data types look mysterious, but the truth is that you have probably used similar types in other programming languages. For example, here is one way to define a linked list in C++:

    struct node {
        int         value;
        struct node *next;
    };
    typedef struct node *list;

This makes it very clear. A linked list is a pointer to a node. So a linked list is either empty (when it points to NULL or 0), or it points to a node which consists of an integer and the remainder of the list.

The `listof` definition above is actually a shortcut for something much more similar to the above C++ definition:

    (defdata list-of-symbols
             (oneof NIL
                    (cons symbol list-of-symbols)))

What this says is that a list of symbols is either
* empty (i.e., NIL)
* a symbol followed by a list of symbols

In this definition `cons` is a special function. It is actually **the built-in constructor** for lists in ACL2. It is, in effect, the equivalent of the `struct node` from the C++ example above.

You should continue to use `listof` to define lists, but it is important to understand the equivalent recursive data definition using `cons`, because using recursive data definitions allows us to build more complicated data structures, like trees.

Your turn! Define the data type of lists of lists of integers using `cons` instead of `listof`.

Finally, we'll use the recursive defdata types to define trees. A binary tree in C++ looks like

    struct treenode {
        int             value;
        struct treenode *left;
        struct treenode *right;
    };
    typedef struct treenode *tree;

As was the case with lists, this says that a binary tree is a pointer to a tree node. In particular, a binary tree is either
* empty (when the pointer is null)
* a treenode, which consists of a value, and a left and right subtree

We can do this directly in ACL2

    (defdata bintree
             (oneof NIL
                    (list int bintree bintree)))

Notice how the list is used as a product type to mimic the equivalent record in C++.

> Aside: ACL2 also lets you write recursive record definitions, so if you prefer using records over product types, you can use records to define recursive structures like trees!

Try it! Define a ternary tree of symbols in ACL2. A ternary tree has up to three subtrees.

We can now define binary trees, ternary trees, quaternary trees, and so on. But what if we want to define trees with an arbitrary number of subtrees? It turns out this is not that hard. Here is an approach:

    (defdata tree
             (oneof NIL
                    (list int
                          (listof tree))))

Now each node in the tree contains an integer and an arbitrary number of subtrees!

Actually, that definition has a small problem, in that it would consider this to be a tree:

    '(10 (nil nil nil nil))

That is a tree with four children -- but actually, it really doesn't have any children, since all the subtrees are null. It's better if we restrict the list of children to consist of non-empty trees. That's easy enough to do:

    (defdata tree
             (list int
                   (listof tree)))

This works, but now we have a different problem. A tree has to have at least one node, so there is no way to have an empty tree. This may be ok. But what if you want to allow empty trees, yet still enforce that a tree has only non-empty children? You can probably think of a solution yourself, maybe something along these lines

    (defdata non-empty-tree
             (list int
                   (listof non-empty-tree)))
    (defdata tree
             (oneof NIL
                    non-empty-tree))

Maybe you think this is a hack, but it's actually reasonable. The same issues crop up in languages like Java and C++. For example, suppose you create a linked list in Java:

    List<Integer> l = new LinkedList<Integer> ();

At this point, `l` is pointing to an **empty** linked list. Notice that `l` is not a null pointer. If it were, we would not be able to add an element to the list `l`! So how does Java handle this? The class `LinkedList` is a wrapper similar to the definition of `tree`. It also defines a private inner class called `LinkedList.Node` which is similar to the definition of `non-empty-tree`. 

Different languages, same problems, same solutions. That's the way computer science works.