## Expression Tutorial

In [None]:
import hail as hl
hl.init()

### What is an Expression?

Data types in Hail are represented by [expression](https://hail.is/docs/devel/expressions.html#expressions) classes. Each data type in Hail has its own expression class. For example, an integer of type `tint32` is represented by an `Int32Expression`. 

We can construct an integer expression in Hail with the [int32](https://hail.is/docs/devel/functions/constructors.html?highlight=int32#hail.expr.functions.int32) function.

In [None]:
hl.int32(3)

Hail has numeric, string, and boolean types, as well as collection types such as arrays, dicts, and structs. There is a complete list of Hail's [types](https://hail.is/docs/devel/types.html#types) and corresponding [expressions](https://hail.is/docs/devel/expressions.html#expressions) in the documentation. 

The `literal` function will convert a Python object into a Hail expression. Let's try it out on a Python list.

In [None]:
hl.literal(['a', 'b', 'c'])

The Python list is converted to an ArrayExpression of type `array<str>`. In other words, an array of strings.

### Expressions are Lazy

In languages like Python and R, expressions are evaluated and stored immediately. This is called **eager** evalutation.

In [None]:
1 + 2

Eager evaluation won't work on datasets that won't fit in memory. Consider the UK Biobank BGEN file, which is ~2TB but decompresses to >100TB in memory.

In order to process datasets of this size, Hail uses lazy evaluation. When you enter an expression, Hail doesn't execute the expression immediately: it simply records what you asked to do.

In [None]:
one = hl.int32(1)
three = one + 2
three

Hail evaluates an expression only when it must. For example:

 - when performing an aggregation,
 - when calling `take`, `collect` or `show`,
 - when exporting or writing to disk.

Hail evaluates expressions by streaming to accomodate very large datasets.

If you want to force the evaluation of an expression, you can do so by calling `value`. Note that this can only be done on an expression with no index, such as `hl.int32(1) + 2`. If the expression has an index, e.g. `table.idx + 1`, 
then the `value` method will fail. The section on indices below explains this concept further. 

In [None]:
three.value

The `show` method can also be used to evaluate and display the expression.

In [None]:
three.show()

### Indices

Expressions carry another piece of information: indices.  Indices record the `Table` or `MatrixTable` to which the expression refers, and the axes over which the expression can vary.

Let's see some examples from the 1000 genomes dataset:

In [None]:
hl.utils.get_1kg('data/')

In [None]:
mt = hl.read_matrix_table('data/1kg.mt')
mt

Let's add a global field.

In [None]:
mt = mt.annotate_globals(dataset = '1kg')

We can examine any field of the matrix table with the `describe` method. If we examine the field we just added, notice that it has no indices, because it is a global field.

In [None]:
mt.dataset.describe()

The `locus` field is a row field, so it will be indexed by `row`. 

In [None]:
mt.locus.describe()

Likewise, a column field `s` will be indexed by `column`.

In [None]:
mt.s.describe()

And finally, an entry field `GT` will be indexed by both the `row` and `column`.

In [None]:
mt.GT.describe()

Expressions like `locus`, `s`, and `GT` above do not have a single value, but rather a value that varies across rows or columns of `mt`. Therefore, calling the `value` method on these expressions will lead to an error.

Global fields don't vary across rows or columns, so they have a `value`:

In [None]:
mt.dataset.value

### `show`, `take`, and `collect`

Although expressions with indices have no `value`, you can use `show` to print the first few values, or `take` and `collect` to localize values to Python. 

`show` and `take` grab the first 10 rows by default, but you can specify a number of rows to grab.

In [None]:
mt.s.show()

In [None]:
mt.s.take(5)

You can `collect` an expression to localize all values, like getting a list of all sample IDs of a dataset.

But be careful -- don't `collect` more data than can fit in memory!

In [None]:
all_sample_ids = mt.s.collect()
all_sample_ids[:5]

### Learning more

Hail has a suite of of [functions](https://hail.is/docs/devel/functions/index.html) to transform and build expressions.

Also, see the documentation for the [expressions](https://hail.is/docs/devel/expressions.html) themselves.