# Expressions

In [None]:
import hail as hl
hl.init()

# Eager Evaluation

Python and R use **eager** evaulation.

When you enter an expression, the result is computed immediately and stored.

In [None]:
1 + 2

# Lazy Evaluation

Eager evaluation won't work on datasets that won't fit in memory.

Consider the UK Biobank BGEN file, which is ~2TB but decompresses to >100TB in memory.

In order to process datasets of this size, Hail uses lazy evaluation.

When you enter an expression, Hail doesn't execute the expression immediately: it simply records what you asked to do.

In [None]:
one = hl.int32(1)
three = one + 2
three

Hail evaluates an expression only when it must, for example:

 - when performing an aggregation,
 - when calling `take`, `collect` or `show`,
 - when exporting or writing to disk.

Hail evaluates expressions by streaming to accomodate very large datasets.

You can evaluate expressions with no index by calling `value`. The `show` method also prints the type.

In [None]:
three.value

In [None]:
three.show()

# Indices

Expressions carry another piece of information: indices.  Indices record the `Table` or `MatrixTable` to which the expression refers, and the axes over which the expression can vary.

Let's see some examples from the 1000 genomes dataset:

In [None]:
hl.utils.get_1kg('data/')

In [None]:
mt = hl.read_matrix_table('data/1kg.mt')
mt

Let's add a global field.

In [None]:
mt = mt.annotate_globals(dataset = '1kg')

And examine some fields.

In [None]:
mt.dataset.describe()

In [None]:
mt.locus.describe()

In [None]:
mt.s.describe()

In [None]:
mt.GT.describe()

Expressions like `locus`, `s`, and `GT` above have no one `value`, but rather their value varies across rows or columns of `mt`.

Global fields don't vary across rows or columns, so they have a `value`:

In [None]:
mt.dataset.value

# `show`, `take`, and `collect`

Although expressions with indices have no `value`, you can use `show` to print the first few values, or `take` and `collect` to localize values to Python.

In [None]:
mt.s.show()

In [None]:
mt.s.take(5)

You can `collect` an expression to localize all values, like getting a list of all sample IDs of a dataset.

But be careful -- don't `collect` more data than can fit in memory!

In [None]:
all_sample_ids = mt.s.collect()
all_sample_ids[:5]

# Learning more

Hail has a suite of of [functions](https://hail.is/docs/devel/functions/index.html) to transform and build expressions.

Also, see the documentation for the [expressions](https://hail.is/docs/devel/expressions.html) themselves.