## Filter

You can filter the rows of a table with [Table.filter](https://hail.is/docs/devel/hail.Table.html#hail.Table.filter).   This returns a table of those rows for which the expression evaluates to `True`.

In [None]:
import hail as hl
import seaborn

hl.utils.get_movie_lens('data/')
users = hl.read_table('data/users.ht')

In [None]:
users.filter(users.occupation == 'programmer').count()

## Annotate

You can add new fields to a table with [annotate](https://hail.is/docs/devel/hail.Table.html#hail.Table.annotate).  Let's mean-center and variance-normalize the `age` field.

In [None]:
stats = users.aggregate(hl.agg.stats(users.age))
missing_occupations = hl.set(['other', 'none'])

t = users.annotate(
    cleaned_occupation = hl.cond(missing_occupations.contains(users.occupation),
                                 hl.null('str'),
                                 users.occupation))
t.show()

Note: `annotate` is functional: it doesn't mutate `users`, but returns a new table.  This is also true of `filter`.  In fact, all operations in Hail are functional.

In [None]:
users.describe()

There are two other annotate methods: [select](https://hail.is/docs/devel/hail.Table.html#hail.Table.select) and [transmute](https://hail.is/docs/devel/hail.Table.html#hail.Table.transmute).  `select` returns a table with the key and an entirely new set of value fields.  `transmute` replaces any fields mentioned on the right-hand side with the new fields, but leaves unmentioned fields unchanged.  `transmute` is useful for transforming data into a new form.  How about some examples?

In [None]:
(users.select(len_occupation = hl.len(users.occupation))
 .describe())

In [None]:
(users.transmute(
    cleaned_occupation = hl.cond(missing_occupations.contains(users.occupation),
                                 hl.null(hl.tstr),
                                 users.occupation))
 .describe())

Finally, you can add global fields with [annotate_globals](https://hail.is/docs/devel/hail.Table.html#hail.Table.annotate_globals).  Globals are useful for storing metadata about a dataset or storing small data structures like sets and maps.

In [None]:
t = users.annotate_globals(cohort = 5, cloudable = hl.set(['sample1', 'sample10', 'sample15']))
t.describe()

In [None]:
t.cloudable

In [None]:
t.cloudable.value

## Exercises


- [Z-score normalize](https://en.wikipedia.org/wiki/Standard_score) the age field of `users`.
- Convert `zip` to an integer.  Hint: Not all zipcodes are US zipcodes!  Use [hl.int32](https://hail.is/docs/devel/functions/constructors.html#hail.expr.functions.int32) to convert a string to an integer.  Use [StringExpression.matches](https://hail.is/docs/devel/expressions.html#hail.expr.expression.StringExpression.matches) to see if a string matches a regular expression.