You can use Python's builtin :mod:`re` module to introduce a regular expression into a :meth:`.Table.where` query.
For example, here we find all states that start with "C".
import re
new_table = table.where(lambda row: re.match('^C', row['state']))
This can also be useful for finding values that don't match your expectations. For example, finding all values in the "phone number" column that don't look like phone numbers:
new_table = table.where(lambda row: not re.match('\d{3}-\d{3}-\d{4}', row['phone']))
Hate regexes? You can use glob (:mod:`fnmatch`) syntax too!
from fnmatch import fnmatch
new_table = table.where(lambda row: fnmatch('C*', row['state']))
This snippet filters the dataset to incomes between 100,000 and 200,000.
new_table = table.where(lambda row: 100000 < row['income'] < 200000)
This snippet filters the dataset to events during the summer of 2015:
import datetime
new_table = table.where(lambda row: datetime.datetime(2015, 6, 1) <= row['date'] <= datetime.datetime(2015, 8, 31))
If you want to filter to events during the summer of any year:
new_table = table.where(lambda row: 6 <= row['date'].month <= 8)
To filter a dataset to the top 10% percent of values we first compute the percentiles for the column and then use the result in the :meth:`.Table.where` truth test:
percentiles = table.aggregate(agate.Percentiles('salary'])
top_ten_percent = table.where(lambda r: r['salary'] >= percentiles[90])
By combining a random sort with limiting, we can effectively get a random sample from a table.
import random
randomized = table.order_by(lambda row: random.random())
sampled = table.limit(10)
With can also get an ordered sample by simply using the step
parameter of the :meth:`.Table.limit` method to get every Nth row.
sampled = table.limit(step=10)