We should define a small set of benchmarks up front. Particularly important to benchmark: * data loading * filtering * reduce * low-level extension array