Skip to content
Switch branches/tags
Go to file

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Census Looker

This is a set of scripts to make public Census data downloaded from the U.S. Census Bureau's DataFerrett tool explorable in Looker and Google BigQuery.

There are two scripts, one for uploading the data files that come from DataFerrett into BigQuery and one for transforming the DataFerrett codebook(s) into LookML.

Schema Generator

Because Census datafiles can be extremely wide, it's impractical for users to write out the schemas by hand. And especially when working with multiple data files that often differ in subtle ways, users need a programmatic way to compare and combine the datafiles into a single file in BigQuery.

The Schema Generator takes the location of one or more datafiles as well as the name(s) of an equal number of tables you want to upload those files to on BigQuery. You call the script from the command line as follows:

python -f /Users/Documents/datafile1.csv  /Users/Documents/datafile12.csv -t table1 table2

It examines a sample of rows in each file and returns a text file that contains schema(s) in the format BigQuery expects for each datafile. If you're combining multiple datafiles, it also returns a query that unions that datafiles together into a single unified table (with NULLs filling in columns that do not exist in all datafiles) so that you can create one table containing all your data.

You can find sample input and output files in the samples directory.

Codebook Parser

DataFerrett dynamically creates a codebook based on which data source and columns you download. This codebook is in plaintext (example in samples) and if you download data from multiple data sources, you'll end up with multiple codebooks.

Because you'd need to cross-reference the data values in each codebook by hand, this approach quickly becomes unwieldy for larger datasets. That's where this parser comes in. It takes the location(s) of the codebook(s) you've downloaded, the names of the table(s) the underlying data lives in on BigQuery, the name of the weighting variable (since this data is generally sampled and each respondent is assigned a "weight" to represent some number of Americans like them), and an optional flag to merge the codebooks into one set of LookML files.

Calling the script from the command line looks like this:

python -f "/Users/Documents/codebook1.txt" "/Users/Documents/codebook2.txt" -t "table1" "table2" -m "PWCMPWGT" -o merge

When you run this script, the output is three LookML view files (if you've merged the codebooks) or three view files for each input file (if you haven't merged them). The first contains all the variables from the codebook(s), rewritten as LookML dimensions. The second (denoted with a _filters) contains all the variables from the codebook(s), rewritten as LookML filter-only dimensions. The third (denoted with a _measures) contains a measure to calculate the weighted population of the cohort and a weighted population of the group.

Once loaded into Looker, these files lets you dimensionalize the data and select any combination of dimensions to define your cohort. They also let you set the filter-only fields to independently define your group's characteristics. You can then see how many people meet the filters of your cohort and your group, which allows you to ask questions of the form "How many of [cohort] are in [group]?"

This is useful because it allows you to, for example, define your cohort as all women and your group as voters, allowing you to ask "What percentage of women are voters?" To see the power of this approach in action, visit


No description, website, or topics provided.



No releases published


No packages published