# Census Data API Demo

## Overview

The Census Bureau has an API for accessing many of their datasets. It returns JSON array data for the requested columns.

> The Census Data Application Programming Interface (API) is a data service that enables software developers to access and use Census Bureau data within their applications.

# Usage

The primary doc is the [Census Data API User Guide](https://www.census.gov/content/dam/Census/data/developers/api-user-guide/api-guide.pdf)

The query model is fairly simple, all GET queries with query parameters:  
```https://api.census.gov/data/2014/pep/natstprc?get=STNAME,POP&DATE_=7&for=state:*```

* The URL scheme is alway `https`
* The server is always `api.census.gov`
* The path part of the URL always begins with `/data/`, and identifies a vintage, program and survey/dataset

Some key query parameters:

|Param|Example Value|Description|
|---|---|---|
|`get`|`get=STNAME,POP`|The variables to return|
|_variable name_|`DATE_=7`|Predicates, or constraints on the values. See the user guide for details.|
|`for`|`for=county:*`|The geography for which to return data|
|`in`|`in=state:NY`|The containing geography for which to limit `for` wildcard values|
|`key`|`key=`_your key_|The API key you receive from the Census Bureau|

Values are returned as a JSON array whose first row consists of field names.

The API provides discovery documents in user and machine readable form:

* Dataset discovery: appending `.html` or `.json` to any prefix of path components will show the datasets (endpoints/API base URLs) with links to each of their valid geographies, valid variables, examples, and other metadata.  
E.g., https://api.census.gov/data/2014/pep.html
* Geography levels: appending a path component of `geography.html` (or json) to an endpoint URL will show the geography levels (combinations of `for`/`in` geographies) you can query in a given dataset.  
E.g., https://api.census.gov/data/2014/pep/subcty/geography.html
* Variables: appending a path component of `variables.html` (or json) to an endpoint URL will show the variables (data) you can query in a given dataset.  
E.g., https://api.census.gov/data/2014/pep/subcty/variables.html
* Groups: appending a path component of `groups.html` (or json) to an endpoint URL wil show you the groups you can query in a given dataset. Groups are thematic collections of variables. You can request all the variables in a group using the syntax `get=group(`_groupid_`)`. For more information, see https://www.census.gov/data/developers/updates/groups-functionality.html  
E.g., https://api.census.gov/data/2018/acs/acs1/groups.html

## Limits
A single query may request up to 50 variables (more using groups).

To make more than 500 requests per day per IP address, you need to include an API key. You can request one at [Request A Key](https://api.census.gov/data/key_signup.html).

This requires agreeing to the [Terms of Service](https://www.census.gov/data/developers/about/terms-of-service.html), which among other things require attribution and disclaiming endorsement by the Census Bureau.


In [None]:
import pandas as pd
import requests

In [None]:
def get_acs5_data(fields, vintage, for_geo, in_geo=None, key=None):
    """Retrieve ACS 5-year fields.
    
    fields: a dict mapping ACS variable names to column names.
    vintage: the year for which to request data.
    for_geo: a dict containing geographies for which to request data.
    in_geo: a dict containing geographics in which to constrain data.
    key: your Census API key
    
    Returns a DataFrame with the columns renamed to human-friendly names."""
    
    url = f'https://api.census.gov/data/{vintage}/acs/acs5'
    params = {
        'get': ','.join(fields.keys()),
        'for': ','.join(f'{k}:{v}' for k,v in for_geo.items()),
    }
    if in_geo:
        params['in'] = ','.join(f'{k}:{v}' for k,v in in_geo.items())
    if key:
        params['key'] = key
    r = requests.get(url, params)
    r.raise_for_status()

    df = pd.read_json(r.text)
    df.columns = df.iloc[0]
    df.columns  = [fields.get(column, column) for column in df.columns]
    df = df.iloc[1:]
    return df

Now we will retrieve demographic characteristics for ZIP codes (technically, Zip Code Tabulation Areas).

Some relevant variable groups:
* `B02001` is Race
* `B02002` is Detailed Race
* `B03002` is Hispanic or Latino Origin By Race

The last of these permits separate counts for common race/ethnicity combinations.

In [None]:
census_data = get_acs5_data(
    fields = {
        'B00001_001E': 'median_income',
        # Always use the total for a group when computing proportions with its other variables.
        'B03002_001E': 'total_population',
        'B03002_003E': 'non_hispanic_white_population',
        'B03002_004E': 'non_hispanic_black_population',
        'B03002_006E': 'asian_population',
        'B03002_012E': 'hispanic_population',
    },
    vintage=2018,
    for_geo={
        'zip code tabulation area': '*',
    }
)

In [None]:
census_data.head()

As a complication, these values are all returned as strings.  We need to cast them to numerical types before we can compute percentages. (Note: always leave the geography IDs as strings.)

In [None]:
census_data[[
    'median_income',
    'total_population',
    'non_hispanic_white_population',
    'non_hispanic_black_population',
    'asian_population',
    'hispanic_population',
]] = census_data[[
    'median_income',
    'total_population',
    'non_hispanic_white_population',
    'non_hispanic_black_population',
    'asian_population',
    'hispanic_population',
]].astype(float)

In [None]:
census_data.describe()

As we can see from the summary, some ZCTAs do not have `median_income` values.

Now we can compute the percentage values (or rather, proportions) from the counts.

In [None]:
census_data['non_hispanic_white_fraction'] = census_data['non_hispanic_white_population'] / census_data['total_population']
census_data['non_hispanic_black_fraction'] = census_data['non_hispanic_black_population'] / census_data['total_population']
census_data['hispanic_fraction'] = census_data['hispanic_population'] / census_data['total_population']
census_data['asian_fraction'] = census_data['asian_population'] / census_data['total_population']

In [None]:
census_data.describe()