<a href="https://colab.research.google.com/github/npr99/URSC645/blob/main/Tasks/CodingExercises/ReadSourceData/URSC645_CensusAPI_Read_SourceData.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Read in US Census Data from the API
### Application of Descriptive Statistics: Finding US Census Tract Outliers (2020 Census)
 
---
This notebook will demonstrate how to read in data from the US Census API website. Clean the data for use in this statistical software environment (python). Explore the data using histograms and descriptive statistics and mapping. This program brings together publicly available data, US Census geography, and descriptive statistics. 
 
This notebook assumes that you have some basic experience with Google Colab and running blocks of code. Users of this notebook do not need to have any python programming background. A basic introduction and curiosity about descriptive statistics and US Census geography is helpful.

This program is a modification of the notebook found at:
https://github.com/npr99/PlanningMethods_Book/blob/main/notebooks/PLAN_02c_Descriptive_Statistics_2020CensusTracts.ipynb

## Step 1: Obtain Data with Census API

The following section sets up and reads in data from the Census API.

For this notebook we will obtain data from the Census API in the fewest number of lines.


In [1]:
# Python packages required to read in and Census API data
import requests ## Required for the Census API
import pandas as pd # For reading, writing and wrangling data

## What does the next block of code do?
The following block of code does three things:
#### 1. Obtain data from the Census API
The first line requests data from the Census API (`https:// api.census.gov / data`). [Available APIs](https://www.census.gov/data/developers/data-sets/decennial-census.html)

From the 2020 Decennial Census (`/ 2020 / dec/`), redistricting data Public Law (PL 94-171) (`PL`), which means the [short form](https://www.census.gov/programs-surveys/decennial-census/technical-documentation/questionnaires.2020_Census.html). 

The Census API needs three parameters (`params={`) in order to find the correct data for the right geography.
1.   `'get' :` the following [variables](https://api.census.gov/data/2020/dec/pl/variables.html):
> Total Housing Units (`H1_001N`) and Total Population (`P1_001N`)
2.   `'for' : ` the following Census Geography 
> Census tracts (`'tract:*'`)
3.   `'in' : ` the following state
> Texas (`'state:48'`)

You can actually see the same data by going to this weblink:

https://api.census.gov/data/2020/dec/pl?get=H1_001N,P1_001N&for=tract:*&in=state:48

#### 2. Clean the data (convert from json to pandas dataframe)
After the `requests.get` command gets the data from api.census.gov, the data is stored in the varaible names `apijson`. The variable name indicates the data source (`api`) and the file type (`json`). [`JSON`](https://www.json.org/json-en.html) (pronounced JAY-son) is a format, like [CSV](https://www.kaggle.com/rtatman/an-intro-to-json-vs-csv).

#### 3. Explore the data (view the first 5 rows of the data)
The last two lines convert the `json` file into a `pandas` [dataframe](https://towardsdatascience.com/pandas-dataframe-a-lightweight-intro-680e3a212b96). The last line tells Google Colab to display the first 5 rows of the dataframe `tractdf`.

In [2]:
apijson = requests.get('https://api.census.gov/data/2020/dec/pl',
                       params={'get': 'H1_001N,P1_001N',
                               'for': 'tract:*',
                                'in': 'state:48'})
# Convert the requested json into pandas dataframe
tract2020df = pd.DataFrame(columns=apijson.json()[0], data=apijson.json()[1:])
tract2020df.head()

Unnamed: 0,H1_001N,P1_001N,state,county,tract
0,3166,5354,48,29,181004
1,2419,4251,48,29,181005
2,2771,5818,48,29,181100
3,2197,4844,48,29,181200
4,2444,4290,48,29,181301


The parameters for the Census API can be modified to request data for other Census Surveys, variables, geographies, and states.
For more information on Census API:
*   [Census Data API User Guide](https://www.census.gov/content/dam/Census/data/developers/api-user-guide/api-guide.pdf)
*   [List of available Census APIs](https://www.census.gov/data/developers/data-sets/decennial-census.2010.html)
*   [A list of variables available just from the 2010 short form](https://api.census.gov/data/2010/dec/sf1/variables.html)
*   [A list of State FIPS codes](https://www.census.gov/library/reference/code-lists/ansi.html#state)

### Practice 1: Modify the Census API parameters for Decennial Census 2010
Modify the Census API parameters to request data for the 2010 Decennial Census Total Housing Units and Total Population.


### Correct Answer
1. Look up the available API for the 2010 Decennial Census.
     - https://www.census.gov/data/developers/data-sets/decennial-census.2010.html#sf1
     - The URL location is slightly different from the 2020 Decennial Census.
     - `https://api.census.gov/data/2010/dec/sf1`
     - `https://api.census.gov/data/2020/dec/pl`
     - sf1 = summary file 1
     - pl = public law
     - for the 2020 Decennial Census, the sf1 is not available yet.
2. Look up the available variables for the 2010 Decennial Census.
     - Note the the variable names are different from the 2020 Decennial Census.


In [10]:
apijson = requests.get('https://api.census.gov/data/2010/dec/sf1',
                       params={'get': 'H001001,P001001',
                               'for': 'tract:*',
                                'in': 'state:48'})
# Convert the requested json into pandas dataframe
tract2010df = pd.DataFrame(columns=apijson.json()[0], data=apijson.json()[1:])
tract2010df.head()

Unnamed: 0,H001001,P001001,state,county,tract
0,2713,7464,48,141,101
1,1198,2587,48,141,111
2,1732,4248,48,141,205
3,1126,3429,48,141,208
4,2374,6417,48,141,301


### Practice 2: Modify the Census API parameters for ACS
Modify the Census API parameters to request data for the the 2018 5 year ACS the table concept "OWN CHILDREN UNDER 18 YEARS BY FAMILY TYPE AND AGE". By County for Texas.

### Correct Answer
1. Look up the available API for the ACS 5-year data.
     - https://www.census.gov/data/developers/data-sets/acs-5year.html
2. Look up the available tables for the ACS table OWN CHILDREN UNDER 18 YEARS BY FAMILY TYPE AND AGE.
     - https://api.census.gov/data/2018/acs/acs5/variables.html

In [5]:
apijson = requests.get('https://api.census.gov/data/2018/acs/acs5',
                       params={'get': 'group(B09002)',
                               'for': 'county:*',
                                'in': 'state:48'})
# Convert the requested json into pandas dataframe
txcountydf = pd.DataFrame(columns=apijson.json()[0], data=apijson.json()[1:])
txcountydf.head()

Unnamed: 0,B09002_001E,B09002_001EA,B09002_001M,B09002_001MA,B09002_002E,B09002_002EA,B09002_002M,B09002_002MA,B09002_003E,B09002_003EA,...,B09002_019M,B09002_019MA,B09002_020E,B09002_020EA,B09002_020M,B09002_020MA,GEO_ID,NAME,state,county
0,10780,,533,,7181,,656,,1194,,...,304,,1150,,363,,0500000US48013,"Atascosa County, Texas",48,13
1,3166,,212,,2427,,244,,373,,...,96,,174,,95,,0500000US48353,"Nolan County, Texas",48,353
2,708,,118,,469,,152,,99,,...,66,,72,,72,,0500000US48229,"Hudspeth County, Texas",48,229
3,2688,,252,,2255,,330,,348,,...,64,,117,,75,,0500000US48475,"Ward County, Texas",48,475
4,15318,,394,,10585,,727,,1583,,...,358,,1185,,294,,0500000US48203,"Harrison County, Texas",48,203


In [9]:
# Compare data to data obtain through Google Colab
# locate observation with GEOID 0500000US48015
txcountydf.loc[txcountydf['GEO_ID'] == '0500000US48015']

Unnamed: 0,B09002_001E,B09002_001EA,B09002_001M,B09002_001MA,B09002_002E,B09002_002EA,B09002_002M,B09002_002MA,B09002_003E,B09002_003EA,...,B09002_019M,B09002_019MA,B09002_020E,B09002_020EA,B09002_020M,B09002_020MA,GEO_ID,NAME,state,county
58,6529,,188,,4915,,378,,618,,...,137,,499,,233,,0500000US48015,"Austin County, Texas",48,15


## Extended example: Turn code into a function
The following code block is an example of how to turn the code into a function. This function can be used to request data from the Census API for any Census Survey, variables, geography, and state.

In [13]:
def obtain_census_api(
                    state: str = "*",
                    county: str = "*",
                    census_geography: str = 'county:*',
                    vintage: str = "2010", 
                    dataset_name: str = 'dec/sf1',
                    get_vars: str = 'GEO_ID'):

    """General utility for obtaining census from Census API.
    Args:
        state (str): 2-digit FIPS code. Default * for all states
        county (str): 3-digit FIPS code. Default * all counties
        census_geography (str): example '&for=block:*' would be for all blocks
        default is for all counties
        vintage (str): Census Year. Default 2010
        dataset_name (str): Census dataset name. Default Decennial SF1
        for a list of all Census API
        get_vars (str): list of variables to get from the API.
    Returns:
        obj, dict: A dataframe for with Census data
    """
    # Check geography hierarchy
    if (
        census_geography == 'county:*' or 
        census_geography == 'tract:*' or 
        census_geography == 'block:*'
        ):
        geography_hierarchy =  '&in=state:' + state + '&in=county:' + county 
    else:
        geography_hierarchy = ''
    # Set up hyperlink for Census API
    api_hyperlink = ('https://api.census.gov/data/' + vintage + '/'+dataset_name + '?get=' + get_vars +
                    geography_hierarchy + '&for=' + census_geography)

    print("Census API data from: " + api_hyperlink)

    # Obtain Census API JSON Data
    apijson = requests.get(api_hyperlink)

    # Convert the requested json into pandas dataframe
    df = pd.DataFrame(columns=apijson.json()[0], data=apijson.json()[1:])

    return df


In [15]:
txcountydfv2 = obtain_census_api(state = "48",
                county = "*",
                census_geography = 'county:*',
                vintage = "2018", 
                dataset_name = 'acs/acs5',
                get_vars = 'group(B09002)')
txcountydfv2.head()

Census API data from: https://api.census.gov/data/2018/acs/acs5?get=group(B09002)&in=state:48&in=county:*&for=county:*


Unnamed: 0,B09002_001E,B09002_001EA,B09002_001M,B09002_001MA,B09002_002E,B09002_002EA,B09002_002M,B09002_002MA,B09002_003E,B09002_003EA,...,B09002_019M,B09002_019MA,B09002_020E,B09002_020EA,B09002_020M,B09002_020MA,GEO_ID,NAME,state,county
0,10780,,533,,7181,,656,,1194,,...,304,,1150,,363,,0500000US48013,"Atascosa County, Texas",48,13
1,3166,,212,,2427,,244,,373,,...,96,,174,,95,,0500000US48353,"Nolan County, Texas",48,353
2,708,,118,,469,,152,,99,,...,66,,72,,72,,0500000US48229,"Hudspeth County, Texas",48,229
3,2688,,252,,2255,,330,,348,,...,64,,117,,75,,0500000US48475,"Ward County, Texas",48,475
4,15318,,394,,10585,,727,,1583,,...,358,,1185,,294,,0500000US48203,"Harrison County, Texas",48,203
