# PSY 525.001 Spring 2020

## Purpose

This notebook documents Rick Gilmore's explorations U.S. Census data using a Jupyter notebook. I'm using the Jupyter notebook hosted here: https://github.com/jtleider/censusdata/blob/master/docs/notebooks/example1.ipynb for significant support and inspiration.

## Preliminaries 

### Install the dataset

In [5]:
import sys

Now that we have imported the `sys` package, we can install it into the Python executable that our notebook is using.

In [6]:
!$sys.executable -m pip install --user censusdata --upgrade

Collecting censusdata
  Using cached https://files.pythonhosted.org/packages/2e/80/09af724ad019b202602cbc47a74737b9609971e3db69e163213732f2f724/CensusData-1.7.tar.gz
Collecting requests (from censusdata)
  Using cached https://files.pythonhosted.org/packages/1a/70/1935c770cb3be6e3a8b78ced23d7e0f3b187f5cbfab4749523ed65d7c9b1/requests-2.23.0-py2.py3-none-any.whl
Collecting urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 (from requests->censusdata)
  Using cached https://files.pythonhosted.org/packages/e8/74/6e4f91745020f967d09332bb2b8b9b10090957334692eb88ea4afe91b77f/urllib3-1.25.8-py2.py3-none-any.whl
Collecting idna<3,>=2.5 (from requests->censusdata)
  Using cached https://files.pythonhosted.org/packages/89/e3/afebe61c546d18fb1709a61bee788254b40e736cff7271c7de5de2dc4128/idna-2.9-py2.py3-none-any.whl
Collecting certifi>=2017.4.17 (from requests->censusdata)
  Using cached https://files.pythonhosted.org/packages/b9/63/df50cac98ea0d5b006c55a399c3bf1db9da7b5a24de7890bc9cfd5dd9e99/certifi-2019.11.

**Note:** This did not work for me, probably because I am using `pyenv` and a virtual environment. When I installed the `censusdata` package from the command line using `pip install censusdata`, and restarted the notebook, things worked.

In [1]:
import censusdata

### Import packages and set options

In [2]:
import pandas as pd
import censusdata
pd.set_option('display.expand_frame_repr', False)
pd.set_option('display.precision', 2)

## View the data

In [3]:
censusdata.search('acs5', 2015, 'label', 'unemploy')[160:170]

[('B23024_023E',
  'B23024.  Poverty Status in the Past 12 Months by Disability Status by Employment Status for the Population 20 to 64 Years',
  'Income in the past 12 months at or above poverty level:!!With a disability:!!In labor force:!!Civilian:!!Unemployed'),
 ('B23024_023M',
  'B23024.  Poverty Status in the Past 12 Months by Disability Status by Employment Status for the Population 20 to 64 Years',
  'Margin of Error for!!Income in the past 12 months at or above poverty level:!!With a disability:!!In labor force:!!Civilian:!!Unemployed'),
 ('B23024_030E',
  'B23024.  Poverty Status in the Past 12 Months by Disability Status by Employment Status for the Population 20 to 64 Years',
  'Income in the past 12 months at or above poverty level:!!No disability:!!In labor force:!!Civilian:!!Unemployed'),
 ('B23024_030M',
  'B23024.  Poverty Status in the Past 12 Months by Disability Status by Employment Status for the Population 20 to 64 Years',
  'Margin of Error for!!Income in the pas

Let's try to focus on Centre County, Pennsylvania. First, we need to find the FIPS codes.

In [4]:
censusdata.geographies(censusdata.censusgeo([('state', '*')]), 'acs5', 2015)

{'Alabama': censusgeo((('state', '01'),)),
 'Alaska': censusgeo((('state', '02'),)),
 'Arizona': censusgeo((('state', '04'),)),
 'Arkansas': censusgeo((('state', '05'),)),
 'California': censusgeo((('state', '06'),)),
 'Colorado': censusgeo((('state', '08'),)),
 'Connecticut': censusgeo((('state', '09'),)),
 'Delaware': censusgeo((('state', '10'),)),
 'District of Columbia': censusgeo((('state', '11'),)),
 'Florida': censusgeo((('state', '12'),)),
 'Georgia': censusgeo((('state', '13'),)),
 'Hawaii': censusgeo((('state', '15'),)),
 'Idaho': censusgeo((('state', '16'),)),
 'Illinois': censusgeo((('state', '17'),)),
 'Indiana': censusgeo((('state', '18'),)),
 'Iowa': censusgeo((('state', '19'),)),
 'Kansas': censusgeo((('state', '20'),)),
 'Kentucky': censusgeo((('state', '21'),)),
 'Louisiana': censusgeo((('state', '22'),)),
 'Maine': censusgeo((('state', '23'),)),
 'Maryland': censusgeo((('state', '24'),)),
 'Massachusetts': censusgeo((('state', '25'),)),
 'Michigan': censusgeo((('stat

So, Pennsylvania is state 42. Next, we'll list the counties in Pennsylvania.

In [5]:
censusdata.geographies(censusdata.censusgeo([('state', '42'), ('county', '*')]), 'acs5', 2015)

{'Adams County, Pennsylvania': censusgeo((('state', '42'), ('county', '001'))),
 'Allegheny County, Pennsylvania': censusgeo((('state', '42'), ('county', '003'))),
 'Armstrong County, Pennsylvania': censusgeo((('state', '42'), ('county', '005'))),
 'Beaver County, Pennsylvania': censusgeo((('state', '42'), ('county', '007'))),
 'Bedford County, Pennsylvania': censusgeo((('state', '42'), ('county', '009'))),
 'Berks County, Pennsylvania': censusgeo((('state', '42'), ('county', '011'))),
 'Blair County, Pennsylvania': censusgeo((('state', '42'), ('county', '013'))),
 'Bradford County, Pennsylvania': censusgeo((('state', '42'), ('county', '015'))),
 'Bucks County, Pennsylvania': censusgeo((('state', '42'), ('county', '017'))),
 'Butler County, Pennsylvania': censusgeo((('state', '42'), ('county', '019'))),
 'Cambria County, Pennsylvania': censusgeo((('state', '42'), ('county', '021'))),
 'Cameron County, Pennsylvania': censusgeo((('state', '42'), ('county', '023'))),
 'Carbon County, Penn

So, we're county 027. The zero matters. Now, we'll download a set of tables for state 42, county 027 and all of the block groups.

In [7]:
centreco_pa = censusdata.download('acs5', 2015,
                             censusdata.censusgeo([('state', '42'), ('county', '027'), ('block group', '*')]),
                             ['B23025_003E', 'B23025_005E', 'B15003_001E', 'B15003_002E', 'B15003_003E',
                              'B15003_004E', 'B15003_005E', 'B15003_006E', 'B15003_007E', 'B15003_008E',
                              'B15003_009E', 'B15003_010E', 'B15003_011E', 'B15003_012E', 'B15003_013E',
                              'B15003_014E', 'B15003_015E', 'B15003_016E'])

# Calculate % unemployed
centreco_pa['percent_unemployed'] = centreco_pa.B23025_005E / centreco_pa.B23025_003E * 100

# Calculate % no HS education
centreco_pa['percent_nohs'] = (centreco_pa.B15003_002E + centreco_pa.B15003_003E + centreco_pa.B15003_004E
                          + centreco_pa.B15003_005E + centreco_pa.B15003_006E + centreco_pa.B15003_007E + centreco_pa.B15003_008E
                          + centreco_pa.B15003_009E + centreco_pa.B15003_010E + centreco_pa.B15003_011E + centreco_pa.B15003_012E
                          + centreco_pa.B15003_013E + centreco_pa.B15003_014E +
                          centreco_pa.B15003_015E + centreco_pa.B15003_016E) / centreco_pa.B15003_001E * 100

centreco_pa_ue_nohs = centreco_pa[['percent_unemployed', 'percent_nohs']]
centreco_pa_ue_nohs.describe()

Unnamed: 0,percent_unemployed,percent_nohs
count,99.0,96.0
mean,5.65,6.71
std,5.37,6.46
min,0.0,0.0
25%,2.13,1.09
50%,4.23,5.35
75%,7.25,10.65
max,26.27,30.11


So, of there are 99 block groups reporting unemployment data and 96 that report having residents with no HS education.