# U.S. Census Data Tutorial

In [None]:
#!pip install censusdata

If censusdata package was not in your enviroment, make sure to uncommond above line to pip it.

Reference of the [CensusData library](https://jtleider.github.io/censusdata/index.html)

In [None]:
import pandas as pd
import re
import numpy as np
import censusdata

### Main Methods
[CensusData API Documentation](https://jtleider.github.io/censusdata/api.html)

In [None]:
# Search for ACS 2015-2019 5-year estimate variables where the concept 
# includes the text 'population'.
sample = censusdata.search('acs5', 2019, 'concept', 
                           lambda value: re.search('population', value, re.IGNORECASE))

**Parameters:**	
* src (str) – Census data source: ```‘acs1’``` for **ACS 1-year estimates**, ```‘acs5’``` for **ACS 5-year estimates**, ```‘acs3’``` for **ACS 3-year estimates**, ```‘acsse’``` for **ACS 1-year supplemental estimates**, ```‘sf1’``` for **SF1 data**.
* year (int) – Year of data.
* field (str) – Field in which to search.
* criterion (str or function) – Search criterion. Either string to search for, or a function which will be passed the value of field and return True if a match and False otherwise.
* tabletype (str, optional) – Type of table from which variables are drawn (only applicable to ACS data). Options are ```‘detail’``` (detail tables), ```‘subject’``` (subject tables), ```‘profile’``` (data profile tables), ```‘cprofile’``` (comparison profile tables).

**Returns:**	
List of 3-tuples containing variable names, concepts, and labels matching the search criterion.

**Return type:**	
list

In [None]:
print(len(sample))

In [None]:
sample

This would be the sample amount we get based on what we use to search. In this case, there are 10765 samples which are ACS 5-year estimates for 2019 include the text 'population'.

In [None]:
print(sample[0])

Let's use the first sample file as an example. Based on the result from above, the first sample is called: 'B01003_001E', which is a total population table under the parent table B01003. 

After you know the parent table you're interested in you can use the ```printtable``` function to get a clean readout of all the subtables in order to check if there are other subtables we might interested about.

In [None]:
censusdata.printtable(censusdata.censustable('acs5', 2019, 'B01003'))

### Data download

If you want download data based on some state, county etc. Start at **step 1**, if not start at **step 3**.

**Step 1** If you want to download the data for some States, you need to find the geography code for it. And function ```geographies``` is build for that

In [None]:
states = censusdata.geographies(censusdata.censusgeo([('state', '*')]), 'acs5', 2019)
print(states['Michigan'])

In [None]:
states

**Step 2** Also if you want it be county level you need do almost the same thing but by adding county after state. For example:

In [None]:
counties = censusdata.geographies(censusdata.censusgeo([('state', '26'), ('county', '*')]), 'acs5', 2019)
print(counties['Wayne County, Michigan'])

In [None]:
counties

**Step 3** Now, is time to download what you want. Example based on Michigan, Wayne County. If you don't have state and county code, leave that as ```'*'```.

In [None]:
data = censusdata.download('acs5', 2019, censusdata.censusgeo([('state', '26'),
                                                               ('county', '163'),
                                                               ('block group', '*')]),
                          ['B01003_001E'])

In [None]:
data

And this is the length of the data we get.

In [None]:
len(data)

### Extra (data formating, slice)

This part are some extra step if you need, such as change the column name by using pandas, and slice it based on Census Tract by using ```census_cut``` in ```Help_Functions```.

In [None]:
column_name = ['TOTAL POPULATION']
data.columns = column_name

In [None]:
new_indices = []
for index in data.index.tolist():
    new_indices.append(index)

data.index = new_indices

In [None]:
data.head()

Sum up the total people in Wayne County Michigan

In [None]:
data['TOTAL POPULATION'].sum()

### ```census_cut``` usage

In [None]:
from Help_Functions import census_cut
import re

For example, we want the data for some areas based on Census Tracts are 5303, 5304, 5316, 5317

In [None]:
Tracts = ['Census Tract 5303', 'Census Tract 5304','Census Tract 5316', 'Census Tract 5317']

In [None]:
df = census_cut(Tracts, data)
df