# Census Data Tools

MORPC works regularly with census data, including but not limited to ACS 5 and 1-year, Decennial Census, PEP, and geographies. The following module is useful for gathering and organizing census data for processes in various workflow. Those workflows are linked when appropriate. 

In [1]:
import morpc

## API functions and variables

api_get() is a low-level wrapper for Census API requests that returns the results as a pandas dataframe. If necessary, it splits the request into several smaller requests to bypass the 50-variable limit imposed by the API.  

The resulting dataframe is indexed by GEOID (regardless of whether it was requested) and omits other fields that are not requested but which are returned automatically with each API request (e.g. "state", "county") 

In [2]:
url = 'https://api.census.gov/data/2022/acs/acs1'
params = {
    "get": "GEO_ID,NAME,B01001_001E",
    "for": "county:049,041",
    "in": "state:39"
}

In [3]:
api = morpc.census.api_get(url, params)

Total variables requested: 3
Starting request #1. 3 variables remain.


In [4]:
api

Unnamed: 0_level_0,NAME,B01001_001E
GEO_ID,Unnamed: 1_level_1,Unnamed: 2_level_1
0500000US39041,"Delaware County, Ohio",226296
0500000US39049,"Franklin County, Ohio",1321820


## American Community Survey (ACS) Data Class

When using ACS data, generally we will be digesting data produded using the [morpc-censusacs-fetch](https://github.com/morpc/morpc-censusacs-fetch) workflow. The data that is produced from that script is by default saved in its output_data folders ./morpc-censusacs-fetch/output_data/

The Census ACS Fetch script leverages the `acs_data` class form `morpc.census`


### Create an initial object which represents a variable in the ACS data api.

The class takes 3 arguments:

1. variable group number
2. the year
3. the type of survey (1 or 5 year estimates)

In [5]:
acs = morpc.census.acs_data('B11001', '2023', '5')

The initial call creates queries the Census for the variable definitions and returns a dictionary of the available variables in the group. see `acs.VARS`

In [6]:
acs.DIMENSIONS

['TOTAL', 'Household Type', 'Living Alone', 'Spouse Present']

In [7]:
acs.VARS

{'B11001_001E': {'label': 'Estimate!!Total:',
  'concept': 'Household Type (Including Living Alone)',
  'predicateType': 'int',
  'group': 'B11001',
  'limit': 0,
  'attributes': 'B11001_001EA,B11001_001M,B11001_001MA'},
 'B11001_002E': {'label': 'Estimate!!Total:!!Family households:',
  'concept': 'Household Type (Including Living Alone)',
  'predicateType': 'int',
  'group': 'B11001',
  'limit': 0,
  'attributes': 'B11001_002EA,B11001_002M,B11001_002MA'},
 'B11001_003E': {'label': 'Estimate!!Total:!!Family households:!!Married-couple family',
  'concept': 'Household Type (Including Living Alone)',
  'predicateType': 'int',
  'group': 'B11001',
  'limit': 0,
  'attributes': 'B11001_003EA,B11001_003M,B11001_003MA'},
 'B11001_004E': {'label': 'Estimate!!Total:!!Family households:!!Other family:',
  'concept': 'Household Type (Including Living Alone)',
  'predicateType': 'int',
  'group': 'B11001',
  'limit': 0,
  'attributes': 'B11001_004EA,B11001_004M,B11001_004MA'},
 'B11001_005E': {'

### Query the API for the deisred variables and geography

The `.query()` method queries the API and caches the data in memory under `acs.DATA`. At the same time it creates a frictionless schema that corrosponds with the data. 

#### scope:
These are pre-defined sumlevels and scopes for commonly queried geographies. see `morpc.census.SCOPES`.

In [8]:
morpc.census.SCOPES

{'us-states': {'desc': 'all states in the United States',
  'for': 'state:*',
  'in': 'us:*'},
 'ohio': {'desc': 'the State of Ohio', 'for': 'state:39'},
 'ohio-counties': {'desc': 'all counties in the State of Ohio',
  'for': 'county:*',
  'in': 'state:39'},
 'ohio-tracts': {'desc': 'all Census tracts in the State of Ohio',
  'for': 'tract:*',
  'in': 'state:39'},
 'region15-counties': {'desc': 'all counties in the MORPC 15-county region',
  'for': 'county:041,045,049,089,097,129,159,083,101,117,047,073,091,127,141',
  'in': 'state:39'},
 'region15-tracts': {'desc': 'all Census tracts in the MORPC 10-county region',
  'for': 'tract:*',
  'in': ['state:39',
   'county:041,045,049,089,097,129,159,083,101,117,047,073,091,127,141']},
 'regionmpo-parts': {'desc': 'all Census township parts and place parts that are MORPC MPO members',
  'ucgid': '1550000US3902582041,0700000US390410577499999,0700000US390410578899999,0700000US390410942899999,1550000US3918000041,0700000US390411814099999,155000

In [9]:
morpc.census.ACS_VAR_GROUPS['B11001']['dimensions']

['TOTAL', 'Household Type', 'Living Alone', 'Spouse Present']

In [10]:
acs = acs.query(scope='region15-counties')

morpc-acs5-2023-region15-counties-b11001 schema is valid
Total variables requested: 19
Starting request #1. 19 variables remain.


In [11]:
data = acs.DATA
data.head()

Unnamed: 0_level_0,B11001_001E,B11001_001M,B11001_002E,B11001_002M,B11001_003E,B11001_003M,B11001_004E,B11001_004M,B11001_005E,B11001_005M,B11001_006E,B11001_006M,B11001_007E,B11001_007M,B11001_008E,B11001_008M,B11001_009E,B11001_009M
GEO_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
0500000US39041,80455,530,61028,838,51755,1091,9273,752,3127,446,6146,586,19427,950,16217,917,3210,400
0500000US39045,59979,510,43253,884,33336,923,9917,792,3221,424,6696,662,16726,795,13678,805,3048,454
0500000US39047,11635,239,7341,347,5388,325,1953,260,693,161,1260,206,4294,372,3425,327,869,206
0500000US39049,547922,1767,312682,3140,210814,2936,101868,2740,27792,1453,74076,2069,235240,3220,181656,3175,53584,1904
0500000US39073,11565,286,8201,369,5923,417,2278,343,952,279,1326,257,3364,345,2816,329,548,120


### For custom queries, use for and in parameters to pass to api query. 

#### for_param:
(optional) The geographies for which to call the the query "state:*" represents all states. "state:39" represent Ohio.

#### in_param:
(optional) A filter for the for parameter. In combinations this allows you do call for small geograhpies inside larger ones. 

> Examples: for_param="county:\*", in_param="state:39" would get all counties in Ohio.
> for_param="tract:\*", in_param='state:39,county:041,049' gets all census tracts in Delaware and Franklin Counties.

### Filter the variables using the get parameter

#### get_param:
(Optional) If you want to return a subset of variables, they can be passed here as a list.

### Dimension Tables

When the query is called the class makes table with the dimensions included that can be used to get summaries of the data. 

This can be used to get quick queries for summaries. 

In [12]:
acs.DIMENSIONS

['TOTAL', 'Household Type', 'Living Alone', 'Spouse Present']

In [15]:
acs = morpc.census.acs_data('B11001', '2023', '5')
acs = acs.query(scope='region15-counties')
acs.DIM_TABLE

morpc-acs5-2023-region15-counties-b11001 schema is valid
Total variables requested: 19
Starting request #1. 19 variables remain.


Unnamed: 0,GEO_ID,VARIABLE,VALUE,VAR_TYPE,TOTAL,Household Type,Living Alone,Spouse Present
0,0500000US39041,B11001_001E,80455,Estimate,Total,,,
1,0500000US39045,B11001_001E,59979,Estimate,Total,,,
2,0500000US39047,B11001_001E,11635,Estimate,Total,,,
3,0500000US39049,B11001_001E,547922,Estimate,Total,,,
4,0500000US39073,B11001_001E,11565,Estimate,Total,,,
...,...,...,...,...,...,...,...,...
265,0500000US39117,B11001_009M,208,MOE,Total,Nonfamily households,Householder not living alone,
266,0500000US39127,B11001_009M,204,MOE,Total,Nonfamily households,Householder not living alone,
267,0500000US39129,B11001_009M,244,MOE,Total,Nonfamily households,Householder not living alone,
268,0500000US39141,B11001_009M,335,MOE,Total,Nonfamily households,Householder not living alone,


In [17]:
DIM_GROUPS = acs.DIM_TABLE.groupby('GEO_ID')

In [19]:
name, table = [x for x in DIM_GROUPS][0]

In [31]:
table.loc[table['VAR_TYPE']=='Estimate'].fillna("").set_index(acs.DIMENSIONS).drop(columns = ['GEO_ID', 'VARIABLE', 'VAR_TYPE']).style.to_latex()

'\\begin{tabular}{llllr}\n &  &  &  & VALUE \\\\\nTOTAL & Household Type & Living Alone & Spouse Present &  \\\\\n\\multirow[c]{9}{*}{Total} &  &  &  & 80455 \\\\\n & \\multirow[c]{5}{*}{Family households} &  &  & 61028 \\\\\n &  & Married-couple family &  & 51755 \\\\\n &  & \\multirow[c]{3}{*}{Other family} &  & 9273 \\\\\n &  &  & Male householder, no spouse present & 3127 \\\\\n &  &  & Female householder, no spouse present & 6146 \\\\\n & \\multirow[c]{3}{*}{Nonfamily households} &  &  & 19427 \\\\\n &  & Householder living alone &  & 16217 \\\\\n &  & Householder not living alone &  & 3210 \\\\\n\\end{tabular}\n'

### Save raw data (not dim table) as a frictionless resource with schema

After querying the data, save the data as a frictionless resource with reasonable descriptors. 

In [None]:
acs.save(output_dir='./temp_data/')

In [None]:
acs.SCHEMA

In [None]:
acs.RESOURCE

## Georeference the data to map

In [None]:
acs = morpc.census.acs_data('B01001', '2023', '5')
acs = acs.query(get_param=['B01001_001E'], scope='region15-tracts')

In [None]:
acs = acs.georeference()

In [None]:
acs.DATA.explore(column='B01001_001E')

## Below should still be functional, but hoping to implement into ACS class

#### Load the data using frictionless.load_data()

In [None]:
data, resource, schema = morpc.frictionless.load_data('./temp_data/morpc-acs5-2023-state-B01001.resource.yaml', verbose=False)

#### Using ACS_ID_FIELDS to get the fields ids

In [None]:
morpc.census.acs_generate_universe_table(data.set_index("GEO_ID"), "B01001_001")

#### Create a dimension table with the data and the dimension names

In [None]:
dim_table = morpc.census.acs_generate_dimension_table(data.set_index("GEO_ID"), schema, idFields=idFields, dimensionNames=["Sex", "Age group"])

In [None]:
dim_table.loc[dim_table['Variable type'] == 'Estimate'].head()

### Build ACS Variable Group JSON for Dimension names

In [None]:
import requests
r = requests.get('https://api.census.gov/data/2023/acs/acs5/variables.json')
varjson = r.json()

In [None]:
groups = {}
for variable in varjson['variables']:
    if variable not in ['for', 'in', 'ucgid', 'GEO_ID', 'AIANHH', 'AIHHTL', 'AIRES', 'ANRC']:
        group = varjson['variables'][variable]['group']
        if not group[-1].isalpha():
            if group not in groups:
                groups[group] = {}
                groups[group]['concept'] = varjson['variables'][variable]['concept']
                groups[group]['dimensions'] = ['TOTAL'] + varjson['variables'][variable]['concept'].replace(' by ',':').replace(' and ',':').split(':')
                variables = {}
                for variable in varjson['variables']:
                    if varjson['variables'][variable]['group'] == group:
                        variables[variable] = varjson['variables'][variable]['label']
                variables = {k: v for k, v in sorted(variables.items(), key=lambda item: item[0])}
                groups[group]['variables'] = variables

In [None]:
groups = {k: v for k, v in sorted(groups.items(), key=lambda item: item[0])}

In [None]:
import json
# with open('../morpc/census/acs_variable_groups.json', 'w') as file:
#     json.dump(groups, file, indent=3)

In [None]:
varjson['variables']['B01001_001E']['group']