# Chicago Health Atlas

The Chicago Health Atlas is a great resource for public health and population data. It even has its own [API](https://api.chicagohealthatlas.org/api-docs/index.html)!

In [1]:
import pandas as pd
import requests

In [3]:
API = "https://api.chicagohealthatlas.org/api/v1"

## Sample Request

This is how we can fetch data from the Chicago Health Atlas API.

In [13]:
area_slug = "albany-park"
r = requests.get(f"{API}/place/demography/{area_slug}")
r.json()

{'location': 'albany-park',
 'demographic_data': [{'age_group': '00-04',
   'pop_2000': 4999.0,
   'pop_2010': 4224.0},
  {'age_group': '05-14', 'pop_2000': 8557.0, 'pop_2010': 6843.0},
  {'age_group': '15-24', 'pop_2000': 10032.0, 'pop_2010': 8247.0},
  {'age_group': '25-34', 'pop_2000': 11234.0, 'pop_2010': 10024.0},
  {'age_group': '35-44', 'pop_2000': 8553.0, 'pop_2010': 7828.0},
  {'age_group': '45-54', 'pop_2000': 6357.0, 'pop_2010': 6100.0},
  {'age_group': '55-64', 'pop_2000': 3685.0, 'pop_2010': 4299.0},
  {'age_group': '65-74', 'pop_2000': 2206.0, 'pop_2010': 2176.0},
  {'age_group': '75-84', 'pop_2000': 1469.0, 'pop_2010': 1227.0},
  {'age_group': '85+', 'pop_2000': 563.0, 'pop_2010': 574.0}],
 'total_population_data': {'pop_2000': 57655, 'pop_2010': 51542},
 'topics_data': [{'title': 'Below Poverty Level',
   'data': {'id': None,
    'name': 'Below Poverty Level',
    'value': 17.1,
    'description': 'Percent of households under the poverty level for the years 2007-2011.',

## Community Areas Dataset

One common data ingestion task we will want to do is get public health data by community area.

We have two datasets for the the 77 community areas in Chicago:

- [Chicago Data Portal](https://data.cityofchicago.org/Facilities-Geographic-Boundaries/Boundaries-Community-Areas-current-/cauq-8yn6): includes area number
- [Chicago Health Atlas](https://api.chicagohealthatlas.org/api-docs/index.html) includes area slug

We need to look up community areas by area number or area slug depending on the data source, so we should link these two datasets.

In the future, we will also have to map street address or longitude/latitude coordinates to community areas, so this will be neither the first nor the last time we have to use different methods when working with community area data.

In [24]:
r = requests.get(f"{API}/places")
res_places = r.json()
df_places = pd.DataFrame(res_places["community_areas"])
print(f"{len(df_places)} community areas in the places data.")

77 community areas in the places data.


In [25]:
df_places.head()

Unnamed: 0,centroid,geo_type,geometry,id,name,part,resource_cnt,slug
0,"[-87.72156,41.968068]",Community Area,"{""type"":""MultiPolygon"",""coordinates"":[[[[-87.7...",,Albany Park,Far North Side,0,albany-park
1,"[-87.726363,41.81088]",Community Area,"{""type"":""MultiPolygon"",""coordinates"":[[[[-87.7...",,Archer Heights,Southwest side,388,archer-heights
2,"[-87.633974,41.842077]",Community Area,"{""type"":""MultiPolygon"",""coordinates"":[[[[-87.6...",,Armour Square,South Side,0,armour-square
3,"[-87.708365,41.745757]",Community Area,"{""type"":""MultiPolygon"",""coordinates"":[[[[-87.7...",,Ashburn,Far Southwest side,90,ashburn
4,"[-87.656307,41.744205]",Community Area,"{""type"":""MultiPolygon"",""coordinates"":[[[[-87.6...",,Auburn Gresham,Far Southwest side,571,auburn-gresham


In [16]:
AREAS_GEOJSON_URL = "https://data.cityofchicago.org/api/geospatial/cauq-8yn6?method=export&format=GeoJSON"

In [26]:
r = requests.get(AREAS_GEOJSON_URL)
geojson_areas = r.json()
df_geojson = pd.DataFrame(geojson_areas["features"])
print(f"{len(df_geojson)} community areas in the GeoJSON data.")

77 community areas in the GeoJSON data.


In [21]:
df_geojson.head()

Unnamed: 0,geometry,properties,type
0,"{'type': 'MultiPolygon', 'coordinates': [[[[-8...","{'community': 'DOUGLAS', 'area': '0', 'shape_a...",Feature
1,"{'type': 'MultiPolygon', 'coordinates': [[[[-8...","{'community': 'OAKLAND', 'area': '0', 'shape_a...",Feature
2,"{'type': 'MultiPolygon', 'coordinates': [[[[-8...","{'community': 'FULLER PARK', 'area': '0', 'sha...",Feature
3,"{'type': 'MultiPolygon', 'coordinates': [[[[-8...","{'community': 'GRAND BOULEVARD', 'area': '0', ...",Feature
4,"{'type': 'MultiPolygon', 'coordinates': [[[[-8...","{'community': 'KENWOOD', 'area': '0', 'shape_a...",Feature


Thankfully, we can link this two datasets by the community area names. The output confirms that all 77 areas match up perfectly between the two datasets.

In [59]:
df_places["name_upper"] = df_places["name"].apply(lambda s: s.upper())
df_geojson["name_upper"] = df_geojson["properties"].apply(lambda p: p["community"])

In [60]:
df_left = df_places.set_index("name_upper").drop("id", axis=1)
df_right = df_geojson.set_index("name_upper")
df_joined = df_left.join(df_right, rsuffix="_geojson")
n_null_cells = df_joined.isnull().sum(axis=0).sum(axis=0)
print(f"{len(df_joined)} community areas in the joined data, with {n_null_cells} null cells.")

77 community areas in the joined data, with 0 null cells.


In [66]:
df_areas = df_joined.drop([
    "geo_type",
    "geometry_geojson",
    "type"
], axis=1)
df_areas["area_number"] = df_areas["properties"].apply(lambda p: p["area_numbe"]) # not a typo haha
df_areas["order"] = df_areas["area_number"].apply(lambda s: s.zfill(2))
df_areas = df_areas.sort_values(by="order", ascending=True)

In [67]:
df_areas.head()

Unnamed: 0_level_0,centroid,geometry,name,part,resource_cnt,slug,properties,area_number,order
name_upper,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
ROGERS PARK,"[-87.670167,42.009623]","{""type"":""MultiPolygon"",""coordinates"":[[[[-87.6...",Rogers Park,Far North Side,0,rogers-park,"{'community': 'ROGERS PARK', 'area': '0', 'sha...",1,1
WEST RIDGE,"[-87.695013,42.001572]","{""type"":""MultiPolygon"",""coordinates"":[[[[-87.6...",West Ridge,Far North Side,0,west-ridge,"{'community': 'WEST RIDGE', 'area': '0', 'shap...",2,2
UPTOWN,"[-87.655879,41.965812]","{""type"":""MultiPolygon"",""coordinates"":[[[[-87.6...",Uptown,Far North Side,0,uptown,"{'community': 'UPTOWN', 'area': '0', 'shape_ar...",3,3
LINCOLN SQUARE,"[-87.687515,41.975172]","{""type"":""MultiPolygon"",""coordinates"":[[[[-87.6...",Lincoln Square,Far North Side,0,lincoln-square,"{'community': 'LINCOLN SQUARE', 'area': '0', '...",4,4
NORTH CENTER,"[-87.683835,41.947792]","{""type"":""MultiPolygon"",""coordinates"":[[[[-87.6...",North Center,North Side,0,north-center,"{'community': 'NORTH CENTER', 'area': '0', 'sh...",5,5


## Extracting Data For All Areas

Now we can query the Chicago Health Atlas API for all 77 community areas by area slug (I could not find a method to request information for a list of community areas or all community areas at once) and create a table that can be joined to other tables by area number.

We can use the handy `tqdm` library to track our progress. To avoid overwhelming the API, we put a small `sleep` after each request, which waits for a given number of seconds.

In [75]:
from tqdm import tqdm
from time import sleep

In [76]:
rows = []
community_areas = list(zip(df_areas["area_number"].values, df_areas["slug"].values))
for number, slug in tqdm(community_areas):
    r = requests.get(f"{API}/place/demography/{slug}")
    data = r.json()
    data["area_number"] = number
    data["area_slug"] = slug
    rows.append(data)
    sleep(0.1)

100%|██████████| 77/77 [00:31<00:00,  2.45it/s]


In [77]:
df_demo = pd.DataFrame(rows)

In [78]:
df_demo.head()

Unnamed: 0,area_number,area_slug,demographic_data,location,topics_data,total_population_data
0,1,rogers-park,"[{'age_group': '00-04', 'pop_2000': 5073.0, 'p...",rogers-park,"[{'title': 'Below Poverty Level', 'data': {'id...","{'pop_2000': 63484, 'pop_2010': 54991}"
1,2,west-ridge,"[{'age_group': '00-04', 'pop_2000': 5448.0, 'p...",west-ridge,"[{'title': 'Below Poverty Level', 'data': {'id...","{'pop_2000': 73199, 'pop_2010': 71942}"
2,3,uptown,"[{'age_group': '00-04', 'pop_2000': 3253.0, 'p...",uptown,"[{'title': 'Below Poverty Level', 'data': {'id...","{'pop_2000': 63551, 'pop_2010': 56362}"
3,4,lincoln-square,"[{'age_group': '00-04', 'pop_2000': 2720.0, 'p...",lincoln-square,"[{'title': 'Below Poverty Level', 'data': {'id...","{'pop_2000': 44574, 'pop_2010': 39493}"
4,5,north-center,"[{'age_group': '00-04', 'pop_2000': 1877.0, 'p...",north-center,"[{'title': 'Below Poverty Level', 'data': {'id...","{'pop_2000': 31895, 'pop_2010': 31867}"


In [79]:
rows[0]["demographic_data"]

[{'age_group': '00-04', 'pop_2000': 5073.0, 'pop_2010': 3737.0},
 {'age_group': '05-14', 'pop_2000': 7846.0, 'pop_2010': 5161.0},
 {'age_group': '15-24', 'pop_2000': 11945.0, 'pop_2010': 9392.0},
 {'age_group': '25-34', 'pop_2000': 13569.0, 'pop_2010': 11433.0},
 {'age_group': '35-44', 'pop_2000': 10425.0, 'pop_2010': 8867.0},
 {'age_group': '45-54', 'pop_2000': 6830.0, 'pop_2010': 7211.0},
 {'age_group': '55-64', 'pop_2000': 3338.0, 'pop_2010': 5070.0},
 {'age_group': '65-74', 'pop_2000': 2218.0, 'pop_2010': 2368.0},
 {'age_group': '75-84', 'pop_2000': 1547.0, 'pop_2010': 1196.0},
 {'age_group': '85+', 'pop_2000': 693.0, 'pop_2010': 556.0}]

In [80]:
rows[0]["total_population_data"]

{'pop_2000': 63484, 'pop_2010': 54991}

In [81]:
rows[0]["topics_data"]

[{'title': 'Below Poverty Level',
  'data': {'id': None,
   'name': 'Below Poverty Level',
   'value': 22.7,
   'description': 'Percent of households under the poverty level for the years 2007-2011.',
   'stat_type': 'range, percent'}},
 {'title': 'Crowded Housing',
  'data': {'id': None,
   'name': 'Crowded Housing',
   'value': 7.9,
   'description': 'Percent of occupied crowded housing units for the years 2007-2011.',
   'stat_type': 'range, percent'}},
 {'title': 'Dependency',
  'data': {'id': None,
   'name': 'Dependency',
   'value': 28.8,
   'description': 'Percent of persons aged less than 16 or more than 64 years for the years 2007-2011.',
   'stat_type': 'range, percent'}},
 {'title': 'No High School Diploma',
  'data': {'id': None,
   'name': 'No High School Diploma',
   'value': 18.1,
   'description': 'Percent of persons aged 25 years and older with no high school diploma for the years 2007-2011.',
   'stat_type': 'range, percent'}},
 {'title': 'Per capita income',
  'data