# Chicago Health Atlas

[The Chicago Health Atlas](https://chicagohealthatlas.org/) is a great resource for public health and population data. It even has its own [API](https://chicagohealthatlas.org/api/v1)!

In [1]:
import pandas as pd
import requests

In [2]:
API = "https://chicagohealthatlas.org/api/v1"

## Sample Request

This is how we can fetch data from the Chicago Health Atlas API.

This example gets the 2015-2019 population estimate for the Bridgeport community area (#60).

In [3]:
geoid_bridgeport = "1714000-60"
r = requests.get(f"{API}/data", params={
    "layer": "neighborhood",
    "geography": geoid_bridgeport,
    "topic": "POP",
    "period": "2015-2019",
    # empty string = entire population
    "population": ""
})
data = r.json()
data

{'time': '17.04 ms',
 'params': {'layer': 'neighborhood',
  'geography': '1714000-60',
  'topic': 'POP',
  'period': '2015-2019',
  'population': ''},
 'count': 1,
 'results': [{'se': 1007.34981865271,
   'g': '1714000-60',
   'l': 'neighborhood',
   'a': 'POP',
   'p': '',
   'd': '2015-2019',
   'v': 34483.901503}]}

## Get Coverages

We can explore other metrics besides population (POP) in the [Topic List](https://chicagohealthatlas.org/api/v1/topics).

Once we have chosen a topic, we need to send a request to the **"coverage"** endpoint to find out what coverage this metric has. For example, getting the coverage of POP tells us which year periods and population segments we have population estimates for at the community area (neighborhood) level.

In [4]:
coverage = requests.get(f"{API}/coverage/POP", params={
    "layers": "neighborhood"
}).json()

In [5]:
df_coverage = pd.DataFrame(coverage["coverages"]["neighborhood"])
df_coverage.head(25)

Unnamed: 0,period,population
0,2015-2019,
1,2014-2018,
2,2013-2017,
3,2012-2016,
4,2011-2015,
5,2010-2014,
6,2009-2013,
7,2008-2012,
8,2007-2011,
9,2006-2010,


## Extracting Data For All Areas

Now we can query the Chicago Health Atlas API for all 10 periods that have total population estimates. Later, we can save the other population segment estimates.

We can use the handy `tqdm` library to track our progress. To avoid overwhelming the API, we put a small `sleep` after each request, which waits for a given number of seconds.

The Health Atlas response data has one-letter field names, so we can also import a variable from our `utils` module that will help us rename the results to more descriptive column names. (Remember that we use the `os.chdir` to change directories so that we can import from the `utils` module.)

In [6]:
import os
os.chdir("../../")

from tqdm import tqdm
from time import sleep
from pipeline.utils.data import HEALTH_ATLAS_VALUE_COLS

In [7]:
# Get periods only for the full population (empty string)
periods_full_population = list(df_coverage[df_coverage["population"] == ""]["period"].values)
periods_full_population

['2015-2019',
 '2014-2018',
 '2013-2017',
 '2012-2016',
 '2011-2015',
 '2010-2014',
 '2009-2013',
 '2008-2012',
 '2007-2011',
 '2006-2010']

In [8]:
# Request endpoint for each period
rows = []
for period in tqdm(periods_full_population):
    r = requests.get(f"{API}/data", params={
        "layer": "neighborhood",
        "topic": "POP",
        "period": period,
        # empty string = entire population
        "population": ""
    })
    data = r.json()
    for record in data["results"]:
        rows.append(record)
    sleep(0.1)

100%|██████████| 10/10 [00:05<00:00,  1.68it/s]


In [9]:
df_pop = pd.DataFrame(rows)
df_pop.rename(inplace=True, columns=HEALTH_ATLAS_VALUE_COLS)
df_pop.head()

Unnamed: 0,topic,period,geoid,layer,population,std_error,value
0,POP,2015-2019,1714000-35,neighborhood,,599.526302,18763.463356
1,POP,2015-2019,1714000-36,neighborhood,,264.595461,4425.440249
2,POP,2015-2019,1714000-37,neighborhood,,225.084211,2396.551147
3,POP,2015-2019,1714000-38,neighborhood,,935.904257,22636.700945
4,POP,2015-2019,1714000-39,neighborhood,,576.190097,14201.388739
