# Calling the Rewiring America Health Impacts API from Python

In December, Rewiring America released our second public API. The API is based on data and models
we built to type to answer questions about how the health of people across the county would be
affected if communities were to electrify their homes in various ways. 

You can read about the research on [our web site](https://www.rewiringamerica.org/research/home-electrification-health-benefits),
and in the [New York Times](https://www.nytimes.com/2024/12/10/climate/heat-pumps-savings.html).
Reports and new coverage are awesome, but we also wanted to make sure it
was easy for researchers to get the underlying data, sliced and diced to meet their needs. So we
created an API.

In this notebook we are going to demonstrate how you can call the API directly from Python and what
kinds of questions you can ask it.

## Python Environment Setup

The Rewiring America APIs can be called from any Python 3.8+ environment using the requests package. The Health Impacts AP returns data in tabular form, so it is also useful to have Pandas installed in your virtual environment. If you don’t already have both of these packages installed, you can install them with

```shell
pip install requests
```

and

```shell
pip install pandas
```

Those should be the only things you need to ensure are installed before you can run the code in this notebook.

## Imports and configuration

Once we have our environment set up, our next step is to import the packages we are going to use

In [1]:
import requests
import pandas as pd
from pathlib import Path

We also need the URL of the Health Effects API.

In [2]:
HOST = "https://api.rewiringamerica.org"
HEALTH_ADDRESS_URL = f"{HOST}/api/v2/health-impacts/"

API_KEY = None  # Put your API key here, or better yet in the file ~/.rwapi/api_key.txt

In [3]:
if API_KEY is None:
    api_key_path = Path.home() / ".rwapi" / "api_key.txt"

    if api_key_path.is_file():
        with open(api_key_path) as f:
            API_KEY = f.read().strip()

## Our First API Call

### Payload

We are going to start out with a basic call to the API, and then we will build up from there.

In our first call, we will pass in 

- a single metric (premature mortality),
- a single upgrade (an air source heat pump),
- a single state (Wisconsin, denoted by the state FIPS code 55).

The return value will be the nationwide impact on premature mortality in person-years if every home
in Wisconsin were to upgrade to a heat pump. Note that the effects are nationwide, because winds can
blow pollutants into nearby states. But the bulk of the effect will occur in and near the state of Wisconsin.

NOTE: a list of state FIPS codes can be found [here](https://www.bls.gov/respondents/mwr/electronic-data-interchange/appendix-d-usps-state-abbreviations-and-fips-codes.htm).

In [4]:
payload = {
    "metrics": ["avoided_premature_mortality_incidence"],
    "upgrade": ["hvac__heat_pump_seer18_hspf10"],
    "state_fips": ["55"]
}

### Headers

In addition to the payload, we will send in some standard headers with each call, to indicate
that we are sending the payload in JSON form and we expect the results to come back in JSON.

In [5]:
headers = {
    "Content-Type": "application/json",
    "Accept": "application/json",
    "Authorization": f"Bearer {API_KEY}"
}

### Make the Call

Now we can make the actual call, look at the response code, which should be 200 to indicate
that everything was OK, and then the text of the response.

In [6]:
# Make the API call:
response = requests.post(HEALTH_ADDRESS_URL, json=payload, headers=headers)

In [7]:
response.status_code

200

### Data from the Response

Now we will make a data frame out of the JSON data that comes back from the call.

We asked for one state, so the response will have one row. It will also have a column
to indicate what the state is. This will be more important when we ask for more than
one state. We asked for one metric, so it will have a column for that. It will also
have a column for the number of households in the state.

In [8]:
df_data = pd.DataFrame(response.json()["data"])

In [9]:
df_data

Unnamed: 0,state_abbreviation,number_of_households,metric,impact,units,warnings
0,WI,2189591,avoided_premature_mortality_incidence,28.0,deaths_per_year,


So looking at the `impact`, we see that if all of the households in Wisconsin were to switch to heat pumps, 28 lives per year would be saved across the continental US.

## Expanding our Query to All States

Now let's make a second query. Just like in the first one, we
pass in a single metric (premature mortality) and a single upgrade (a
medium efficiency heat pump). But unlike in the Wisconsin example, we do not pass in a state. Therefore, we get results for all states in the continental United States. 

In [None]:
payload_all_states = {
    "metrics": ["avoided_premature_mortality_incidence"],
    "upgrade": ["hvac__heat_pump_seer18_hspf10"],
}

response_all_states = requests.post(HEALTH_ADDRESS_URL, json=payload_all_states, headers=headers)

In [11]:
df_data_all_states = pd.DataFrame(response_all_states.json()["data"])
df_data_all_states

Unnamed: 0,state_abbreviation,number_of_households,metric,impact,units,warnings
0,AL,1409929,avoided_premature_mortality_incidence,15.86,deaths_per_year,
1,AR,884020,avoided_premature_mortality_incidence,11.27,deaths_per_year,
2,AZ,1848670,avoided_premature_mortality_incidence,2.98,deaths_per_year,
3,CA,12098560,avoided_premature_mortality_incidence,166.44,deaths_per_year,
4,CO,1904118,avoided_premature_mortality_incidence,27.33,deaths_per_year,
5,CT,1290316,avoided_premature_mortality_incidence,43.78,deaths_per_year,
6,DC,236562,avoided_premature_mortality_incidence,10.72,deaths_per_year,
7,DE,299274,avoided_premature_mortality_incidence,11.11,deaths_per_year,
8,FL,4840199,avoided_premature_mortality_incidence,10.92,deaths_per_year,
9,GA,2898066,avoided_premature_mortality_incidence,38.97,deaths_per_year,


Our resulting dataframe has the same columns as before, but since we did not 
ask for a specific state, we got all of them. As before, the `impact` describes the total annual benefit to the continental US from installing heat pumps in that state.

### Deriving New Metrics

Since the data is in a data frame, we can manipulate it a variety of ways. For
example, we could add a column for the ratio of change in mortality to number
of households, then see which states have the highest value for that derived
metric.

In [12]:
df_data_all_states["mortality_per_household"] = (
    df_data_all_states["impact"] / df_data_all_states["number_of_households"]
)

df_data_all_states.nlargest(10, "mortality_per_household")

Unnamed: 0,state_abbreviation,number_of_households,metric,impact,units,warnings,mortality_per_household
32,NY,6859572,avoided_premature_mortality_incidence,870.08,deaths_per_year,,0.000127
29,NJ,3061262,avoided_premature_mortality_incidence,204.19,deaths_per_year,,6.7e-05
17,MA,2449155,avoided_premature_mortality_incidence,131.98,deaths_per_year,,5.4e-05
37,RI,392252,avoided_premature_mortality_incidence,19.81,deaths_per_year,,5.1e-05
18,MD,1791043,avoided_premature_mortality_incidence,89.94,deaths_per_year,,5e-05
6,DC,236562,avoided_premature_mortality_incidence,10.72,deaths_per_year,,4.5e-05
36,PA,4489835,avoided_premature_mortality_incidence,199.92,deaths_per_year,,4.5e-05
20,MI,3679665,avoided_premature_mortality_incidence,141.64,deaths_per_year,,3.8e-05
7,DE,299274,avoided_premature_mortality_incidence,11.11,deaths_per_year,,3.7e-05
5,CT,1290316,avoided_premature_mortality_incidence,43.78,deaths_per_year,,3.4e-05


## County-by-County Data in New Jersey

Just as in the previous examples, we pass in a single metric (premature mortality), a single upgrade (a medium
efficiency heat pump), a state (New Jersey), and a value of \"*\" for the county. This gives us results for all counties
in New Jersey. For each county, we get the nationwide effects, just as we did for the Wisconsin and all states examples.

In [None]:
payload_nj_counties = {
    "metrics": ["avoided_premature_mortality_incidence"],
    "upgrade": ["hvac__heat_pump_seer18_hspf10"],
    "state_fips": ["34"],
    "county_fips": ["*"]
}

response_nj_counties = requests.post(HEALTH_ADDRESS_URL, json=payload_nj_counties, headers=headers)

In [14]:
df_data_nj_counties = pd.DataFrame(response_nj_counties.json()["data"])
df_data_nj_counties

Unnamed: 0,state_abbreviation,county_fips,county_name,number_of_households,metric,impact,units,warnings
0,NJ,1,Atlantic,91768,avoided_premature_mortality_incidence,2.48,deaths_per_year,
1,NJ,3,Bergen,330509,avoided_premature_mortality_incidence,56.54,deaths_per_year,
2,NJ,5,Burlington,153027,avoided_premature_mortality_incidence,16.01,deaths_per_year,
3,NJ,7,Camden,176998,avoided_premature_mortality_incidence,23.78,deaths_per_year,
4,NJ,9,Cape May,38257,avoided_premature_mortality_incidence,0.74,deaths_per_year,
5,NJ,11,Cumberland,46731,avoided_premature_mortality_incidence,1.42,deaths_per_year,
6,NJ,13,Essex,268039,avoided_premature_mortality_incidence,18.59,deaths_per_year,
7,NJ,15,Gloucester,99274,avoided_premature_mortality_incidence,12.14,deaths_per_year,
8,NJ,17,Hudson,235109,avoided_premature_mortality_incidence,27.34,deaths_per_year,
9,NJ,19,Hunterdon,44794,avoided_premature_mortality_incidence,0.69,deaths_per_year,


### Comparing to Statewide

The data we got in our last request was just for the state of New Jersey, and was disaggreated
by county. So if we add up the effects of all of the counties, they should add up to the statewide
number we got for New Jersey when we did the query for all states. We can verify this now.

In [15]:
statewide_nj_mortality = df_data_all_states[df_data_all_states["state_abbreviation"] == 'NJ']["impact"].iloc[0]
sum_of_nj_county_mortality = df_data_nj_counties["impact"].sum()

f"At the state level: {statewide_nj_mortality:.1f}; Summed over counties: {sum_of_nj_county_mortality:.1f}."

'At the state level: 204.2; Summed over counties: 204.2.'

## County by County Data in Nebraska

We can repeat the query we did in New Jersey for Nebraska.

In [None]:
payload_ne_counties = {
    "metrics": ["avoided_premature_mortality_incidence"],
    "upgrade": ["hvac__heat_pump_seer18_hspf10"],
    "state_fips": ["31"],
    "county_fips": ["*"]
}

response_ne_counties = requests.post(HEALTH_ADDRESS_URL, json=payload_ne_counties, headers=headers)

### Any Warnings?

In earlier examples, all the results returned no warnings. Let's if there are any when we query all counties in Nebraska.

In [17]:
df_ne_counties = pd.DataFrame(response_ne_counties.json()["data"])
df_ne_counties

Unnamed: 0,state_abbreviation,county_fips,county_name,number_of_households,metric,impact,units,warnings
0,NE,001,Adams,10654,avoided_premature_mortality_incidence,0.07,deaths_per_year,
1,NE,003,Antelope,2663,avoided_premature_mortality_incidence,0.01,deaths_per_year,This result does not meet the recommended samp...
2,NE,005,Arthur,242,avoided_premature_mortality_incidence,0.01,deaths_per_year,This result does not meet the recommended samp...
3,NE,007,Banner,484,avoided_premature_mortality_incidence,0.01,deaths_per_year,This result does not meet the recommended samp...
4,NE,011,Boone,2421,avoided_premature_mortality_incidence,0.03,deaths_per_year,This result does not meet the recommended samp...
...,...,...,...,...,...,...,...,...
87,NE,177,Washington,7506,avoided_premature_mortality_incidence,0.10,deaths_per_year,
88,NE,179,Wayne,2906,avoided_premature_mortality_incidence,0.02,deaths_per_year,This result does not meet the recommended samp...
89,NE,181,Webster,1453,avoided_premature_mortality_incidence,0.05,deaths_per_year,This result does not meet the recommended samp...
90,NE,183,Wheeler,242,avoided_premature_mortality_incidence,0.01,deaths_per_year,This result does not meet the recommended samp...


Several of the rows have a non-null entry for `warnings`. Let's see what the full string says.

In [18]:
df_ne_counties["warnings"].unique()[1]

'This result does not meet the recommended sample size of 5,000 households.'

Rows with small sample sizes should be interpreted with caution. We can filter to avoid these rows.

In [19]:
df_ne_counties[df_ne_counties["number_of_households"] >= 5000]

Unnamed: 0,state_abbreviation,county_fips,county_name,number_of_households,metric,impact,units,warnings
0,NE,1,Adams,10654,avoided_premature_mortality_incidence,0.07,deaths_per_year,
8,NE,19,Buffalo,15496,avoided_premature_mortality_incidence,0.09,deaths_per_year,
11,NE,25,Cass,9201,avoided_premature_mortality_incidence,0.08,deaths_per_year,
20,NE,43,Dakota,6538,avoided_premature_mortality_incidence,0.04,deaths_per_year,
22,NE,47,Dawson,7506,avoided_premature_mortality_incidence,0.09,deaths_per_year,
25,NE,53,Dodge,12107,avoided_premature_mortality_incidence,0.14,deaths_per_year,
26,NE,55,Douglas,202179,avoided_premature_mortality_incidence,2.79,deaths_per_year,
32,NE,67,Gage,7748,avoided_premature_mortality_incidence,0.12,deaths_per_year,
38,NE,79,Hall,19855,avoided_premature_mortality_incidence,0.16,deaths_per_year,
53,NE,109,Lancaster,107264,avoided_premature_mortality_incidence,1.03,deaths_per_year,


## Query Multiple Metrics in Multiple States

Sometimes we want bulk data, not just data for one metric or one state. In this
example, we will query several metrics, one for each kind of emissions the model tracks, across three states, New York,
New Jersey, and Connecticut. We can do that by passing multiple values for these arguments to
the API.

In [20]:
payload_bulk = {
    "metrics": ["fine_particulate_matter", "ammonia", "nitrogen_oxides", "volatile_organic_compounds", "sulfur_dioxide"],
    "upgrade": ["hvac__heat_pump_seer18_hspf10"],
    "state_fips": ["36", "34", "09"]
}

In [None]:
response_bulk = requests.post(HEALTH_ADDRESS_URL, json=payload_bulk, headers=headers)

In [22]:
df_data_bulk = pd.DataFrame(response_bulk.json()["data"])
df_data_bulk

Unnamed: 0,state_abbreviation,number_of_households,metric,impact,units,warnings
0,CT,1290316,fine_particulate_matter,379625.25,kg_per_year,
1,NJ,3061262,fine_particulate_matter,72953.86,kg_per_year,
2,NY,6859572,fine_particulate_matter,744221.99,kg_per_year,
3,CT,1290316,ammonia,517453.72,kg_per_year,
4,NJ,3061262,ammonia,1881648.49,kg_per_year,
5,NY,6859572,ammonia,3675170.68,kg_per_year,
6,CT,1290316,nitrogen_oxides,4971106.16,kg_per_year,
7,NJ,3061262,nitrogen_oxides,10158879.27,kg_per_year,
8,NY,6859572,nitrogen_oxides,22374526.76,kg_per_year,
9,CT,1290316,volatile_organic_compounds,195265.63,kg_per_year,


Notice that now we have three rows, one for each state we requested, and instead of one value for metric like we have gotten in the past,
we have several, one for each air pollutant metric.

## Grouping Results

In this example, instead of aggregating results for all homes together,
we will group the homes by ranges of square footage.

In [None]:
payload_groups = {
    "metrics": ["avoided_premature_mortality_incidence"],
    "upgrade": ["hvac__heat_pump_seer18_hspf10"],
    "state_fips": ["55"],
    "group_by": ["house_size_square_feet_binned"]
}

response_groups = requests.post(HEALTH_ADDRESS_URL, json=payload_groups, headers=headers)


The result we got back contains data grouped into bins based on the square footage of homes.

In [24]:
df_groups = pd.DataFrame(response_groups.json()["data"])
df_groups

Unnamed: 0,state_abbreviation,in_sqft_bin,number_of_households,metric,impact,units,warnings
0,WI,0-1499,1119372,avoided_premature_mortality_incidence,9.5,deaths_per_year,
1,WI,1500-2499,682809,avoided_premature_mortality_incidence,9.75,deaths_per_year,
2,WI,2500-5499,315497,avoided_premature_mortality_incidence,5.76,deaths_per_year,
3,WI,5500+,71913,avoided_premature_mortality_incidence,3.0,deaths_per_year,


If we sum up the impact, we will find that it adds up to the number from our very first query at the top of this notebook.

In [25]:
grouped_sum = df_groups["impact"].sum()
f"{grouped_sum:.1f}"

'28.0'

## Combine our Data with U.S. Census Maps

In this final section, we will import an library called `censusdis` that will let us grab
U.S. Census data. If you don't have this in your virtual environment, you can 

`pip install censusdis`

to get it.

### Additional Imports

In [26]:
import censusdis.data as ced
from censusdis.datasets import ACS5
import censusdis.maps as cem
from censusdis.states import NJ

### Download Map Data and Merge

In [27]:
gdf_nj_counties = ced.download(
    ACS5, 2023,
    ["NAME"],
    state=NJ, county="*",
    with_geometry=True
)

In [28]:
gdf_map = gdf_nj_counties.merge(df_data_nj_counties, left_on="COUNTY", right_on="county_fips")