# All About REM API Statistics

In order to make estimates for the impact of an electrical upgrade
on a home, the REM API does a
[Monte Carlo](https://en.wikipedia.org/wiki/Monte_Carlo_method)
simulation over one hundred theoretical homes that closely
resemble the target home and, taken as a whole, 
probabalistically represent it. 

We do this because we generally don't know everything about each
home. Rewiring America has a large database, based on public records,
that tells us things like when a home what built and how many square feet
it is. But it does not give us all the details that we need to accurately
predict savings. So we instead use a set of simulated homes that we know
more about, predict the energy consumption for each of them, 
and compute statistics across the set of homes to decide what is likely
to happen in the given home.

This means that instead of getting one answer for
how much less fuel oil the home will use or how
much money the homeowner will save each year, it generates
many answers to the question, each based on one
theoretical home.

In the return value from the API, we get statistics across
those theoretical homes including the mean, median, and 20th
and 80th percentile values for savings, emissions, and so on.

The purpose of this notebook is to illustrate how those statistics
work, describe how they are computed, and discuss how and why
they should or should not be used in particular ways.

## Imports and Configuration

In [56]:
import requests
from pathlib import Path

In [72]:
HOST = "https://api.rewiringamerica.org"
HOST = "http://127.0.0.1:8000"
REM_ADDRESS_URL = f"{HOST}/api/v1/rem/address"

API_KEY = None  # Put your API key here, or better yet in the file ~/.rwapi/api_key.txt

In [73]:
if API_KEY is None:
    api_key_path = Path.home() / ".rwapi" / "api_key.txt"

    if api_key_path.is_file():
        with open(api_key_path) as f:
            API_KEY = f.read().strip()

## Parameters

Address we are interested in and the upgrade we want to do.

In [74]:
address = '165 Hope St, Providence, RI 02906'
upgrade = "med_eff_hp_hers_sizing_no_setback"
heating_fuel = 'fuel_oil'

## Make the Request

In [75]:
headers = {"Authorization": f"Bearer {API_KEY}"}

response = requests.get(
    url=REM_ADDRESS_URL,
    headers=headers,
    params=dict(
        address=address, upgrade=upgrade, heating_fuel=heating_fuel
    )
)

In [76]:
response

<Response [200]>

## Pull out the results

We are specifically interested in the total dollar savings.

In [77]:
data = response.json()

In [78]:
fuel_results = data["fuel_results"]

In [79]:
fuel_results['fuel_oil']['baseline']

{'energy': {'mean': {'value': 1896.4883, 'units': 'gallon'},
  'median': {'value': 1835.3017, 'units': 'gallon'},
  'percentile_20': {'value': 1421.099, 'units': 'gallon'},
  'percentile_80': {'value': 2339.4447, 'units': 'gallon'}},
 'emissions': {'mean': {'value': 23339.9583, 'units': 'kgCO2e'},
  'median': {'value': 22586.9396, 'units': 'kgCO2e'},
  'percentile_20': {'value': 17489.3734, 'units': 'kgCO2e'},
  'percentile_80': {'value': 28791.3943, 'units': 'kgCO2e'}},
 'cost': {'mean': {'value': 7583.0629, 'units': '$'},
  'median': {'value': 7338.41, 'units': '$'},
  'percentile_20': {'value': 5682.2303, 'units': '$'},
  'percentile_80': {'value': 9354.2135, 'units': '$'}}}

In [80]:
data['emissions_factors']

{'electricity': {'value': 0.1241, 'units': 'kgCO2e/kWh'},
 'natural_gas': {'value': 6.6798, 'units': 'kgCO2e/therm'},
 'fuel_oil': {'value': 12.3069, 'units': 'kgCO2e/gallon'},
 'propane': {'value': 7.3776, 'units': 'kgCO2e/gallon'}}

## Let's look at fuel oil, since that's what we are replacing

It's a fairly big block of nested dictionaries, but we will go through it peice by
peice.

In [81]:
fuel_oil_results = fuel_results["fuel_oil"]

In [82]:
fuel_oil_results

{'baseline': {'energy': {'mean': {'value': 1896.4883, 'units': 'gallon'},
   'median': {'value': 1835.3017, 'units': 'gallon'},
   'percentile_20': {'value': 1421.099, 'units': 'gallon'},
   'percentile_80': {'value': 2339.4447, 'units': 'gallon'}},
  'emissions': {'mean': {'value': 23339.9583, 'units': 'kgCO2e'},
   'median': {'value': 22586.9396, 'units': 'kgCO2e'},
   'percentile_20': {'value': 17489.3734, 'units': 'kgCO2e'},
   'percentile_80': {'value': 28791.3943, 'units': 'kgCO2e'}},
  'cost': {'mean': {'value': 7583.0629, 'units': '$'},
   'median': {'value': 7338.41, 'units': '$'},
   'percentile_20': {'value': 5682.2303, 'units': '$'},
   'percentile_80': {'value': 9354.2135, 'units': '$'}}},
 'upgrade': {'energy': {'mean': {'value': 66.8291, 'units': 'gallon'},
   'median': {'value': 0.0, 'units': 'gallon'},
   'percentile_20': {'value': 0.0, 'units': 'gallon'},
   'percentile_80': {'value': 127.0023, 'units': 'gallon'}},
  'emissions': {'mean': {'value': 822.4611, 'units': 

The results are divided into three sections:

- `baseline` contains estimates of what was consumed in a typical year before the upgrade
- `upgrade` contains estimates of what is consumed in a typical year after the upgrade
- `delta` contains estimates of the change in consumption in a typical year due to the upgrade

In [83]:
result_keys = fuel_oil_results.keys()
result_keys

dict_keys(['baseline', 'upgrade', 'delta'])

Now we are going to pull out some mean numbers for all three.

In [84]:
def results_for_stat(results, metric, stat: str):
    """A helper function to pull out subsets of the results."""
    return {
        k: results[k][metric][stat]
        for k in result_keys
    }

### Mean

Let's start with the mean. For many applications, like presenting a single number to
a consumer contemplating and upgrade, this is the place we might start.

In [85]:
mean_energy = results_for_stat(fuel_oil_results, "energy", "mean")
mean_energy

{'baseline': {'value': 1896.4883, 'units': 'gallon'},
 'upgrade': {'value': 66.8291, 'units': 'gallon'},
 'delta': {'value': -1829.6592, 'units': 'gallon'}}

### Mean post-upgrade consumption is not zero?

The first thing to notice is that mean consumption of fuel oil after the upgrade is not exactly zero.
It is still a small non-zero number. The reason for this is that in the sample space we constructed,
there was at least one home that used fuel oil for some purpose other than heating, like water heating,
so it continued to use it after the upgrade. In fact, while the samples are not exposed in this API,
if we look under the hood, 89 out of 200 (45%) of the theoretical samples have Fuel Oil water heating,
since heating and water heating fuels are highly correlated and fuel oil is prevalant for both in Rhode Island.

### Mean baseline, upgrade, and change

Now let's look at how the upgrade changed consumption. For the mean of the distribution, the consumption of energy after the upgrade should be the sum of the baseline and how much consumption changed.

In [86]:
round(mean_energy['upgrade']['value'] - (mean_energy['baseline']['value'] + mean_energy['delta']['value']), 2)

0.0

### Median

Now let's do the same analysis, but on the median values.

In [87]:
median_energy = results_for_stat(fuel_oil_results, "energy", "median")
median_energy

{'baseline': {'value': 1835.3017, 'units': 'gallon'},
 'upgrade': {'value': 0.0, 'units': 'gallon'},
 'delta': {'value': -1734.6599, 'units': 'gallon'}}

Unlike the mean, the median of the distribution uses no fuel oil after the upgrade. This is because a 
minority of homes in the sample used fuel oil for things other than heating (in this case, we know it was 45%). 

Indeed, we can look at the 20th percentile and see that it is also still zero after the upgrade.

In [88]:
results_for_stat(fuel_oil_results, "energy", "percentile_20")["upgrade"]

{'value': 0.0, 'units': 'gallon'}

## Emissions

In addition to consumption of various fuels, the model estimates in kgCO2e (which include other greenhouse such as methane) emissions before and after the upgrade. Let's look at some of those numbers for total household emissions, including those for  all fuels and all end uses in the home.

In [89]:
total_results = fuel_results["total"]

In [90]:
median_emissions = results_for_stat(total_results, "emissions", "median")

In [91]:
median_emissions

{'baseline': {'value': 24494.8868, 'units': 'kgCO2e'},
 'upgrade': {'value': 4827.1296, 'units': 'kgCO2e'},
 'delta': {'value': -19006.6057, 'units': 'kgCO2e'}}

### Median baseline, upgrade, and change

Now let's see if the median behaves like the mean did when we add things up.
(Spoiler alert: it does not.)

In [92]:
round(median_emissions['upgrade']['value'] - (median_emissions['baseline']['value'] + median_emissions['delta']['value']), 2)

-661.15

What happened? The reason the numbers don't quite add up has to do with how we compute the medians.
`median_emissions['upgrade']` is a median taken over the total emissions of every home in the distribution
after the upgrade. `median_emissions['baseline']` is the median total emissions of every home in the distribution
before the upgrade. But because factors like insulation affect the amount of heat needed, which affects emissions
differently before and after the upgrade, homes can move around in the distribution, which can affect the median.
`median_emissions['delta']` is the median of the change in emissions, which is therefore not necessarily the difference
of the median emissions before and after the upgrade.