# Residential Energy Consumption Survey

In this notebook we will be performing some basic analyses of the [Residential Energy Consumption Survey](https://www.eia.gov/consumption/residential/data/2015/index.php?view=microdata) dataset.


## Input Datasets

| Dataset | File Name | Data Cleaning |
| --- | --- | --- |
| RECS 2015 Dataset | recs2015_public_v4.csv | |
| RECS 2015 Data Dictionary| codebook_publicv4.xlsx | |

In [1]:
from datetime import date
import pandas as pd
import numpy as np

In [2]:
# Import 2015 RECS dataset

df = pd.read_csv("../Data/2015/recs2015_public_v4.csv")

## Weights

From the [EIA Website](https://www.eia.gov/consumption/residential/data/2015/index.php?view=microdata):

- Data were collected from more than 5,600 households selected at random using a complex multistage, area-probability sample design. The sample represents 118.2 million U.S. households.

Each row in the dataset is one sample point. The weight of each sampled is the number of households represented by that sample point.

In [3]:
rows = df.shape[0]

print(f"Number of data points : {rows:,.0f}")

Number of data points : 5,686


In [4]:
total_wgt = df['NWEIGHT'].sum()

print(f"Sum of all weights : {total_wgt:,.0f}")

Sum of all weights : 118,208,250


### What percentage of households use natural gas as their main space heating fuel?

The file `codebook_publicv4.xlsx` is a data dictionary. The data dictionary includes a mapping for the variable `FUELHEAT`.

| Label | Description |
| -- | -- |
| 1 | Natural gas from underground pipes |
| 2 | Propane (bottled gas) |
| 3 | Fuel oil/kerosene |
| 5 | Electricity |
| 7 | Wood (cordwood or pellets) |
| 21 | Some other fuel |
| -2 | Not applicable |


To estimate the percentage of households using natural gas, we will estimate the total number of households using natural gas, and divide that number by the total number of households in the population (118M).

$$ P = \frac{X_{gas}}{X_{total}} $$

In [5]:
# filter to houses which use natural gas as primary fuel
gas_df = df.loc[df['FUELHEAT'] == 1 ]

# sum weights
gas_wgt = gas_df['NWEIGHT'].sum()

# percentage
gas_pct = gas_wgt / total_wgt

print(f"Percentage natural gas : {gas_pct:,.0%}")

Percentage natural gas : 49%


In [11]:
#What is the estimated percentage of households which use a central AC unit?

# filter to houses which have a central unit
cool_df1 = df.loc[df['COOLTYPE'] == 1 ]

# sum weights
cool_wgt1 = cool_df1['NWEIGHT'].sum()

# filter to houses which use a central unit and a window unit
cool_df2 = df.loc[df['COOLTYPE'] == 3 ]

# sum weights
cool_wgt2 = cool_df2['NWEIGHT'].sum()

# percentage
cool_pct = (cool_wgt1 +  cool_wgt2)/ total_wgt

print(f"Percentage central cooling: {cool_pct:,.0%}")


Percentage central cooling: 64%


In [12]:
#What is the estimated percentage of households which have a water heater storage tank >50 gallons?

#Represented by the variable WHEATSIZ ==3


# filter to houses which have a central unit
WH_df = df.loc[df['WHEATSIZ'] == 3 ]

# sum weights
WH_wgt = WH_df['NWEIGHT'].sum()


# percentage
WH_pct = (WH_wgt)/ total_wgt

print(f"Percentage of households with storage water heaters > 50 gal: {WH_pct:,.0%}")



Percentage of households with storage water heaters > 50 gal: 27%
