# Getting started with the OpenDSM library

This jupyter notebook is an interactive tutorial. It walks through loading data, utilizing OpenEEmeter models, and plotting results. You'll run all the code yourself. Cells can be executed with `<shift><enter>`. Feel free to make edits to the code in these cells and dig deeper.

### Note on tutorial scope

This tutorial assumes the reader has properly installed python and the OpenDSM package (`pip install opendsm`) and has a basic working knowledge of python syntax and usage.

## Outline

This tutorial is a walkthrough of how to use the OpenDSM package. We'll cover the following:

- Background - why this library
- Loading data
- Fitting OpenEEmeter models
- Prediction of fit models
- Computing metered savings

The tutorial is focused on demonstrating how to use the package to run the Hourly, Daily, and Billing models.

## Background and Cautions

[OpenDSM](https://lfenergy.org/projects/opendsm/) is an open-source library which can be used to calculate avoided energy consumption on an individual meter. It pulls weather data using `EEweather`, fits models on training (baseline) data using the `OpenEEmeter` module, predicts using those models on testing (reporting) data, and corrects models for population-level changes via `GRIDmeter`. This tutorial will focus on the OpenEEmeter portion of this sequence. OpenEEmeter is the successor of the [CalTRACK](http://www.caltrack.org/) methodology. Initially OpenEEmeter was the complete open source implementation of the CalTRACK methodology, but it has since evolved beyond the CalTRACK models.

The `OpenEEmeter` module is built for flexibility and modularity. While this makes it easier to use the package, without following the documentation and the guidance provided, it is very possible to use the module in a way that does not comply with the approved methodology. For example, while the `OpenEEmeter` models set specific hard limits for the purpose of standardization and consistency, they can be configured to edit or entirely ignore those limits. The reason for this flexibility is to facilitate research, development, and testing of potential changes to the models. Usage of `OpenEEmeter` does not in itself guarantee compliance with the accepted methodology if nondefault configurations are used.

Some new users have assumed that the OpenDSM package constitutes an entire application suitable for running metering analytics at scale. This is not necessarily the case. It is designed instead to be embedded within other applications or to be used in one-off analyses. OpenDSM leaves it up to the user to decide when to use or how to embed the provided tools within other applications. This limitation is an important consequence of the decision to make the models and tools as open and accessible as possible as not all users will have access to enterprise-level infrastructure.

As you dive in, remember that this is a work in progress and that we welcome feedback and contributions. To contribute, please open an [issue](https://github.com/opendsm/opendsm/issues) or a [pull request](https://github.com/opendsm/opendsm/pulls) on github.

### Jupyter housekeeping

*Note: these Jupyter cell magics enable some useful special features but are unrelated to eemeter.*

In [2]:
# inline plotting
%matplotlib inline

# allow live package editing
%load_ext autoreload
%autoreload 2

# Importing the OpenDSM library

Once the OpenDSM has been installed, it can be imported as shown below.

This tutorial requires OpenDSM version > 1.2.x. Verify the version you have installed.

We will load eemeter and drmeter modules separately

In [3]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt

import opendsm as odsm
from opendsm import (
    eemeter as em,
    drmeter as dm,
)

print(f"OpenDSM {odsm.__version__}")

OpenDSM 1.2.6


## Loading data

The essential inputs to OpenDSM library functions are the following:

1. Meter baseline data named `observed`
2. Meter reporting data `observed`
3. Temperature data from a nearby weather station for both named `temperature`
4. All data is expected to have a timezone-aware datetime index or column named `datetime`

Users of the library are responsible for obtaining and formatting this data (to get weather data, see [eeweather](https://eeweather.openee.io/), which helps perform site to weather station matching and can pull and cache temperature data directly from public (US) data sources). Some samples come loaded with the library and we'll load these first to save you the trouble of loading in your own data.

We utilize data classes to store meter data, perform transforms, and validate the data to ensure data compliance. The inputs into these data classes can either be [pandas](https://pandas.pydata.org/) `DataFrame` if initializing the classes directly or `Series` if initializing the classes using `.from_series`.

The test data contained within the OpenDSM library is derived from [NREL ComStock](https://comstock.nrel.gov/) simulations.

If working with your own data instead of these samples, please refer directly to the excellent pandas documentation for instructions for loading data (e.g., [pandas.read_csv](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html)).

### Important notes about data:
- *These models were developed and tested using Fahrenheit temperature. Please convert your temperatures accordingly*
- *It is expected that all data is trimmed to its appropriate time period (baseline and reporting) and does not contain extraneous datetimes*
- *If you run load_test_data it will download the necessary files from the OpenDSM repository. This can be up to 150 MB*

In [None]:
# Load in test data
#     - This data contains 100 different meters

df_baseline, df_reporting = odsm.test_data.load_test_data("monthly_treatment_data")

## BILLING EXAMPLE

Let's repeat this to show how the billing model works almost the same

Note that the only difference in how these are called are the specific data classes and model used. Everything else remains the same.

- As with the Daily data, Billing data should have hourly temperature
- Billing data is reversed from a customer perspective. From a customer perspective, you pay for the month you used energy and so the bill is for the month prior. To model this, the start date should have the usage for a given month

In [None]:
n = 15

id = df_baseline.index.get_level_values(0).unique()[n]

df_baseline_n = df_baseline.loc[id]
df_reporting_n = df_reporting.loc[id]

billing_baseline_data = em.BillingBaselineData(df_baseline_n, is_electricity_data=True)
billing_reporting_data = em.BillingReportingData(df_reporting_n, is_electricity_data=True)

billing_model = em.BillingModel().fit(billing_baseline_data, ignore_disqualification=False)
billing_model.predict(billing_baseline_data).head()

The billing model prediction function does have additional functionality built into it where it can aggregate from averaged daily data to `monthly` or `bimonthly`

In [None]:
billing_model.predict(billing_baseline_data, aggregation="monthly")

Similarly, the plot function also has the ability to aggregate to `monthly` or `bimonthly`. 

This model is still at its core, a modified daily model though. This is why the model prediction is not straight for either of the aggregations.

In [None]:
billing_model.plot(billing_baseline_data)

billing_model.plot(billing_baseline_data, aggregation="monthly")

billing_model.plot(billing_baseline_data, aggregation="bimonthly")

## HOURLY Energy Efficiency Model

Just like the daily and billing model, we follow the same calls but for new data classes and model.

In [None]:
df_baseline, df_reporting = odsm.test_data.load_test_data("hourly_treatment_data")

In [None]:
n = 15

id = df_baseline.index.get_level_values(0).unique()[n]

df_baseline_n = df_baseline.loc[id]
df_reporting_n = df_reporting.loc[id]

hourly_baseline_data = em.HourlyBaselineData(df_baseline_n, is_electricity_data=True)
hourly_reporting_data = em.HourlyReportingData(df_reporting_n[["temperature"]], is_electricity_data=True)

hourly_model = em.HourlyModel().fit(hourly_baseline_data)

In [None]:
hourly_model.predict(hourly_baseline_data)

## Hourly Demand Response Model

Finally, we have a demand response model meant to be used for measuring short-term demand response events within `drmeter`, but it too follows the same API structure

In [4]:
df_baseline, df_reporting = odsm.test_data.load_test_data("hourly_treatment_data")

In [6]:
n = 15

id = df_baseline.index.get_level_values(0).unique()[n]

df_baseline_n = df_baseline.loc[id]
df_reporting_n = df_reporting.loc[id]

hourly_baseline_data = dm.CaltrackDRBaselineData(df_baseline_n, is_electricity_data=True)
hourly_reporting_data = dm.CaltrackDRReportingData(df_reporting_n[["temperature"]], is_electricity_data=True)

dr_model = dm.CaltrackDRModel().fit(hourly_baseline_data)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df.loc[df["observed"] == 0, "observed"] = np.nan


In [7]:
dr_model.predict(hourly_baseline_data)

Unnamed: 0_level_0,temperature,observed,predicted,predicted_uncertainty
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2018-01-01 00:00:00-06:00,-5.08,30.336523,31.670276,29.326114
2018-01-01 01:00:00-06:00,-5.98,37.355408,32.070312,29.326114
2018-01-01 02:00:00-06:00,-7.06,39.376695,32.714753,29.326114
2018-01-01 03:00:00-06:00,-7.06,39.911217,32.758016,29.326114
2018-01-01 04:00:00-06:00,-7.06,39.406699,32.789302,29.326114
...,...,...,...,...
2018-12-31 19:00:00-06:00,33.98,15.388135,11.477984,29.326114
2018-12-31 20:00:00-06:00,33.98,15.261147,12.202178,29.326114
2018-12-31 21:00:00-06:00,33.98,13.970028,12.383450,29.326114
2018-12-31 22:00:00-06:00,33.92,14.095850,12.120746,29.326114


## How to calculate savings or avoided energy use

Savings calculation functions are not provided in `eemeter`, but to calculate basic savings is a summation of the subtraction of reporting year observed from baseline year prediction for 1 year.

- Savings = sum(predicted_baseline - observed_reporting)

In [None]:
from PIL import Image

# Load the image
img_path = "/app/applied_data_science/opendsm/website/src/images/eemeter/daily_billing/billing_model_balance_points.png"
img = Image.open(img_path).convert("RGBA")

# Create a white background image
white_bg = Image.new("RGBA", img.size, (255, 255, 255, 255))

# Composite the original image onto the white background
composited = Image.alpha_composite(white_bg, img)

# Convert back to RGB (no alpha) and save, overwriting the original
composited.convert("RGB").save(img_path)

print(f"Image at {img_path} has been updated to have a white background.")
