<div class="alert alert-info">

**Note:**

Most users of the EEmeter stack do not directly use the `eemeter`
package for loading their data. Instead, they use the datastore application,
which uses the eemeter internally. To learn to use the datastore, head
over to the datastore basic usage tutorial.

</div>

## Running a meter

We can load this input file into memory with the following:

In [1]:
import json

with open('meter_input_example.json', 'r') as f:  # modify to point to your downloaded input file.
    meter_input = json.load(f)

The file has a single trace of hourly electricity consumption data and some associated project data. Its contents looks like this:

In [2]:
!head -15 meter_input_example.json

{
  "type": "SINGLE_TRACE_SIMPLE_PROJECT", 
  "trace": {
    "type": "ARBITRARY_START", 
    "interpretation": "ELECTRICITY_CONSUMPTION_SUPPLIED", 
    "unit": "KWH", 
    "records": [
      {
        "start": "2012-01-01T00:00:00+00:00", 
        "value": 0.5148, 
        "estimated": false
      }, 
      {
        "start": "2012-01-01T01:00:00+00:00", 
        "value": 0.9943, 


In [3]:
!tail -25 meter_input_example.json

        "value": 0.4756, 
        "estimated": false
      }, 
      {
        "start": "2016-07-18T23:00:00+00:00", 
        "value": 0.4472, 
        "estimated": false
      }
    ]
  }, 
  "project": {
    "type": "PROJECT_WITH_SINGLE_MODELING_PERIOD_GROUP", 
    "zipcode": "95625", 
    "modeling_period_group": {
      "baseline_period": {
        "start": null, 
        "end": "2014-10-01T00:00:00+00:00"
      }, 
      "reporting_period": {
        "start": "2014-11-01T00:00:00+00:00", 
        "end": null
      }
    }
  }
}


Next, we can create a meter, model and formatter. These work in tandem to create a model of energy usage.

The `meter` coordinates loading the input data, matching it with appropriate weather data, and
passing it to the formatter and model. It then uses these to calculate a set of outputs, including
energy savings estimates such as annualized weather normalized usage.

The `formatter` formats the trace and project data for use within the model.

The `model` fits a model of energy usage to this formatted data which can be used, given covariate weather data, to predict or model energy usage over an arbitrary period of time.

In [4]:
from eemeter.ee.meter import EnergyEfficiencyMeterTraceCentric
from eemeter.modeling.models import SeasonalElasticNetCVModel
from eemeter.modeling.formatters import ModelDataFormatter

meter = EnergyEfficiencyMeterTraceCentric()
model = (SeasonalElasticNetCVModel, {"cooling_base_temp": 65, "heating_base_temp": 65})
formatter = (ModelDataFormatter, {"freq_str": "D"})

The meter we created is an instance of the EEmeter class which operates on single energy traces.

The model we created is a tuple of (model class, model keyword arguments), not an instantiation of the model. We do it this way to allow easy creation of multiple instances of the model class.

The formatter is, like the model, a tuple of (formatter class, formatter keyword arguments), for the same reason - we want to make multiple instances of the formatter class.

These can be used directly to "evaluate" the meter on the meter input. We'll store the output in `meter_output`.

In [5]:
meter_output = meter.evaluate(meter_input, model=model, formatter=formatter)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)


This `meter_ouput` is quite verbose, so we'll export it to a json file which is a bit more readable.

In [6]:
with open('meter_output_example.json', 'w') as f:  # change this path if desired.
    json.dump(meter_output, f, indent=2)

The content of this file will look something like this:

In [7]:
!head -40 meter_output_example.json

{
  "status": "SUCCESS",
  "failure_message": null,
  "logs": [
    "Using weather_source ISDWeatherSource(\"724828\")",
    "Using weather_normal_source TMY3WeatherSource(\"745160\")"
  ],
  "eemeter_version": "0.4.12",
  "model_class": "SeasonalElasticNetCVModel",
  "model_kwargs": {
    "heating_base_temp": 65,
    "cooling_base_temp": 65
  },
  "formatter_class": "ModelDataFormatter",
  "formatter_kwargs": {
    "freq_str": "D"
  },
  "weather_source_station": "724828",
  "weather_normal_source_station": "745160",
  "derivatives": [
    {
      "label": null,
      "derivative_interpretation": "annualized_weather_normal",
      "trace_interpretation": "ELECTRICITY_CONSUMPTION_SUPPLIED",
      "unit": "KWH",
      "baseline": {
        "label": "baseline",
        "value": 5812.037520876966,
        "lower": 109.76058020882165,
        "upper": 109.76058020882165,
        "n": 365,
        "demand_fixture": {
          "2015-01-01T00:00:00+00:00": 42.

Note how this file is organized: it contains a summary of the operations done during meter execution, including everything necessary to recreate the meter run, like the model class and keyword arguments used to initialize it, and the weather data (degrees F, called "demand_fixture") that was used in model building.

Not everyone has data ready to go, so if you are in that bucket, the next section covers how you can get started with data of your own.

Data preparation
----------------

All we'll be doing in this section is creating a data structure that has the same format as `meter_input_example.json` file above. We are using the eemeter EnergyTrace helper structure.

Of course, this is not the only way to get data into the necessary format; use this for inspiration, but make changes as necessary to accomodate the particulars of your dataset.

In [8]:
# library imports
from eemeter.structures import EnergyTrace
from eemeter.io.serializers import ArbitraryStartSerializer
from eemeter.ee.meter import EnergyEfficiencyMeter
import pandas as pd
import pytz

First, we import the energy data from the sample CSV and transform it into records

In [9]:
energy_data = pd.read_csv('sample-energy-data_project-ABC_zipcode-50321.csv',
                          parse_dates=['date'], dtype={'zipcode': str})
records = [{
    "start": pytz.UTC.localize(row.date.to_datetime()),
    "value": row.value,
    "estimated": row.estimated,
} for _, row in energy_data.iterrows()]

The records we created look like this:

In [10]:
records[:3]  # the first three records

[{'estimated': False,
  'start': datetime.datetime(2011, 1, 1, 0, 0, tzinfo=<UTC>),
  'value': 57.8},
 {'estimated': False,
  'start': datetime.datetime(2011, 1, 2, 0, 0, tzinfo=<UTC>),
  'value': 64.8},
 {'estimated': False,
  'start': datetime.datetime(2011, 1, 3, 0, 0, tzinfo=<UTC>),
  'value': 49.5}]

Next, we load our records into an `EnergyTrace`. We give it units `"KWH"` and interpretation `"ELECTRICITY_CONSUMPTION_SUPPLIED"`, which means that this is electricity consumed by the building and supplied by a utility (rather than by solar panels or other on-site generation). We also pass in an instance of the record serializer `ArbitraryStartSerializer` to show it how to interpret the records.

In [11]:
energy_trace = EnergyTrace(
    records=records,
    unit="KWH",
    interpretation="ELECTRICITY_CONSUMPTION_SUPPLIED",
    serializer=ArbitraryStartSerializer())

The energy trace data we created looks like this:

In [12]:
energy_trace.data[:3]  # first three records

Unnamed: 0,value,estimated
2011-01-01 00:00:00+00:00,57.8,False
2011-01-02 00:00:00+00:00,64.8,False
2011-01-03 00:00:00+00:00,49.5,False


Now we load the rest of the project data from the sample project data CSV. This CSV includes the project_id (we don't use it in this tutorial, but this is how you might identify the saved meter results), the ZIP code of the building, and the dates retrofit work for this project started and completed.

In [13]:
project_data = pd.read_csv('sample-project-data.csv',
                           parse_dates=['retrofit_start_date', 'retrofit_end_date']).iloc[0]

Here's what our project data looks like.

In [14]:
project_data

project_id                             ABC
zipcode                              50321
retrofit_start_date    2013-06-01 00:00:00
retrofit_end_date      2013-07-01 00:00:00
Name: 0, dtype: object

In [15]:
zipcode = "{:05d}".format(project_data.zipcode)
retrofit_start_date = pytz.UTC.localize(project_data.retrofit_start_date)
retrofit_end_date = pytz.UTC.localize(project_data.retrofit_end_date)

Here's an example of how to get this data into the format the meter expects (exactly the format of the meter_input_example.json from above).

In [16]:
from collections import OrderedDict

def serialize_meter_input(trace, zipcode, retrofit_start_date, retrofit_end_date):

    data = OrderedDict([
        ("type", "SINGLE_TRACE_SIMPLE_PROJECT"),
        ("trace", trace_serializer(trace)),
        ("project", project_serializer(zipcode, retrofit_start_date, retrofit_end_date)),
    ])

    return data


def trace_serializer(trace):
    data = OrderedDict([
        ("type", "ARBITRARY_START"),
        ("interpretation", trace.interpretation),
        ("unit", trace.unit),
        ("records", [
            OrderedDict([
                ("start", start.isoformat()),
                ("value", record.value if pd.notnull(record.value) else None),
                ("estimated", bool(record.estimated)),
            ])
            for start, record in trace.data.iterrows()
        ]),
    ])
    return data


def project_serializer(zipcode, retrofit_start_date, retrofit_end_date):
    data = OrderedDict([
        ("type", "PROJECT_WITH_SINGLE_MODELING_PERIOD_GROUP"),
        ("zipcode", zipcode),
        ("modeling_period_group", OrderedDict([
            ("baseline_period", OrderedDict([
                ("start", None),
                ("end", retrofit_start_date.isoformat()),
            ])),
            ("reporting_period", OrderedDict([
                ("start", retrofit_end_date.isoformat()),
                ("end", None),
            ]))
        ]))
    ])
    return data

In [17]:
my_meter_input = serialize_meter_input(
    energy_trace, zipcode, retrofit_start_date, retrofit_end_date)

In [18]:
with open('my_meter_input.json', 'w') as f:
    json.dump(my_meter_input, f, indent=2)

In [19]:
!head -15 my_meter_input.json

{
  "type": "SINGLE_TRACE_SIMPLE_PROJECT",
  "trace": {
    "type": "ARBITRARY_START",
    "interpretation": "ELECTRICITY_CONSUMPTION_SUPPLIED",
    "unit": "KWH",
    "records": [
      {
        "start": "2011-01-01T00:00:00+00:00",
        "value": 57.8,
        "estimated": false
      },
      {
        "start": "2011-01-02T00:00:00+00:00",
        "value": 64.8,


In [20]:
!tail -25 my_meter_input.json

        "value": 73.0,
        "estimated": false
      },
      {
        "start": "2015-01-01T00:00:00+00:00",
        "value": null,
        "estimated": false
      }
    ]
  },
  "project": {
    "type": "PROJECT_WITH_SINGLE_MODELING_PERIOD_GROUP",
    "zipcode": "50321",
    "modeling_period_group": {
      "baseline_period": {
        "start": null,
        "end": "2013-06-01T00:00:00+00:00"
      },
      "reporting_period": {
        "start": "2013-07-01T00:00:00+00:00",
        "end": null
      }
    }
  }
}

Now we can run this through the meter exactly the same way we did before:

In [21]:
my_meter_output = meter.evaluate(my_meter_input, model=model, formatter=formatter)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)


## Inspecting results

Now that we have some results at our fingertips, let's inspect them. We'll be using the meter output from the first example trace.

The output is mostly made up of a set of "derivatives". These aren't derivatives in the calculus sense - they're just derived from the model output.

Let's take a look at the first one.

In [22]:
derivative = meter_output["derivatives"][0]

We can take a peek at the contents by looking at the keys of the dict.

In [23]:
[k for k in derivative.keys()]

['label',
 'derivative_interpretation',
 'trace_interpretation',
 'unit',
 'baseline',
 'reporting']

This derivative has the interpretation 'annualized_weather_normal', which means it contains estimates of annualized, weather normalized usage in the baseline and reporting periods.

In [24]:
derivative['derivative_interpretation']

'annualized_weather_normal'

The actual usage values are stored in 'baseline' and 'reporting'

In [25]:
baseline = derivative['baseline']
reporting = derivative['reporting']

These contain the values and upper and lower bounds, as well as the weather data used to calculate them (as "demand_fixture").

In [26]:
baseline_value = baseline['value']
baseline_upper_bound = baseline_value + baseline['upper']
baseline_lower_bound = baseline_value - baseline['lower']

In [27]:
baseline_lower_bound, baseline_value, baseline_upper_bound

(5702.2769406681437, 5812.0375208769656, 5921.7981010857875)

In [28]:
reporting_value = reporting['value']
reporting_upper_bound = reporting_value + reporting['upper']
reporting_lower_bound = reporting_value - reporting['lower']

In [29]:
reporting_lower_bound, reporting_value, reporting_upper_bound

(4473.0878250572732, 4612.2408703752799, 4751.3939156932865)

The savings can be calculated by subtracting the baseline value from the reporting value and propagating errors.

In [30]:
savings_value = baseline['value'] - reporting['value']
savings_upper_bound = savings_value + (baseline['upper']**2 + reporting['lower']**2)**0.5
savings_lower_bound = savings_value - (baseline['lower']**2 + reporting['upper']**2)**0.5

In [31]:
savings_lower_bound, savings_value, savings_upper_bound

(1022.5652904407402, 1199.7966505016857, 1377.0280105626312)

These values are all in kilowatt hours, as indicated in the derivative structure.

In [32]:
derivative['unit']

'KWH'