Specify maximum baseline and reporting period lengths #68

hshaban · 2018-02-01T10:59:41Z

Minimum baseline and reporting period lengths are defined in Caltrack’s data sufficiency requirements, but using long baseline/reporting periods results in significantly different model fits than constrained periods (due to naturally occurring savings, non-routine events etc.)

We propose setting a limit on the data included in the baseline and reporting periods: 12-months for daily data and 24 months for monthly data.

steevschmidt · 2018-02-01T22:45:43Z

HEA has found that 18 months of daily data is optimal for heating and cooling regressions. This amount provides more accuracy than just 12 months, but does not overwhelm recent trends (a risk with longer periods).

mcgeeyoung · 2018-02-02T16:30:59Z

Interesting. It's a little counterintuitive to select 18 months (given the likelihood of over-fitting to which ever season gets counted twice). Do you have test results that show why 18 months yields better results?

jskromer · 2018-02-06T23:13:58Z

Is any thought given to known or predicted variations in operational characteristics? For example, if you know that the building operation/schedule was recently changed, you probably wouldn't want to include the data from before the change. (you'd need to do a preemptive non-routine adjustment) One would prefer to have data on the operational modes that one expects to see in the reporting period.

danrubado · 2018-02-13T19:17:51Z

Energy Trust's view is that data should be selected in increments of 12-months so that a seasonal bias is not introduced when fitting the model, as pointed out by McGee. We also have a preference for limiting the baseline and reporting periods to the 12 months of data closest to the treatment period to limit the impact of factors unrelated to the treatment, as noted by Hassan. You may get better fit statistics using 24 months of data, but it may not represent the pre-retrofit conditions as closely. A longer time series may contain a blend of current and past physical and operational conditions at the site.

hshaban · 2018-02-15T12:08:12Z

Proposed test methodology:

This test should be run separately for daily and monthly data
Caltrack models will be fit to the baseline periods of all test buildings for different lengths of the baseline period in 3 month increments (i.e. 12, 15, 18, 21 and 24 months)
Total energy usage predictions using the baseline models will be calculated for a 12 month reporting period, following a fake project date.
Error metrics (CVRMSE and NMBE) will be calculated for these predictions as well.

Acceptance criteria:

The energy use trends and error metrics will be compared for different baseline period lengths. The minimum baseline period length that does not result in inflated out-of-sample errors will be recommended as the maximum baseline/reporting period length in the Caltrack spec.

hshaban · 2018-03-01T17:36:58Z

TEST RESULTS

Background
The length of the baseline and reporting periods that are included in the savings models may affect results in two ways:

Periods that are too short may not capture the full range of input conditions (e.g. weather, occupancy) that are typically experienced.
Periods that are too long increase the chances of unexpected changes in a building’s energy use (e.g. due to a change in occupancy or in the building’s equipment).

It is generally agreed that a minimum 12 months of data should be used in order to capture at least one annual cycle of energy use. However, there are no general guidelines about the maximum length of time to include in savings analysis.

Dataset
Billing data from 1000 residential buildings in Oregon and daily data from 1000 residential buildings in California.

Tested parameters
The Caltrack methods were applied to the full datasets five times, only varying the length of the baseline period that was used to fit the models- between 12 and 24 months in 3-month increments.

Results
Figure 1 shows that the estimated normalized consumption is clearly proportional to the baseline period length for daily data. Figure 2 demonstrates that with increasing period length, the model fits, represented by R-squared, tend to get worse. This is likely because of the second effect pointed out above (greater likelihood of non-routine events that affect energy use). This monotonic increase in baseline energy use for program participants may drive an increase in estimated savings. This is an indication that the rate of naturally occurring savings in this sample may affect results when longer time periods are used for analysis. Therefore, it appears preferable to use a maximum of 12 months of data for analysis, especially with the availability of daily data.

Figure 1. Effect of baseline period length on normalized annual consumption using daily data.

Figure 2. Effect of baseline period length on model R-squared distribution. Model fits get poorer with increasing baseline period length.

When the same test was applied using billing data, no monotonic trends were obvious in the normalized annual consumption, however, the normalized annual consumption showed cyclical trends, likely corresponding to the model being weighted towards the seasons with more data. The 24-month baseline model produced a normalized estimate that was slightly higher than the 12-month baseline model.

Figure 3. Effect of baseline period length on normalized annual consumption using billing data. Y axis (Baseline Normalized Annual Consumption) is in percent.

Recommendations

We recommend limiting the maximum baseline period length to 12 months of consumption data for both billing and daily models.

hshaban · 2018-03-01T23:52:27Z

Adding some more clarification to the choice of a 12 month baseline period based on stakeholder input:
The results above are dataset-specific and do not reflect generally expected trends in estimated baseline vs. baseline period length (e.g. estimated baseline energy use will not necessarily increase if a longer baseline is used). However, they do strongly indicate that the predicted baseline may be unstable with different baseline period lengths, which may, in turn, affect calculated savings. A choice must be made by the analyst as to how long this baseline period should be. We are recommending that this choice be limited by setting the maximum baseline period at 12 months, since the year leading to the energy efficiency intervention is the most indicative of near-term energy use, making it amenable to calculating savings in pay-for-performance scenarios.

hshaban · 2018-07-26T16:06:23Z

This update has been integrated in CalTRACK 2. Closing this issue

hshaban added the monthly and daily caltrack updates label Feb 13, 2018

danrubado mentioned this issue Feb 13, 2018

Data sufficiency requirements for Monthly methods #66

Closed

hshaban mentioned this issue Mar 5, 2018

FINAL CALL FOR COMMENTS: Monthly and Daily Methods Updates #82

Closed

hshaban closed this as completed Jul 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specify maximum baseline and reporting period lengths #68

Specify maximum baseline and reporting period lengths #68

hshaban commented Feb 1, 2018

steevschmidt commented Feb 1, 2018

mcgeeyoung commented Feb 2, 2018

jskromer commented Feb 6, 2018

danrubado commented Feb 13, 2018

hshaban commented Feb 15, 2018

hshaban commented Mar 1, 2018

hshaban commented Mar 1, 2018

hshaban commented Jul 26, 2018

Specify maximum baseline and reporting period lengths #68

Specify maximum baseline and reporting period lengths #68

Comments

hshaban commented Feb 1, 2018

steevschmidt commented Feb 1, 2018

mcgeeyoung commented Feb 2, 2018

jskromer commented Feb 6, 2018

danrubado commented Feb 13, 2018

hshaban commented Feb 15, 2018

hshaban commented Mar 1, 2018

TEST RESULTS

hshaban commented Mar 1, 2018

hshaban commented Jul 26, 2018