Accuracy #73

steevschmidt · 2018-02-01T21:18:26Z

HEA believes accuracy of CalTRACK results is critical for the long term success of P4P programs, should be a top priority along with the other three, and should inform the priority of other tasks.

Background --

We have been analyzing residential smart meter data since 2008, and deployed our first customer-facing disaggregation tool in 2010. We learned quickly that some customers are more energy savvy than others (e.g. Art Rosenfeld, Gil Masters, CEC & CPUC staffers, etc) and that getting the analysis right for each home was crucial to providing the right recommendations. So we too have been humbled by the challenges in this space. And since very little "ground truth" data exists, we had to come up with other methods to test the accuracy of our system.

We have pursued three primary approaches:

Use Artificial Data Sets: We created our first simplistic "artificial home" in 2013 in order to test various disaggregation methods. This approach to testing energy tool accuracy was also the conclusion of a CEC research project into a possible "AMI Data Analytics Testbed" lead by Martha Brook in 2016. See the final report here, and try the prototype online service (using CBECC-Res) at www.VirtualHomeData.com to create your own artificial home data (current configurations can produce energy data for 400,000 unique artificial homes).
Analyze the Homes of Energy Experts: As mentioned above, there are many industry experts who know their homes well enough to assess results of detailed analysis. Their feedback has been incredibly helpful to HEA to improve our algorithms.
Use Ground Truth sub metered Data, from actual homes. When available this is the preferred approach, but it is plagued with problems. Efforts to date have failed to provide complete and reliable data sets. For example, capturing 17 individual loads doesn't help if the 18th (unmetered) load is a space heater or attic fan on a thermostat. And when all the sub metered data doesn't add up to the whole house meter readings... all readings come into question.

HEA has been a long time champion for P4P because we believe it will drive EE to be more cost effective. With one P4P contract in place and a 2nd in final negotiations, HEA is very motivated to make CalTRACK successful.

Can we discuss/develop a strategy for accuracy?

mcgeeyoung · 2018-02-02T16:26:44Z

For CalTRACK, we decided to use out-of-sample testing to gauge the uncertainty associated with estimating counterfactual usage. Accuracy is probably not the best way to describe the nature of the uncertainty that we're dealing with. But it turned out that out-of-sample testing proved a reliable way to evaluate methods choices. For CalTRACK 2.0 we would like to keep the same testing regime in place. We can look at specific steps in the methodology and evaluate whether or not a revised approach would yield a better out of sample result.

steevschmidt · 2018-03-16T21:41:34Z

In addition to validating CalTRACK regression results as suggested above, another bulk approach may be much easier: compare the calculated heating intensity (btu/sf/hdd) of homes to expected norms.

Possible approach:

Collect total_HDDs for the period (either baseline or reporting);
Use regression results (hdd_coefficient) with total_HDDs to estimate heating kWhs and heating therms during the period;
Use site energy conversion of kWhs to BTUs (3,412) and therms to BTUs (100k) to get total heating BTUs during period;
Use size of home to get BTU/sf/hdd, and compare to EIA data referenced above.

Note this would only be possible for homes where we have data on all primary heating fuels (e.g. electricity and natural gas).

Any reason this wouldn't work? If it does, it may provide a useful metric for #71.

steevschmidt · 2018-07-03T17:56:23Z

Related to NMEC accuracy, adding a reference to an excellent paper by Sam Borgeson for PG&E on targeting EE programs for SMBs. Snippet from page 54:

Potential sources of NMEC savings bias:

In large samples, mean-zero fluctuations and site-specific changes in consumption are often assumed to cancel out across premises (for every site with an increase, there is a corresponding site with a decrease). However, shared factors like droughts, prevailing economic conditions, etc. can cause shifts in consumption that do not cancel out. Further, these exogenous factors can impact certain customer segments more than others.

Similarly, a weather normalization model that is overly temperature sensitive or was trained using relatively cool (or hot) weather data, could create systematic biases when trying to normalize consumption for a relatively hot (or cold) year.

Trends in energy consumption (i.e. organic LED adoption or plug load growth) can also undermine the assumption that models trained on pre-period data can provide unbiased estimates of the counterfactual conditions for the post-period.

hshaban · 2018-07-26T16:28:44Z

Closing this issue as out-of-sample testing was used for CalTRACK 2

steevschmidt · 2018-07-26T16:34:16Z

All of these issues apply to future CalTRACK improvements; I'd like to request this ticket not be closed, but instead be moved into the "future requests" category.

hshaban · 2018-07-26T16:37:35Z

Ok, will bring this to discussion with the next working group

steevschmidt · 2019-08-20T23:21:01Z

Recently McGee posted to the Recurve blog an internal discussion titled Accuracy: Why I Hate That Term which helped me understand his prior comments (and our differing views) on this topic. I realize now we may have been talking about two different types of accuracy.

A slide from the presentation is shown here:

From HEA's perspective, the answer to the third bullet is a resounding Yes: NMEC accuracy for residential homes should include identification of ALL non-weather-related changes in energy consumption, no matter what the cause. We are normalizing [residential] building energy use for weather, and nothing else. So it's critical that we identify HVAC loads accurately.

On the other hand, the first two bullets -- and much of the related discussion in the video -- relate to accuracy of attribution (i.e. "explaining") not accuracy of NMEC. We agree the former is unknowable and agree with McGee on his analysis of that issue. However, accuracy in NMEC, the intended focus of this Issue, is a different beast altogether and can be known and measured.

For example, the "True Value" (i.e. Ground Truth) measurement of how much of a building's energy went toward heating in a given period can be measured (not modeled): every year, Gil Masters at Stanford has his building science students do this in a small mobile home with a single resistance heater, and their grade depends on the accuracy of their analysis.

Likewise when we use CalTRACK to identify heating and cooling loads in a baseline period (in order to normalize them for weather) we could measure the accuracy of the resulting model against ground truth during the baseline period: did the model produce the same value of heating load as was measured? One simple example of such a test would be to run CalTRACK on Gil's mobile home and confirm the resulting heating coefficients during the baseline period result in a heating load similar to what the students measured. But there are other ways as well; I proposed some in #122.

McGee wrote above "we decided to use out-of-sample testing to gauge the uncertainty associated with estimating counterfactual usage". As described in #123, this works only for buildings with predictable energy use: if energy use patterns during the period used to build the model differs from the energy use patterns in the out-of-sample period, all bets are off. We need to develop other methods to assess & improve the accuracy of the CalTRACK model vs the ground truth during that same baseline period.

arstein · 2019-08-30T14:49:36Z

We concur with Steve about the importance of model accuracy to NMEC. See a related discussion here: https://gridium.com/evo-measurement-verification-accuracy/

steevschmidt mentioned this issue Mar 6, 2018

Weather station selection criteria should include guidance for site as well as zip code centroid and a fallback option in case the first weather station fails because of data sufficiency #65

Closed

This was referenced May 18, 2018

Heating balance point temperatures selected by CalTRACK are abnormally low #95

Closed

Add heating and cooling efficiency metrics as additional model validation #98

Closed

hshaban closed this as completed Jul 26, 2018

hshaban reopened this Jul 26, 2018

hshaban added this to Existing Daily Methods Improvements in CalTRACK Future Improvements Roadmap Jul 26, 2018

steevschmidt mentioned this issue Nov 3, 2018

CalTRACK Results [contested]. [NO EVIDENCE OF CLAIMS PRESENTED #109

Closed

mcgeeyoung closed this as completed Feb 14, 2019

AJLutz mentioned this issue May 8, 2019

CalTRACK Issue: Difference in Differences (DiD) methods using control groups for better gross / net savings #119

Closed

This was referenced May 8, 2019

CalTRACK Issue: Heating balance point temperatures abnormally low? #121

Closed

CalTRACK Issue: Enhance Testing Methods #122

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accuracy #73

Accuracy #73

steevschmidt commented Feb 1, 2018

mcgeeyoung commented Feb 2, 2018

steevschmidt commented Mar 16, 2018

steevschmidt commented Jul 3, 2018

hshaban commented Jul 26, 2018

steevschmidt commented Jul 26, 2018

hshaban commented Jul 26, 2018

steevschmidt commented Aug 20, 2019

arstein commented Aug 30, 2019

Accuracy #73

Accuracy #73

Comments

steevschmidt commented Feb 1, 2018

mcgeeyoung commented Feb 2, 2018

steevschmidt commented Mar 16, 2018

steevschmidt commented Jul 3, 2018

hshaban commented Jul 26, 2018

steevschmidt commented Jul 26, 2018

hshaban commented Jul 26, 2018

steevschmidt commented Aug 20, 2019

arstein commented Aug 30, 2019