Weather station selection criteria should include guidance for site as well as zip code centroid and a fallback option in case the first weather station fails because of data sufficiency #65
Comments
I'm curious what other weather station selection methods were considered. Given the limited number of weather stations listed in the specification, it may be best to use the nearest available station data, but from what I've read, spatial interpolating by Kriging could be appropriate. The interpolation methodology would have to be adapted to accommodate for occasionally missing data. |
@roberthansen Do you have a link to the Kriging methodology? We hadn't considered it, but it may be particularly useful as we get into hourly savings methods and have more of a substantial need for complete hourly data. |
Here's a thread on the topic, including links to examples in a few languages (R and Fortran); the given answer is a generalized case, but hopefully useful: https://stats.stackexchange.com/questions/157543/how-does-kriging-interpolation-work ArcGIS also has some help documentation for using their implementations: http://resources.arcgis.com/en/help/main/10.1/index.html#/What_are_the_different_kriging_models/00310000003q000000/ And here's a Python implementation: https://github.com/bsmurphy/PyKrige There are of course other interpolation methods that may be worth considering, but Kriging seemed to be the preferred option for temperature in my quick research. |
I do not support spatial interpolation of weather data values for CalTrack for three reasons:
I support McGee's general approach to selecting a weather station for a site and using the data from that station. However, in the event of missing values in the time series, I would suggest the following:
|
Another weather station selection issue to consider: what subset of weather stations should be considered as sources of weather data? Not all weather stations are created equal and they have varying levels of data quality and reliability. In particular, issues to consider are the frequency of readings, percent of missing values in the time series, length of the time series, and calibration/accuracy of sensors. We should evaluate station reliability and develop (or adopt from other work) some criteria to screen out stations deemed unreliable. |
Proposed plan:
|
If the approach for selecting the weather station and retrieving/manipulating data require testing, visualization, formatting, or adjusting, the freely available, open source tool , Elements, may be of use. See https://bigladdersoftware.com/projects/elements/ for more information. |
Suggest dealing with hourly data separately (ie. different sub-sections), as the considerations will be different (eg. if talking about "two consecutive missing data points"). Based on (limited) recent experience it can be harder to find good quality complete hourly weather data, so we need to think carefully how we set the minimum data sufficiency requirement. Also, on the question of moving to the next-nearest weather station if you find bad data, CalTrack should define method in case that you end up 50 miles away, on the other side of a mountain or on the coastline, ie. in a place where weather may be different. Hard to be prescriptive about it, but you could at least require that the energy modeler justify their choice in cases where they choose a weather station outside of a specified radius. |
@eliotcrowe Your suggestion makes sense to me, as we'll have a number of similar questions regarding data sufficiency once we get to hourly. For now, we should focus this question on monthly and daily weather sufficiency. |
@danrubado, thanks for pushing back on the interpolation methods. I'd like to continue this discussion before moving forward with nearest-neighbor interpolation (or, for that matter, Kriging). I'll address your concerns below.
I disagree that complexity would necessarily be increased significantly. With nearest-neighbor interpolation, each weather reading would, at worst, check each weather station's location and time stamps (88 stations if we constrain to the NOAA set, assume sorted lists/no searching for most recent) before selecting the nearest, checking for and applying the nearest (temporally and spatially) available reading. An alternative to nearest-neighbor would be triangulation, which would use three readings, rather than one, and could provide substantially more realistic values when sensors are geographically sparse. Inverse distance weighting and polynomial interpolation could use a subset (up to the full set) of all sensors, and again provide improved results at project sites. In each of these cases, the computational complexity is a function of both the fixed number of stations and the potentially growing number of sites. Since number of stations is constant, and the number of sites will likely exceed the stations, the complexity is essentially O(n), where n is the number of sites. Interpolation methods involving regression (e.g., Kriging) may add complexity to each interpolation calculation by a function of the number of stations, but as this number is constant, overall computational complexity remains O(n). In any case, any interpolation method is well-suited for parallelization. No interpolation method we might consider is going to overwhelm the processing capability a modern computer.
This is a fair point, however it applies equally, if not more so, to the proposed nearest-neighbor interpolation. I have been looking for studies which either apply or recommend this method for weather data, but I haven't found any such endorsement. Authors acknowledge that it is available, and is useful for ease of calculation, but only produces somewhat realistic values unless weather stations are immediately adjacent to the location of interpolation. I believe applying this interpolation across the distances between the 88 NOAA stations will yield unsatisfactory approximations of the weather at project sites. If, however, we allow (or require) implementers to setup their own weather stations at each site, or use nearby, reliable stations within, say, 1 to 10 miles, depending on the terrain and climate zone, this method would be justifiable. One advantage Kriging holds over other interpolation methods is that it gives estimations of the error in the interpolated value. All interpolation methods necessarily layer on additional errors to those intrinsic to the station data. With nearest-neighbor interpolation, it's clear that these errors are significant, but just much is an unknown.
Again, resorting to nearest-neighbor interpolation does not address this concern. Indeed, Kriging can be generalized into three (2 space + 1 time) or four (add elevation/adiabatic lapse rate), and interpolations between readings in space and time become automatic--missing data points are no more of a problem than a weather station that never existed. Finally, here's another resource on several common interpolation methods used for meteorological data, including nearest-neighbor and several flavors of Kriging: Since CalTrack aims to be an open, repeatable methodology, I would propose considering the deterministic interpolation methods described in the link above, as anyone could apply the methodology to the same data and return identical results. Kriging, while potentially a more accurate method, is stochastic in nature, and thus won't return the same results on repeated analysis. Thus, I propose either the inverse distance weighting (easier to code) or linear regression (better results). |
Regarding missing data, I recommend reviewing literature for best practices, as I've seen authors suggest different approaches depending on the sampling interval. I'm not sure how real-time the calculations are to be done, but if we're talking about intermittently missing historical data, it makes sense to interpolate between both nearby stations and the closest station before and after the outage. For real-time calculation, we wouldn't have the nearest reading after the outage, but could still use the reading before the outage. More generally, however, I suggest dropping the weather station paradigm and dealing with weather readings independently, so for each site, at each time of interest, we would interpolate among a time- and space-windowed set of readings. For nearest-neighbor interpolation, this set would always be one reading (alternatively, interpolating in time between nearest-station readings is another possibility, in which case we would be looking at consecutive readings from one station). For more advanced interpolation methods, the window can be expanded arbitrarily or based on geographical features (i.e., mountain ranges). In this case, we wouldn't need to specify data sufficiency, but would need to define minimum sensor quality, including proper setup. This would open up the possibility of incorporating temporary and mobile weather stations. |
It might be helpful to run an out of sample test using a sample of homes in which we select weather data from the primary weather station and then for the same sample of homes we select the fallback weather station data. It may turn out that the difference is large enough to justify additional cleaning of weather data. It also might turn out to not matter very much. |
One of the first cuts at weather data we looked at so far is how consistent the weather is across a state over the course of a year. Of course this will raise more questions than answers, but I thought others would find this interesting as we dig in. These are plots of all the weather stations in several states (New York, Texas, Illinois, Colorado, and California). You can see that California actually has three weather patterns, but in the other states, the weather is remarkably consistent across the state. The next question will be, how much do the variations in the weather matter? |
I used the following testing methodology to test methods for matching a site to a weather station. To isolate differences in weather from differences in model fits due to the peculiarities of any particular site, this methodology avoids using meter data and focuses entirely on temperature data. Temperature models as described in the daily and billing methods assume that an accurate source of site temperature data is available. Weather station matching algorithms are developed to assist in the process of finding appropriate weather station from which to obtain this temperature data. But to evaluate the effectiveness of these weather mappings, we need a way to test them for accuracy. What I'm proposing as a more general testing method for weather station matching algorithm is to bootstrap a scenario in which we have knowledge of weather at a site - in this case by matching weather stations sites to other weather stations. Because we have a clear picture of weather at each weather station site, we can use that to match it to other stations using metrics such as weather-to-weather cosine distance or rmse we can't normally use without “ground truth” temperature data (because few buildings have a reliable data source of outdoor temperature data on site). The assumption, of course, is that the most similar source of weather data will be the best one to use in modeling (I see no reason why this should not be the case - but I could imagine someone coming up with a different similarity metric that more heavily weight the similarity of the shape or mean. This method should work even in those scenarios.) Weather similarity was determined in the following way. For all weather stations in california meeting the data quality pre-screen, we pulled hourly temperature data for Jan 2014 to Jan 2018. We combined 3 distance metrics (rmse, km, and cosine_dist) by adding each rank together with equal weight (like scoring a racing team). Using that metric there are 12 (of 115) stations which have "best" stations with scores of 10 or higher, meaning for the remaining stations the worst rank we see is a 6 in any category, showing some evidence of stability. Two stations have scores are above 20, max is 29. Ranks correspond to size on the map below. A rank with a larger number means that the “best” station was less convincingly dominant in the three ranked similarity scores. Example: Station 1: rmse rank 3, distance rank 4, cosine distance rank 5 -> score 12 Station 2 has a better ranked similarity than Station 1. Bootstrapped “ground truth” station mappings:
The two methods I tested are as follows: Method A:
Method B:
For each method, we measured the percentage of station-sites for which the method found the “ground-truth” station and used this to “score” the method. These scores are reported below and were used to determine preferred order of the methods. Method A: 56% Therefore, Method A is recommended in favor of Method B. Method A scores 56% because 56% of the selections made by Method A (restricted to the test station set) exactly matched the ground truth ranked distance selection in the table above. There is likely room for improvement here if other methods are proposed, for example one using elevation (although using a method like that would require having elevation data at the sites in question, which may not always be practical). One could imagine using a scoring method similar to this (i.e., using bootstrapped weather stations as sites) to score the accuracy of various kriging methods, although I did not. |
Another effort I made was to test the effect of using poorly matched weather data on the following two meters from the greater Philadelphia area (matched weather station USAF ID 725113, ~8km from site) , using the Caltrack 1 Daily model (hdd range 55-65, increment 1, cdd range 65-75, increment 1). Basic fits on two years of data show below. (Also performed fits on first year and second year alone and both showed little difference): Then I picked two weather stations from each state, the idea being that the weather at these stations would diverge widely (but realistically) from the weather observed at the station closest to the ISD station.
Then I fit the models using the same CalTRACK 1.0 daily method with the same meter data but different the weather data - once for each of these stations. The results are charted below, with results for electricity on the left and gas on the right. Commentary on each row of charts:
|
From the summary:
There are a number of possible explanations for this. I would suggest a more accurate conclusion is:
|
That was the intended conclusion, thank you for the clarifying language. |
To further clarify the language, maybe "accuracy of weather data" (open to
interpretation, and there are many kinds of inaccuracy) could be adjusted
to something like "location of weather station", with an added caveat that
the SAME weather station needs to be used for pre and post case.
This might be stating the obvious but it seems that the key element is that
all weather stations have similar high/low/shoulder seasonal
characteristics. Even if different states see different extremes, the model
just needs to associate the highest energy consumption with the highest
ambient temp, and it doesn't seem to matter much if the highest ambient
temp is 90F or 110F, so long as it's occurring at roughly the same time of
year(?).
…On Tue, Mar 6, 2018 at 9:19 AM, Phil Ngo ***@***.***> wrote:
That was the intended conclusion, thank you for the clarifying language.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#65 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AiYRAEKgYb4dJFrouuvjibMnwOzCqPrdks5tbsUlgaJpZM4R1I_1>
.
--
Eliot Crowe
Program Manager
Building Technology and Urban Systems Division
Energy Technologies Area
Lawrence Berkeley National Laboratory
541-708-3034 (O) 503-784-7250 (C)
|
"Accuracy" is definitely the wrong language, as it necessitates having something to measure against - a ground truth. I did not measure or get access to that ground truth at this or most other sites, so the lack of a ground truth in this case makes my choice of the word "accuracy" totally meaningless. Oops!
|
I raised the issue of accuracy in issue 73. As described there, I believe you could use a best-in-class building energy simulation tool (like CBECC-Res) to run various tests with different building types and weather data to test CalTRACK methods. These modeling tools create highly accurate disaggregated energy data for these virtual buildings, with the benefit of having no real occupants to mess up the measurements. This may be the closest thing we can get to "ground truth data". |
Right! It's good to remember that there are many different ways to test these methods, and anything that gives us more (or less!) confidence in these methods can be helpful. It's likely there is something to gain from using simulations as a proxy for true ground truth. I think that at the end of the day what matters is the extent to which CalTRACK methods are reliable under a variety of circumstances. The better we understand these methods and their characteristics the better equipped we are to make those methodology decisions. |
This update has been integrated in CalTRACK 2. Closing this issue. Added an issue for weather data averaging in Sandbox #91 |
Current CalTRACK methods underspecify weather station selection.
"Weather station mapping requires locating the station nearest to the project. Each project file should contain a zip code that allows matching weather stations to projects"
I propose amending the methods guidance as follows:
The text was updated successfully, but these errors were encountered: