Add convenience function for calculating benchmarking metrics #111

AdamRJensen · 2021-11-15T23:56:41Z

Description

There exists a number of metrics that can be used for comparing a modeled time-series against a measured (reference) time-series. I propose to add a convenience function that makes it easy to calculate the most commonly used performance/benchmarking metrics. An overview of benchmarking metrics for solar irradiance time-series was made by Gueymard (2014).

Checklist

The following items must be addressed before the code can be merged.
Please don't hesitate to ask for help if you are unsure of how to accomplish any of the items.
You are free to remove any checklist items that do not apply or add additional items that are
not on this list

Closes #xxx
Added new API functions to docs/api.rst
Clearly documented all new API functions with PEP257 and numpydoc compliant docstrings
Adds description and name entries in the appropriate "what's new" file
in docs/whatsnew
for all changes. Includes link to the GitHub Issue with :issue:`num`
or this Pull Request with :pull:`num`. Includes contributor name
and/or GitHub username (link with :ghuser:`user`).
Non-API functions clearly documented with docstrings or comments as necessary
Added tests to cover all new or modified code
Pull request is nearly complete and ready for detailed review

AdamRJensen · 2021-11-15T23:58:21Z

@cwhanse As I've never worked with data classes before, I'd appreciate it if you just gave a thumbs up if you think I'm on the right path with what I've written so far.

wfvining · 2021-11-16T13:46:37Z

@AdamRJensen Looks like a good start, but opposite what I was expecting. The overall flavor of the library is functions that accept series as input and return either a new series or some other object. I think what @cwhanse (correct me if I'm wrong) and I were envisioning was that the data class would have attributes for each metric, and no methods. so you end up with something like:

@dataclass
class stats:
    rmsd: float
    r_squared: float
    # ...

and a function that returns an instance of that class:

def compare_series(measured: Series, modeled: Series) -> stats:
    # ...

I really like the way the metric calculation is deferred until it is requested in your implementation though, so I'm questioning my original thoughts. If kept as you have written my only feedback would be to decorate the methods you defined for each statistic with @property (or better yet, @functools.cached_property).

kandersolar · 2021-11-16T13:54:55Z

@functools.cached_property

fyi I think this only exists in python >=3.8

AdamRJensen · 2021-11-16T20:01:59Z

Thanks for the feedback @wfvining. I don't have a personal preference, and probably lack sufficient experience to determine which of the two options are preferred. Perhaps @cwhanse or @kanderso-nrel has an opinion?

cwhanse · 2021-11-16T22:44:34Z

I'm leery of the deferred computation. I think it's confusing for less-experienced users, and can lead to difficulty re-using components from this library. I'd prefer the DataClass having only attributes, and the compare_series function.

wholmgren · 2021-11-16T22:56:17Z

Is there going to be any consideration for resampling, interval convention, or filtering? Or is that left to the user? My own experience is these trickier aspects are most relevant when formalizing in a class. But that's a much larger scope than what's currently proposed.

AdamRJensen · 2021-11-16T23:11:10Z

@wholmgren My initial thought was to let the user handle this. I don't really see how this could look... perhaps you can elaborate?

In regards to interval convention, then I suppose it doesn't matter much, assuming that the modeled and measured time-series follow the same convention.

I would propose that a simple nan check is implemented, e.g., an error is raised if either time-series contains nans.

wholmgren · 2021-11-17T16:57:00Z

My initial thought was to let the user handle this... In regards to interval convention, then I suppose it doesn't matter much, assuming that the modeled and measured time-series follow the same convention.

Totally reasonable to let the user handle it. You seem like the kind of person that is very careful with comparisons so maybe this is of relatively little value to you. It's clear to me that many others are not so careful. Also here's some relevant discussion in the pvlib google group.

I don't really see how this could look... perhaps you can elaborate?

I'm not saying this is the right way to do it here, but (no surprise) I'd point to solarforecastarbiter for a highly formal example. And obviously the forecast-specific attributes should be ignored in this discussion. Workflow goes like this:

Define an Observation and a Forecast. These include the interval conventions.
Put them together in a ForecastObservation. This step helps to define metadata for additional processing steps.
Pass the ForecastObservation to a function, process_forecast_observations that performs the "preprocessing" steps of making interval labels and lengths consistent, filters data, and drops or fills missing data.

What you get from this is a ProcessedForecastObservation that contains resampled, aligned, filtered time series.

The final step is to pass that object to a function that calculates metrics. Roughly speaking, this PR is a different take on that step. I'd be happy to say this PR should be restricted to that step and set aside the rest of this discussion. I just wanted to bring up the larger context.

I would propose that a simple nan check is implemented, e.g., an error is raised if either time-series contains nans.

Ah, I suspect there's a variety of opinions here. Pandas default style is to ignore nans. Numpy returns nans unless using the nan* functions. I can see an argument for raising too. If you dig into solar forecast arbiter preprocessing.py you'll find there's a sophisticated reporting system for various data checks, including nans, nighttime, unphysical values, etc.

pvanalytics started with a port of the general purpose validation code developed under solarforecastarbiter. I think it would be useful to see a general purpose port of the time series resample/align/fill code that continues to integrate with validation code. But again, maybe a different PR and for now we should only keep it in mind as we think about interfaces.

Also, you might consider adapting the tests in test_deterministic to this PR. There are a few edge cases to consider.

cwhanse · 2021-11-17T17:20:47Z

Thanks very much @wholmgren.

resampled, aligned, filtered time series

I'm assuming that those are the inputs to the metrics function proposed here.

I think it would be useful to see a general purpose port of the time series resample/align/fill code

I agree.

wholmgren · 2021-11-17T17:23:18Z

I'm assuming that those are the inputs to the metrics function proposed here.

Correct. I eventually talked myself into believing that we should stay focused on metrics in this PR, but with an acknowledgement that there's much more to consider.

cwhanse · 2021-11-17T17:28:35Z

@AdamRJensen add this line to requirements-min.txt file:

dataclasses

@wfvining we need to change the last line of the README:

* `dataclasses` contains classes for normalizing data (e.g. an
  `IVCurve` class)

cwhanse · 2021-12-02T18:35:00Z

Test failure being fixed in #115

AdamRJensen · 2022-02-07T20:55:41Z

Should the function be able to handle nans in the two time series?

For example, start by dropping the rows where there is a nan in either the measured or modeled time series?

Then calculate N as:

N = sum(~np.isnan(modeled - measured))

cwhanse · 2022-02-07T21:00:56Z

Should the function be able to handle nans in the two time series?

Yes, I think default behavior should be to ignore nans. I'm on the fence if that behavior should be controlled by a kwarg.

Update metrics.py

571c4b9

Reformat class structure

252040d

Add KSI and OVER metrics

4680a05

Add dataclasses to requirements-min.txt

590d7c8

wfvining mentioned this pull request Nov 19, 2021

Rename hypothetical 'dataclasses' module in readme/docs #112

Closed

Update metrics.py

185fc5c

kandersolar mentioned this pull request Feb 16, 2022

Release ahead of PVRW? #130

Closed

AdamRJensen closed this May 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add convenience function for calculating benchmarking metrics #111

Add convenience function for calculating benchmarking metrics #111

AdamRJensen commented Nov 15, 2021 •

edited

AdamRJensen commented Nov 15, 2021

wfvining commented Nov 16, 2021

kandersolar commented Nov 16, 2021

AdamRJensen commented Nov 16, 2021

cwhanse commented Nov 16, 2021 •

edited

wholmgren commented Nov 16, 2021

AdamRJensen commented Nov 16, 2021

wholmgren commented Nov 17, 2021

cwhanse commented Nov 17, 2021

wholmgren commented Nov 17, 2021

cwhanse commented Nov 17, 2021

cwhanse commented Dec 2, 2021

AdamRJensen commented Feb 7, 2022 •

edited

cwhanse commented Feb 7, 2022

Add convenience function for calculating benchmarking metrics #111

Add convenience function for calculating benchmarking metrics #111

Conversation

AdamRJensen commented Nov 15, 2021 • edited

Description

Checklist

AdamRJensen commented Nov 15, 2021

wfvining commented Nov 16, 2021

kandersolar commented Nov 16, 2021

AdamRJensen commented Nov 16, 2021

cwhanse commented Nov 16, 2021 • edited

wholmgren commented Nov 16, 2021

AdamRJensen commented Nov 16, 2021

wholmgren commented Nov 17, 2021

cwhanse commented Nov 17, 2021

wholmgren commented Nov 17, 2021

cwhanse commented Nov 17, 2021

cwhanse commented Dec 2, 2021

AdamRJensen commented Feb 7, 2022 • edited

cwhanse commented Feb 7, 2022

AdamRJensen commented Nov 15, 2021 •

edited

cwhanse commented Nov 16, 2021 •

edited

AdamRJensen commented Feb 7, 2022 •

edited