Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add convenience function for calculating benchmarking metrics #111

Closed
wants to merge 5 commits into from

Conversation

AdamRJensen
Copy link
Member

@AdamRJensen AdamRJensen commented Nov 15, 2021

Description

There exists a number of metrics that can be used for comparing a modeled time-series against a measured (reference) time-series. I propose to add a convenience function that makes it easy to calculate the most commonly used performance/benchmarking metrics. An overview of benchmarking metrics for solar irradiance time-series was made by Gueymard (2014).

Checklist

The following items must be addressed before the code can be merged.
Please don't hesitate to ask for help if you are unsure of how to accomplish any of the items.
You are free to remove any checklist items that do not apply or add additional items that are
not on this list

  • Closes #xxx
  • Added new API functions to docs/api.rst
  • Clearly documented all new API functions with PEP257 and numpydoc compliant docstrings
  • Adds description and name entries in the appropriate "what's new" file
    in docs/whatsnew
    for all changes. Includes link to the GitHub Issue with :issue:`num`
    or this Pull Request with :pull:`num`. Includes contributor name
    and/or GitHub username (link with :ghuser:`user`).
  • Non-API functions clearly documented with docstrings or comments as necessary
  • Added tests to cover all new or modified code
  • Pull request is nearly complete and ready for detailed review

@AdamRJensen
Copy link
Member Author

@cwhanse As I've never worked with data classes before, I'd appreciate it if you just gave a thumbs up if you think I'm on the right path with what I've written so far.

@wfvining
Copy link
Collaborator

@AdamRJensen Looks like a good start, but opposite what I was expecting. The overall flavor of the library is functions that accept series as input and return either a new series or some other object. I think what @cwhanse (correct me if I'm wrong) and I were envisioning was that the data class would have attributes for each metric, and no methods. so you end up with something like:

@dataclass
class stats:
    rmsd: float
    r_squared: float
    # ...

and a function that returns an instance of that class:

def compare_series(measured: Series, modeled: Series) -> stats:
    # ...

I really like the way the metric calculation is deferred until it is requested in your implementation though, so I'm questioning my original thoughts. If kept as you have written my only feedback would be to decorate the methods you defined for each statistic with @property (or better yet, @functools.cached_property).

@kandersolar
Copy link
Member

@functools.cached_property

fyi I think this only exists in python >=3.8

@AdamRJensen
Copy link
Member Author

Thanks for the feedback @wfvining. I don't have a personal preference, and probably lack sufficient experience to determine which of the two options are preferred. Perhaps @cwhanse or @kanderso-nrel has an opinion?

@cwhanse
Copy link
Member

cwhanse commented Nov 16, 2021

I'm leery of the deferred computation. I think it's confusing for less-experienced users, and can lead to difficulty re-using components from this library. I'd prefer the DataClass having only attributes, and the compare_series function.

@wholmgren
Copy link
Member

Is there going to be any consideration for resampling, interval convention, or filtering? Or is that left to the user? My own experience is these trickier aspects are most relevant when formalizing in a class. But that's a much larger scope than what's currently proposed.

@AdamRJensen
Copy link
Member Author

@wholmgren My initial thought was to let the user handle this. I don't really see how this could look... perhaps you can elaborate?

In regards to interval convention, then I suppose it doesn't matter much, assuming that the modeled and measured time-series follow the same convention.

I would propose that a simple nan check is implemented, e.g., an error is raised if either time-series contains nans.

@wholmgren
Copy link
Member

My initial thought was to let the user handle this... In regards to interval convention, then I suppose it doesn't matter much, assuming that the modeled and measured time-series follow the same convention.

Totally reasonable to let the user handle it. You seem like the kind of person that is very careful with comparisons so maybe this is of relatively little value to you. It's clear to me that many others are not so careful. Also here's some relevant discussion in the pvlib google group.

I don't really see how this could look... perhaps you can elaborate?

I'm not saying this is the right way to do it here, but (no surprise) I'd point to solarforecastarbiter for a highly formal example. And obviously the forecast-specific attributes should be ignored in this discussion. Workflow goes like this:

  1. Define an Observation and a Forecast. These include the interval conventions.
  2. Put them together in a ForecastObservation. This step helps to define metadata for additional processing steps.
  3. Pass the ForecastObservation to a function, process_forecast_observations that performs the "preprocessing" steps of making interval labels and lengths consistent, filters data, and drops or fills missing data.

What you get from this is a ProcessedForecastObservation that contains resampled, aligned, filtered time series.

The final step is to pass that object to a function that calculates metrics. Roughly speaking, this PR is a different take on that step. I'd be happy to say this PR should be restricted to that step and set aside the rest of this discussion. I just wanted to bring up the larger context.

I would propose that a simple nan check is implemented, e.g., an error is raised if either time-series contains nans.

Ah, I suspect there's a variety of opinions here. Pandas default style is to ignore nans. Numpy returns nans unless using the nan* functions. I can see an argument for raising too. If you dig into solar forecast arbiter preprocessing.py you'll find there's a sophisticated reporting system for various data checks, including nans, nighttime, unphysical values, etc.

pvanalytics started with a port of the general purpose validation code developed under solarforecastarbiter. I think it would be useful to see a general purpose port of the time series resample/align/fill code that continues to integrate with validation code. But again, maybe a different PR and for now we should only keep it in mind as we think about interfaces.

Also, you might consider adapting the tests in test_deterministic to this PR. There are a few edge cases to consider.

@cwhanse
Copy link
Member

cwhanse commented Nov 17, 2021

Thanks very much @wholmgren.

resampled, aligned, filtered time series

I'm assuming that those are the inputs to the metrics function proposed here.

I think it would be useful to see a general purpose port of the time series resample/align/fill code

I agree.

@wholmgren
Copy link
Member

I'm assuming that those are the inputs to the metrics function proposed here.

Correct. I eventually talked myself into believing that we should stay focused on metrics in this PR, but with an acknowledgement that there's much more to consider.

@cwhanse
Copy link
Member

cwhanse commented Nov 17, 2021

@AdamRJensen add this line to requirements-min.txt file:

dataclasses

@wfvining we need to change the last line of the README:

* `dataclasses` contains classes for normalizing data (e.g. an
  `IVCurve` class)

@cwhanse
Copy link
Member

cwhanse commented Dec 2, 2021

Test failure being fixed in #115

@AdamRJensen
Copy link
Member Author

AdamRJensen commented Feb 7, 2022

Should the function be able to handle nans in the two time series?

For example, start by dropping the rows where there is a nan in either the measured or modeled time series?

Then calculate N as:

N = sum(~np.isnan(modeled - measured))

@cwhanse
Copy link
Member

cwhanse commented Feb 7, 2022

Should the function be able to handle nans in the two time series?

Yes, I think default behavior should be to ignore nans. I'm on the fence if that behavior should be controlled by a kwarg.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants