-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add convenience function for calculating benchmarking metrics #111
Conversation
@cwhanse As I've never worked with data classes before, I'd appreciate it if you just gave a thumbs up if you think I'm on the right path with what I've written so far. |
@AdamRJensen Looks like a good start, but opposite what I was expecting. The overall flavor of the library is functions that accept series as input and return either a new series or some other object. I think what @cwhanse (correct me if I'm wrong) and I were envisioning was that the data class would have attributes for each metric, and no methods. so you end up with something like: @dataclass
class stats:
rmsd: float
r_squared: float
# ... and a function that returns an instance of that class: def compare_series(measured: Series, modeled: Series) -> stats:
# ... I really like the way the metric calculation is deferred until it is requested in your implementation though, so I'm questioning my original thoughts. If kept as you have written my only feedback would be to decorate the methods you defined for each statistic with |
fyi I think this only exists in python >=3.8 |
I'm leery of the deferred computation. I think it's confusing for less-experienced users, and can lead to difficulty re-using components from this library. I'd prefer the |
Is there going to be any consideration for resampling, interval convention, or filtering? Or is that left to the user? My own experience is these trickier aspects are most relevant when formalizing in a class. But that's a much larger scope than what's currently proposed. |
@wholmgren My initial thought was to let the user handle this. I don't really see how this could look... perhaps you can elaborate? In regards to interval convention, then I suppose it doesn't matter much, assuming that the modeled and measured time-series follow the same convention. I would propose that a simple nan check is implemented, e.g., an error is raised if either time-series contains nans. |
Totally reasonable to let the user handle it. You seem like the kind of person that is very careful with comparisons so maybe this is of relatively little value to you. It's clear to me that many others are not so careful. Also here's some relevant discussion in the pvlib google group.
I'm not saying this is the right way to do it here, but (no surprise) I'd point to
What you get from this is a The final step is to pass that object to a function that calculates metrics. Roughly speaking, this PR is a different take on that step. I'd be happy to say this PR should be restricted to that step and set aside the rest of this discussion. I just wanted to bring up the larger context.
Ah, I suspect there's a variety of opinions here. Pandas default style is to ignore nans. Numpy returns nans unless using the pvanalytics started with a port of the general purpose validation code developed under solarforecastarbiter. I think it would be useful to see a general purpose port of the time series resample/align/fill code that continues to integrate with validation code. But again, maybe a different PR and for now we should only keep it in mind as we think about interfaces. Also, you might consider adapting the tests in |
Thanks very much @wholmgren.
I'm assuming that those are the inputs to the metrics function proposed here.
I agree. |
Correct. I eventually talked myself into believing that we should stay focused on metrics in this PR, but with an acknowledgement that there's much more to consider. |
@AdamRJensen add this line to
@wfvining we need to change the last line of the
|
Test failure being fixed in #115 |
Should the function be able to handle nans in the two time series? For example, start by dropping the rows where there is a nan in either the measured or modeled time series? Then calculate N as:
|
Yes, I think default behavior should be to ignore nans. I'm on the fence if that behavior should be controlled by a kwarg. |
Description
There exists a number of metrics that can be used for comparing a modeled time-series against a measured (reference) time-series. I propose to add a convenience function that makes it easy to calculate the most commonly used performance/benchmarking metrics. An overview of benchmarking metrics for solar irradiance time-series was made by Gueymard (2014).
Checklist
The following items must be addressed before the code can be merged.
Please don't hesitate to ask for help if you are unsure of how to accomplish any of the items.
You are free to remove any checklist items that do not apply or add additional items that are
not on this list
docs/api.rst
in
docs/whatsnew
for all changes. Includes link to the GitHub Issue with
:issue:`num`
or this Pull Request with
:pull:`num`
. Includes contributor nameand/or GitHub username (link with
:ghuser:`user`
).