Skip to content

Conversation

wfvining
Copy link
Collaborator

Implementation of the stale value detection method from the pvfleets_qa_analysis code.

This is a very similar function to stale_values_diff, but it is based on the difference between consecutive data points after rounding rather than putting a lower bound on the differences. It is also more aggressive in what it considers stale, marking the full sequence of repeated values, rather than just a suffix of the sequence as in stale_values_diff.

@wfvining wfvining requested a review from cwhanse April 23, 2020 19:11
@wholmgren
Copy link
Member

wholmgren commented Apr 23, 2020

Do you have a real world data set that benefits from this? In practice we find the existing stale filter to be too aggressive to be useful. Context: SolarArbiter/solarforecastarbiter-core#124

@wfvining
Copy link
Collaborator Author

I'll see if Matt Muller (who devised this filter) has some specific examples.

@wfvining
Copy link
Collaborator Author

Matt Muller indicated to me that he hasn't see much of a problem with the sort of false-positives that are noted in the SFA issue, at least not for the data sets he has used. Obviously this approach has the same issues with rounded (or otherwise altered) data (in particular Matt called out rounded temperature data as a problem case). I don't see much of a way around that other than making sure the caller uses a window that is large enough to span more than the expected time for the variable to change.

"More aggressive" was a poor choice of words on my part. All I really meant by it was that stale_values_rounding() marks every value in the sequence of values as stale while stale_values_diff() leaves the first window-1 values unmarked. This might actually be a useful feature to add as an option for both functions (perhaps a kwarg include_prefix=True).

@wfvining
Copy link
Collaborator Author

wfvining commented May 28, 2020

wfvining and others added 7 commits June 9, 2020 07:50
The original implementation of this was based on loops, this
implementation uses pandas functions for all manipulations which
should improve performance and maintainability.
We now have two stale-value detection methods. Also adds a note about
caveats for the use of these functions.
Also removes unused pandas import.
Using .count() doesn't work for python 3.5, len() should be supported
on pretty much every version.
Makes the description somewhat more clear.

Co-Authored-By: Cliff Hansen <cwhanse@sandia.gov>
Co-Authored-By: Cliff Hansen <cwhanse@sandia.gov>
@wfvining wfvining force-pushed the stale-values-rounding branch from 0330226 to 002d450 Compare June 9, 2020 13:52
wfvining added 4 commits June 9, 2020 08:20
Make this match the default for the other gaps functions
Remove note about the differences with the stale_values_diff funciton
as it is no longer applicable.
@wfvining wfvining requested a review from cwhanse June 9, 2020 14:39
@wfvining wfvining requested a review from cwhanse June 9, 2020 16:11
window comes first in stale_values_diff
@wfvining wfvining merged commit 182bd07 into master Jun 9, 2020
@wfvining wfvining deleted the stale-values-rounding branch June 9, 2020 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants