Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port derivative-/quantile-based clipping detection from pvfleets QA #39

Merged
merged 39 commits into from Jun 9, 2020

Conversation

wfvining
Copy link
Collaborator

Detect clipping in power data by identifying a clipping threshold (through the derivative of the 99.5% quantile of the daily power curve).

All data greater than or equal to the clipping threshold is marked as "clipped".

Night time data is periods of zero power preceding and following the
quadratic_clipped test fixture.
Night time data is periods of zero power preceding and following the
quadratic test fixture.
By keeping more data we can handle the case where there is the same
number of positive power data points at each minute. With a strictly
greater than comparison we end up excluding all the data in this
case. Including data at the first quartile shouldn't have a
substantial effect for large datasets with night time data, and it
makes the filter more useful for a variety of data sets.
Only part of the data set should be marked True to indicate clipping.
Separates the test for whether a value is clipped into a named
predicate which improves readability and satisfies the linter by
shortening the if-line where the predicate is tested.
For the quadratic data set passing freq='10T' should give the same
result as not passing a freq.
Completely goofed the call to isinstance when I wrote that the first
time. Thanks to Coveralls for pointing out that there was no coverage
on this line.
@wfvining
Copy link
Collaborator Author

wfvining commented Apr 28, 2020

Still needs:

  • test for clipping that stops for a brief period in the middle of the day before starting again.
  • test(s) for multiple days of data (not all of which have clipping)
  • add to API documentation

@wfvining wfvining marked this pull request as ready for review April 30, 2020 15:35
@wfvining wfvining requested a review from cwhanse April 30, 2020 15:35
Track and update the longest period of suspected clipping throughout
the loop, not just when clipping ends.

Renames variable to improve clarity.
Copy link
Member

@cwhanse cwhanse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably going to be a few iterations on this, let me know if it's easier to talk it out.

pvanalytics/features/clipping.py Outdated Show resolved Hide resolved
pvanalytics/features/clipping.py Outdated Show resolved Hide resolved
pvanalytics/features/clipping.py Outdated Show resolved Hide resolved
pvanalytics/features/clipping.py Outdated Show resolved Hide resolved
pvanalytics/features/clipping.py Outdated Show resolved Hide resolved
pvanalytics/features/clipping.py Show resolved Hide resolved
pvanalytics/features/clipping.py Outdated Show resolved Hide resolved
pvanalytics/features/clipping.py Outdated Show resolved Hide resolved
pvanalytics/features/clipping.py Outdated Show resolved Hide resolved
pvanalytics/features/clipping.py Outdated Show resolved Hide resolved
Not necessary to make a DataFrame with a minutes column, all the same
operations can be achieved using the original series.

Also adds two parameters to the function to replace the magic
numbers (power_quantile and frequency_quantile).
These two parameters control the calculation of the clipping threshold.
Match the order of the threshold parameters with the order of the data
parameters. Also rename minimum -> power_min to be more descriptive.
Also introduces the power_min parameter which controlls how far above
the median power a level must be for it to be considered clipping.
Makes the calculation of the clipping power level a bit more readable,
it is possible that it also improves the performance by using pandas
functions instead of loops, but I doubt the difference is noticable
given that the loop was only iterating over a single day of data (at
most 1440 values). Readability is substantially better, though.
@wfvining wfvining requested a review from cwhanse May 21, 2020 20:25
freq should be in units of minutes. Was accidentally multiplying
instead of dividing.
Credit PVFleets QA project as the source of the algorithm; however,
the code has been entirely implemented from scratch, so removing the
copyright.
Copy link
Member

@cwhanse cwhanse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better with the descriptions in the docstrings.

pvanalytics/features/clipping.py Outdated Show resolved Hide resolved
pvanalytics/features/clipping.py Outdated Show resolved Hide resolved
pvanalytics/features/clipping.py Outdated Show resolved Hide resolved
pvanalytics/features/clipping.py Outdated Show resolved Hide resolved
pvanalytics/features/clipping.py Outdated Show resolved Hide resolved
pvanalytics/features/clipping.py Outdated Show resolved Hide resolved
pvanalytics/features/clipping.py Outdated Show resolved Hide resolved
pvanalytics/features/clipping.py Outdated Show resolved Hide resolved
pvanalytics/features/clipping.py Outdated Show resolved Hide resolved
Comment on lines 203 to 205
each minute of the day, any minute with a count less than the
`frequency_quantile`-quantile of all counts is excluded from
the calculation of the clipping threshold.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a reference to the day/ night detector?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad you brought this up. We do a sort of day/night filtering step in _daytime_powercurve. It might be better to pass in only daytime power, or pass a day/night mask. Passing a mask is probably the best solution to limit coupling between modules while still reducing code duplication.

@matt14muller, are there any caveats/pitfalls you can see in changing the original algorithm like this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually starting to wonder if this step is even necessary. We only accept a period with low slope a clipping if it has sufficiently high power (see the power_min parameter). That should prevent using night time or early morning periods where we shouldn't be seeing clipping as the the clipping threshold.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I follow your question. Can you elaborate?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a step where minutes of the day with a low number of positive values are eliminated. I think it is to remove the night/early morning/evening times. Do you think that is really necessary? Can we just include night-time/morning/evening when looking for clipping?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just include night-time/morning/evening when looking for clipping?

We can with this algorithm, since there is a power threshold critieria. The SFA function only looks for flat spots, and has to be combined with a night/day detector to avoid labeling night as clipping.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll need to remove the frequency_quantile parameter (and associated code) and see if the tests still pass.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cwhanse @matt14muller I removed the frequency_quantile param. Works fine for the tests, but those are just simple tests... The only problem I can think of with this is if there is enough erroneous high data in the night time that the 95% quantile is driven up. I'm still good with eliminating the parameter though since we have many other quality functions to help users eliminate that problem. If worst comes to worst, users can simply pass only daytime power values, leaving night time out of the equation all together.

Co-authored-by: Cliff Hansen <cwhanse@sandia.gov>
This makes it more clear that we are looking at the point-to-point
slope, as opposed a more complex numerical approximation of the
derivative.
We already pass all parameters, no need for defaults. By removing the
defaults I can get an error out of the interpreter when I forget to
pass a parameter.

Also exposes the `power_min` parameter in clipping.threshold. I'm
having a hard time describing it well though, It specifies that power
must be greater than `power_min` times the median of the
`power_quantile` quantile of power at each minute of the day,
excluding night, and early morning/late evening.
/ normalized_power.index.to_series().diff()) * freq

clipped_times = _clipped(powercurve, power_derivative,
powercurve.median() * power_min,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with this

pvanalytics/features/clipping.py Outdated Show resolved Hide resolved
pvanalytics/features/clipping.py Outdated Show resolved Hide resolved
wfvining and others added 3 commits June 4, 2020 12:52
Change up the order of sentences in the last paragraph (much better - thanks, Cliff). rewording in the third paragraph.

Need to find a way to describe that only the values for the longest period with low-slope high-power are used to compute the threshold. (as is, it reads kind of like we take the average of the full curve).

Co-authored-by: Cliff Hansen <cwhanse@sandia.gov>
Because this algorithm has a minimum power threshold (`power_min`) we
don't need to filter out night time periods (where the slope will
pretty much always be near zero). This could still get messed up if
the `power_quantile` of night time power is very high, but we will
assume that the data you pass in here has reasonable night time
values (most of the time). If it doesn't and you are concerned, then
you can pass data for daytime periods only.
Reflects the removal of the frequency_quantile parameter.

Makes the last step in the calculation of the threshold more clear and
clarifies what is returned.
@wfvining wfvining merged commit f856230 into master Jun 9, 2020
@wfvining wfvining deleted the clipping-derivative branch June 9, 2020 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants