Port derivative-/quantile-based clipping detection from pvfleets QA #39

wfvining · 2020-04-28T21:24:56Z

Detect clipping in power data by identifying a clipping threshold (through the derivative of the 99.5% quantile of the daily power curve).

All data greater than or equal to the clipping threshold is marked as "clipped".

Night time data is periods of zero power preceding and following the quadratic_clipped test fixture.

Night time data is periods of zero power preceding and following the quadratic test fixture.

By keeping more data we can handle the case where there is the same number of positive power data points at each minute. With a strictly greater than comparison we end up excluding all the data in this case. Including data at the first quartile shouldn't have a substantial effect for large datasets with night time data, and it makes the filter more useful for a variety of data sets.

Only part of the data set should be marked True to indicate clipping.

Separates the test for whether a value is clipped into a named predicate which improves readability and satisfies the linter by shortening the if-line where the predicate is tested.

For the quadratic data set passing freq='10T' should give the same result as not passing a freq.

Completely goofed the call to isinstance when I wrote that the first time. Thanks to Coveralls for pointing out that there was no coverage on this line.

wfvining · 2020-04-28T21:41:29Z

Still needs:

test for clipping that stops for a brief period in the middle of the day before starting again.
test(s) for multiple days of data (not all of which have clipping)
add to API documentation

Gives a complete description of how the clipping threshold is computed

Track and update the longest period of suspected clipping throughout the loop, not just when clipping ends. Renames variable to improve clarity.

cwhanse

Probably going to be a few iterations on this, let me know if it's easier to talk it out.

pvanalytics/features/clipping.py

Not necessary to make a DataFrame with a minutes column, all the same operations can be achieved using the original series. Also adds two parameters to the function to replace the magic numbers (power_quantile and frequency_quantile).

These two parameters control the calculation of the clipping threshold.

Match the order of the threshold parameters with the order of the data parameters. Also rename minimum -> power_min to be more descriptive.

Also introduces the power_min parameter which controlls how far above the median power a level must be for it to be considered clipping.

Makes the calculation of the clipping power level a bit more readable, it is possible that it also improves the performance by using pandas functions instead of loops, but I doubt the difference is noticable given that the loop was only iterating over a single day of data (at most 1440 values). Readability is substantially better, though.

freq should be in units of minutes. Was accidentally multiplying instead of dividing.

Credit PVFleets QA project as the source of the algorithm; however, the code has been entirely implemented from scratch, so removing the copyright.

cwhanse

Much better with the descriptions in the docstrings.

pvanalytics/features/clipping.py

cwhanse · 2020-05-28T23:29:15Z

pvanalytics/features/clipping.py

+        each minute of the day, any minute with a count less than the
+        `frequency_quantile`-quantile of all counts is excluded from
+        the calculation of the clipping threshold.


Is this a reference to the day/ night detector?

I'm glad you brought this up. We do a sort of day/night filtering step in _daytime_powercurve. It might be better to pass in only daytime power, or pass a day/night mask. Passing a mask is probably the best solution to limit coupling between modules while still reducing code duplication.

@matt14muller, are there any caveats/pitfalls you can see in changing the original algorithm like this?

I'm actually starting to wonder if this step is even necessary. We only accept a period with low slope a clipping if it has sufficiently high power (see the power_min parameter). That should prevent using night time or early morning periods where we shouldn't be seeing clipping as the the clipping threshold.

I am not sure I follow your question. Can you elaborate?

There is a step where minutes of the day with a low number of positive values are eliminated. I think it is to remove the night/early morning/evening times. Do you think that is really necessary? Can we just include night-time/morning/evening when looking for clipping?

Can we just include night-time/morning/evening when looking for clipping?

We can with this algorithm, since there is a power threshold critieria. The SFA function only looks for flat spots, and has to be combined with a night/day detector to avoid labeling night as clipping.

I'll need to remove the frequency_quantile parameter (and associated code) and see if the tests still pass.

@cwhanse @matt14muller I removed the frequency_quantile param. Works fine for the tests, but those are just simple tests... The only problem I can think of with this is if there is enough erroneous high data in the night time that the 95% quantile is driven up. I'm still good with eliminating the parameter though since we have many other quality functions to help users eliminate that problem. If worst comes to worst, users can simply pass only daytime power values, leaving night time out of the equation all together.

Co-authored-by: Cliff Hansen <cwhanse@sandia.gov>

pvanalytics/features/clipping.py

This makes it more clear that we are looking at the point-to-point slope, as opposed a more complex numerical approximation of the derivative.

We already pass all parameters, no need for defaults. By removing the defaults I can get an error out of the interpreter when I forget to pass a parameter. Also exposes the `power_min` parameter in clipping.threshold. I'm having a hard time describing it well though, It specifies that power must be greater than `power_min` times the median of the `power_quantile` quantile of power at each minute of the day, excluding night, and early morning/late evening.

pvanalytics/features/clipping.py

Provides a significantly better description of how the threshold is calculated.

cwhanse · 2020-06-04T18:11:47Z

pvanalytics/features/clipping.py

+                        / normalized_power.index.to_series().diff()) * freq
+
+    clipped_times = _clipped(powercurve, power_derivative,
+                             powercurve.median() * power_min,


I'm okay with this

pvanalytics/features/clipping.py

Change up the order of sentences in the last paragraph (much better - thanks, Cliff). rewording in the third paragraph. Need to find a way to describe that only the values for the longest period with low-slope high-power are used to compute the threshold. (as is, it reads kind of like we take the average of the full curve). Co-authored-by: Cliff Hansen <cwhanse@sandia.gov>

Because this algorithm has a minimum power threshold (`power_min`) we don't need to filter out night time periods (where the slope will pretty much always be near zero). This could still get messed up if the `power_quantile` of night time power is very high, but we will assume that the data you pass in here has reasonable night time values (most of the time). If it doesn't and you are concerned, then you can pass data for daytime periods only.

Reflects the removal of the frequency_quantile parameter. Makes the last step in the calculation of the threshold more clear and clarifies what is returned.

wfvining added 9 commits April 28, 2020 13:24

First cut porting pvfleets qa clipping filter

bd110b0

Adds clipping.threshold test that includes night time data.

fc0e229

Night time data is periods of zero power preceding and following the quadratic_clipped test fixture.

Adds clipping.threshold test that includes night and no clipping.

0c70f21

Night time data is periods of zero power preceding and following the quadratic test fixture.

Power equal to the clipping threshold is considered clipped

f224dc8

Test that when clipping is indicated it is not for the full day.

6fe29bc

Only part of the data set should be marked True to indicate clipping.

Adds a '_clipped' predicate to improve readability

14cf896

Separates the test for whether a value is clipped into a named predicate which improves readability and satisfies the linter by shortening the if-line where the predicate is tested.

Add test that passes a frequency string to clipping.threshold

b12c9b2

For the quadratic data set passing freq='10T' should give the same result as not passing a freq.

Use isinstance correctly.

fc164ea

Completely goofed the call to isinstance when I wrote that the first time. Thanks to Coveralls for pointing out that there was no coverage on this line.

wfvining added 5 commits April 30, 2020 07:23

Add threshold-based clipping detection top API documentation.

92bfeea

Update clipping.threshold documentation

bd94eaa

Gives a complete description of how the clipping threshold is computed

Add test for clipping.threshold where clippin is interrupted.

b985674

Test a four day series where only the first day has clipping

a6a9337

Test a four day series with no clipping

b3a6a1c

wfvining marked this pull request as ready for review April 30, 2020 15:35

wfvining requested a review from cwhanse April 30, 2020 15:35

wfvining added 2 commits April 30, 2020 09:44

Include a copy of the pvfleets_qa_analysis license

b95d130

Rework clipping threshold loop

296d8d6

Track and update the longest period of suspected clipping throughout the loop, not just when clipping ends. Renames variable to improve clarity.

cwhanse reviewed May 19, 2020

View reviewed changes

wfvining added 10 commits May 21, 2020 08:43

Propagate parameters up to the public API

baacd50

These two parameters control the calculation of the clipping threshold.

Rename clip_derivative to derivative_max

dc31efa

re-order parameters to _clipped

d079493

Match the order of the threshold parameters with the order of the data parameters. Also rename minimum -> power_min to be more descriptive.

Better description of _clipped in comments.

c774652

Add a better description of how the clipping power is calculated

0124fdb

Also introduces the power_min parameter which controlls how far above the median power a level must be for it to be considered clipping.

Clean up indentation and copyright notice

f81051e

Apply suggested documentation edits from Code Review

b5a14d6

Clarify how the derivative is calculated in clipping.threshold

e4a0f05

wfvining requested a review from cwhanse May 21, 2020 20:25

wfvining added 2 commits May 21, 2020 15:26

Fix units for freq

13e49e7

freq should be in units of minutes. Was accidentally multiplying instead of dividing.

Remove copyright notices

956313d

Credit PVFleets QA project as the source of the algorithm; however, the code has been entirely implemented from scratch, so removing the copyright.

cwhanse reviewed May 28, 2020

View reviewed changes

Documentation updates from code review

c56caf8

Co-authored-by: Cliff Hansen <cwhanse@sandia.gov>

wfvining commented May 29, 2020

View reviewed changes

pvanalytics/features/clipping.py Outdated Show resolved Hide resolved

wfvining added 3 commits May 29, 2020 09:18

Use 'slope' instead of 'derivative'

232bb05

This makes it more clear that we are looking at the point-to-point slope, as opposed a more complex numerical approximation of the derivative.

Improve documentation comments on internal functions

32355d6

wfvining commented May 29, 2020

View reviewed changes

pvanalytics/features/clipping.py Outdated Show resolved Hide resolved

wfvining added 3 commits May 29, 2020 15:33

Rewrote docstring for clipping.threshold

a8b005c

Provides a significantly better description of how the threshold is calculated.

Fix typo

15e3395

Use pandas.Index.{min()|max()} for the min and max of the index

acc7e88

cwhanse approved these changes Jun 4, 2020

View reviewed changes

wfvining and others added 3 commits June 4, 2020 12:52

Re-fill docstring to shorten lines

b4ceab2

cwhanse approved these changes Jun 9, 2020

View reviewed changes

Update docstring

1dfb7fc

Reflects the removal of the frequency_quantile parameter. Makes the last step in the calculation of the threshold more clear and clarifies what is returned.

wfvining merged commit f856230 into master Jun 9, 2020

wfvining deleted the clipping-derivative branch June 9, 2020 18:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port derivative-/quantile-based clipping detection from pvfleets QA #39

Port derivative-/quantile-based clipping detection from pvfleets QA #39

wfvining commented Apr 28, 2020

wfvining commented Apr 28, 2020 •

edited

cwhanse left a comment

cwhanse left a comment

cwhanse May 28, 2020

wfvining May 29, 2020

wfvining Jun 2, 2020

matt14muller Jun 2, 2020

wfvining Jun 2, 2020

cwhanse Jun 4, 2020

wfvining Jun 4, 2020

wfvining Jun 4, 2020

cwhanse Jun 4, 2020

Navigation Menu

Port derivative-/quantile-based clipping detection from pvfleets QA #39

Port derivative-/quantile-based clipping detection from pvfleets QA #39

Conversation

wfvining commented Apr 28, 2020

wfvining commented Apr 28, 2020 • edited

cwhanse left a comment

Choose a reason for hiding this comment

cwhanse left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wfvining commented Apr 28, 2020 •

edited