Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port derivative-/quantile-based clipping detection from pvfleets QA #39

Merged
merged 39 commits into from Jun 9, 2020
Merged
Show file tree
Hide file tree
Changes from 35 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
bd110b0
First cut porting pvfleets qa clipping filter
wfvining Apr 28, 2020
fc0e229
Adds clipping.threshold test that includes night time data.
wfvining Apr 28, 2020
0c70f21
Adds clipping.threshold test that includes night and no clipping.
wfvining Apr 28, 2020
f224dc8
Power equal to the clipping threshold is considered clipped
wfvining Apr 28, 2020
6a8e0b1
Relax the criteria for excluding night time data.
wfvining Apr 28, 2020
6fe29bc
Test that when clipping is indicated it is not for the full day.
wfvining Apr 28, 2020
14cf896
Adds a '_clipped' predicate to improve readability
wfvining Apr 28, 2020
b12c9b2
Add test that passes a frequency string to clipping.threshold
wfvining Apr 28, 2020
fc164ea
Use isinstance correctly.
wfvining Apr 28, 2020
92bfeea
Add threshold-based clipping detection top API documentation.
wfvining Apr 30, 2020
bd94eaa
Update clipping.threshold documentation
wfvining Apr 30, 2020
b985674
Add test for clipping.threshold where clippin is interrupted.
wfvining Apr 30, 2020
a6a9337
Test a four day series where only the first day has clipping
wfvining Apr 30, 2020
b3a6a1c
Test a four day series with no clipping
wfvining Apr 30, 2020
b95d130
Include a copy of the pvfleets_qa_analysis license
wfvining Apr 30, 2020
296d8d6
Rework clipping threshold loop
wfvining Apr 30, 2020
51e838e
Refactor _daytime_powercurve to avoid making a DataFrame
wfvining May 21, 2020
baacd50
Propagate parameters up to the public API
wfvining May 21, 2020
dc31efa
Rename clip_derivative to derivative_max
wfvining May 21, 2020
d079493
re-order parameters to _clipped
wfvining May 21, 2020
c774652
Better description of _clipped in comments.
wfvining May 21, 2020
0124fdb
Add a better description of how the clipping power is calculated
wfvining May 21, 2020
edbe087
Refactor _clipping_power to avoid looping over the data
wfvining May 21, 2020
f81051e
Clean up indentation and copyright notice
wfvining May 21, 2020
b5a14d6
Apply suggested documentation edits from Code Review
wfvining May 21, 2020
e4a0f05
Clarify how the derivative is calculated in clipping.threshold
wfvining May 21, 2020
13e49e7
Fix units for freq
wfvining May 21, 2020
956313d
Remove copyright notices
wfvining May 21, 2020
c56caf8
Documentation updates from code review
wfvining May 29, 2020
232bb05
Use 'slope' instead of 'derivative'
wfvining May 29, 2020
ba688a5
Remove defaults from internal functions.
wfvining May 29, 2020
32355d6
Improve documentation comments on internal functions
wfvining May 29, 2020
a8b005c
Rewrote docstring for clipping.threshold
wfvining May 29, 2020
15e3395
Fix typo
wfvining May 29, 2020
acc7e88
Use pandas.Index.{min()|max()} for the min and max of the index
wfvining Jun 2, 2020
5594611
Documentation edits from code review
wfvining Jun 4, 2020
b4ceab2
Re-fill docstring to shorten lines
wfvining Jun 4, 2020
e1e4f9e
Remove frequency_quantile parameter
wfvining Jun 4, 2020
1dfb7fc
Update docstring
wfvining Jun 9, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
29 changes: 29 additions & 0 deletions LICENSES/PVFLEETS_QA_LICENSE
@@ -0,0 +1,29 @@
Copyright (c) 2020 Alliance for Sustainable Energy, LLC.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:

1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the
distribution.

3. Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
1 change: 1 addition & 0 deletions docs/api.rst
Expand Up @@ -102,6 +102,7 @@ Functions for identifying inverter clipping
:toctree: generated/

features.clipping.levels
features.clipping.threshold

Clearsky
--------
Expand Down
151 changes: 151 additions & 0 deletions pvanalytics/features/clipping.py
Expand Up @@ -88,3 +88,154 @@ def levels(ac_power, window=4, fraction_in_window=0.75,
flags = flags | _label_clipping(temp, window=window,
frac=fraction_in_window)
return flags


def _daytime_powercurve(ac_power, power_quantile, frequency_quantile):
# return the `power_quantile` quantile of power at each minute of
# the day after removing night time, early morning, and late
# evening data.
minutes = ac_power.index.hour * 60 + ac_power.index.minute
positive_power = ac_power >= 0
powerfreq = positive_power.groupby(minutes).sum()
min_daytime_freq = powerfreq.quantile(frequency_quantile)
daytimes = powerfreq[powerfreq >= min_daytime_freq].index
daytime_power = ac_power[minutes.isin(daytimes)]

return daytime_power.groupby(
daytime_power.index.hour * 60 + daytime_power.index.minute
).quantile(power_quantile)


def _clipped(power, slope, power_min, slope_max):
# return a mask that is true where `power` is greater than
# `power_min` and the absolute value of `slope` is less
# than or equal to `slope_max`
return (np.abs(slope) <= slope_max) & (power > power_min)


def _clipping_power(ac_power, slope_max, power_min,
power_quantile, frequency_quantile,
freq=None):
# Calculate a power threshold, above which the power is being
# clipped.
#
# - The daytime power curve is calculated using
# _daytime_powercurve(). This function groups `ac_power` by
# minute of the day and returns the `power_quantile`-quantile of
# power for each minute, excluding night time, early morning,
# and late afternoon/evening (the amount of data that is
# excluded is controlled by `frequency_quantile`).
wfvining marked this conversation as resolved.
Show resolved Hide resolved
#
wfvining marked this conversation as resolved.
Show resolved Hide resolved
# - Each timestamp in the daytime power curve that satisfies the
# clipping criteria[*] is flagged.
#
# - The clipping threshold is calculated as the mean power during the
# longest flagged period in the daytime power curve.
#
# [*] clipping criteria: a timestamp satisfies the clipping
# criteria if the absolute value of the slope of the daytime power curve
# is less than `slope_max` and the value of the daytime
# power curve is greater than `power_min` times the median of the
# daytime power curve.
#
# Based on the PVFleets QA Analysis project
if not freq:
freq = pd.Timedelta(pd.infer_freq(ac_power.index)).seconds / 60
elif isinstance(freq, str):
freq = pd.Timedelta(freq).seconds / 60

# Use the slope of the 99.5% quantile of daytime power at
# each minute to identify clipping.
powercurve = _daytime_powercurve(
ac_power, power_quantile, frequency_quantile
)
normalized_power = powercurve / powercurve.max()
power_slope = (normalized_power.diff()
/ normalized_power.index.to_series().diff()) * freq

clipped_times = _clipped(powercurve, power_slope,
powercurve.median() * power_min,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the intent was to apply the median at each minute (of the day), is powercurve.median() is the median of the powercurve time series?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the median for the whole power curve. The intent is to not mark periods of low power as clipping. This codifies the assumption that clipping will always be in periods of relatively high power.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with this

slope_max)
clipping_cumsum = (~clipped_times).cumsum()
# get the value of the cumulative sum of the longest True span
longest_clipped = clipping_cumsum.value_counts().idxmax()
# select the longest span that satisfies the clipping criteria
longest = powercurve[clipping_cumsum == longest_clipped]

if longest.index.max() - longest.index.min() >= 60:
# if the period of clipping is at least 60 minutes then we
# have enough evidence to determine the clipping threshold.
return longest.mean()
return None


def threshold(ac_power, slope_max=0.0035, power_min=0.75,
power_quantile=0.995, frequency_quantile=0.25,
freq=None):
"""Detect clipping based on a maximum power threshold.

This is a two-step process. First a clipping threshold is
identified, then any values in `ac_power` greater than or equal
to that threshold are flagged.

The clipping threshold is determined by computing a 'daily power
curve' which is the `power_quantile` quantile of all values in
`ac_power` at each minute of the day (excluding night, early morning,
and late evening). This gives a rough estimate of the maximum power
produced at each minute of the day.

The minutes of the day where the slope of the daily power curve,
normalized by its maximum, is less than `slope_max` are identified. If
wfvining marked this conversation as resolved.
Show resolved Hide resolved
there is a continuous period of time spanning at least one hour where
the slope is less than `slope_max` and the value of the normalized
daily power curve is greater than `power_min` times the median of the
normalized daily power curve then the data has clipping in it. The
average for those minutes in the (not normalized) daily power curve is
used as the clipping threshold. If no sufficiently long period with
both a low slope and high power exists then there is no clipping in
the data.
wfvining marked this conversation as resolved.
Show resolved Hide resolved

Parameters
----------
ac_power : Series
DatetimeIndexed series of AC power data.
slope_max : float, default 0.0035
Maximum absolute value of slope of AC power quantile for
clipping to be indicated. The default value has been derived
empirically to prevent false positives for tracking PV
systems.
power_min: float, default 0.75
The power during periods with slope less than `slope_max` must
be greater than `power_min` times the median normalized
daytime power.
power_quantile : float, default 0.995
Quantile used to calculate the daily power curve.
frequency_quantile : float, default 0.25
After taking the count of positive values in `ac_power` at
each minute of the day, any minute with a count less than the
`frequency_quantile`-quantile of all counts is excluded. This
eliminates early morning/evening times.
freq : string, default None
A pandas string offset giving the frequency of data in
`ac_power`. If None then the frequency is inferred from the
series index.

Returns
-------
Series
True when clipping is indicated.

Notes
-----
This function is based on the pvfleets_qa_analysis project.

"""
threshold = _clipping_power(
ac_power,
slope_max,
power_min,
power_quantile,
frequency_quantile,
freq=freq
)
return ac_power >= threshold
164 changes: 164 additions & 0 deletions pvanalytics/tests/features/test_clipping.py
Expand Up @@ -2,6 +2,7 @@
import pytest
from pandas.util.testing import assert_series_equal
import numpy as np
import pandas as pd
from pvanalytics.features import clipping


Expand Down Expand Up @@ -68,3 +69,166 @@ def test_levels_two_periods(quadratic, quadratic_clipped):
assert clipped[35:40].all()
assert not clipped[0:10].any()
assert not clipped[50:].any()


def test_threshold_no_clipping(quadratic):
"""In a data set with a single quadratic there is no clipping."""
quadratic.index = pd.date_range(
start='01/01/2020 07:30',
freq='10T',
periods=61
)
assert not clipping.threshold(quadratic).any()


def test_threshold_no_clipping_with_night(quadratic):
"""In a data set with a single quadratic surrounded by zeros there is
no clipping."""
quadratic.index = pd.date_range(
start='01/01/2020 07:30',
freq='10T',
periods=61
)
full_day = quadratic.reindex(
pd.date_range(
start='01/01/2020 00:00',
end='01/01/2020 23:50',
freq='10T')
)
full_day.fillna(0)
assert not clipping.threshold(quadratic).any()


def test_threshold_clipping(quadratic_clipped):
"""In a data set with a single clipped quadratic clipping is
indicated."""
quadratic_clipped.index = pd.date_range(
start='01/01/2020 07:30',
freq='10T',
periods=61
)
assert not clipping.threshold(quadratic_clipped).all()
assert clipping.threshold(quadratic_clipped).any()


def test_threshold_clipping_with_night(quadratic_clipped):
"""Clipping is identified in the daytime with periods of zero power
before and after simulating night time conditions."""
quadratic_clipped.index = pd.date_range(
start='01/01/2020 07:30',
freq='10T',
periods=61
)
full_day = quadratic_clipped.reindex(
pd.date_range(
start='01/01/2020 00:00',
end='01/01/2020 23:50',
freq='10T')
)
full_day.fillna(0)
assert not clipping.threshold(full_day).all()
assert clipping.threshold(full_day)[quadratic_clipped.index].any()


def test_threshold_clipping_with_freq(quadratic_clipped):
"""Passing the frequency gives same result as infered frequency."""
quadratic_clipped.index = pd.date_range(
start='01/01/2020 07:30',
freq='10T',
periods=61
)
assert_series_equal(
clipping.threshold(quadratic_clipped),
clipping.threshold(quadratic_clipped, freq='10T')
)


def test_threshold_clipping_with_interruption(quadratic_clipped):
"""Test threshold clipping with period of no clipping mid-day."""
quadratic_clipped.loc[28:31] = [750, 725, 700, 650]
quadratic_clipped.index = pd.date_range(
start='01/01/2020 07:30',
freq='10T',
periods=61
)
clipped = clipping.threshold(quadratic_clipped)

assert not clipped.iloc[0:10].any()
assert not clipped.iloc[28:31].any()
assert not clipped.iloc[50:].any()
assert clipped.iloc[17:27].all()
assert clipped.iloc[32:40].all()


def test_threshold_clipping_four_days(quadratic, quadratic_clipped):
"""Clipping is identified in the first of four days."""
quadratic.index = pd.date_range(
start='01/01/2020 07:30',
freq='10T',
periods=61
)
quadratic_clipped.index = pd.date_range(
start='01/01/2020 07:30',
freq='10T',
periods=61
)
full_day_clipped = quadratic_clipped.reindex(
pd.date_range(
start='01/01/2020 00:00',
end='01/01/2020 23:50',
freq='10T')
)
full_day = quadratic.reindex(
pd.date_range(
start='01/01/2020 00:00',
end='01/01/2020 23:50',
freq='10T')
)
full_day_clipped.fillna(0)
full_day.fillna(0)

# scale the rest of the days below the clipping threshold
full_day *= 0.75

power = full_day_clipped
power.append(full_day)
power.append(full_day)
power.append(full_day)

power.index = pd.date_range(
start='01/01/2020 00:00', freq='10T', periods=len(power)
)

clipped = clipping.threshold(power)

assert clipped['01/01/2020'].any()
assert not clipped['01/02/2020':].any()


def test_threshold_no_clipping_four_days(quadratic):
"""Four days with no clipping."""
quadratic.index = pd.date_range(
start='01/01/2020 07:30',
freq='10T',
periods=61
)
full_day = quadratic.reindex(
pd.date_range(
start='01/01/2020 00:00',
end='01/01/2020 23:50',
freq='10T')
)
full_day.fillna(0)

power = full_day
power.append(full_day * 1.3)
power.append(full_day * 1.2)
power.append(full_day * 1.1)

power.index = pd.date_range(
start='01/01/2020 00:00', freq='10T', periods=len(power)
)

clipped = clipping.threshold(power)

assert not clipped.any()