Refactor detect_clearsky #1074

cwhanse · 2020-10-05T16:43:18Z

I am familiar with the contributing guidelines
Tests added
Adds description and name entries in the appropriate "what's new" file in docs/sphinx/source/whatsnew for all changes. Includes link to the GitHub Issue with :issue:`num` or this Pull Request with :pull:`num`. Includes contributor name and/or GitHub username (link with :ghuser:`user`).
New code is fully documented. Includes numpydoc compliant docstrings, examples, and comments where necessary.
Pull request is nearly complete and ready for detailed review.
Maintainer: Appropriate GitHub Labels and Milestone are assigned to the Pull Request and linked Issue.

Refactor of clearsky.detect_clearsky in preparation for adding BrightSun detect clearsky algorithm. Major changes:

uses pd.Series.rolling instead of Hankel matrix to compute criteria over sliding windows. This change accomodates centered windows, i.e., aligning the criteria for a window to a midpoint rather than to the left endpoint (default behavior in this code).
calculation of criteria delegated to helper functions.
new kwarg align (default 'left') otherwise function signature is unchanged.

cwhanse · 2020-10-05T21:44:54Z

if clearsky.detect_clearsky is of interest please take a look, in particular, at _calc_stats and the other helper functions.

I need to add some tests for the non-trivial helper functions, the align options, and I'm sure there's room for performance improvement. The align feature is added anticipating the BrightSun method which uses centered windows. I think that the BrightSun method could be implemented using numpy arrays and the Hankel matrix, but using pandas.Series.rolling seems more transparent to me.

We could drop align='right' as an option (the current code's behavior is align='left' but pandas.rolling only offers center=False (default, means right-aligned) and center=True(centered). Maintaining the current behavior withalign='left'by default wasn't too hard. I'm not sure how much value thealignoptions offer, but there they are.align='center'` is more in line (sorry) with what Matt and I had in mind when we wrote the algorithm, but we didn't code it that way in Matlab.

wholmgren · 2020-10-06T17:21:51Z

Is there a compelling use case for align='left'? It sounds to me like we should consider it a bug in the initial implementation, and always use center=True (without exposing a kwarg).

I will carefully review when I have the time. I want to understand why this implementation works but the previous attempts to use pandas did not work.

cwhanse · 2020-10-06T20:09:51Z

align='left' is here to preserve the old behavior, personally I agree with align='center' as the better choice for a default.

I also think we could drop support for array inputs and lose the times argument.

wholmgren · 2020-10-06T20:14:50Z

To put it differently: why should we preserve the old behavior? It sounds like a bug. People that want the old behavior can use an old, buggier version of pvlib.

I also think we could drop support for array inputs and lose the times argument.

agree

cwhanse · 2020-10-06T20:49:51Z

I guess I'm unwilling to call the left alignment a bug - it's not wrong, rather, it's an undesirable behavior. I'm OK changing the default to 'center'. And open to not offering 'left', 'right' as options.

wholmgren · 2020-10-08T18:42:26Z

Other than consistency with previous versions, when might a user prefer 'left' or 'right'? Without a compelling reason, I'd hard-code center and then let users ask/contribute if they come up with a good reason for a kwarg.

cwhanse · 2020-10-08T18:47:06Z

Other than consistency with previous versions, when might a user prefer 'left' or 'right'?

Consistency is the only reason I came up with. The option to align 'left', 'center', 'right' falls out of the combination of numpy and pandas defaults.

I'd hard-code center

I'm OK with this. Unless you object I'll keep the 'align' plumbing in the private functions.

…etect_clearsky

cwhanse · 2020-12-24T22:03:18Z

@wholmgren this looks better on the profiler

wholmgren · 2020-12-24T22:39:51Z

asv compare pvlib/master cwhanse/detect_clearsky

All benchmarks:

       before           after         ratio
     [926d2f95]       [983bd6f8]
                      <detect_clearsky>
+      14.7±0.4ms       55.6±0.8ms     3.77  detect_clearsky.DetectClear.time_detect_clearsky [holmgren-mbp.local/conda-py3.6-ephem3.7.6.0-numba0.36.1-numpy1.12.0-pandas0.22.0-pytables3.6.1-scipy1.2.0]
+      15.1±0.1ms         54.2±2ms     3.58  detect_clearsky.DetectClear.time_detect_clearsky [holmgren-mbp.local/conda-py3.8-ephem-numba-numpy-pandas-pytables-scipy]

I'm willing to accept this performance decrease if we think it's a net win for readability and ease of reuse in future algorithms.

This reverts commit 5b4e441.

cwhanse · 2020-12-28T17:58:04Z

@wholmgren I removed the use of pandas.roller, it is a performance hit. I don't think this is back to the speed of the original because it uses Series as the internal type rather than array. It could go back to array but that would take some work to maintain the centered interval alignments.

wholmgren

asv compare pvlib/master cwhanse/detect_clearsky

All benchmarks:

       before           after         ratio
     [926d2f95]       [e53572d9]
                      <detect_clearsky>
+      14.7±0.4ms       54.0±0.5ms     3.66  detect_clearsky.DetectClear.time_detect_clearsky [holmgren-mbp.local/conda-py3.6-ephem3.7.6.0-numba0.36.1-numpy1.12.0-pandas0.22.0-pytables3.6.1-scipy1.2.0]
+      15.1±0.1ms         52.4±1ms     3.47  detect_clearsky.DetectClear.time_detect_clearsky [holmgren-mbp.local/conda-py3.8-ephem-numba-numpy-pandas-pytables-scipy]

pvlib/clearsky.py

wholmgren · 2020-12-28T20:43:49Z

@cwhanse I took the liberty of parameterizing the test in the latest commit, so you'll want to pull before you commit again. Results below with ndays in parenthesis.

asv compare pvlib/master cwhanse/detect_clearsky

All benchmarks:

       before           after         ratio
     [926d2f95]       [e53572d9]
                      <detect_clearsky>
+     2.08±0.06ms       36.2±0.8ms    17.45  detect_clearsky.DetectClear.time_detect_clearsky(1) [holmgren-mbp.local/conda-py3.6-ephem3.7.6.0-numba0.36.1-numpy1.12.0-pandas0.22.0-pytables3.6.1-scipy1.2.0]
+     2.63±0.09ms       32.2±0.6ms    12.27  detect_clearsky.DetectClear.time_detect_clearsky(1) [holmgren-mbp.local/conda-py3.8-ephem-numba-numpy-pandas-pytables-scipy]
+      14.2±0.4ms       55.1±0.6ms     3.87  detect_clearsky.DetectClear.time_detect_clearsky(10) [holmgren-mbp.local/conda-py3.6-ephem3.7.6.0-numba0.36.1-numpy1.12.0-pandas0.22.0-pytables3.6.1-scipy1.2.0]
+      15.2±0.2ms       52.0±0.6ms     3.41  detect_clearsky.DetectClear.time_detect_clearsky(10) [holmgren-mbp.local/conda-py3.8-ephem-numba-numpy-pandas-pytables-scipy]
+         165±1ms          231±2ms     1.40  detect_clearsky.DetectClear.time_detect_clearsky(100) [holmgren-mbp.local/conda-py3.6-ephem3.7.6.0-numba0.36.1-numpy1.12.0-pandas0.22.0-pytables3.6.1-scipy1.2.0]
+         164±2ms          222±1ms     1.35  detect_clearsky.DetectClear.time_detect_clearsky(100) [holmgren-mbp.local/conda-py3.8-ephem-numba-numpy-pandas-pytables-scipy]

The performance penalty is not so bad for longer data sets.

Co-authored-by: Will Holmgren <william.holmgren@gmail.com>

wholmgren

Ready to merge?

cwhanse · 2021-01-04T21:16:31Z

Ready to merge?

Yes for me. @kanderso-nrel ?

kandersolar

I didn't check calculation details this time but LGTM. One small question below.

kandersolar · 2021-01-04T21:40:13Z

pvlib/tests/test_clearsky.py

@@ -569,19 +579,19 @@ def test_detect_clearsky_iterations(detect_clearsky_data):
    alpha = 1.0448
    with pytest.warns(RuntimeWarning):
        clear_samples = clearsky.detect_clearsky(
-            expected['GHI'], cs['ghi']*alpha, cs.index, 10, max_iterations=1)
-    assert (clear_samples[:'2012-04-01 10:41:00'] == True).all()
-    assert (clear_samples['2012-04-01 10:42:00':] == False).all()


De Morgan wonders if this should use .any() instead of .all()?

Oops, this comment is about the line assert not clear_samples['2012-04-01 10:42:00':].all() # expected False

You are correct, should be .any()

cwhanse added 7 commits October 5, 2020 10:26

refactor detect_clearsky

5c94c64

formatting

af9a314

make array input work

ea5021f

lint, make clearsky a Series

2a5cdeb

make work for py36min

6f39e6b

update arguments in call

153c7fd

fit another diff

8e719ae

cwhanse added 4 commits October 6, 2020 13:10

fix ddof, add test for _calc_stats with window=3

d680f77

actually add test for _calc_stats with window=3

c5a15b9

add test__calc_c5, simplify _calc_c5

c16eed5

really add test__calc_c5 this time

a14eb2c

cwhanse added 2 commits October 8, 2020 11:51

fix _calc_c5, use shift

84cb165

fix test__calc_stats

e4cf350

cwhanse added 9 commits October 8, 2020 12:53

remove align from public function kwargs

f1ae2d4

fiddly window and shifting

1e390c9

Merge branch 'master' of https://github.com/pvlib/pvlib-python into d…

0eff949

…etect_clearsky

fix _calc_c5

d39218a

can now label points at the end of the time series

86c4e79

fix formatting

ace81cf

only labels end of data if window is short enough

ffc7fac

improve coverage

25c9c8b

Merge branch 'master' of https://github.com/pvlib/pvlib-python into d…

62a3775

…etect_clearsky

cwhanse added the enhancement label Oct 16, 2020

cwhanse added 6 commits December 24, 2020 10:12

add tests for new windowed functions

7c7b4b2

uncomplicate tests, remove align options

20130d8

move common data to fixture

9d64abe

index correctly

0eda545

correct cut/paste/delete errors

4eb70e1

proofread it next time

983bd6f

cwhanse added 5 commits December 28, 2020 08:57

replace roller with numpy

bdee2ce

try this dimensioning

26e36a2

try numpy arrays instead

05d8bc1

remove unneeded conversion to Series

5b4e441

Revert "remove unneeded conversion to Series"

e53572d

This reverts commit 5b4e441.

wholmgren reviewed Dec 28, 2020

View reviewed changes

pvlib/clearsky.py Outdated Show resolved Hide resolved

pvlib/clearsky.py Outdated Show resolved Hide resolved

pvlib/clearsky.py Outdated Show resolved Hide resolved

param ndays in benchmark

ea82348

cwhanse and others added 3 commits December 29, 2020 09:50

Line formatting

970227d

Co-authored-by: Will Holmgren <william.holmgren@gmail.com>

More line formatting

79a329c

Co-authored-by: Will Holmgren <william.holmgren@gmail.com>

Remove unneeded declaration

7af5f55

Co-authored-by: Will Holmgren <william.holmgren@gmail.com>

wholmgren approved these changes Jan 4, 2021

View reviewed changes

kandersolar approved these changes Jan 4, 2021

View reviewed changes

fix test error, .all -> .any

be1829b

kandersolar merged commit 5a21ca4 into pvlib:master Jan 5, 2021

wholmgren mentioned this pull request Jan 11, 2021

compat with centered detect clear sky labels in pvlib>0.8.0 SolarArbiter/solarforecastarbiter-core#644

Open

cwhanse mentioned this pull request Mar 24, 2021

Detect shadows from fixed objects in GHI data pvlib/pvanalytics#101

Merged

7 tasks

wfvining mentioned this pull request Apr 6, 2021

Tests for clearsky.reno fail with pvlib 0.8.1 pvlib/pvanalytics#105

Closed

cwhanse mentioned this pull request May 7, 2021

irradiance.detect_clearsky: possibly redundant input 'times' #954

Open

cwhanse mentioned this pull request Mar 31, 2023

Support for unequal time intervals #1678

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor detect_clearsky #1074

Refactor detect_clearsky #1074

cwhanse commented Oct 5, 2020 •

edited

Loading

cwhanse commented Oct 5, 2020 •

edited

Loading

wholmgren commented Oct 6, 2020

cwhanse commented Oct 6, 2020

wholmgren commented Oct 6, 2020

cwhanse commented Oct 6, 2020

wholmgren commented Oct 8, 2020

cwhanse commented Oct 8, 2020 •

edited

Loading

cwhanse commented Dec 24, 2020

wholmgren commented Dec 24, 2020 •

edited

Loading

cwhanse commented Dec 28, 2020

wholmgren left a comment

wholmgren commented Dec 28, 2020 •

edited

Loading

wholmgren left a comment

cwhanse commented Jan 4, 2021

kandersolar left a comment

kandersolar Jan 4, 2021

kandersolar Jan 4, 2021

cwhanse Jan 4, 2021

Refactor detect_clearsky #1074

Refactor detect_clearsky #1074

Conversation

cwhanse commented Oct 5, 2020 • edited Loading

cwhanse commented Oct 5, 2020 • edited Loading

wholmgren commented Oct 6, 2020

cwhanse commented Oct 6, 2020

wholmgren commented Oct 6, 2020

cwhanse commented Oct 6, 2020

wholmgren commented Oct 8, 2020

cwhanse commented Oct 8, 2020 • edited Loading

cwhanse commented Dec 24, 2020

wholmgren commented Dec 24, 2020 • edited Loading

cwhanse commented Dec 28, 2020

wholmgren left a comment

Choose a reason for hiding this comment

wholmgren commented Dec 28, 2020 • edited Loading

wholmgren left a comment

Choose a reason for hiding this comment

cwhanse commented Jan 4, 2021

kandersolar left a comment

Choose a reason for hiding this comment

kandersolar Jan 4, 2021

Choose a reason for hiding this comment

kandersolar Jan 4, 2021

Choose a reason for hiding this comment

cwhanse Jan 4, 2021

Choose a reason for hiding this comment

cwhanse commented Oct 5, 2020 •

edited

Loading

cwhanse commented Oct 5, 2020 •

edited

Loading

cwhanse commented Oct 8, 2020 •

edited

Loading

wholmgren commented Dec 24, 2020 •

edited

Loading

wholmgren commented Dec 28, 2020 •

edited

Loading