Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor detect_clearsky #1074

Merged
merged 68 commits into from
Jan 5, 2021
Merged

Refactor detect_clearsky #1074

merged 68 commits into from
Jan 5, 2021

Conversation

cwhanse
Copy link
Member

@cwhanse cwhanse commented Oct 5, 2020

  • I am familiar with the contributing guidelines
  • Tests added
  • Adds description and name entries in the appropriate "what's new" file in docs/sphinx/source/whatsnew for all changes. Includes link to the GitHub Issue with :issue:`num` or this Pull Request with :pull:`num`. Includes contributor name and/or GitHub username (link with :ghuser:`user`).
  • New code is fully documented. Includes numpydoc compliant docstrings, examples, and comments where necessary.
  • Pull request is nearly complete and ready for detailed review.
  • Maintainer: Appropriate GitHub Labels and Milestone are assigned to the Pull Request and linked Issue.

Refactor of clearsky.detect_clearsky in preparation for adding BrightSun detect clearsky algorithm. Major changes:

  • uses pd.Series.rolling instead of Hankel matrix to compute criteria over sliding windows. This change accomodates centered windows, i.e., aligning the criteria for a window to a midpoint rather than to the left endpoint (default behavior in this code).
  • calculation of criteria delegated to helper functions.
  • new kwarg align (default 'left') otherwise function signature is unchanged.

@cwhanse
Copy link
Member Author

cwhanse commented Oct 5, 2020

if clearsky.detect_clearsky is of interest please take a look, in particular, at _calc_stats and the other helper functions.

I need to add some tests for the non-trivial helper functions, the align options, and I'm sure there's room for performance improvement. The align feature is added anticipating the BrightSun method which uses centered windows. I think that the BrightSun method could be implemented using numpy arrays and the Hankel matrix, but using pandas.Series.rolling seems more transparent to me.

We could drop align='right' as an option (the current code's behavior is align='left' but pandas.rolling only offers center=False (default, means right-aligned) and center=True(centered). Maintaining the current behavior withalign='left'by default wasn't too hard. I'm not sure how much value thealignoptions offer, but there they are.align='center'` is more in line (sorry) with what Matt and I had in mind when we wrote the algorithm, but we didn't code it that way in Matlab.

@wholmgren
Copy link
Member

Is there a compelling use case for align='left'? It sounds to me like we should consider it a bug in the initial implementation, and always use center=True (without exposing a kwarg).

I will carefully review when I have the time. I want to understand why this implementation works but the previous attempts to use pandas did not work.

@cwhanse
Copy link
Member Author

cwhanse commented Oct 6, 2020

align='left' is here to preserve the old behavior, personally I agree with align='center' as the better choice for a default.

I also think we could drop support for array inputs and lose the times argument.

@wholmgren
Copy link
Member

To put it differently: why should we preserve the old behavior? It sounds like a bug. People that want the old behavior can use an old, buggier version of pvlib.

I also think we could drop support for array inputs and lose the times argument.

agree

@cwhanse
Copy link
Member Author

cwhanse commented Oct 6, 2020

I guess I'm unwilling to call the left alignment a bug - it's not wrong, rather, it's an undesirable behavior. I'm OK changing the default to 'center'. And open to not offering 'left', 'right' as options.

@wholmgren
Copy link
Member

Other than consistency with previous versions, when might a user prefer 'left' or 'right'? Without a compelling reason, I'd hard-code center and then let users ask/contribute if they come up with a good reason for a kwarg.

@cwhanse
Copy link
Member Author

cwhanse commented Oct 8, 2020

Other than consistency with previous versions, when might a user prefer 'left' or 'right'?

Consistency is the only reason I came up with. The option to align 'left', 'center', 'right' falls out of the combination of numpy and pandas defaults.

I'd hard-code center

I'm OK with this. Unless you object I'll keep the 'align' plumbing in the private functions.

@cwhanse
Copy link
Member Author

cwhanse commented Dec 24, 2020

@wholmgren this looks better on the profiler

@wholmgren
Copy link
Member

wholmgren commented Dec 24, 2020

asv compare pvlib/master cwhanse/detect_clearsky

All benchmarks:

       before           after         ratio
     [926d2f95]       [983bd6f8]
                      <detect_clearsky>
+      14.7±0.4ms       55.6±0.8ms     3.77  detect_clearsky.DetectClear.time_detect_clearsky [holmgren-mbp.local/conda-py3.6-ephem3.7.6.0-numba0.36.1-numpy1.12.0-pandas0.22.0-pytables3.6.1-scipy1.2.0]
+      15.1±0.1ms         54.2±2ms     3.58  detect_clearsky.DetectClear.time_detect_clearsky [holmgren-mbp.local/conda-py3.8-ephem-numba-numpy-pandas-pytables-scipy]

I'm willing to accept this performance decrease if we think it's a net win for readability and ease of reuse in future algorithms.

@cwhanse
Copy link
Member Author

cwhanse commented Dec 28, 2020

@wholmgren I removed the use of pandas.roller, it is a performance hit. I don't think this is back to the speed of the original because it uses Series as the internal type rather than array. It could go back to array but that would take some work to maintain the centered interval alignments.

Copy link
Member

@wholmgren wholmgren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asv compare pvlib/master cwhanse/detect_clearsky

All benchmarks:

       before           after         ratio
     [926d2f95]       [e53572d9]
                      <detect_clearsky>
+      14.7±0.4ms       54.0±0.5ms     3.66  detect_clearsky.DetectClear.time_detect_clearsky [holmgren-mbp.local/conda-py3.6-ephem3.7.6.0-numba0.36.1-numpy1.12.0-pandas0.22.0-pytables3.6.1-scipy1.2.0]
+      15.1±0.1ms         52.4±1ms     3.47  detect_clearsky.DetectClear.time_detect_clearsky [holmgren-mbp.local/conda-py3.8-ephem-numba-numpy-pandas-pytables-scipy]

pvlib/clearsky.py Outdated Show resolved Hide resolved
pvlib/clearsky.py Outdated Show resolved Hide resolved
pvlib/clearsky.py Outdated Show resolved Hide resolved
@wholmgren
Copy link
Member

wholmgren commented Dec 28, 2020

@cwhanse I took the liberty of parameterizing the test in the latest commit, so you'll want to pull before you commit again. Results below with ndays in parenthesis.

asv compare pvlib/master cwhanse/detect_clearsky

All benchmarks:

       before           after         ratio
     [926d2f95]       [e53572d9]
                      <detect_clearsky>
+     2.08±0.06ms       36.2±0.8ms    17.45  detect_clearsky.DetectClear.time_detect_clearsky(1) [holmgren-mbp.local/conda-py3.6-ephem3.7.6.0-numba0.36.1-numpy1.12.0-pandas0.22.0-pytables3.6.1-scipy1.2.0]
+     2.63±0.09ms       32.2±0.6ms    12.27  detect_clearsky.DetectClear.time_detect_clearsky(1) [holmgren-mbp.local/conda-py3.8-ephem-numba-numpy-pandas-pytables-scipy]
+      14.2±0.4ms       55.1±0.6ms     3.87  detect_clearsky.DetectClear.time_detect_clearsky(10) [holmgren-mbp.local/conda-py3.6-ephem3.7.6.0-numba0.36.1-numpy1.12.0-pandas0.22.0-pytables3.6.1-scipy1.2.0]
+      15.2±0.2ms       52.0±0.6ms     3.41  detect_clearsky.DetectClear.time_detect_clearsky(10) [holmgren-mbp.local/conda-py3.8-ephem-numba-numpy-pandas-pytables-scipy]
+         165±1ms          231±2ms     1.40  detect_clearsky.DetectClear.time_detect_clearsky(100) [holmgren-mbp.local/conda-py3.6-ephem3.7.6.0-numba0.36.1-numpy1.12.0-pandas0.22.0-pytables3.6.1-scipy1.2.0]
+         164±2ms          222±1ms     1.35  detect_clearsky.DetectClear.time_detect_clearsky(100) [holmgren-mbp.local/conda-py3.8-ephem-numba-numpy-pandas-pytables-scipy]

The performance penalty is not so bad for longer data sets.

cwhanse and others added 3 commits December 29, 2020 09:50
Co-authored-by: Will Holmgren <william.holmgren@gmail.com>
Co-authored-by: Will Holmgren <william.holmgren@gmail.com>
Co-authored-by: Will Holmgren <william.holmgren@gmail.com>
Copy link
Member

@wholmgren wholmgren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready to merge?

@cwhanse
Copy link
Member Author

cwhanse commented Jan 4, 2021

Ready to merge?

Yes for me. @kanderso-nrel ?

Copy link
Member

@kandersolar kandersolar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't check calculation details this time but LGTM. One small question below.

@@ -569,19 +579,19 @@ def test_detect_clearsky_iterations(detect_clearsky_data):
alpha = 1.0448
with pytest.warns(RuntimeWarning):
clear_samples = clearsky.detect_clearsky(
expected['GHI'], cs['ghi']*alpha, cs.index, 10, max_iterations=1)
assert (clear_samples[:'2012-04-01 10:41:00'] == True).all()
assert (clear_samples['2012-04-01 10:42:00':] == False).all()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

De Morgan wonders if this should use .any() instead of .all()?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, this comment is about the line assert not clear_samples['2012-04-01 10:42:00':].all() # expected False

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct, should be .any()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants