Skip to content

Conversation

wfvining
Copy link
Collaborator

@wfvining wfvining commented May 12, 2020

Two methods to identify when the sun is up from power or irradiance data. These are useful when you do not trust/know the timezone of timestamps in your data.

Both functions work by aggregating data by minute of the day

  • features.daylight.frequency() looks for the minutes of the day where the vast majority of days (defaults to 80%) in the data have positive values.
  • features.daylight.level() looks for the minutes of the day where the mean of all data for that minute is greater than some percentage (defaults to 20%) of some quantile (defaults to 95%) of the data.

To Do

  • Add to API documentation
  • move to features.daylight submodule
  • Consider additional tests (did I miss any corner cases?).
    • Add a test where the daytime period is split across two days in the index
  • Update readme to reflect the new modules/submodules

Not working yet, but includes a couple initial tests and the rough
documentation of the function.

'features.time' may not be the right place for it, but it makes sense
to me at the moment. I think this kind of function can be useful in a
quality check, but in and of itself it is not a QA function---it is a
feature extraction function. Another organization option could be
'features.irradiance' (since the function operates on irradiance
data); however, because it can also operate power data it doesn't make
sense to me to put it there.
Fixes bug where maximum number of days with data was not calculated
correctly. Since the data was transformed from a series to a frame the
return value of 'data.max()' was a series. The conditional, however,
was expecting a scalar. To avoid this we just compute the minutes of
the day as their own series and use that series to do the grouping
rather than making the minutes of the day another column of the data.
Looks for periods of sufficiently high power or irradiance rather than
just looking at the frequency of non-zero values. Should be more
robust to data with non-zero (but low) night time values than
daytime_frequency, but may suffer from other (as yet unknown
limitations).
@wfvining
Copy link
Collaborator Author

@matt14muller let me know if you have any thoughts on the daytime_level() function.

Adds summary of functions to API documentation as well.
@wfvining
Copy link
Collaborator Author

Our original plan (see README.md) was to put these types of functions in the filtering module. Should we do that, or do we want to keep them here. Personally, I think it makes more sense to put these in features since they are identifying a feature in the data, but I could be convinced otherwise.

@cwhanse
Copy link
Member

cwhanse commented May 14, 2020

I'm OK with putting it these functions in features, maybe make a daylight folder within features

@matt14muller
Copy link

matt14muller commented May 14, 2020 via email

@wfvining
Copy link
Collaborator Author

features.daylight.[frequency|level] seems good to me. I didn't really like features.time now that you mention it.

wfvining added 2 commits May 19, 2020 08:35
More specific name and a place to hold functions for features related
to when the sun is up.
@wfvining wfvining marked this pull request as ready for review May 19, 2020 14:57
@wfvining wfvining requested a review from cwhanse May 19, 2020 14:57
Copy link
Member

@cwhanse cwhanse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions to discuss with @matt14muller before we merge

raise ValueError("Too few days with data (got {}, minimum_days={})"
.format(value_frequency.max(), minimum_days))
daylight_minutes = value_frequency[
value_frequency > threshold * value_frequency.mean()
Copy link
Member

@cwhanse cwhanse May 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matt14muller I don't understand how this works. value_frequency[minute_of_day] is the count of days when the data value is positive. value_frequency.mean() is the average of these counts. What happens if the data span a summer at high latitude? Since the daylight period is long, value_frequency.mean() will be large compared to data from a winter at the same latitude. So it is more likely to label a minute as daytime during winter than summer?

up.
"""
max_power = power_or_irradiance.quantile(quantile)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matt14muller max_power is the 95th quantile of the time series, e.g. a value representative of solar noon. So it is labeling daylight as minutes when the time series is greater than or equal to the 95th percentile. The labels from this algorithm are going to depend on the season, e.g., early morning minutes are less likely to be labeled in the summer than in the winter since the solar noon peak is higher in summer.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote this one based on our discussion a few weeks ago (not based on the pvfleets code, even though it is structurally similar). I had not thought of the issue you pointed out with the higher values in the summer, that might be a good reason not to include this. If you only really want mid-day periods and you don't mind losing the morning/evening data then I do think that this algorithm works well (even on the wacky irradiance data set that Matt shared with me).

wfvining and others added 2 commits May 28, 2020 14:41
Co-authored-by: Cliff Hansen <cwhanse@sandia.gov>
Co-authored-by: Cliff Hansen <cwhanse@sandia.gov>
@wfvining wfvining marked this pull request as draft June 2, 2020 17:13
@wfvining
Copy link
Collaborator Author

Closing this since #67 implements a better solution.

@wfvining wfvining closed this Aug 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants