refactor preprocessing resampling, validation, and alignment #580

wholmgren · 2020-09-30T23:04:13Z

Closes accounting for marginal nighttime and clearsky conditions when resampling observations #556 .
I am familiar with the contributing guidelines.
Tests added.
Updates entries to docs/source/api.rst for API changes.
Adds descriptions to appropriate "what's new" file in docs/source/whatsnew for all changes. Includes link to the GitHub Issue with :issue:`num` or this Pull Request with :pull:`num`. Includes contributor name and/or GitHub username (link with :ghuser:`user`).
New code is fully documented. Includes numpydoc compliant docstrings, examples, and comments where necessary.
Maintainer: Appropriate GitHub Labels and Milestone are assigned to the Pull Request and linked Issue.

There is a problem with the current implementation. Flags that represent bad data should be excluded from the time series before resampling (e.g. limits exceeded) and that's not done here. We're trading one problem for another. I don't know how to solve this without a richer flag datamodel or awful hacks. So maybe it's worth addressing that now. Or we can say this is a net win, move on, and address that later as originally planned. I guess I would say address it now but I don't have a good idea of what the datamodel changes would actually look like.

@alorenzo175 can you review and let me know what you think?

alorenzo175

Looks good so far. I think the datamodel changes basically involve adding keys like threshold_percentage to QualityFlagFilter. To handle this new issue, I think a key like discard_before_resample would help.

We should also keep in mind that one can specify multiple quality flag filters in the report. Right now they are merged into one (https://github.com/SolarArbiter/solarforecastarbiter-core/pull/580/files#diff-715b43f6fac14c48d8729da1c197d286R430) but if we go the route of adding these keys, maybe the filters should be applied successively.

alorenzo175 · 2020-10-01T20:47:08Z

solarforecastarbiter/metrics/preprocessing.py

+    # what's the point of copying the fill map, potentially adding a new key,
+    # and then only accessing one key?
+    # this would make more sense to me (wholmgren):
+    # forecast_fill_str = FORECAST_FILL_STRING_MAP.get(forecast_fill_method,


from what I can see, I agree

wholmgren · 2020-10-01T22:43:19Z

I think the datamodel changes basically involve adding keys like threshold_percentage to QualityFlagFilter. To handle this new issue, I think a key like discard_before_resample would help.

Would these be exposed in the report json or would they be preset for predefined QualityFlagFilters for each kind of quality flag?

alorenzo175 · 2020-10-05T21:14:30Z

I think the datamodel changes basically involve adding keys like threshold_percentage to QualityFlagFilter. To handle this new issue, I think a key like discard_before_resample would help.

Would these be exposed in the report json or would they be preset for predefined QualityFlagFilters for each kind of quality flag?

I was thinking they would be in the report json. If we wanted some kind of presets, I would think the dashboard could handle that like it might already do for other fields.

alorenzo175 · 2020-10-05T21:17:12Z

solarforecastarbiter/datamodel.py

-        'UNEVEN FREQUENCY', 'LIMITS EXCEEDED', 'CLEARSKY EXCEEDED',
-        'DAYTIME STALE VALUES', 'INCONSISTENT IRRADIANCE COMPONENTS'
-    )
+    type: str


It isn't ideal, but we should probably keep the quality_flags key (and the other parameters would apply to all flags in that tuple). Otherwise all current reports would be broken since I think the report json is loaded through the report datamodel before displaying/downloading.

So keep the old quality_flags key in addition to the new keys? I agree that could work. A couple of things to consider as alternatives:

script to migrate current reports from old json to new json

create new objects and json keys, leave the old ones untouched until some later time when we implement 1

I'm not sure what would be less work because the testing implications for keeping the old keys sound miserable.

Well keep old key, and add new keys except type. So all flags in quality_flags are processed with the other options. You would probably have the same options for many of the flags anyway, and you can still have multiple QualityFlagFilters with different sets of flags.

We've done the migration before an it was a real pain (and motivation to eventually get the report schema versioned). One problem is you also have to modify the json core posts to the API.

I don't think testing would be bad. All the keys are still used (if type is removed), we just have a couple of new keys w/ reasonable defaults if empty

wholmgren · 2020-10-06T20:09:15Z

Should these have the same behavior?

# separate flags:
qflag_1 = QualityFlagFilter(('NIGHTTIME', ), discard_before_resample=False, resample_threshold_percentage=30.)
qflag_2 = QualityFlagFilter(('CLEARSKY', ), discard_before_resample=False, resample_threshold_percentage=30.)

# combined flags
qflag_combined = QualityFlagFilter(('NIGHTTIME', 'CLEARSKY'), discard_before_resample=False, resample_threshold_percentage=30.)

Imagine a 15 minute validation time series like

nighttime = [True, False, False, False]
clearsky = [False, False, True, False]

and we want to resample into 1 hour. If we apply the filters individually, we keep the hourly interval. But if we combine the filters we have [True, False, True, False] and we discard the interval.

I think the "separate flags" case should keep the interval and the "combined flags" case should discard the interval, but I'm open to the idea that they should have the same behavior.

alorenzo175 · 2020-10-07T16:22:43Z

Interesting. I guess I always thought they would have the same behaviour and be treated separately. I can see how the combined flags case could be useful. Just need to make sure it's documented well.

wholmgren · 2020-10-08T22:13:02Z

More scope creep.... TimeOfDayFilter and ValueFilter are not currently implemented, but they'll need to fit into this pattern eventually. It's not obvious to me how that would work. I feel like they might need to be eliminated and instead set as keys on QualityFlagFilter (perhaps renamed). It's worth considering that before committing to this refactor.

wholmgren · 2020-10-12T21:46:24Z

I'd like to think it makes more sense now.

Although, I guess it wouldn't be too bad if there were just a Filter object that had keys quality_flags, time_of_day_range, value_range. Either way, I think we can lay the groundwork in this PR and make a follow on soon after if necessary

I added a few comments on where we could easily add support for this. Best left to another PR, though.

docs/source/whatsnew/1.0.0rc4.rst

alorenzo175 · 2020-10-13T21:00:48Z

solarforecastarbiter/datamodel.py

+    resample_threshold_percentage : float, default 10.
+        The percentage of points in a resampled interval that must be
+        flagged for the resampled interval to be flagged.
+        Ignored if ``discard_before_resample`` is True.


I think this needs updating?

Co-authored-by: Tony Lorenzo <atlorenzo@email.arizona.edu>

wholmgren · 2020-10-13T22:17:12Z

@lboeman we'll need some dashboard updates to go along with this. see #580 (comment)

The easiest thing might be to make separate QualityFlagFilters for each dashboard option (instead of putting them all together like we do now), with each discard_before_resample set by validation.quality_mapping.DISCARD_BEFORE_RESAMPLE.

Let me know if I should make any changes here to support that.

wholmgren added 4 commits September 28, 2020 16:00

decouple some parts of process_forecast_observations

77154c1

comments, docs

e90f653

refactor resampling, validation, and alignment

414d6cb

update description

52f6fda

wholmgren added enhancement New feature or request validation Issue pertains to data validation metrics Issue pertains to metrics calculation labels Sep 30, 2020

wholmgren added this to the 1.0 rc4 milestone Sep 30, 2020

alorenzo175 reviewed Oct 1, 2020

View reviewed changes

qualityflagfilter refactor sketch

2193839

alorenzo175 reviewed Oct 5, 2020

View reviewed changes

wholmgren added 4 commits October 8, 2020 11:55

maybe works. document expectations

6da3368

bool dtype, careful indexing

f2fa276

comments on other qfilters, remove _merge_filters

8bf00ea

remove non-QualityFlagFilters once again

a926af2

wholmgren added 10 commits October 8, 2020 19:56

fix missing ISNAN. event works

b6aac5e

test__resample_event_obs works

50831c6

one more list around isnan

86705e9

address some related notes

9bc1c63

test_resample_align to test_align

a8cb335

test_filter_resample_event

951e396

test_resample_and_align_timezone

994a027

test_process_forecast_observations_no_data comment

54b58ba

no_fx test, rename existing no_fx test to empty_fx

6f12288

test_preprocessing.py passes

f13104b

wholmgren added 3 commits October 12, 2020 14:20

threshold for preresample

c862e55

comments

7bb5113

change flag name

84d41db

wholmgren added 13 commits October 12, 2020 20:28

filter_resample tests

5b161e7

test_datamodel and test_api

9ee9ea6

empty data

97c6ca6

missing int

169d37b

unused import

25a33c9

coverage

288d352

whatsnew

be07a9d

refactor ref_data handling

d52592e

Merge remote-tracking branch 'solararbiter/master' into resampleflags

4ff7453

whatsnew

1b2a74c

api.rst

8a0ee95

more coverage

baa400f

should really get better at local coverage tests

9abbf9f

wholmgren requested a review from alorenzo175 October 13, 2020 19:35

alorenzo175 approved these changes Oct 13, 2020

View reviewed changes

wholmgren and others added 3 commits October 13, 2020 14:19

Update docs/source/whatsnew/1.0.0rc4.rst

6838c2c

Co-authored-by: Tony Lorenzo <atlorenzo@email.arizona.edu>

fix docstring

7a8db1e

fix docs

195a797

wholmgren merged commit dcf39c1 into SolarArbiter:master Oct 14, 2020

wholmgren deleted the resampleflags branch October 14, 2020 22:56

wholmgren mentioned this pull request Oct 28, 2020

fix report quality flag tables #602

Merged

11 tasks

wholmgren mentioned this pull request Aug 26, 2021

filter_resample uses > instead of >= when comparing against flag resample thresholds #723

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor preprocessing resampling, validation, and alignment #580

refactor preprocessing resampling, validation, and alignment #580

wholmgren commented Sep 30, 2020 •

edited

Loading

alorenzo175 left a comment •

edited

Loading

alorenzo175 Oct 1, 2020

wholmgren commented Oct 1, 2020

alorenzo175 commented Oct 5, 2020

alorenzo175 Oct 5, 2020

wholmgren Oct 5, 2020

alorenzo175 Oct 5, 2020

wholmgren commented Oct 6, 2020

alorenzo175 commented Oct 7, 2020

wholmgren commented Oct 8, 2020

wholmgren commented Oct 12, 2020

alorenzo175 Oct 13, 2020

wholmgren commented Oct 13, 2020

refactor preprocessing resampling, validation, and alignment #580

refactor preprocessing resampling, validation, and alignment #580

Conversation

wholmgren commented Sep 30, 2020 • edited Loading

alorenzo175 left a comment • edited Loading

Choose a reason for hiding this comment

alorenzo175 Oct 1, 2020

Choose a reason for hiding this comment

wholmgren commented Oct 1, 2020

alorenzo175 commented Oct 5, 2020

alorenzo175 Oct 5, 2020

Choose a reason for hiding this comment

wholmgren Oct 5, 2020

Choose a reason for hiding this comment

alorenzo175 Oct 5, 2020

Choose a reason for hiding this comment

wholmgren commented Oct 6, 2020

alorenzo175 commented Oct 7, 2020

wholmgren commented Oct 8, 2020

wholmgren commented Oct 12, 2020

alorenzo175 Oct 13, 2020

Choose a reason for hiding this comment

wholmgren commented Oct 13, 2020

wholmgren commented Sep 30, 2020 •

edited

Loading

alorenzo175 left a comment •

edited

Loading