add pvdaq reference sites, fetch, config #438

wholmgren · 2020-05-07T19:28:34Z

Closes PVDAQ sites for reference data #397 .
I am familiar with the contributing guidelines.
Tests added.
Updates entries to docs/source/api.rst for API changes.
Adds descriptions to appropriate "what's new" file in docs/source/whatsnew for all changes. Includes link to the GitHub Issue with :issue:`num` or this Pull Request with :pull:`num`. Includes contributor name and/or GitHub username (link with :ghuser:`user`).
New code is fully documented. Includes numpydoc compliant docstrings, examples, and comments where necessary.
Maintainer: Appropriate GitHub Labels and Milestone are assigned to the Pull Request and linked Issue.

Additional To do:

make fetch wrapper functions. Need to reindex to fixed frequency (most sites report at only quasi-regular intervals). Also need to localize data in fetch wrapper because PVDAQ apparently returns local time (no DST), but without the tz offset. I pulled tz offset data from google's timezone API.
add pv modeling parameters somewhere
add sites to the reference forecast configuration too - not sure how that's done.
set up SFA API key
add sites to google map accessible at https://solarforecastarbiter.org/referencedata/

get_pvdaq_data function is based on pvlib/pvlib-python#664

It takes about 3 minutes to pull all of the 2020 data.

wholmgren · 2020-05-07T22:04:19Z

@alorenzo175 @lboeman I found this existing code for conforming to an index:

solarforecastarbiter-core/solarforecastarbiter/io/reference_observations/common.py

Lines 309 to 315 in 642fad0

    
           # we don't extend the new index to start, end, since reference 
        
           # data has some lag time from the end it was requested from 
        
           # and it isn't necessary to keep the nans between uploads in db 
        
           new_index = pd.date_range(start=data.index[0], end=data.index[-1], 
        
                                     freq=observation.interval_length) 
        
           data = data.reindex(new_index) 
        
           # set quality flags

Can we change that to

    data_resampled = data.resample(observation.interval_length).first()
    new_index = pd.date_range(start=data.index[0], end=data.index[-1],
                              freq=observation.interval_length)
    data = data_resampled.reindex(new_index)

alorenzo175 · 2020-05-07T22:10:24Z

Hmm, and if interval label is ending? Seems like .last() is more appropriate. And if interval_label == 'mean' seems like a mean would be appropriate?

wholmgren · 2020-05-08T03:39:52Z

I was originally just trying to fix a problem with timestamps that need to be rounded to be on the right interval. Here are a couple of examples:

	ac_power	dc_power
Date-Time
2020-03-12 13:30:00	61000.0	64600.0
2020-03-12 13:35:00	101700.0	107700.0
2020-03-12 13:40:00	60300.0	63800.0
2020-03-12 13:45:00	57300.0	60800.0
2020-03-12 13:49:58	58400.0	61800.0
2020-03-12 13:50:00	58400.0	62100.0
2020-03-12 13:55:00	67200.0	71100.0

and

	ac_power
Date-Time
2020-01-04 12:00:00	1057
2020-01-04 12:15:01	1685
2020-01-04 12:30:00	1982
2020-01-04 12:45:00	2891
2020-01-04 13:00:00	2428
2020-01-04 13:15:00	2285
2020-01-04 13:30:00	2169
2020-01-04 13:45:00	2075
2020-01-04 14:00:00	1418
2020-01-04 14:15:00	1829
2020-01-04 14:30:01	1181
2020-01-04 14:45:00	2127

In the first case we could average or we could just discard. In the second case we want to round. If discarding is ok, then we can use resample().first() regardless of the label.

But there are also two sites that report 15s data that we'll need to resample to 1 minute intervals (unless we want to support non integer interval_lengths in the API).

alorenzo175 · 2020-05-08T21:43:48Z

Dumping data is ifne with me

wholmgren · 2020-05-09T03:54:00Z

I'd like some feedback on the proposed pattern before implementing/fixing tests.

pvdaq_reference_sites.json is a dict with keys 'sites' and 'observations'. The sites are all of the pvdaq sites, and the observations are a flat list of all of the pvdaq observations. I don't have a strong opinion about this vs. a nested structure. The code I'm parsing a handful of ugly python dicts and dataframes into Site and Observation objects. From there it's simple to change the organization of the json file if people prefer something else. I know @lboeman was thinking of something similar for SRML PV sites.

common._prepare_data_to_post gets a new resample_how argument that can be None or a pandas resampler method. For PVDAQ, this lets us drop data or resample sub-minute data to 1 minute data. common.post_observation_data pulls this from observation.extra_parameters, with default None. I could put this kind of logic in reference_observations.pvdaq.fetch if I assign the resample rules to the Site instead of the Observation. While that might be consistent with the existing expectations of all reference site data, I think we need to be more flexible in the future. I also just think it makes more sense to put it in the observation metadata.

wholmgren · 2020-05-09T22:07:25Z

It's now working with the solararbiter referencedata init and update commands.

See this site for an example: https://dev-dashboard.solarforecastarbiter.org/observations/bfdb2b98-923f-11ea-bfa1-0a580a82013d?start=2020-04-01T22%3A00Z&end=2020-05-09T22%3A00Z

Observations won't update at the other sites because I first created them with not quite the correct extra_parameters (it was a dict instead of string-ified dict). I deleted the site/observation/forecasts at just the linked site above (major pain - separate issue) and then recreated it with extra parameters that are strings, and the values post was successful.

wholmgren · 2020-05-11T17:15:26Z

Looks like everything is working. There are a couple of sites on the dev dashboard that have the wrong AC/DC capacity since I hadn't yet fixed the W-->MW conversion when they were created, but they would be recreated correctly following a database wipe. It takes about 20 minutes to pull and post all of the 2020 data. About 75% of the time is in the post.

lboeman

This looks good to me. I do like storing the sites as valid API schema. Maybe eventually we convert the full list to a json file with the format:

{ 'NETWORK': {
    'sites': [ ... ],
    'observations': [...], // optional
 },
 'NETWORK 2': {...}
}

Then we can have a set of required extra_parameters, and special cases can be added without needing to update the whole list. Not suggesting this should happen here, but just thinking ahead a little.

lboeman · 2020-05-11T17:57:22Z

solarforecastarbiter/io/reference_observations/pvdaq.py

+    start : datetime
+        The beginning of the period to request data for.
+    end : datetime
+        The end of the period to request data for.


nrel_pvdaq_api_key here

alorenzo175

I'm good with the approach. Needs tests for common and resample_how and fetch/pvdaq at least.

alorenzo175 · 2020-05-11T19:30:27Z

solarforecastarbiter/io/fetch/pvdaq.py

+    # Each year must queries separately, so iterate over the years and
+    # generate a list of dataframes.
+    # Consider putting this loop in its own private function with
+    # try / except / try again pattern for network issues and NREL API


will we see any issues from this?

The pvlib CI struggled with a different NREL API but I haven't run into any with the pvdaq API. They have a 1000 requests per hour limit but we are no where close to that. Let's see if it's a problem in the rc cycle.

solarforecastarbiter/io/reference_observations/common.py

Co-authored-by: Tony Lorenzo <atlorenzo@email.arizona.edu>

wholmgren · 2020-05-12T15:17:40Z

@alorenzo175 tests pass, coverage is almost 100%, so I think we're close. The mocks grew more aggressive as the hour grew later, but I think they're consistent with other modules.

add pvdaq reference sites, fetch, config

355b4bd

wholmgren added the enhancement New feature or request label May 7, 2020

wholmgren added this to the 1.0 release candidate milestone May 7, 2020

wholmgren added the IO Issue pertains to data IO label May 7, 2020

wholmgren added 7 commits May 8, 2020 16:40

add json file with sites and observations

4f8efca

doc fix

7ef1c7c

support resampling

4f38117

doc fix

332516f

remove python mapping

cecf16d

refactor create_observation, create check_and_post_observation

36e457e

merge solararbiter/master

b4c3011

wholmgren added 3 commits May 9, 2020 09:56

documentation, cli

d367fd6

metadata working with dev api

39657b4

stringy extra_parameters dict

d26a543

wholmgren added 4 commits May 10, 2020 10:08

api.rst

8735e64

whatsnew, debug code

4697472

add untested pvdaq.adjust_site_parameters

ec52dc1

fix modeling parameters bug, fix mw bug

b395c03

wholmgren requested a review from lboeman May 11, 2020 17:13

lboeman approved these changes May 11, 2020

View reviewed changes

missing arg in docstring

c574b00

wholmgren requested a review from alorenzo175 May 11, 2020 18:40

alorenzo175 reviewed May 11, 2020

View reviewed changes

wholmgren and others added 9 commits May 11, 2020 14:01

Update solarforecastarbiter/io/reference_observations/common.py

6b657c1

Co-authored-by: Tony Lorenzo <atlorenzo@email.arizona.edu>

test common

02f72e4

pvdaq fetch tests

a05754f

the test

433bdb2

fix path

5f9fb3f

test_ref_pvdaq

bbae0e6

remove unused fixture

1dbb7d2

moar coverage

9c87316

remove unused import

4d038bc

alorenzo175 approved these changes May 12, 2020

View reviewed changes

wholmgren merged commit 03a2c29 into SolarArbiter:master May 12, 2020

wholmgren deleted the pvdaq branch May 12, 2020 16:43

mikofski mentioned this pull request May 13, 2020

pvdaq io functions pvlib/pvlib-python#664

Open

8 tasks

lboeman mentioned this pull request May 13, 2020

Adjust srml, Add reference obs documentation #442

Merged

7 tasks

alorenzo175 modified the milestones: 1.0 release candidate, 1.0 beta 6 May 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add pvdaq reference sites, fetch, config #438

add pvdaq reference sites, fetch, config #438

wholmgren commented May 7, 2020 •

edited

wholmgren commented May 7, 2020 •

edited

alorenzo175 commented May 7, 2020

wholmgren commented May 8, 2020

alorenzo175 commented May 8, 2020

wholmgren commented May 9, 2020

wholmgren commented May 9, 2020

wholmgren commented May 11, 2020

lboeman left a comment

lboeman May 11, 2020

alorenzo175 left a comment

alorenzo175 May 11, 2020

wholmgren May 11, 2020

wholmgren commented May 12, 2020

add pvdaq reference sites, fetch, config #438

add pvdaq reference sites, fetch, config #438

Conversation

wholmgren commented May 7, 2020 • edited

wholmgren commented May 7, 2020 • edited

alorenzo175 commented May 7, 2020

wholmgren commented May 8, 2020

alorenzo175 commented May 8, 2020

wholmgren commented May 9, 2020

wholmgren commented May 9, 2020

wholmgren commented May 11, 2020

lboeman left a comment

Choose a reason for hiding this comment

lboeman May 11, 2020

Choose a reason for hiding this comment

alorenzo175 left a comment

Choose a reason for hiding this comment

alorenzo175 May 11, 2020

Choose a reason for hiding this comment

wholmgren May 11, 2020

Choose a reason for hiding this comment

wholmgren commented May 12, 2020

wholmgren commented May 7, 2020 •

edited

wholmgren commented May 7, 2020 •

edited