Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functions to read and retrieve SolarAnywhere irradiance data #1497

Merged
merged 36 commits into from Dec 20, 2023

Conversation

AdamRJensen
Copy link
Member

@AdamRJensen AdamRJensen commented Jul 22, 2022

Description
This PR adds functions for reading and retrieving SolarAnywhere data as discussed in #1310. SolarAnywhere is a commercial dataset, although it provides irradiance data for a few locations for free.

Minor comments
The SolarAnywhere API is asynchronous, hence two requests are made for each data retrieval.

The get_solaranywhere function is currently not designed to handle annual and monthly data, as this seems less relevant to pvlib users.

@AdamRJensen AdamRJensen added enhancement io remote-data triggers --remote-data pytests labels Jul 22, 2022
@AdamRJensen AdamRJensen added this to the 0.10.0 milestone Jul 22, 2022
@AdamRJensen
Copy link
Member Author

AdamRJensen commented Jul 22, 2022

@KWagnerCPR It would be great if you could test the functions and give some feedback on the default values. 😄

@AdamRJensen AdamRJensen marked this pull request as draft July 22, 2022 18:07
@KWagnerCPR
Copy link

@KWagnerCPR It would be great if you could test the functions and give some feedback on the default values. 😄

@AdamRJensen Yes, I'll have a few people on our team test it out and get back to you. Which default values are you referring to?

@AdamRJensen
Copy link
Member Author

@KWagnerCPR In regards to default values I'm thinking of these:

def get_solaranywhere(latitude, longitude, api_key, start=None, end=None,
                      time_resolution=60, spatial_resolution=0.1,
                      true_dynamics=False, source='SolarAnywhereLatest',
                      variables=DEFAULT_VARIABLES, missing_data='Omit',
                      url=URL, map_variables=True, max_response_time=300)

Also, I have chosen not to parse any annual and monthly data, as that seems to be outside the scope of pvlib and would just unnecessarily complicate the function.

Furthermore, the get_solaranywhere function currently does not support retrieval of "probability of exceedance" time series, as the public license does not allow me to retrieve those.

@KWagnerCPR
Copy link

@AdamRJensen

In terms of default values, I would recommend the following:
time_resolution=60,
spatial_resolution=0.01,
true_dynamics=False,
source='SolarAnywhereLatest',
missing_data='FillAverage',

With SolarAnywhere Public, users get access to 1 km (0.01 x 0.01 degree) spatial resolution data, so I think it makes sense to set the default as 0.01 if our focus is on time series requests rather than TGY requests. Also, using the "FillAverage" default for missing data will ensure periods of missing satellite imagery are filled with long term averages for those times, which is likely preferable.

You should be able to request PXX data as a part of the SolarAnywhere public license. The issue might have been that you had the spatial resolution set to 0.1 when you requested it. You also have to include "ProbabilityOfExceedance": 90, in the request.

I think it makes sense not to request the monthly and annual totals at this time.

@AdamRJensen
Copy link
Member Author

AdamRJensen commented Jul 23, 2022

@KWagnerCPR I have updated the default values accordingly. Thanks for the feedback! 😄

Also, I figured out how to work with PXX data, so I've made some modifications to handle that and added a keyword called probability_of_exceedance.

@AdamRJensen
Copy link
Member Author

@pvlib/pvlib-maintainer I am thinking of developing primarily mock tests for the get_solaranywhere function as the SolarAnywhere data retrieval is slow (minutes). However, I would like to include at least one real request of historical data (just two days of data, which would make the time reasonable). Thus, unless I hear any objections I will go ahead and register the pvlib email for a free SolarAnywhere account and add a GitHub Secret called SOLARANYWHERE_API_KEY. If there is a desire to avoid live tests at all, let me know. FYI I imagine writing mock tests for this will be a real pain.

@AdamRJensen AdamRJensen mentioned this pull request Jul 26, 2022
9 tasks
@KWagnerCPR
Copy link

@AdamRJensen We've tested these functions and they are working well!

One thing I noticed is that when I request time series data (WeatherDataSource = SolarAnywhereLatest) without specifying a start or end time, I don't get the error message you specify in the script: "'When requesting non-TMY data, specifying start and 'end` is required.'". Instead I get: AttributeError: 'NoneType' object has no attribute 'tz'. Not a huge deal. Just wanted to point it out.

I think these will be really helpful for our customers as well as new SolarAnywhere users. I'd love to include links to these in our developers documentation. Do you know if these will get documented in the official pvlib documentation?: https://pvlib-python.readthedocs.io/en/stable/index.html

@AdamRJensen
Copy link
Member Author

One thing I noticed is that when I request time series data (WeatherDataSource = SolarAnywhereLatest) without specifying a start or end time, I don't get the error message you specify in the script: "'When requesting non-TMY data, specifying start and 'end` is required.'". Instead I get: AttributeError: 'NoneType' object has no attribute 'tz'. Not a huge deal. Just wanted to point it out.

@KWagnerCPR Thank you for catching this error. I had forgotten to raise the error, hence nothing was happening - it was been corrected.

I think these will be really helpful for our customers as well as new SolarAnywhere users. I'd love to include links to these in our developers documentation. Do you know if these will get documented in the official pvlib documentation?: https://pvlib-python.readthedocs.io/en/stable/index.html

Once the functions get merged and we make a new release of pvlib, the documentation will be available publicly. We're releasing version 0.9.2 within a few weeks, but I can't promise that this pull request will be merged by then.

@KWagnerCPR
Copy link

One thing I noticed is that when I request time series data (WeatherDataSource = SolarAnywhereLatest) without specifying a start or end time, I don't get the error message you specify in the script: "'When requesting non-TMY data, specifying start and 'end` is required.'". Instead I get: AttributeError: 'NoneType' object has no attribute 'tz'. Not a huge deal. Just wanted to point it out.

@KWagnerCPR Thank you for catching this error. I had forgotten to raise the error, hence nothing was happening - it was been corrected.

I think these will be really helpful for our customers as well as new SolarAnywhere users. I'd love to include links to these in our developers documentation. Do you know if these will get documented in the official pvlib documentation?: https://pvlib-python.readthedocs.io/en/stable/index.html

Once the functions get merged and we make a new release of pvlib, the documentation will be available publicly. We're releasing version 0.9.2 within a few weeks, but I can't promise that this pull request will be merged by then.

@AdamRJensen Thank you! I'll stay tuned.

@AdamRJensen
Copy link
Member Author

When downloading data from the SolarAnywhere website, missing data may either be filled with average data or represented as missing using one of three options:
image

Missing values are by default correctly converted to Numpy nan. Similarly, I belive that the 'NaN' option would also be correctly converted to Numpy nan since we're using pd.read_csv (though I haven't tested it). However, -999 will have to be converted to numpy nan by the read_solaranywhere function.

@KWagnerCPR Do you have a file or a know station/time period where there is missing data that I could use for testing?

@AdamRJensen AdamRJensen added remote-data triggers --remote-data pytests and removed remote-data triggers --remote-data pytests labels Aug 16, 2022
@kandersolar
Copy link
Member

@kandersolar Could you help with the flake8 issue - I can't really tell what it is?

I think the problem is this:

  1. To use flake8's --diff option, we are piping git diff into flake8 via stdin rather than letting flake8 discover and read files itself
  2. That means flake8 has to identify .py files by parsing the git diff text provided on stdin, which means it wants to decode the stdin text to UTF-8 first
  3. The CSV data file has that troublesome trademark character that cannot be decoded as UTF-8. This character is included in the output of git diff
  4. Thus flake8 breaks while trying to decode the data passed to its stdin

What to do about it? Modify the git diff to only show output for .py files like this, maybe?

git diff upstream/main -- "*.py" | flake8 --exclude pvlib/version.py --ignore E201,E241,E226,W503,W504 --max-line-length 79 --diff

@AdamRJensen
Copy link
Member Author

I think the problem is this:

  1. To use flake8's --diff option, we are piping git diff into flake8 via stdin rather than letting flake8 discover and read files itself
  2. That means flake8 has to identify .py files by parsing the git diff text provided on stdin, which means it wants to decode the stdin text to UTF-8 first
  3. The CSV data file has that troublesome trademark character that cannot be decoded as UTF-8. This character is included in the output of git diff
  4. Thus flake8 breaks while trying to decode the data passed to its stdin

What to do about it? Modify the git diff to only show output for .py files like this, maybe?

git diff upstream/main -- "*.py" | flake8 --exclude pvlib/version.py --ignore E201,E241,E226,W503,W504 --max-line-length 79 --diff

This solved the issue. Thanks!

Co-authored-by: Kevin Anderson <kevin.anderso@gmail.com>
@AdamRJensen AdamRJensen added remote-data triggers --remote-data pytests and removed remote-data triggers --remote-data pytests labels Dec 19, 2023
@AdamRJensen AdamRJensen marked this pull request as ready for review December 19, 2023 17:02
Copy link
Member

@kandersolar kandersolar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments, but this looks good to me! Note that I did not actually run any code.

pvlib/iotools/solaranywhere.py Outdated Show resolved Hide resolved
Comment on lines 283 to 284
data.index = pd.to_datetime(data['ObservationTime(GMT)'],
format='%m/%d/%Y %H:%M', utc=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
data.index = pd.to_datetime(data['ObservationTime(GMT)'],
format='%m/%d/%Y %H:%M', utc=True)
data.index = pd.to_datetime(data['ObservationTime(LST)'],
format='%m/%d/%Y %H:%M')

Maybe better to use the local standard time column?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, I think the tests will need to be adjusted

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right that LST is preferred. My main point for saying this is that it's much easier for a user to convert to UTC than to convert to LST,

solaranywhere_api_key = os.environ["SOLARANYWHERE_API_KEY"]
has_solaranywhere_credentials = True
except KeyError:
has_solaranywhere_credentials = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why codecov is claiming that line 102 isn't covered. The CI logs show that the tests are running, so I don't see why this line wouldn't be getting hit. No action needed; just recording my confusion...

@AdamRJensen AdamRJensen added remote-data triggers --remote-data pytests and removed remote-data triggers --remote-data pytests labels Dec 19, 2023
@AdamRJensen AdamRJensen added remote-data triggers --remote-data pytests and removed remote-data triggers --remote-data pytests labels Dec 19, 2023
Copy link
Member

@cwhanse cwhanse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment about requirements start and end, rest are spelling-level nits.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert this, or is it fixing a mistake?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no change there so I can't remove the file... I think it's fine leaving it. @kandersolar leave a thumb if you have an opinion

pvlib/iotools/solaranywhere.py Outdated Show resolved Hide resolved
pvlib/iotools/solaranywhere.py Outdated Show resolved Hide resolved
pvlib/iotools/solaranywhere.py Outdated Show resolved Hide resolved
payload['Options']['ApplyTrueDynamics'] = True

if probability_of_exceedance is not None:
if type(probability_of_exceedance) != int:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isinstance here? I don't know if type == is preferrable.

I would type(probability_of_exceedance) == int: and let all the else's raise the error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched to isinstance

probability_of_exceedance

# Add start/end time if requesting non-TMY data
if (start is not None) or (end is not None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should or be and? I'm guessing both start and end are required. If so, then I think we could use an else: raise ValueError here, unless SolarAnywhere supplies some default times.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a user only specified start or end we would want them to get useful feedback, such as None cannot be converted to timestamp which is what would happen now. If you use and then the error message would first happen much later in the code and be more difficult to interpret.

pvlib/iotools/solaranywhere.py Outdated Show resolved Hide resolved
pvlib/iotools/solaranywhere.py Outdated Show resolved Hide resolved
pvlib/iotools/solaranywhere.py Outdated Show resolved Hide resolved
pvlib/tests/iotools/test_solaranywhere.py Outdated Show resolved Hide resolved
AdamRJensen and others added 2 commits December 20, 2023 17:55
Co-authored-by: Cliff Hansen <cwhanse@sandia.gov>
@AdamRJensen AdamRJensen added remote-data triggers --remote-data pytests and removed remote-data triggers --remote-data pytests labels Dec 20, 2023
@kandersolar
Copy link
Member

Ok, going ahead with the merge. Thanks @AdamRJensen!

@kandersolar kandersolar merged commit ac70b14 into pvlib:main Dec 20, 2023
39 of 41 checks passed
@AdamRJensen AdamRJensen deleted the solaranywhere branch December 20, 2023 18:01
@KWagnerCPR
Copy link

We greatly appreciate the work on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement io remote-data triggers --remote-data pytests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants