Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sodapro.get_cams_radiation #1175

Merged
merged 59 commits into from
Jun 13, 2021
Merged

Conversation

AdamRJensen
Copy link
Member

@AdamRJensen AdamRJensen commented Feb 22, 2021

  • Closes #xxxx
  • I am familiar with the contributing guidelines
  • Tests added
  • Updates entries to docs/sphinx/source/api.rst for API changes.
  • Adds description and name entries in the appropriate "what's new" file in docs/sphinx/source/whatsnew for all changes. Includes link to the GitHub Issue with :issue:`num` or this Pull Request with :pull:`num`. Includes contributor name and/or GitHub username (link with :ghuser:`user`).
  • New code is fully documented. Includes numpydoc compliant docstrings, examples, and comments where necessary.
  • Pull request is nearly complete and ready for detailed review.
  • Maintainer: Appropriate GitHub Labels and Milestone are assigned to the Pull Request and linked Issue.

Add a function to retrieve CAMS McClear clear-sky radiation as previously unsuccessfully attempted in #271, #274, #279. This is a new pull request as my previous attempt #1172 erroneously had a long/incorrect commit history.

@AdamRJensen
Copy link
Member Author

AdamRJensen commented Feb 22, 2021

@mikofski Thanks for the help in making a new pull request. The previous PR #1172 has now been closed.

As previously mentioned: I've asked the developing team of the CAMS radiation and McClear services, and have been informed that there is not and will not be any demo API email/access. So if tests have to be implemented, we need to register a PVLIB email. Only a registered email address is necessary and no password will ever be public.

The function includes some very convenient features, including the label, integrated, and map_variable arguments, making it very intuitive for Python users. I'd very much like a critical look at the function and feedback on how it can be improved.

@AdamRJensen
Copy link
Member Author

@mikofski As far as I can tell CAMS McClear and ECMWF MACC pvlib.iotools.ecmwf_macc return very different parameters.

@wholmgren
Copy link
Member

I'm not very familiar with the details of CAMS services, but it appears the major differences between this McClear data reader and the existing MACC reader are in spectral components of AOD. McClear returns AOD at 550 nm for a variety of particles, while MACC returns AOD at a variety of wavelengths. So I think they are complementary and both readers can exist in pvlib.

The CAMS data license (available here) allows us to bundle a file for testing.

How about one function for parsing data and one function for fetching data? The fetch function can call the parse function so a user only needs to make one function call. But separating the logic lets users load data in a different manner if desired (e.g. from disk or using threads).

@AdamRJensen
Copy link
Member Author

@wholmgren I will separate it into a get and read/parse function. Any preference in calling it read or parse? Is there a difference?

The nice part about having it in one function is that you know if the file has 6 (verbose=False) or 23 columns (verbose=True). I suppose this can be solved by reading in the file, and then checking how many columns there are, and based on this assign the 6 or 23 column name header.

The primary output of CAMS McClear is the time-series of broad-band clear sky irradiance. This clear-sky model can be used for a number of purposes, e.g. quality checks, etc., and has the major advantage over most clear-sky models in that it is based on data for the specific location and the specific time and not just seasonal trends. The ECMWF MACC reader only returns AOD for different wavelengths and water vapor, which has very different uses.

@wholmgren
Copy link
Member

wholmgren commented Feb 23, 2021

Any preference in calling it read or parse? Is there a difference?

I'm guessing parse. Check out psm3.py for an example of a module with a get_psm3 (downloads data from remote source, passes it to parse), parse_psm3 that converts the buffered data into a well-formatted DataFrame, and finally read_psm3 that simplifies usage for local files.

#1155 has some recent discussion relevant to the benefits of splitting up fetching and parsing.

The nice part about having it in one function is that you know if the file has 6 (verbose=False) or 23 columns (verbose=True). I suppose this can be solved by reading in the file, and then checking how many columns there are, and based on this assign the 6 or 23 column name header.

That could work, but you could also keep it simple and add that flag to the read/parse function too. Most people will use the higher level function for direct downloads + parsing and it's not a big deal to make users of the lower level parse function repeat a parameter.

@AdamRJensen
Copy link
Member Author

@wholmgren Ok, I will attempt to make a similar structure, with parse, read, and get. Or should we stick to just parse and get?
This function differs from psm3, in that arguments are needed to get passed from the get function to the read function and then to the parse function: particularly the integrated, label, and map_variables arguments.

@wholmgren
Copy link
Member

I don't see that it's an issue to pass any of those arguments from one function to another or to expect that a user with local data already knows something about that data. But I don't want to over complicate it - iotools is supposed to be relatively easy to contribute to. It's really fine with me if you want to keep it all in one function.

@AdamRJensen
Copy link
Member Author

AdamRJensen commented Feb 24, 2021

@wholmgren I have now created read and parse sub-functions similar to psm3. It is indeed nice that it can retrieve files, but also work with local files.

Also, I found a way to parse the meta-data nicely and get the column names from the file itself.

The documentation for the two new functions is not fully developed, but I will do this within the next few days. I'll also be working on making tests. Though you are welcome to look through / test the function now.

@AdamRJensen
Copy link
Member Author

AdamRJensen commented Feb 27, 2021

@wholmgren I have renamed the function to 'get_cams_radiation' as I have extended the functions to read/parse/get both CAMS Radiation and McClear as the two services follow the same format (API input is identical, with the exception of one string, specifying whether the users wants Radiation or McClear).

I have been looking into developing tests, but I am very uncertain about which aspects need to be tested? Do we want to register a pvlib email and test the 'get' method by making a real API call?

Copy link
Member

@kandersolar kandersolar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking pretty good @AdamRJensen! Some notes below, and let's talk about the tests on our call later.

ci/requirements-py36-min.yml Show resolved Hide resolved
pvlib/iotools/sodapro.py Outdated Show resolved Hide resolved
pvlib/iotools/sodapro.py Show resolved Hide resolved
pvlib/iotools/sodapro.py Outdated Show resolved Hide resolved
pvlib/iotools/sodapro.py Outdated Show resolved Hide resolved
pvlib/iotools/sodapro.py Outdated Show resolved Hide resolved
if (time_step == '1d') | (time_step == '1M'):
data.index = pd.DatetimeIndex(data.index.date)
# For monthly data with 'right' label, the index should be the last
# date of the month and not the first date of the following month
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way seems reasonable to me (and I personally prefer left-labeling to avoid questions like this), but I wonder what other people think. As a point of comparison, TMY3 data are right-labeled but without a sub-unit shift like this.

I also wonder if all this labeling code is worth it -- maybe we should just return the start observation time and be done with it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should just return the start observation time and be done with it.

+1

Copy link
Member Author

@AdamRJensen AdamRJensen Jun 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recognize your concern and would perhaps agree if it the service didn't return monthly data.

My main argument is that Pandas labels minute and daily data by the left interval, however, monthly data is labelled right (i.e., last day of the month). The majority of users of monthly data would be required to manually change the index labels, before being able to merge with any other resample data in pandas.

Personally I think it adds a lot of functionality that makes the function a lot easier to use, and that it would be a shame nixing it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW here are some one-liners for converting month-begin data to month-end:

In [20]: s = pd.Series([1,2,3], pd.date_range('2019-01-01', periods=3, freq='MS'))

In [21]: s.index
Out[21]: DatetimeIndex(['2019-01-01', '2019-02-01', '2019-03-01'], dtype='datetime64[ns]', freq='MS')

In [22]: s.resample('M').mean().index
Out[22]: DatetimeIndex(['2019-01-31', '2019-02-28', '2019-03-31'], dtype='datetime64[ns]', freq='M')

# or
In [23]: s.to_period('M').to_timestamp('M').index
Out[23]: DatetimeIndex(['2019-01-31', '2019-02-28', '2019-03-31'], dtype='datetime64[ns]', freq='M')

How about removing the label parameter but adding a docstring example with (a better version of) the above snippet for the benefit of whatever subset of pvlib users want monthly insolation? E.g. this, which renders as this.

pvlib/iotools/sodapro.py Outdated Show resolved Hide resolved
pvlib/iotools/sodapro.py Outdated Show resolved Hide resolved
pvlib/iotools/sodapro.py Outdated Show resolved Hide resolved
Copy link
Member

@wholmgren wholmgren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job with the tests @AdamRJensen.

pvlib/iotools/sodapro.py Outdated Show resolved Hide resolved
if (time_step == '1d') | (time_step == '1M'):
data.index = pd.DatetimeIndex(data.index.date)
# For monthly data with 'right' label, the index should be the last
# date of the month and not the first date of the following month
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should just return the start observation time and be done with it.

+1

AdamRJensen and others added 4 commits June 7, 2021 12:28
Co-authored-by: Kevin Anderson <57452607+kanderso-nrel@users.noreply.github.com>
Co-authored-by: Kevin Anderson <57452607+kanderso-nrel@users.noreply.github.com>
Also, includes a few minor changes addressing the review by @kanderso-nrel
@AdamRJensen AdamRJensen changed the title Add cams.get_cams_radiation Add sodapro.get_cams_radiation Jun 7, 2021
Includes match keywords for tests that asserts warning and error messages.
Copy link
Member

@wholmgren wholmgren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @AdamRJensen. @kanderso-nrel I'll let you decide when to merge this and refrain from further review unless requested.

@AdamRJensen
Copy link
Member Author

@wholmgren @kanderso-nrel I recommend that this first gets merged when a decision on #1245 is made.

Copy link
Member

@kandersolar kandersolar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay here, this looks great. Thanks @AdamRJensen! One more idea, what do you think about adding the new-to-pvlib bhi acronym to the Variables and Symbols list?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants