Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add function to retrieve example dataset paths #1763

Open
wants to merge 27 commits into
base: main
Choose a base branch
from

Conversation

echedey-ls
Copy link
Contributor

@echedey-ls echedey-ls commented Jun 7, 2023

Done:

I'll wait for milestone assignment:

  • Adds description and name entries in the appropriate "what's new" file in docs/sphinx/source/whatsnew for all changes. Includes link to the GitHub Issue with :issue:`num` or this Pull Request with :pull:`num`. Includes contributor name and/or GitHub username (link with :ghuser:`user`).
  • Maintainer: Appropriate GitHub Labels (including remote-data) and Milestone are assigned to the Pull Request and linked Issue.

Adds a get_example_dataset_path() function to pvlib.tools that returns a filepath to a dataset bundled with pvlib. See linked issue above.

Here is the documentation link [UPDATED 2023-06-18].

@echedey-ls echedey-ls marked this pull request as ready for review June 7, 2023 11:17
@echedey-ls
Copy link
Contributor Author

This issue is related to #1056. However, I haven't addressed any of it since in first place I don't know which purpose each file has.
Tweaking this new function would leverage work put into changing the structure.

Copy link
Member

@kandersolar kandersolar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @echedey-ls :)

there isn't a tools.rst file, so I believe it doesn't need referencing.

It's true that we don't usually document the functions in pvlib.tools, but in this case I think we probably should so that curious users can see what the function does and doesn't do. How about a new entry list at the bottom of the iotools page, something like this:

Functions for locating the example data files included in pvlib.

.. autosummary::
   :toctree: generated/

   tools.locate_example_dataset

If we do that, the function will need a complete docstring (parameter descriptions, etc).

pvlib/tests/test_tools.py Outdated Show resolved Hide resolved
pvlib/tools.py Outdated Show resolved Hide resolved
pvlib/tools.py Outdated Show resolved Hide resolved
docs/examples/soiling/plot_greensboro_kimber_soiling.py Outdated Show resolved Hide resolved
echedey-ls and others added 6 commits June 8, 2023 23:24
Co-Authored-By: Kevin Anderson <57452607+kandersolar@users.noreply.github.com>
Co-Authored-By: Kevin Anderson <57452607+kandersolar@users.noreply.github.com>
Co-Authored-By: Kevin Anderson <57452607+kandersolar@users.noreply.github.com>
Co-Authored-By: Kevin Anderson <57452607+kandersolar@users.noreply.github.com>
Co-Authored-By: Kevin Anderson <57452607+kandersolar@users.noreply.github.com>
@echedey-ls
Copy link
Contributor Author

echedey-ls commented Jun 9, 2023

Here is the documentation link [UPDATED].

@echedey-ls echedey-ls changed the title Add dataset function Add function to get dataset paths Jun 9, 2023
@echedey-ls echedey-ls changed the title Add function to get dataset paths Add function to retrieve example dataset paths Jun 9, 2023
@kandersolar kandersolar added this to the 0.10.0 milestone Jun 9, 2023
@kandersolar kandersolar mentioned this pull request Jun 9, 2023
@echedey-ls
Copy link
Contributor Author

echedey-ls commented Jun 10, 2023

This should be ok now.
Please have a look at the docs. I've created a table to leverage work on the other issue. Maybe it's better to position it on another paragraph.

Edit: oh, and I ended up adding the parameter, since it won't really impact the use, someone more experienced may be able to organize the data folder.

Copy link
Member

@kandersolar kandersolar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! @echedey-ls can you resolve the merge conflict? After that I think this will be ready to go.

Maybe @wholmgren would like to review as well.

Copy link
Member

@wholmgren wholmgren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned about using 1 public function to do three things:

  1. load resources for pvlib code
  2. load resources for test code
  3. load resources for documentation code

The function name get_test_dataset_path implies that it's only for tests, yet it's used for all 3 of those items. Seems ripe for problems down the road.

I also don't think we need this, at least not until we clean up our data directories and make it clear what is public and what is not. I think the "preferably one way to do it" is now to use importlib.resources.files. Example below

In [3]: import importlib.resources

In [6]: importlib.resources.files('pvlib') / 'data'
Out[6]: WindowsPath('C:/Users/WILHOL/Miniconda3/envs/pvdeg/lib/site-packages/pvlib/data')

In [7]: tmy3_filepath = importlib.resources.files('pvlib') / 'data' / '723170TYA.CSV'

In [8]: tmy3_filepath.exists()
Out[8]: True

@@ -189,8 +188,7 @@ def lookup_linke_turbidity(time, latitude, longitude, filepath=None,
# 1st column: 179.9583 W, 2nd column: 179.875 W

if filepath is None:
pvlib_path = os.path.dirname(os.path.abspath(__file__))
filepath = os.path.join(pvlib_path, 'data', 'LinkeTurbidities.h5')
filepath = get_test_dataset_path('LinkeTurbidities.h5')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is definitely not test data

@kandersolar
Copy link
Member

Good to know about importlib.resources.files! It does require python 3.9 though.

The function name get_test_dataset_path implies that it's only for tests, yet it's used for all 3 of those items.

I suggested including test in the function name in an attempt to prevent confusing it for something like an iotools.get_ function; the thought was to try to make it clear that it's limited to only a select set of special files, not a general purpose data tool. But if we're looking to draw a line between test (as in pvlib.tests) and other types of data then I can see that it could be confusing.

But I don't see why one function for three types of data files might cause problems later on. If all the function does is figure_out_where_pvlib_is() + path_parameter, what's it matter whether the file is LinkeTurbidities.h5 or a testing file?

@wholmgren
Copy link
Member

There's an importlib-resources backport for older python. We already use this approach for importlib-metadata for python < 3.8. This realpython example might be helpful too.

I think your original list of pros to a function are still valid. There the focus is on a function for example code and maybe user onboarding that separates public resources from package structure. If the user-facing function only does that then it might be a local win. On the other hand, it can also be good to propagate standard python patterns in example code so it's less clear that it's a global win. I think the internal code should keep it simple and stick to importlib-resources.

@echedey-ls
Copy link
Contributor Author

I understand the issue. We can limit the scope of this function until #1056 is addressed.
I agree with @wholmgren criteria on that this PR should only satisfy user needs. I will revert unrelated changes.

I suggested including test in the function name in an attempt to prevent confusing it for something like an iotools.get_ function

@kandersolar what is your opinion on renaming it to get_example_dataset_path (or get_pvlib_example_dataset_path, get_bundled_dataset_path, maybe consider other synonym of 'bundled')?

I'd like to bring attention to two points:

  • Importlib imports the whole module with importlib.resources.files('pvlib'). I don't know how much overhead that introduces compared to using __path__[0] or __file__, but I expect it to be a little higher. Not much, but just to consider the micro optimization.
  • Tests are being shipped with pvlib when it's installed. Is that on purpose?

echedey-ls and others added 6 commits June 18, 2023 19:35
Co-Authored-By: Kevin Anderson <57452607+kandersolar@users.noreply.github.com>
Co-Authored-By: Will Holmgren <william.holmgren@gmail.com>
@echedey-ls
Copy link
Contributor Author

This should be fine to go. Renamed to get_example_dataset_path and use cases are only in the examples.

@kandersolar
Copy link
Member

Importlib imports the whole module with importlib.resources.files('pvlib'). I don't know how much overhead that introduces compared to using __path__[0] or __file__, but I expect it to be a little higher. Not much, but just to consider the micro optimization.

It seems to use the same sys.modules cache as import does, so as long as we are assuming pvlib is already imported, or will be imported later, then I don't think there is any significant penalty.

Tests are being shipped with pvlib when it's installed. Is that on purpose?

Yes, see #473. There is a non-pvlib discussion here on whether including tests in sdists and wheels is a good idea in general (tl;dr opinions vary).

@echedey-ls
Copy link
Contributor Author

That's good to know @kandersolar !

@kandersolar kandersolar modified the milestones: 0.10.0, 0.10.1 Jun 28, 2023
@kandersolar kandersolar modified the milestones: 0.10.1, v0.10.2 Jul 6, 2023
@echedey-ls echedey-ls mentioned this pull request Aug 3, 2023
20 tasks
@kandersolar kandersolar modified the milestones: v0.10.2, v0.10.3 Sep 21, 2023
@kandersolar kandersolar modified the milestones: v0.10.3, v0.10.4 Dec 20, 2023
@kandersolar kandersolar modified the milestones: v0.10.4, Someday Mar 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add function to retrieve example datasets
4 participants