Add function to retrieve example datasets #924

kandersolar · 2020-03-02T15:10:53Z

It would be nice to have a clean way for gallery examples to get access to the test data files. See #860 (comment)

For a function like load_dataset('greensboro-tmy'):

Pros:

No need to monkey around with filepaths, especially ones that aren't really meant to be public
Associating files with keys means we can move and rename test data files without it being a breaking change
Simplifies example code

Cons:

The data files are in several formats (csv, json, nc, h5 etc), so this function would either have to know the appropriate reading method for each file (complicated) or just return a file handle and let the user parse the contents (less useful).

The text was updated successfully, but these errors were encountered:

cwhanse · 2020-03-02T15:21:22Z

+1 to having the function. I think a useful scope is to return the full path to the file. Reading file content and providing output in various formats seems like a large bite to chew.

kandersolar · 2020-03-02T15:42:14Z

Any ideas for what the function should be named if it returns the file path? I'm having trouble coming up with a verb that doesn't seem misleading. Maybe just dataset()?

df = pd.read_csv(dataset('greensboro-tmy'))

with open(dataset('greensboro-tmy')) as f:

mikofski · 2020-03-02T16:49:50Z

Con: The data files are in several formats (csv, json, nc, h5 etc)

To me this a "Pro" for having a dedicated data reader, to make it easier for users, if it's a PITA for us, it will be a blocker for them I think

wholmgren · 2020-03-02T23:32:55Z

I would be happy if this

pvlib-python/docs/examples/plot_greensboro_kimber_soiling.py

Lines 39 to 43 in c9929e8

    
           # get full path to the data directory 
        
           DATA_DIR = pathlib.Path(pvlib.__file__).parent / 'data' 
        
           # get TMY3 data with rain 
        
           greensboro, _ = read_tmy3(DATA_DIR / '723170TYA.CSV', coerce_year=1990)

looked like

# get full path to the data file
file_path = dataset('greensboro-tmy')

# parse TMY3 data
greensboro, _ = read_tmy3(file_path, coerce_year=1990)

I don't think the broader scope is feasible. To be clear, this is just something for the tests/examples - not for anything else.

metpy has a get_test_data function with the same idea, but a different implementation because it uses a caching back end that I think we should avoid.

example: https://unidata.github.io/MetPy/latest/examples/XArray_Projections.html#sphx-glr-examples-xarray-projections-py

kandersolar · 2020-03-17T14:50:59Z

I just stumbled across some functions in pkg_resources that might be relevant to this issue: https://setuptools.readthedocs.io/en/latest/pkg_resources.html#resource-extraction

echedey-ls · 2023-03-13T23:56:47Z

I just stumbled upon this issue looking for ideas for GSoC.
@kanderso-nrel , just a quick note on that might-be-relevant package:
From my experience both with PVLIB and SciencePlots, sticking to __path__[0] (which is already in use in PVLIB) is more than enough, and I believe (or want to) it is exempt from errors. At least, in SciencePlots we haven't recorded any issues regarding that after the breaking changes in the v2.0.0 release.
About pkg_resources, it's use is deprecated in favor of other packages. See this attention directive at pkg_resources.

kandersolar mentioned this issue Mar 2, 2020

add Kimber soiling model #860

Merged

8 tasks

CameronTStark added the enhancement label Mar 6, 2020

CameronTStark added this to the 0.7.3 milestone Mar 6, 2020

wholmgren modified the milestones: 0.7.3, 0.8.0 Jul 17, 2020

wholmgren modified the milestones: 0.8.0, Someday Aug 28, 2020

kandersolar mentioned this issue Jan 21, 2022

Data shift algorithm pvlib/pvanalytics#124

Merged

6 tasks

echedey-ls mentioned this issue Jun 7, 2023

Add function to retrieve example dataset paths #1763

Closed

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add function to retrieve example datasets #924

Add function to retrieve example datasets #924

kandersolar commented Mar 2, 2020

cwhanse commented Mar 2, 2020

kandersolar commented Mar 2, 2020

mikofski commented Mar 2, 2020

wholmgren commented Mar 2, 2020

kandersolar commented Mar 17, 2020

echedey-ls commented Mar 13, 2023

Add function to retrieve example datasets #924

Add function to retrieve example datasets #924

Comments

kandersolar commented Mar 2, 2020

cwhanse commented Mar 2, 2020

kandersolar commented Mar 2, 2020

mikofski commented Mar 2, 2020

wholmgren commented Mar 2, 2020

kandersolar commented Mar 17, 2020

echedey-ls commented Mar 13, 2023