Add read_bsrn function #1145

AdamRJensen · 2021-01-24T18:52:14Z

Closes Add BSRN format reader to iotools #1015
I am familiar with the contributing guidelines
Tests added
Updates entries to docs/sphinx/source/api.rst for API changes.
Adds description and name entries in the appropriate "what's new" file in docs/sphinx/source/whatsnew for all changes. Includes link to the GitHub Issue with :issue:`num` or this Pull Request with :pull:`num`. Includes contributor name and/or GitHub username (link with :ghuser:`user`).
New code is fully documented. Includes numpydoc compliant docstrings, examples, and comments where necessary.
Pull request is nearly complete and ready for detailed review.
Maintainer: Appropriate GitHub Labels and Milestone are assigned to the Pull Request and linked Issue.

Related to issue #1015.

Simplified how the start and end line of the data is determined. Improved documentation, e.g. moved constants outside of function.

Add bsrn file and read_bsrn function

AdamRJensen · 2021-01-24T19:43:17Z

@wholmgren I've changed it such that it now commits to pvlib:master. Hope it's correct.

kandersolar · 2021-01-24T23:21:55Z

Hi @AdamRJensen, could you update this PR to resolve the code style issues raised by Stickler? It's easier to review PRs once stickler is satisfied. The style issues are listed here: https://github.com/pvlib/pvlib-python/pull/1145/files

Edit: also please add a new entry here for your function: https://github.com/pvlib/pvlib-python/blob/master/pvlib/iotools/__init__.py

Add bsrn to iotools

wholmgren · 2021-01-25T20:41:01Z

Thanks @AdamRJensen! We will need some tests. Do you want me to push a test to your branch or do you want to take it on yourself? It seemed like you might have been disinclined to add a test in your first PR, but I am happy to help since this will be a big help to me.

AdamRJensen · 2021-01-25T20:52:10Z

Hi @kanderso-nrel, thanks for the heads up!

I wasn't sure if PVlib is very strict on style issues, e.g. line length<79 characters? I have corrected all instances of this, but in some places, it makes the code a little harder to follow / less compact.

AdamRJensen · 2021-01-25T20:53:53Z

@wholmgren I am definitely up for attempting to make tests. Can you point me in a direction to get started?

wholmgren · 2021-01-25T20:59:21Z

Check out pvlib/tests/iotools/test_surfrad.py and pvlib/tests/iotools/test_surfrad.py pvlib/tests/iotools/test_solrad.py.

Add a test file to pvlib/data. Here's an example for the SURFRAD reader. You'll find similar for the other iotools.

Start by making your test as simple as possible. A single data file, no remote io capabilities. Probably ok to only assert the index values and the columns for now.

wholmgren · 2021-01-25T21:00:18Z

I wasn't sure if PVlib is very strict on style issues, e.g. line length<79 characters? I have corrected all instances of this, but in some places, it makes the code a little harder to follow / less compact.

If you think it makes the code less readable then it's acceptable to add # noqa: E501 to that line. Note pep8 also says two spaces before the #.

kandersolar · 2021-01-25T21:04:36Z

If the test file is a proper BSRN data file, we should make sure we're staying within their conditions of data release.

AdamRJensen · 2021-01-25T21:18:56Z

If the test file is a proper BSRN data file, we should make sure we're staying within their conditions of data release.

@kanderso-nrel and @wholmgren any opinion on if I should create a pseudo-file or ask Amelie from BSRN if we can upload a single file for test purposes?

pvlib/iotools/bsrn.py

cwhanse · 2021-01-25T22:28:04Z

@AdamRJensen I think we need to get explicit OK from the WRMC, because pvlib-python isn't a traditional "publication", but (in my view) qualifies as a "project for wide release." What matters, though, is WRMC's view of pvlib-python.

The way I read the Data Release Guidelines, assuming WRMC concurrence pvlib-python can include a file of BSRN data provided:

the file is clearly identified as originated from the WRMC (use the file name for this purpose if there isn't a free-form header)
users are provided with information on how to obtain the original data (function docstring does this, but we could also add a short paragraph to the LICENSE file about using the BSRN file).

wholmgren · 2021-01-25T22:50:56Z

My reading is that BSRN data downloaded directly from NASA websites is subject to NASA's Data and Information Policy. So I don't see any issue with including a test file from either of the GIM or LaRC sites. I'm not opposed to asking - and that may well be more polite in any case - just pointing out that there may be other options.

Air temperature was listed as air_temperature in the docstring instead of temp_air.

pvlib/iotools/bsrn.py

pvlib/tests/iotools/test_bsrn.py

wholmgren · 2021-02-10T04:03:07Z

pvlib/iotools/bsrn.py

+            if line.startswith('*'):  # Find start of all logical records
+                line_no_dict[line[2:6]] = num  # key is 4 digit LR number
+
+    # Determine start and end line of logical record LR0100 to be parsed
+    start_row = line_no_dict['0100'] + 1  # Start line number
+    # If LR0100 is the last logical record, then read rest of file
+    if start_row-1 == max(line_no_dict.values()):
+        end_row = num  # then parse rest of the file
+    else:  # otherwise parse until the beginning of the next logical record
+        end_row = min([i for i in line_no_dict.values() if i > start_row])
+    nrows = end_row-start_row
+
+    # Read file as a fixed width file (fwf)
+    data = pd.read_fwf(filename, skiprows=start_row, nrows=nrows, header=None,
+                       colspecs=COL_SPECS, na_values=[-999.0, -99.9])


It should be possible to avoid reading the file twice by accumulating the first read of just the '0100' section into an IO buffer, stopping the read at the end of the section, and then passing that buffer into pd.read_fwf.

I think this might result in meaningful performance improvements when reading many files. Or maybe we need to profile the code before we put the effort into it.

In any case, I'm ok saving this idea for future work.

@wholmgren The initial opening of the file and looping through all of the lines takes ~414 ms, whereas the pd.read_fwf takes 1.15 s. Tested in a Jupyter Notebook using the %%timeit command.

Other parts of the code take on the order of µs (except setting the datetime index 3 ms).

So there are potential savings, however, after searching for a while I still have no idea of how to write to a buffer line by line. I'm also very fine with saving this idea for future work though.

Thanks for putting in the effort to profile the code! The potential speed up seems modest so let's save it for future work.

Co-authored-by: Will Holmgren <william.holmgren@gmail.com>

kandersolar · 2021-02-10T15:14:58Z

A clue for some of the test failures: pandas-dev/pandas#37909

I get failures locally with b5ed6ee and pandas==1.2.2, but adding compression='infer' to the pd.read_fwf call gets them passing.

Also this line should be assert actual.index.equals(expected_index):

pvlib-python/pvlib/tests/iotools/test_bsrn.py

Line 33 in b5ed6ee

assert actual.index.equals()

wholmgren · 2021-02-10T18:19:18Z

@AdamRJensen please review changes on my bsrnfixes branch:

https://github.com/wholmgren/pvlib-python/blob/bsrnfixes/pvlib/iotools/bsrn.py
https://github.com/wholmgren/pvlib-python/blob/bsrnfixes/pvlib/tests/iotools/test_bsrn.py

The function didn't quite produce the expected results when running it against this NASA LRC file that only contains the 0100 measurements: https://cove.larc.nasa.gov/BSRN/LRC49/2021/lrc0121.dat

In the process of trying to fix it, I changed the line number convention to make it make more sense to me. Fine with me if you want to keep the original convention so long as it's fixed to work correctly for files with only 0100.

I also refactored the test to use both files. I didn't actually add the NASA LRC file to git due to the data license considerations above, so you'll need to download it and put it in the right place if you want that test to run correctly. Alternatively, we could make a second test file from the PAY file that only contains the 0100 record.

If you can, try adding my fork as another git remote, fetching my branch, and comparing against yours. Normally I would make a pull request to your fork, but for some reason github won't give me the option to select it as the base repository.

Finally, it appears the bsrn.py file uses windows line breaks. Can you change that to unix line breaks?

AdamRJensen · 2021-02-10T22:25:56Z

@kanderso-nrel Thank you very much for solving the issue regarding the compression. Cleared a lot of issues! There are still 3 issues left... any ideas on these?

@wholmgren I prefer keeping it consistent with Python starting the line numbers at zero, hence I have added start=2 to the enumerate function. Also, I attempted to change the file to Unix line ending.. but I have no way of checking if it is correct now.

kandersolar · 2021-02-10T22:51:06Z

Older pandas versions had inconsistent behavior for offset strings... there's a relevant pandas issue or PR about this but I couldn't quickly find it. Anyway I'd try T as the unit for the Timedelta because it means the same as minute in pandas lingo but might be supported by our minimum pandas version. If python36-min doesn't like that either then I'll try to find that pandas page I'm remembering and figure out what will work across all versions.

For the codecov failure, it's because not 100% of your lines of code get run by the test suite: https://github.com/pvlib/pvlib-python/pull/1145/files#annotation_969052144. If @wholmgren is okay with leaving the non-gzip branch untested then 96 can be left alone. I haven't read the code in-depth enough to know whether it would be easy to engineer a test that runs 112.

All logical records after LR0100 have been removed to reduce space (be below 25 MB), but also to test the functionality of files with few logical records.

wholmgren · 2021-02-10T23:12:38Z

the bsrn.py file in this zip should have the right line breaks: bsrn.py.zip

wholmgren

I can fix line breaks in a follow up PR.

@kanderso-nrel any other comments?

kandersolar

I didn't run any code myself, but docs LGTM with a minor typo.

pvlib/iotools/bsrn.py

Co-authored-by: Kevin Anderson <57452607+kanderso-nrel@users.noreply.github.com>

AdamRJensen · 2021-02-11T14:53:58Z

I'm unsure if there's anything else I need to do, but if that's the case please let me know. Thank you all for your patience and help!

wholmgren · 2021-02-11T15:42:16Z

Thanks @AdamRJensen for the new feature and for your patience during the review!

I expect it'll be a month or so before we release 0.9.0, but I'll make another pre-release by the end of next week.

AdamRJensen added 9 commits January 15, 2021 15:49

Add bsrn file to read bsrn files

73e6f9e

Related to issue #1015.

simplified read_bsrn function

31203b3

Simplified how the start and end line of the data is determined. Improved documentation, e.g. moved constants outside of function.

Simplified selection of rows in read_bsrn

7a437c3

Added read_bsrn to api.rst

93b92c8

Merge pull request #1 from AdamRJensen/AdamRJensen-patch-1

b0a6ddf

Add bsrn file and read_bsrn function

Delete 2021_01_16_read_bsrn_pull_request_v2.py

c6447a3

Improved format, e.g removed trailing white spaces

52a952a

Fixed spacing issues

870095a

Update v0.9.0.rst

444e979

AdamRJensen added 3 commits January 25, 2021 20:38

Add iotools.bsrn and import read_bsrn

f11288e

Merge pull request #2 from AdamRJensen/patch-3

93deb2f

Add bsrn to iotools

Split multiple lines to obey 75 character limit

2fc95ad

AdamRJensen changed the title ~~Add bsrn_read function~~ Add read_bsrn function Jan 25, 2021

AdamRJensen marked this pull request as ready for review January 25, 2021 20:15

AdamRJensen added 2 commits January 25, 2021 21:21

Corrected indentation

656bbda

Fixed indentation again

c10d75b

cwhanse reviewed Jan 25, 2021

View reviewed changes

pvlib/iotools/bsrn.py Outdated Show resolved Hide resolved

wholmgren added this to the 0.9.0 milestone Jan 26, 2021

wholmgren added the enhancement label Jan 26, 2021

AdamRJensen reopened this Feb 9, 2021

Fixed doc error

536f53c

Air temperature was listed as air_temperature in the docstring instead of temp_air.

wholmgren reviewed Feb 10, 2021

View reviewed changes

AdamRJensen and others added 6 commits February 10, 2021 09:24

Handle file start date explicitly

63dd3ac

Co-authored-by: Will Holmgren <william.holmgren@gmail.com>

Correct pytest fixture magic

6924183

Co-authored-by: Will Holmgren <william.holmgren@gmail.com>

Fix indentation broken by previous commit

5e5f9d5

Correct Dataframe to DataFrame in doc string

5cf3d30

Add offset to line num after explicitly handling start date

71895d1

Update test_bsrn.py

b5ed6ee

AdamRJensen added 3 commits February 10, 2021 22:58

Added compression='infer', fixed end line number issue

23d4525

Fixed test issue

8dae943

Changed timedelta unit from min to minute

23c3455

AdamRJensen added 4 commits February 11, 2021 00:02

Add files via upload

260b68a

All logical records after LR0100 have been removed to reduce space (be below 25 MB), but also to test the functionality of files with few logical records.

Changed to_timedelta unit from minute' to 'T'

6b52a72

Updated test to cover unzipped and zipped files

7c12848

Removed error causing blank line in test file

17206df

wholmgren approved these changes Feb 10, 2021

View reviewed changes

AdamRJensen added 2 commits February 11, 2021 00:27

Change to Unix end of line character from file by wholmgren

9466626

Remove extra line at end of file

4d3a21c

kandersolar approved these changes Feb 10, 2021

View reviewed changes

pvlib/iotools/bsrn.py Outdated Show resolved Hide resolved

Fix typo in bsrn.py doc string

4db55be

Co-authored-by: Kevin Anderson <57452607+kanderso-nrel@users.noreply.github.com>

wholmgren merged commit 59a8b6c into pvlib:master Feb 11, 2021

kandersolar mentioned this pull request Jul 22, 2021

Add retrieval function for BSRN data #1254

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add read_bsrn function #1145

Add read_bsrn function #1145

AdamRJensen commented Jan 24, 2021 •

edited

Loading

AdamRJensen commented Jan 24, 2021

kandersolar commented Jan 24, 2021 •

edited

Loading

wholmgren commented Jan 25, 2021

AdamRJensen commented Jan 25, 2021

AdamRJensen commented Jan 25, 2021

wholmgren commented Jan 25, 2021

wholmgren commented Jan 25, 2021 •

edited

Loading

kandersolar commented Jan 25, 2021

AdamRJensen commented Jan 25, 2021

cwhanse commented Jan 25, 2021

wholmgren commented Jan 25, 2021 •

edited

Loading

wholmgren Feb 10, 2021

AdamRJensen Feb 10, 2021

wholmgren Feb 10, 2021

kandersolar commented Feb 10, 2021 •

edited

Loading

wholmgren commented Feb 10, 2021

AdamRJensen commented Feb 10, 2021

kandersolar commented Feb 10, 2021

wholmgren commented Feb 10, 2021

wholmgren left a comment

kandersolar left a comment

AdamRJensen commented Feb 11, 2021

wholmgren commented Feb 11, 2021

Add read_bsrn function #1145

Add read_bsrn function #1145

Conversation

AdamRJensen commented Jan 24, 2021 • edited Loading

AdamRJensen commented Jan 24, 2021

kandersolar commented Jan 24, 2021 • edited Loading

wholmgren commented Jan 25, 2021

AdamRJensen commented Jan 25, 2021

AdamRJensen commented Jan 25, 2021

wholmgren commented Jan 25, 2021

wholmgren commented Jan 25, 2021 • edited Loading

kandersolar commented Jan 25, 2021

AdamRJensen commented Jan 25, 2021

cwhanse commented Jan 25, 2021

wholmgren commented Jan 25, 2021 • edited Loading

wholmgren Feb 10, 2021

Choose a reason for hiding this comment

AdamRJensen Feb 10, 2021

Choose a reason for hiding this comment

wholmgren Feb 10, 2021

Choose a reason for hiding this comment

kandersolar commented Feb 10, 2021 • edited Loading

wholmgren commented Feb 10, 2021

AdamRJensen commented Feb 10, 2021

kandersolar commented Feb 10, 2021

wholmgren commented Feb 10, 2021

wholmgren left a comment

Choose a reason for hiding this comment

kandersolar left a comment

Choose a reason for hiding this comment

AdamRJensen commented Feb 11, 2021

wholmgren commented Feb 11, 2021

AdamRJensen commented Jan 24, 2021 •

edited

Loading

kandersolar commented Jan 24, 2021 •

edited

Loading

wholmgren commented Jan 25, 2021 •

edited

Loading

wholmgren commented Jan 25, 2021 •

edited

Loading

kandersolar commented Feb 10, 2021 •

edited

Loading