Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Improve file parsing #982

Merged
merged 55 commits into from
Feb 25, 2022
Merged

ENH: Improve file parsing #982

merged 55 commits into from
Feb 25, 2022

Conversation

rstoneback
Copy link
Collaborator

@rstoneback rstoneback commented Jan 27, 2022

Description

Addresses #763, #762

  • Added support for user supplied variables in file template strings, not just those used for dates.
  • Improved support for delimited filenames. More than one variable can be within a single delimited section. Delimited sections are not required to have a variable.
  • Updated docstring in from_os as well as supporting methods.
  • Added tests for delimited filenames, user variables, and wildcard use.
  • Condensed existing tests
  • Output from process_parsed_filenames is now always sorted

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide
instructions so we can reproduce. Please also list any relevant details for
your test configuration

  • Verified parsing delimited filenames now works on COSMIC. COSMIC filenames violate a strict interpretation of delimited information. Two examples using the fixed_width parser, two examples using the delimited parser.
pysat.Files.from_os(data_path, '{year2:04d}.{day2:03d}/ionPrf_C{sat_id:03d}.{year:04d}.{day:03d}.{hour:02d}.{minute:02d}.???_????.????_nc')
pysat.Files.from_os(data_path, '{year2:04d}.{day2:03d}/ionPrf_C{sat_id:03d}.{year:04d}.{day:03d}.{hour:02d}.{minute:02d}.G{code:02d}_{year3:04d}.{radio:04d}_nc')
pysat.Files.from_os(data_path, '{year2:04d}.{day2:03d}/ionPrf_C{sat_id:03d}.{year:04d}.{day:03d}.{hour:02d}.{minute:02d}.{code:8}.{radio:04d}_nc', delimiter='.')
pysat.Files.from_os(data_path, '{year2:04d}.{day2:03d}/ionPrf_C{sat_id:03d}.{year:04d}.{day:03d}.{hour:02d}.{minute:02d}.*.{radio:04d}_nc', delimiter='.')

image

  • Added unit tests

Test Configuration:

  • Operating system: MacOS
  • Version number: Python 3.9.7

Checklist:

  • Make sure you are merging into the develop (not main) branch
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • Add a note to CHANGELOG.md, summarizing the changes

If this is a release PR, replace the first item of the above checklist with the release
checklist on the wiki: https://github.com/pysat/pysat/wiki/Checklist-for-Release

@rstoneback
Copy link
Collaborator Author

Tests are failing due to some Meta thing which I believe @jklenzing has already identified. The pull request forms are passing.

@rstoneback rstoneback marked this pull request as ready for review January 28, 2022 00:24
@aburrell
Copy link
Member

The figure you use in the example doesn't show the entire command you use for testing. Can you include that as a code snippet?

Copy link
Member

@aburrell aburrell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't tried things out yet, but looked over the code.

CHANGELOG.md Outdated Show resolved Hide resolved
pysat/_files.py Outdated Show resolved Hide resolved
pysat/utils/files.py Outdated Show resolved Hide resolved
pysat/utils/files.py Outdated Show resolved Hide resolved
pysat/utils/files.py Outdated Show resolved Hide resolved
pysat/utils/files.py Outdated Show resolved Hide resolved
pysat/utils/files.py Outdated Show resolved Hide resolved
rstoneback and others added 2 commits January 28, 2022 10:33
Co-authored-by: Angeline Burrell <aburrell@users.noreply.github.com>
pysat/_files.py Outdated Show resolved Hide resolved
pysat/utils/files.py Outdated Show resolved Hide resolved
pysat/utils/files.py Outdated Show resolved Hide resolved
pysat/utils/files.py Outdated Show resolved Hide resolved
pysat/utils/files.py Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
pysat/_files.py Outdated Show resolved Hide resolved
pysat/_files.py Outdated Show resolved Hide resolved
@aburrell
Copy link
Member

Ok, can confirm this does NOT work if you use the * in the middle of the specifiers. You can use a * on the front end, though.

pysat/utils/files.py Show resolved Hide resolved
pysat/utils/files.py Outdated Show resolved Hide resolved
pysat/utils/files.py Outdated Show resolved Hide resolved
Comment on lines 238 to 240
# engage the fixed width filename parser. If there is not a common
# delimiter then the fixed width parser is suggested though not always
# required. Given the range of standards compliance across the decades of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# engage the fixed width filename parser. If there is not a common
# delimiter then the fixed width parser is suggested though not always
# required. Given the range of standards compliance across the decades of
# engage the fixed width filename parser. Given the range of standards
# compliance across the decades of

And fix spacing. Recommend removing that sentence because from this level, you can't specify which parser is used. I was reading this today and it took me a moment before I realized I had no control over that aspect.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A developer can control which parser is used when starting here as a template. If delimiter=None or left unassigned then the fixed width parser is used.

I don't understand the fix spacing comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"If there is not a common delimiter then the fixed width parser is suggested though not always required" is not accurate in this setting because you can ONLY differentiate between the two by providing or not providing a delimiter.

"fix spacing" means that I don't know if the lines are now too long or short as a part of the whole comment block.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is correct that you can only provide a single given delimiter or use the fixed width parser. In practice though, the delimited parser will likely work on filenames that don't have even a single instance of the given delimiter. I think I even test for that.

"fix spacing" means that I don't know if the lines are now too long or short as a part of the whole comment block.

Got it!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GitHub shows me that this hasn't been addressed? Which is confusing because this is marked as outdated.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes I think it gets confused. I updated the comment a bit to specifically note I am talking about modifying the function call from_os below the comment.

@rstoneback
Copy link
Collaborator Author

Ok, can confirm this does NOT work if you use the * in the middle of the specifiers. You can use a * on the front end, though.

You can use the '?' in the middle of specifiers though. Requires knowing ahead of time how many characters are needed.

@aburrell
Copy link
Member

Requires knowing ahead of time how many characters are needed.

Also requires that the number of characters is always the same. For both reasons, this is why I couldn't use it.

pysat/utils/files.py Outdated Show resolved Hide resolved
pysat/utils/files.py Show resolved Hide resolved
@rstoneback rstoneback merged commit 74f19f6 into develop Feb 25, 2022
@rstoneback rstoneback deleted the improve_file_parsing_1 branch February 25, 2022 21:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants