-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: General Madrigal instrument #67
Conversation
Updated the JRO docstring and file header comments.
Removed the old Madrigal template, which was actually a broken implementation of a general instrument.
Updated the JRO ISR download docstring, removing the unnecessary "+" formatting suggestion.
Added a function to specify Madrigal instrument codes, grouping them by pandas-compatible or xarray-compatible.
Updated the general unit tests by: - adding a new unit test for the new known Madrigal instrument code function, and - updated tests for `_check_madrigal_params` to capture new, better evaluation.
Added a general instrument sub-module for time-series Madrigal data.
Updated the instrument init by: - breaking out the imports to different lines and - adding the new instrument.
Updated the changelog with a summary of the enhancements so far.
Added a function to return the known madrigal file formats and test these formats for pysat parsing compatibility. Also updated docstrings and local comments.
Updated the GNSS TEC, JRO ISR, and pandas general instruments to use the new general file format function. Also updated the pandas general list methods to handle kindat and year formatting. Improved the pandas general instrument by only including Madrigal instrument codes with parsable file formats and adding download test support for one of the possible instruments.
Added unit tests for `known_madrigal_inst_codes` and updated the version evaluations to use the packaging module.
Added the packaging module as a dependency. It was already required by other modules, so is not adding additional overhead.
Added a description of the general pandas instrument to the docs.
Updated the changelog to include the new improvements and cycled the year for the next release.
Fixed test method names that weren't updated when copying from an older test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @aburrell! When following your example the software downloads data for Jan 1, 2015. I tried to sort out error location but so far it looks like it is coming from Madrigal (?) 🤷
Long list of filename formats you added here!
# If the kindat (madrigal tag) is not known, advise user | ||
self.kindat = kindat | ||
if self.kindat == '': | ||
logger.warning('`inst_id` did not supply KINDAT, all will be returned.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I registered pysatMadrigal modules I got warnings from this line.
In [4]: pysat.utils.registry.register_by_module(pysatMadrigal.instruments)
pysat WARNING: `inst_id` did not supply KINDAT, all will be returned.
pysat WARNING: `inst_id` did not supply KINDAT, all will be returned.
pysat WARNING: `inst_id` did not supply KINDAT, all will be returned.
pysat WARNING: `inst_id` did not supply KINDAT, all will be returned.
pysat WARNING: `inst_id` did not supply KINDAT, all will be returned.
pysat WARNING: `inst_id` did not supply KINDAT, all will be returned.
pysat WARNING: `inst_id` did not supply KINDAT, all will be returned.
pysat WARNING: `inst_id` did not supply KINDAT, all will be returned.
pysat WARNING: `inst_id` did not supply KINDAT, all will be returned.
I haven't traced the source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That warning is unavoidable for the general instrument, but important to raise. Basically, it means you could potentially be mixing data sets that should be in separate Instruments.
# Warn if file format has multiple '*' wildcards | ||
num_wc = len(fstr.split("*")) | ||
if num_wc >= 3: | ||
msg = "".join(["file format string has multiple '*' ", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also got this,
In [1]: import pysat
import
In [2]: import pysatMadrigal as py_mad
pysat WARNING: file format string has multiple '*' wildcards, may not be parsable by pysat
pysat WARNING: file format string has '*' between formatting constraints, may not be parsable by pysat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that one is confusing. No idea why that one appears on module import.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this comes from the Jicamarca init. I think I can fix this by adding a "verbose" flag to the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be fixed now.
Data does load for Jan 1,
|
Was super not happy to need to do that, but couldn't find another solution. |
Makes sense. Thanks for the bummer repetitive work! |
Updated the general madrigal instrument example.
Updated the test day to be different, and hopefully work.
Added a verbosity flag to the general Madrigal file format function, implemented this flag in the instruments that use the function, and added unit tests for it's success.
Set the general pandas instrument download tests to True for all tags.
Expanded ranges of values that indicate all kindat options should be considered.
Allows Madrigal to store HDF5 files using either the 'h5' or 'hdf5' extensions. Also changed the date for the testing to comply with the file available for tag 7800.
Use presence of '*' to choose the correct file parsing method.
Updated test dates for current known potential instruments.
These tests are passing locally, but require the file format parsing changes available on the 'develop' branch of pysat. Because of this, I am adding a version cap for pysat and putting this merge on hold until the next pysat release. |
Fixed the logic for evaluating the presence or absence of a wildcard.
Examine file format to provide delimiter, if needed.
Add a pysat version cap for this pull request.
Removed try/except catch that is now handled in the general list_remote_files function.
@rstoneback pinging for a re-review. Looks like the tests are probably going to pass given how long they've taken. |
xarray released a new version https://twitter.com/xarray_dev/status/1550522574672523266 , on Friday, which breaks pysat tests and functionality. |
Bumped the lowest supported pysat version due to an xarray issue.
The madrigal_pandas download isn't working. Getting an internal server error. Rerunning the tests didn't help. |
Hmm... was able to do a download in ipython. I'm going to try re-running just one job and see if it's perhaps the parallel requests that's an issue. |
Well, that didn't work. @rstoneback , what happens if you run the test suite locally? |
Getting at least 1 failure locally so far. inst_dict26. When tests finish I'll decode that. |
In total there were 29 test failures locally. madrigal_pandas was the Instrument with the download error. Some 500 Internal Server Errors and a variety of error message check issues. |
I unfortunately did not run these tests in my normal Python environment. Rerunning in the correct one. |
In the correct environment I get only 2 errors, 1 for download and 1 for the remote_file_list, both for madrigal_pandas or inst_dict26. |
Can you see what the tag is? |
8105 |
tag_dates = {'120': dt.datetime(1963, 11, 27), '170': dt.datetime(1998, 7, 1), | ||
'180': dt.datetime(2000, 1, 1), '210': dt.datetime(1950, 1, 1), | ||
'211': dt.datetime(1978, 1, 1), '212': dt.datetime(1957, 1, 1), | ||
'7800': dt.datetime(2009, 11, 10), '8105': dt.datetime(2017, 9, 1)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'7800': dt.datetime(2009, 11, 10), '8105': dt.datetime(2017, 9, 1)} | |
'7800': dt.datetime(2009, 11, 10), '8105': dt.datetime(2017, 9, 7)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is data for Sept 7-11 on Madrigal. I can download data for Sept 7th, but to then load that data I have to load for Sept 1. The filename is van_allen_2017_09.001.hdf5
which looks like it only has year and month in the filename.....
In [19]: inst.download(
...: dt.datetime(2017, 9, 7), user=Redacted, password=Redacted
...: )
In [20]: inst.load(date=dt.datetime(2017, 9, 7))
<ipython-input-20-18f7eb017f3c>:1: DeprecationWarning: Meta now contains a class for global metadata (MetaHeader). Default attachment of global attributes to Instrument will be Deprecated in pysat 3.2.0+. Set `use_header=True` in this load call or on Instrument instantiation to remove this warning.
inst.load(date=dt.datetime(2017, 9, 7))
In [21]: inst.data
Out[21]:
Empty DataFrame
Columns: []
Index: []
In [22]: inst.files.files
Out[22]:
2017-09-01 van_allen_2017_09.001.hdf5
dtype: object
In [23]: inst.load(date=dt.datetime(2017, 9, 1))
pysat WARNING: The generalized Madrigal data Instrument can't support instrument-specific cleaning.
<ipython-input-23-146bd78005dd>:1: DeprecationWarning: Meta now contains a class for global metadata (MetaHeader). Default attachment of global attributes to Instrument will be Deprecated in pysat 3.2.0+. Set `use_header=True` in this load call or on Instrument instantiation to remove this warning.
inst.load(date=dt.datetime(2017, 9, 1))
In [24]: inst.data
Out[24]:
year month day hour min sec recno kindat kinst ut1_unix ut2_unix ch_energy lpeak_orb_el_flux mid_grad_el_flux plasmapause
2017-09-07 00:02:00 2017 9 7 0 2 0 0 10305 8105 1.504743e+09 1.504743e+09 2600000.0 11.563559 NaN NaN
2017-09-07 04:54:00 2017 9 7 4 54 0 1 10305 8105 1.504760e+09 1.504760e+09 2600000.0 11.241525 NaN NaN
2017-09-07 05:54:00 2017 9 7 5 54 0 2 10305 8105 1.504764e+09 1.504764e+09 2600000.0 NaN NaN 4.690049
2017-09-07 09:05:00 2017 9 7 9 5 0 3 10305 8105 1.504775e+09 1.504775e+09 2600000.0 10.631356 NaN NaN
2017-09-07 09:44:00 2017 9 7 9 44 0 4 10305 8105 1.504777e+09 1.504777e+09 2600000.0 NaN 4.037392 NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2017-09-11 15:40:00 2017 9 11 15 40 0 70 10305 8105 1.505144e+09 1.505144e+09 2600000.0 NaN NaN 3.500927
2017-09-11 16:33:00 2017 9 11 16 33 0 71 10305 8105 1.505148e+09 1.505148e+09 2600000.0 12.190678 NaN NaN
2017-09-11 18:05:00 2017 9 11 18 5 0 72 10305 8105 1.505153e+09 1.505153e+09 2600000.0 NaN NaN 3.775340
2017-09-11 19:30:00 2017 9 11 19 30 0 73 10305 8105 1.505158e+09 1.505158e+09 2600000.0 NaN 2.992274 NaN
2017-09-11 20:02:00 2017 9 11 20 2 0 74 10305 8105 1.505160e+09 1.505160e+09 2600000.0 12.258475 NaN NaN
[75 rows x 15 columns]
In [25]:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it'd need a special download routine to adjust the filename. I guess we should remove this from the general instrument 🐕🦺
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah.... there are also only 6 days total of data on the site.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not our highest priority 🐦
Removed the Van Allen probe data because the file handling requires additional support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
Description
Partially addresses #1 by adding general support for pandas-supportable madrigal data sets.
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Added new unit tests and examples:
Test Configuration
Checklist:
develop
(notmain
) branchCHANGELOG.md
, summarizing the changesIf this is a release PR, replace the first item of the above checklist with the
release checklist on the pysat wiki:
https://github.com/pysat/pysat/wiki/Checklist-for-Release