Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: General Madrigal instrument #67

Merged
merged 34 commits into from
Aug 3, 2022
Merged

ENH: General Madrigal instrument #67

merged 34 commits into from
Aug 3, 2022

Conversation

aburrell
Copy link
Member

@aburrell aburrell commented Feb 28, 2022

Description

Partially addresses #1 by adding general support for pandas-supportable madrigal data sets.

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality or documentation)
  • This change requires a documentation update

How Has This Been Tested?

Added new unit tests and examples:

import pysat
import pysatMadrigal as py_mad

# Download DMSP data from Madrigal
dmsp_abi = pysat.Instrument(inst_module=py_mad.instruments.madrigal_pandas,
                            tag='180', kindat='17110')
dmsp_abi.download(dt.datetime(2015, 12, 30), dt.datetime(2015, 12, 31),
                  user='Firstname+Lastname', password='email@address.com')
dmsp_abi.load(date=dt.datetime(2015, 12, 30))

Test Configuration

  • Operating system: OS X Big Sur
  • Version number: Python 3.8 and 3.9
  • Any details about your local setup that are relevant: develop branch of pysat

Checklist:

  • Make sure you are merging into the develop (not main) branch
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have linted the files updated in this pull request
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • Add a note to CHANGELOG.md, summarizing the changes

If this is a release PR, replace the first item of the above checklist with the
release checklist on the pysat wiki:
https://github.com/pysat/pysat/wiki/Checklist-for-Release

aburrell and others added 16 commits January 6, 2022 10:44
Updated the JRO docstring and file header comments.
Removed the old Madrigal template, which was actually a broken implementation of a general instrument.
Updated the JRO ISR download docstring, removing the unnecessary "+" formatting suggestion.
Added a function to specify Madrigal instrument codes, grouping them by pandas-compatible or xarray-compatible.
Updated the general unit tests by:
- adding a new unit test for the new known Madrigal instrument code function, and
- updated tests for `_check_madrigal_params` to capture new, better evaluation.
Added a general instrument sub-module for time-series Madrigal data.
Updated the instrument init by:
- breaking out the imports to different lines and
- adding the new instrument.
Updated the changelog with a summary of the enhancements so far.
Added a function to return the known madrigal file formats and test these formats for pysat parsing compatibility.  Also updated docstrings and local comments.
Updated the GNSS TEC, JRO ISR, and pandas general instruments to use the new general file format function.  Also updated the pandas general list methods to handle kindat and year formatting.  Improved the pandas general instrument by only including Madrigal instrument codes with parsable file formats and adding download test support for one of the possible instruments.
Added unit tests for `known_madrigal_inst_codes` and updated the version evaluations to use the packaging module.
Added the packaging module as a dependency.  It was already required by other modules, so is not adding additional overhead.
Added a description of the general pandas instrument to the docs.
Updated the changelog to include the new improvements and cycled the year for the next release.
Fixed test method names that weren't updated when copying from an older test.
Copy link
Collaborator

@rstoneback rstoneback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aburrell! When following your example the software downloads data for Jan 1, 2015. I tried to sort out error location but so far it looks like it is coming from Madrigal (?) 🤷

Long list of filename formats you added here!

# If the kindat (madrigal tag) is not known, advise user
self.kindat = kindat
if self.kindat == '':
logger.warning('`inst_id` did not supply KINDAT, all will be returned.')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I registered pysatMadrigal modules I got warnings from this line.

In [4]: pysat.utils.registry.register_by_module(pysatMadrigal.instruments)
pysat WARNING: `inst_id` did not supply KINDAT, all will be returned.
pysat WARNING: `inst_id` did not supply KINDAT, all will be returned.
pysat WARNING: `inst_id` did not supply KINDAT, all will be returned.
pysat WARNING: `inst_id` did not supply KINDAT, all will be returned.
pysat WARNING: `inst_id` did not supply KINDAT, all will be returned.
pysat WARNING: `inst_id` did not supply KINDAT, all will be returned.
pysat WARNING: `inst_id` did not supply KINDAT, all will be returned.
pysat WARNING: `inst_id` did not supply KINDAT, all will be returned.
pysat WARNING: `inst_id` did not supply KINDAT, all will be returned.

I haven't traced the source.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That warning is unavoidable for the general instrument, but important to raise. Basically, it means you could potentially be mixing data sets that should be in separate Instruments.

# Warn if file format has multiple '*' wildcards
num_wc = len(fstr.split("*"))
if num_wc >= 3:
msg = "".join(["file format string has multiple '*' ",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also got this,

In [1]: import pysat
import 
In [2]: import pysatMadrigal as py_mad
pysat WARNING: file format string has multiple '*' wildcards, may not be parsable by pysat
pysat WARNING: file format string has '*' between formatting constraints, may not be parsable by pysat

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that one is confusing. No idea why that one appears on module import.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this comes from the Jicamarca init. I think I can fix this by adding a "verbose" flag to the function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fixed now.

pysatMadrigal/instruments/methods/general.py Show resolved Hide resolved
@rstoneback
Copy link
Collaborator

Data does load for Jan 1,

In [5]: dmsp_abi.load(2015, 1)
pysat WARNING: The generalized Madrigal data Instrument can't support instrument-specific cleaning.
<ipython-input-5-065c33fa990e>:1: DeprecationWarning: Meta now contains a class for global metadata (MetaHeader). Default attachment of global attributes to Instrument will be Deprecated in pysat 3.2.0+. Set `use_header=True` to remove this warning.
  dmsp_abi.load(2015, 1)

In [6]: dmsp_abi.data
Out[6]: 
                     year  month  day  hour  min  sec  recno  kindat  kinst      ut1_unix      ut2_unix  epowf        mlt  gdlat   glon  cgm_lat  cgm_long  eqb_qc_fl  eqb_prv_kp  eqb_prb_kp  sat_id   epowq
2015-01-01 00:00:53  2015      1    1     0    0   53      0   17110    180  1.420070e+09  1.420070e+09   62.9   4.070833  -50.7   80.4    -62.9     135.0        2.0        2.24       0.907      17  2171.0
2015-01-01 00:13:05  2015      1    1     0   13    5      1   17110    180  1.420071e+09  1.420071e+09   64.1  20.991389  -80.2  -27.0    -67.9      26.2        1.0        1.58       0.972      17  2171.0
2015-01-01 00:14:41  2015      1    1     0   14   41      2   17110    180  1.420071e+09  1.420071e+09   63.6   6.001944  -51.2  100.3    -65.8     161.0        2.0        1.87       0.928      18  2181.0
2015-01-01 00:27:21  2015      1    1     0   27   21      3   17110    180  1.420072e+09  1.420072e+09   64.6  21.463333  -79.2  -18.2    -67.6      30.3        1.0        1.28       0.992      18  2181.0
2015-01-01 00:35:49  2015      1    1     0   35   49      4   17110    180  1.420073e+09  1.420073e+09   63.0   4.343611  -51.3   78.2    -63.0     131.6        2.0        2.21       0.907      19  2191.0
.

@aburrell
Copy link
Member Author

aburrell commented Mar 1, 2022

Long list of filename formats you added here!

Was super not happy to need to do that, but couldn't find another solution.

@rstoneback
Copy link
Collaborator

Long list of filename formats you added here!

Was super not happy to need to do that, but couldn't find another solution.

Makes sense. Thanks for the bummer repetitive work!

Updated the general madrigal instrument example.
Updated the test day to be different, and hopefully work.
@aburrell aburrell added this to the 0.1.0 Release milestone Mar 17, 2022
@aburrell aburrell added the enhancement New feature or request label Mar 17, 2022
Added a verbosity flag to the general Madrigal file format function, implemented this flag in the instruments that use the function, and added unit tests for it's success.
Set the general pandas instrument download tests to True for all tags.
Expanded ranges of values that indicate all kindat options should be considered.
Allows Madrigal to store HDF5 files using either the 'h5' or 'hdf5' extensions.  Also changed the date for the testing to comply with the file available for tag 7800.
Use presence of '*' to choose the correct file parsing method.
Updated test dates for current known potential instruments.
@aburrell
Copy link
Member Author

These tests are passing locally, but require the file format parsing changes available on the 'develop' branch of pysat. Because of this, I am adding a version cap for pysat and putting this merge on hold until the next pysat release.

aburrell and others added 6 commits April 13, 2022 11:26
Fixed the logic for evaluating the presence or absence of a wildcard.
Examine file format to provide delimiter, if needed.
Add a pysat version cap for this pull request.
Removed try/except catch that is now handled in the general list_remote_files function.
@aburrell
Copy link
Member Author

@rstoneback pinging for a re-review. Looks like the tests are probably going to pass given how long they've taken.

@rstoneback
Copy link
Collaborator

xarray released a new version https://twitter.com/xarray_dev/status/1550522574672523266 , on Friday, which breaks pysat tests and functionality.

Bumped the lowest supported pysat version due to an xarray issue.
@rstoneback
Copy link
Collaborator

The madrigal_pandas download isn't working. Getting an internal server error. Rerunning the tests didn't help.

@aburrell
Copy link
Member Author

Hmm... was able to do a download in ipython. I'm going to try re-running just one job and see if it's perhaps the parallel requests that's an issue.

@aburrell
Copy link
Member Author

Well, that didn't work. @rstoneback , what happens if you run the test suite locally?

@rstoneback
Copy link
Collaborator

Getting at least 1 failure locally so far. inst_dict26. When tests finish I'll decode that.

@rstoneback
Copy link
Collaborator

In total there were 29 test failures locally. madrigal_pandas was the Instrument with the download error. Some 500 Internal Server Errors and a variety of error message check issues.

@rstoneback
Copy link
Collaborator

In total there were 29 test failures locally. madrigal_pandas was the Instrument with the download error. Some 500 Internal Server Errors and a variety of error message check issues.

I unfortunately did not run these tests in my normal Python environment. Rerunning in the correct one.

@rstoneback
Copy link
Collaborator

In the correct environment I get only 2 errors, 1 for download and 1 for the remote_file_list, both for madrigal_pandas or inst_dict26.

@aburrell
Copy link
Member Author

aburrell commented Aug 2, 2022

Can you see what the tag is?

@rstoneback
Copy link
Collaborator

Can you see what the tag is?

8105

tag_dates = {'120': dt.datetime(1963, 11, 27), '170': dt.datetime(1998, 7, 1),
'180': dt.datetime(2000, 1, 1), '210': dt.datetime(1950, 1, 1),
'211': dt.datetime(1978, 1, 1), '212': dt.datetime(1957, 1, 1),
'7800': dt.datetime(2009, 11, 10), '8105': dt.datetime(2017, 9, 1)}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'7800': dt.datetime(2009, 11, 10), '8105': dt.datetime(2017, 9, 1)}
'7800': dt.datetime(2009, 11, 10), '8105': dt.datetime(2017, 9, 7)}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is data for Sept 7-11 on Madrigal. I can download data for Sept 7th, but to then load that data I have to load for Sept 1. The filename is van_allen_2017_09.001.hdf5 which looks like it only has year and month in the filename.....

In [19]: inst.download(
    ...:     dt.datetime(2017, 9, 7), user=Redacted, password=Redacted
    ...: )

In [20]: inst.load(date=dt.datetime(2017, 9, 7))
<ipython-input-20-18f7eb017f3c>:1: DeprecationWarning: Meta now contains a class for global metadata (MetaHeader). Default attachment of global attributes to Instrument will be Deprecated in pysat 3.2.0+. Set `use_header=True` in this load call or on Instrument instantiation to remove this warning.
  inst.load(date=dt.datetime(2017, 9, 7))

In [21]: inst.data
Out[21]: 
Empty DataFrame
Columns: []
Index: []

In [22]: inst.files.files
Out[22]: 
2017-09-01    van_allen_2017_09.001.hdf5
dtype: object

In [23]: inst.load(date=dt.datetime(2017, 9, 1))
pysat WARNING: The generalized Madrigal data Instrument can't support instrument-specific cleaning.
<ipython-input-23-146bd78005dd>:1: DeprecationWarning: Meta now contains a class for global metadata (MetaHeader). Default attachment of global attributes to Instrument will be Deprecated in pysat 3.2.0+. Set `use_header=True` in this load call or on Instrument instantiation to remove this warning.
  inst.load(date=dt.datetime(2017, 9, 1))

In [24]: inst.data
Out[24]: 
                     year  month  day  hour  min  sec  recno  kindat  kinst      ut1_unix      ut2_unix  ch_energy  lpeak_orb_el_flux  mid_grad_el_flux  plasmapause
2017-09-07 00:02:00  2017      9    7     0    2    0      0   10305   8105  1.504743e+09  1.504743e+09  2600000.0          11.563559               NaN          NaN
2017-09-07 04:54:00  2017      9    7     4   54    0      1   10305   8105  1.504760e+09  1.504760e+09  2600000.0          11.241525               NaN          NaN
2017-09-07 05:54:00  2017      9    7     5   54    0      2   10305   8105  1.504764e+09  1.504764e+09  2600000.0                NaN               NaN     4.690049
2017-09-07 09:05:00  2017      9    7     9    5    0      3   10305   8105  1.504775e+09  1.504775e+09  2600000.0          10.631356               NaN          NaN
2017-09-07 09:44:00  2017      9    7     9   44    0      4   10305   8105  1.504777e+09  1.504777e+09  2600000.0                NaN          4.037392          NaN
...                   ...    ...  ...   ...  ...  ...    ...     ...    ...           ...           ...        ...                ...               ...          ...
2017-09-11 15:40:00  2017      9   11    15   40    0     70   10305   8105  1.505144e+09  1.505144e+09  2600000.0                NaN               NaN     3.500927
2017-09-11 16:33:00  2017      9   11    16   33    0     71   10305   8105  1.505148e+09  1.505148e+09  2600000.0          12.190678               NaN          NaN
2017-09-11 18:05:00  2017      9   11    18    5    0     72   10305   8105  1.505153e+09  1.505153e+09  2600000.0                NaN               NaN     3.775340
2017-09-11 19:30:00  2017      9   11    19   30    0     73   10305   8105  1.505158e+09  1.505158e+09  2600000.0                NaN          2.992274          NaN
2017-09-11 20:02:00  2017      9   11    20    2    0     74   10305   8105  1.505160e+09  1.505160e+09  2600000.0          12.258475               NaN          NaN

[75 rows x 15 columns]

In [25]: 

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it'd need a special download routine to adjust the filename. I guess we should remove this from the general instrument 🐕‍🦺

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah.... there are also only 6 days total of data on the site.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not our highest priority 🐦

Removed the Van Allen probe data because the file handling requires additional support.
Copy link
Collaborator

@rstoneback rstoneback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@aburrell aburrell merged commit 25001ad into develop Aug 3, 2022
@aburrell aburrell deleted the general_madrigal_inst branch August 3, 2022 12:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants