Keep metadata of sample datasets in the xarray objects #184

nshea3 · 2020-08-03T19:53:05Z

Fixes #125

Many global attributes are included in the netcdf files, however they are dropped in the fetch_* functions in sample_data.py with the float64 cast: data = xr.open_dataset(fname, engine="scipy").astype("float64"). This is a known issue in xarray (pydata/xarray#2049) and the current solution is to open the NetCDF file as-is, cast to the new datatype, and then copy the global attributes to the new object.

Also added tests to test_sample_data.py to check that the attrs attribute is not an empty dictionary.

Reminders:

Run make format and make check to make sure the code follows the style guide.
Add tests for new features or tests that would have caught the bug that you're fixing.
Add new public functions/methods/classes to doc/api/index.rst and the base __init__.py file for the package.
Write detailed docstrings for all functions/classes/methods. It often helps to design better code if you write the docstrings first.
If adding new functionality, add an example to the docstring, gallery, and/or tutorials.
Add your full name, affiliation, and ORCID (optional) to the AUTHORS.md file (if you haven't already) in case you'd like to be listed as an author on the Zenodo archive of the next release.

santisoler

Thanks @nshea3 for spotting the true problem and solving it!

I would make small changes on you code.
By defining data and cast_data we will be storing the same data twice: one in float64 and another one in the original type. So, what if we cache the attrs on a variable, change the datatype and then set the attrs equal to the cached dictionary. Something like:

    data = xr.open_dataset(fname, engine="scipy")
    attrs = data.attrs.copy()
    data = data.astype("float64")
    data.attrs = attrs

I think this should work just fine, but worth testing it of course.

Besides, I would also test if the attrs, besides not being empty, has some important attributes, like refsysname.

What do you think?

santisoler · 2020-08-06T18:57:12Z

Since we are about to move all datasets to Rockhound (see #179), we should remember to apply the same changes we perform here on fatiando/rockhound#84.

nshea3 · 2020-08-07T19:33:28Z

Excellent point thank you @santisoler I will make that change so we avoid duplicating the xarray.

I'll add a test for the refsysname attribute as well. Are there any others I should include?

santisoler · 2020-08-10T15:09:43Z

I'll add a test for the refsysname attribute as well. Are there any others I should include?

Maybe we can add those attributes of which the data depends on.
For example, the refsysname is crucial because it sets the reference frame (and which normal gravity was removed if we are requesting for gravity anomalies or disturbances).
I would also test the modelname, the max_used_degree and probably the tide_system.
Another things we might want to test if the min and maximum values agree with the ones that are hardcoded in the tests/sample_data.py:

harmonica/harmonica/tests/test_sample_data.py

Lines 35 to 42 in 2980071

    
           def test_gravity_earth(): 
        
               "Sanity checks for the loaded grid" 
        
               grid = fetch_gravity_earth() 
        
               assert grid.gravity.shape == (361, 721) 
        
               npt.assert_allclose(grid.gravity.max(), 9.8018358e05) 
        
               npt.assert_allclose(grid.gravity.min(), 9.7476403e05) 
        
               assert grid.height_over_ell.shape == (361, 721) 
        
               npt.assert_allclose(grid.height_over_ell, 10000)

Feel free to add as many tests as you think, more tests is usually a good thing to have!

…c attrs

nshea3 · 2020-08-17T17:26:23Z

Hi @santisoler, I have made the change you suggested, the attrs are now copied to a dictionary and set after the conversion.

I have also added several tests for the presence of specific attributes: refsysname, max_used_degree, modelname, and tide_system.

I have not yet changed the netcdf files as discussed in #124. I started editing the files with the netcdf4-python library but I realized that the global attributes do not update as changes are made to the files, so the global attributes would be incorrect.

I am currently trying to regenerate the files from the ICGEM website (http://icgem.gfz-potsdam.de/calcgrid) without the repeated longitudes but the only output option is a .gdf file and I do not know how to convert that to a .cdl or .nc file?

santisoler

Hi @nshea3. Thanks for making the changes! I think they are good to go! I will just make small changes on the comment lines, just to update them.

I have not yet changed the netcdf files as discussed in #124. I started editing the files with the netcdf4-python library but I realized that the global attributes do not update as changes are made to the files, so the global attributes would be incorrect.

Don't worry about #124 here. Now that you figured out that the missing attributes weren't in fact loaded by the function (but still in the files), these two issues are somewhat unrelated. So we can solve them in different PRs.

I am currently trying to regenerate the files from the ICGEM website (http://icgem.gfz-potsdam.de/calcgrid) without the repeated longitudes but the only output option is a .gdf file and I do not know how to convert that to a .cdl or .nc file?

Oh, you can use harmonica.load_icgem_gdf to load .gdf files into xarray.Dataset, which can be stored as netCDF files with xr.Dataset.to_netcdf.

santisoler · 2021-03-29T19:46:10Z

@nshea3 Sorry for stalling this PR for too long. I don't know why it kept under my radar for too long.
I'll merge right away if you agree.

We are planning a new Harmonica release for the next week(s). Thanks for the contribution!

nshea3 added 2 commits August 3, 2020 10:38

Missing attributes copied from original xarray before cast

daab1c1

Added tests to check that grid.attrs is not empty

3008109

santisoler requested changes Aug 6, 2020

View reviewed changes

attrs copied to dict object, updated test_sample_data to test specifi…

a2e58ce

…c attrs

nshea3 force-pushed the ICGEM-missing-attr branch from 7140de2 to a2e58ce Compare August 11, 2020 23:24

Update comment lines

6cc2edf

santisoler approved these changes Aug 18, 2020

View reviewed changes

Merge branch 'master' into ICGEM-missing-attr

61c088e

santisoler added this to the v0.2.0 milestone Mar 29, 2021

santisoler changed the title ~~Missing attributes copied from xarray before cast~~ Keep metadata of sample datasets in the xarray objects Apr 7, 2021

santisoler merged commit 62f3e65 into fatiando:master Apr 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keep metadata of sample datasets in the xarray objects #184

Keep metadata of sample datasets in the xarray objects #184

nshea3 commented Aug 3, 2020 •

edited

santisoler left a comment

santisoler commented Aug 6, 2020

nshea3 commented Aug 7, 2020

santisoler commented Aug 10, 2020

nshea3 commented Aug 17, 2020

santisoler left a comment

santisoler commented Mar 29, 2021

Keep metadata of sample datasets in the xarray objects #184

Keep metadata of sample datasets in the xarray objects #184

Conversation

nshea3 commented Aug 3, 2020 • edited

santisoler left a comment

Choose a reason for hiding this comment

santisoler commented Aug 6, 2020

nshea3 commented Aug 7, 2020

santisoler commented Aug 10, 2020

nshea3 commented Aug 17, 2020

santisoler left a comment

Choose a reason for hiding this comment

santisoler commented Mar 29, 2021

nshea3 commented Aug 3, 2020 •

edited