New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keep metadata of sample datasets in the xarray objects #184
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @nshea3 for spotting the true problem and solving it!
I would make small changes on you code.
By defining data
and cast_data
we will be storing the same data twice: one in float64
and another one in the original type. So, what if we cache the attrs
on a variable, change the datatype and then set the attrs
equal to the cached dictionary. Something like:
data = xr.open_dataset(fname, engine="scipy")
attrs = data.attrs.copy()
data = data.astype("float64")
data.attrs = attrs
I think this should work just fine, but worth testing it of course.
Besides, I would also test if the attrs
, besides not being empty, has some important attributes, like refsysname
.
What do you think?
Since we are about to move all datasets to Rockhound (see #179), we should remember to apply the same changes we perform here on fatiando/rockhound#84. |
Excellent point thank you @santisoler I will make that change so we avoid duplicating the xarray. I'll add a test for the |
Maybe we can add those attributes of which the data depends on. harmonica/harmonica/tests/test_sample_data.py Lines 35 to 42 in 2980071
Feel free to add as many tests as you think, more tests is usually a good thing to have! |
7140de2
to
a2e58ce
Compare
Hi @santisoler, I have made the change you suggested, the I have also added several tests for the presence of specific attributes: I have not yet changed the netcdf files as discussed in #124. I started editing the files with the I am currently trying to regenerate the files from the ICGEM website (http://icgem.gfz-potsdam.de/calcgrid) without the repeated longitudes but the only output option is a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @nshea3. Thanks for making the changes! I think they are good to go! I will just make small changes on the comment lines, just to update them.
I have not yet changed the netcdf files as discussed in #124. I started editing the files with the netcdf4-python library but I realized that the global attributes do not update as changes are made to the files, so the global attributes would be incorrect.
Don't worry about #124 here. Now that you figured out that the missing attributes weren't in fact loaded by the function (but still in the files), these two issues are somewhat unrelated. So we can solve them in different PRs.
I am currently trying to regenerate the files from the ICGEM website (http://icgem.gfz-potsdam.de/calcgrid) without the repeated longitudes but the only output option is a .gdf file and I do not know how to convert that to a .cdl or .nc file?
Oh, you can use harmonica.load_icgem_gdf
to load .gdf
files into xarray.Dataset
, which can be stored as netCDF files with xr.Dataset.to_netcdf
.
@nshea3 Sorry for stalling this PR for too long. I don't know why it kept under my radar for too long. We are planning a new Harmonica release for the next week(s). Thanks for the contribution! |
Fixes #125
Many global attributes are included in the netcdf files, however they are dropped in the
fetch_*
functions insample_data.py
with the float64 cast:data = xr.open_dataset(fname, engine="scipy").astype("float64")
. This is a known issue in xarray (pydata/xarray#2049) and the current solution is to open the NetCDF file as-is, cast to the new datatype, and then copy the global attributes to the new object.Also added tests to
test_sample_data.py
to check that theattrs
attribute is not an empty dictionary.Reminders:
make format
andmake check
to make sure the code follows the style guide.doc/api/index.rst
and the base__init__.py
file for the package.AUTHORS.md
file (if you haven't already) in case you'd like to be listed as an author on the Zenodo archive of the next release.