Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix HDF4 support in geocat reader with hardcoded engine #2507

Merged
merged 13 commits into from
Jun 20, 2023

Conversation

joleenf
Copy link
Contributor

@joleenf joleenf commented Jun 12, 2023

Adds an __init__ method to the GEOCATFileHandler to allow the addition of xarray_kwargs to the reader. This has been done so that the engine to the xarray_opendataset can be set as well as setting decode_times=False since the current metadata does not allow xarray to decode the times. (Even when the engine is set, decode_times=True will throw an error because xarray cannot interpret the units metadata as written by GEOCAT).

Relates to the discussion on pydata/xarray#7905
I wanted to isolate this pull request from later work which will rely on being able to read these GEOCAT files with SatPy (i.e. adding aliases to facilitate RGB composites with existing recipes).

Comment on lines 67 to 69
scene = satpy.Scene(filenames,
reader='geocat',
reader_kwargs={'decode_times': True})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong but this isn't how this would work. The user would be doing reader_kwargs={"xarray_kwargs": {"decode_times": True}}.

Copy link
Contributor Author

@joleenf joleenf Jun 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct fixed in 510c24f

Comment on lines 75 to 78
kwargs.setdefault('xarray_kwargs', {}).setdefault(
'decode_times', False)
kwargs.setdefault('xarray_kwargs', {}).setdefault(
'engine', "netcdf4")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it, these can just be hardcoded can't they? So kwargs.setdefault("xarray_kwargs", {"decode_times": False, "engine": "netcdf4"})?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, I can't think of a use case where they could be anything else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the geocat code changes, and the unit metadata becomes compliant, I think it would be nice to be able to interpret the scan line times when reading them. I might be mistaken, but 75-78 as written would allow the user to set the decode_times back to True if it is possible to decode them?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it looks like it would work. My suggestion for doing the entire xarray_kwargs dictionary would also work, but would require users to make sure they include engine and decode_times if they were specifying xarray_kwargs at all. This is such a small thing and not sure it matters too much.

Copy link
Contributor Author

@joleenf joleenf Jun 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I will work on it. However, if I set as you outline above, and the user enters:

a = Scene([fn], reader="geocat", reader_kwargs={"xarray_kwargs": {"decode_times": True}})

That will strip the default engine so that what is passed to xarray.open_dataset is decode_times=True, but no engine. As I am understanding what you are saying, and I think it is necessary, the reader should still pass engine="netcdf4" regardless of what the call from Scene contains. Correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Circling back to this again, sorry.... if the kwargs are set as in a previous commit

Doesn't that leave the engine as "netcdf4" if the user only enters a value for decode_times? If no xarray_kwargs are provided, the values would be {"engine": "netcdf4", "decode_times": False}?

I think I am not understanding your point fully on this requested change to
kwargs.setdefault("xarray_kwargs", {"decode_times": False, "engine": "netcdf4"}).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct. Yours is more flexible. That's why I said it doesn't matter much. My main reason for my suggestion was that it treated xarray_kwargs as an all-or-nothing keyword argument rather than individual things. Sorry for all the confusion for something so dumb (on my part). Let me know if you want to put it back the way you had it or leave it as is. Either way I'm fine with it and we can merge regardless of the pre-commit issue...which I think is a hiccup on pre-commit's side of things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is fine. I think either way works, so to avoid errors that can be introduced with too much more fiddling, I think I will leave it as is.

@codecov
Copy link

codecov bot commented Jun 12, 2023

Codecov Report

Merging #2507 (bc8a955) into main (9415ad9) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main    #2507   +/-   ##
=======================================
  Coverage   94.85%   94.85%           
=======================================
  Files         337      337           
  Lines       49624    49635   +11     
=======================================
+ Hits        47069    47080   +11     
  Misses       2555     2555           
Flag Coverage Δ
behaviourtests 4.41% <0.00%> (-0.01%) ⬇️
unittests 95.47% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
satpy/readers/geocat.py 88.81% <100.00%> (+0.28%) ⬆️
satpy/tests/reader_tests/test_geocat.py 100.00% <100.00%> (ø)

@coveralls
Copy link

coveralls commented Jun 12, 2023

Pull Request Test Coverage Report for Build 5315827401

  • 13 of 13 (100.0%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.001%) to 95.42%

Totals Coverage Status
Change from base Build 5268800416: 0.001%
Covered Lines: 47193
Relevant Lines: 49458

💛 - Coveralls

"""Open and perform initial investigation of NetCDF file."""
super(GEOCATFileHandler, self).__init__(
filename, filename_info, filetype_info,
xarray_kwargs={"engine": "netcdf4", "decode_times": False})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the only change I'd like to consider is allowing the user to pass xarray_kwargs in a **kwargs. I think we had done something like kwargs.setdefault("xarray_kwargs, {...}).

Comment on lines +80 to +82
super(GEOCATFileHandler, self).__init__(
filename, filename_info, filetype_info,
xarray_kwargs=kwargs["xarray_kwargs"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should just be **kwargs in case the user is passing other kwargs down to the utility file handler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand where you want kwargs, in the __init__ correct? The user would still send the xarray specific kwargs with the key "xarray_kwargs" so the documentation should reflect that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. This call to __init__ should use kwargs to pass everything to the base class. That doesn't change what the user uses.

satpy/readers/geocat.py Outdated Show resolved Hide resolved

**Loading data with decode_times=True**

By default, this reader will use xarray_kwargs={"engine": "netcdf4", "decode_times": False}.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
By default, this reader will use xarray_kwargs={"engine": "netcdf4", "decode_times": False}.
By default, this reader will use ``xarray_kwargs={"engine": "netcdf4", "decode_times": False}``.

joleenf and others added 3 commits June 15, 2023 08:46
Fix the documentation reference for the Scene class.

Co-authored-by: David Hoese <david.hoese@ssec.wisc.edu>
@joleenf joleenf marked this pull request as ready for review June 15, 2023 20:48
@joleenf joleenf requested a review from mraspaud as a code owner June 15, 2023 20:48
@joleenf joleenf requested a review from djhoese June 15, 2023 20:48
Comment on lines 67 to 69
scene = satpy.Scene(filenames,
reader='geocat',
reader_kwargs={'xarray_kwargs': {'decode_times': True}})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to update this to include engine="netcdf" in this example since without it and with my proposed code that you implemented the engine will be the default where xarray tries to match/find one.

@djhoese
Copy link
Member

djhoese commented Jun 16, 2023

pre-commit.ci autofix

@djhoese
Copy link
Member

djhoese commented Jun 19, 2023

Closing and reopening to hopefully pick up any commits from pre-commit.

@djhoese djhoese closed this Jun 19, 2023
@djhoese djhoese reopened this Jun 19, 2023
@djhoese
Copy link
Member

djhoese commented Jun 19, 2023

pre-commit.ci autofix

@djhoese
Copy link
Member

djhoese commented Jun 19, 2023

@joleenf I checked the pre-commit.ci log and it looks like it is having some trouble merging/pushing fixes to your branch. It says to pull/merge with upstream main to fix it. You could also do the fixes yourself (after merging with main) by running pre-commit run -a and that should make the changes pre-commit.ci would have made anyway.

Merge branch 'main' of github.com:pytroll/satpy into geocat_xr_kwargs
@joleenf joleenf requested a review from adybbroe as a code owner June 19, 2023 15:49
user does not have to enter both the engine and decode_times when
changing one of the defaults.
@joleenf joleenf requested a review from djhoese June 19, 2023 20:54
@djhoese djhoese changed the title Add reader keyword argument xarray_kwargs Add xarray_kwargs to geocat reader Jun 20, 2023
@djhoese djhoese changed the title Add xarray_kwargs to geocat reader Fix HDF4 support in geocat reader with hardcoded engine Jun 20, 2023
@djhoese djhoese merged commit 1d37638 into pytroll:main Jun 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add xarray_kwargs capability to the geocat reader
3 participants