Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor CFWriter utility into CF directory #2524

Merged
merged 42 commits into from
Nov 28, 2023

Conversation

ghiggi
Copy link
Contributor

@ghiggi ghiggi commented Jun 27, 2023

  • Tests added
  • Fully documented
  • Add your name to AUTHORS.md if not there already

Refactor CFWriter-related functions into separate CF modules.

@codecov
Copy link

codecov bot commented Jun 27, 2023

Codecov Report

Attention: 25 lines in your changes are missing coverage. Please review.

Comparison is base (ed220be) 95.21% compared to head (b4e8fa5) 95.26%.

Files Patch % Lines
satpy/cf/attrs.py 93.33% 8 Missing ⚠️
satpy/cf/coords.py 96.10% 6 Missing ⚠️
satpy/writers/cf_writer.py 60.00% 6 Missing ⚠️
satpy/cf/datasets.py 97.01% 2 Missing ⚠️
satpy/cf/encoding.py 96.00% 2 Missing ⚠️
satpy/cf/area.py 97.36% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2524      +/-   ##
==========================================
+ Coverage   95.21%   95.26%   +0.05%     
==========================================
  Files         356      367      +11     
  Lines       51605    51685      +80     
==========================================
+ Hits        49134    49239     +105     
+ Misses       2471     2446      -25     
Flag Coverage Δ
behaviourtests 4.22% <0.44%> (+<0.01%) ⬆️
unittests 95.89% <97.24%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@coveralls
Copy link

coveralls commented Jun 27, 2023

Pull Request Test Coverage Report for Build 7020542918

  • 882 of 907 (97.24%) changed or added relevant lines in 19 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.06%) to 95.839%

Changes Missing Coverage Covered Lines Changed/Added Lines %
satpy/cf/area.py 37 38 97.37%
satpy/cf/datasets.py 65 67 97.01%
satpy/cf/encoding.py 48 50 96.0%
satpy/cf/coords.py 148 154 96.1%
satpy/writers/cf_writer.py 9 15 60.0%
satpy/cf/attrs.py 112 120 93.33%
Totals Coverage Status
Change from base Build 7019457197: 0.06%
Covered Lines: 49365
Relevant Lines: 51508

💛 - Coveralls

@ghiggi
Copy link
Contributor Author

ghiggi commented Jun 27, 2023

Here it is a first refactor.
I just split the various CFWriter components into various files in the satpy/writers/cf directory.
The associated tests have been copy-pasted and reorganized in the corresponding cf_tests/test_<filename>.py files.

The cf_writer.py content is now very light and can be reused to also i.e. add a ZarrWriter.
The cf_writer.py just depends on a single function of dataset.py and encoding.py.
The dataset.py depends on a single function of dataarray.py and attrs.py and some other stuffs that can still be refactored (see inline comments below)

Decisions to be taken:

  • Code in satpy/writers/cf or satpy/cf
  • Organization of cf files: coords_attrs.py, crs.py, time.py and attrs.py (crs.py in area.py ? | coords_attrs.py and time.py in a new attrs_coords.py ?)
  • Test in test_cf.py often call save_datasets. Should we duplicate/copy relevant content and also test the parent cf methods?
  • In perspective of a ZarrWriter implementation, we aim to run test_cf.py tests twice (for writer='cf' (or writer='netcdf') and writer='zarr')?

satpy/writers/cf/datasets.py Outdated Show resolved Hide resolved
satpy/writers/cf/encoding.py Outdated Show resolved Hide resolved
satpy/writers/cf/time.py Outdated Show resolved Hide resolved
satpy/writers/cf/time.py Outdated Show resolved Hide resolved
@mraspaud
Copy link
Member

Decisions to be taken:

* Code in `satpy/writers/cf` or `satpy/cf`

I vote for satpy/writers/cf

* Organization of `cf` files: `coords_attrs.py`, `crs.py`,  `time.py` and `attrs.py` (`crs.py` in `area.py` ? |  `coords_attrs.py` and `time.py` in a new `attrs_coords.py` ?)

the former maybe?

* Test in `test_cf.py` often call `save_datasets`. Should we duplicate/copy relevant content and also test the parent cf methods?

I don't think we need to explicitely test cf methods as long as the save_datasets exercises the code we are interested in.

* In perspective of a `ZarrWriter` implementation, we aim to run `test_cf.py` tests twice (for `writer='cf'` (or `writer='netcdf'`) and `writer='zarr')`?

Why not. One solution would be to have a base test class that is then inherited by TestCFWriter and TestZarrWriter if parametrization isn't good enough.

@ghiggi
Copy link
Contributor Author

ghiggi commented Jul 6, 2023

Hey @djhoese :) In case you manage to get a quick look in the coming days to this PR, next Tuesday/Wednesday I should have time to address your comments ;)

Since we discussed in a previous PR to add options like serialize=True, cf=True, drop_area=True to to_xarray ... these should be implemented in a separate PR right?

@djhoese djhoese added component:writers cleanup Code cleanup but otherwise no change in functionality labels Jul 6, 2023
@djhoese
Copy link
Member

djhoese commented Jul 6, 2023

I'm fine with kwarg changes being moved to another PR. I haven't reviewed this in-depth because of Martin's comments, but I now realize most of the comments here are your own about decisions that need to be made. I'll see what I can do.

Copy link
Member

@djhoese djhoese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In perspective of a ZarrWriter implementation, we aim to run test_cf.py tests twice (for writer='cf' (or writer='netcdf') and writer='zarr')?

My preference would be to have tests that actually test what the writers are doing and what they are responsible for. The CF tests should test that the proper conversions are happening. The writer tests don't need to make sure that CF conversion for every case happened because the CF tests should have covered that.

In the cases where you want to make sure that something the CF conversion is doing is saved/rendered properly on-disk then I think putting it in a writer test is fine, but you could also use to_netcdf and to_zarr directly...maybe. I'm not sure how much I'll regret saying that later.

Most of this looks fine to me. I left some inline comments trying to address some of the questions you had. I think in general if you think some functions could be extracted and/or shared then go for it.

I think I requested this somewhere else, but I'd really like if we could be more specific with function names and module names regarding what is private (_ prefix) and public. For example, if a module in satpy/writers/cf/ is only used internally to that sub-package/module then I think the module name could start with a _. Similarly, if a function is only used inside its own module then it should get a _ prefix to make sure users/readers don't assume it can be imported into somewhere else. The opposite of this is that you shouldn't be importing a function with a _ from one module to another. Thoughts?

satpy/tests/writer_tests/cf_tests/test_attrs.py Outdated Show resolved Hide resolved
satpy/tests/writer_tests/cf_tests/test_attrs.py Outdated Show resolved Hide resolved
satpy/writers/cf/datasets.py Outdated Show resolved Hide resolved
satpy/writers/cf/encoding.py Outdated Show resolved Hide resolved
satpy/writers/cf/time.py Outdated Show resolved Hide resolved
satpy/writers/cf_writer.py Outdated Show resolved Hide resolved
@djhoese
Copy link
Member

djhoese commented Nov 16, 2023

The readthedocs failure seems like a corrupt download. It should fix itself next run.

@djhoese
Copy link
Member

djhoese commented Nov 16, 2023

Nevermind:

/home/docs/checkouts/readthedocs.org/user_builds/satpy/conda/2524/lib/python3.10/site-packages/satpy/cf/dataarray.py:docstring of satpy.cf.dataarray.make_cf_dataarray:15: ERROR: Unknown target name: "channel".

@ghiggi
Copy link
Contributor Author

ghiggi commented Nov 16, 2023

@djhoese I further simplified all functions that CodeScene was complaining about. Have a last look ... in case tomorrow I can do the last changes ;)

@djhoese
Copy link
Member

djhoese commented Nov 21, 2023

@mraspaud do you want to give this one last review?

@ghiggi
Copy link
Contributor Author

ghiggi commented Nov 22, 2023

Maybe also @TomLav want to have a look in light of pytroll/pyresample#278 ?

Copy link
Member

@mraspaud mraspaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together, looks good to me.

@TomLav
Copy link

TomLav commented Nov 22, 2023

A question about _add_xy_projected_coords_attrs. I see that it only allows for projection_x|y_coordinates.

projection_x|y_coordinates is valid for all projections, except for rotated_latitude_longitude and geostationary.

For rotated_latitute_longitude, the names are grid_latitude and grid_longitude(with units degrees, I suppose).

For geostationary, the names are projection_x_angular_coordinate and projection_y_angular_coordinate (since CF-1.9).

In pyresample's load_cf_area() this logic is implemented by _valid_cf_coordinate_standardnames.

For geostationary, this goes beyond the change of standard_name since the CF convention expects the angles to be stored, while the area_def object stores the meters. Again, in pyresample load_cf_area() we have an explicit convertion of the angles (in the CF file) to the meters (in the area_def object): https://github.com/pytroll/pyresample/blob/25d0c7209f15fd57928fbafccbdaa0060530d6ae/pyresample/utils/cf.py#L72

I have not looked at the test suite yet, but do you have tests that use CFWriter to dump an area_def, then re-read it with load_cf_area() and compare the results?

@djhoese
Copy link
Member

djhoese commented Nov 23, 2023

It should be noted that the angles for the CF geostationary case are satellite viewing angles in radians NOT lon/lat degrees. Regardless a NetCDF is still valid with metered coordinates for geos as long as the units attribute states that.

@TomLav
Copy link

TomLav commented Nov 24, 2023

The units of the geostationary projection axis are indeed radians. The CF convention states:

The use of projection_x_coordinate and projection_y_coordinate was deprecated in version 1.9 of the CF Conventions. The initial definition of this projection used these standard names to identify the projection coordinates even though their canonical units (meters) do not match those required for this projection (radians).

So the validity of satpy using projection_x|y_coordinates depends on what "CF-1.X" it writes in :Conventions.

@djhoese
Copy link
Member

djhoese commented Nov 24, 2023

@TomLav I'm a little confused by that. It seems like they are providing part of the explanation and not the whole thing. I also noticed that canonical units are defined as:

canonical units

    Representative units of the physical quantity. Unless it is dimensionless, a variable with a standard_name attribute must have units which are physically equivalent (not necessarily identical) to the canonical units, possibly modified by an operation specified by the standard name modifier (see below and [Appendix C, Standard Name Modifiers](https://cfconventions.org/Data/cf-conventions/cf-conventions-1.10/cf-conventions.html#standard-name-modifiers)) or by the cell_methods attribute (see [Section 7.3, "Cell Methods"](https://cfconventions.org/Data/cf-conventions/cf-conventions-1.10/cf-conventions.html#cell-methods) and [Appendix E, Cell Methods](https://cfconventions.org/Data/cf-conventions/cf-conventions-1.10/cf-conventions.html#appendix-cell-methods)) or both.

So obviously meters and kilometers are "physically equivalent", but based on the quote you commented they're saying radians and meters are not. So if we're using the CF 1.9 specification and producing a geostationary grid mapping then we should (only) produce x/y variables with standard_name projection_x/y_angular_coordinate and units in radians. Although we only want to produce CF standard files, if we could this would be one place where I'd be OK not following the standard (produce metered x/y instead of radians).

@mraspaud
Copy link
Member

@ghiggi I see that the directory now is satpy/cf rather than satpy/writers/cf as was agreed in the beginning, what motivated the switch?

@ghiggi
Copy link
Contributor Author

ghiggi commented Nov 28, 2023

@mraspaud I don't recall precisely. Maybe we had a discussion with @djhoese on this ... I don't remember.
If I should decide, I prefer to have it in satpy/cf rather than inside a writer folder.
The code in cf does not do any writing: it only ensure that data conform to a convention and are serializable.

@djhoese
Copy link
Member

djhoese commented Nov 28, 2023

I think I like the idea of a generic satpy/cf/ directory for CF related stuff BUT it also made me feel like we shouldn't have this stuff in Satpy and should find a way to use another package like cf-xarray or something. But since this was primarily a refactor I was fine with whatever @ghiggi wanted to do.

@mraspaud mraspaud merged commit 72e2a71 into pytroll:main Nov 28, 2023
17 of 19 checks passed
@mraspaud
Copy link
Member

Merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cleanup Code cleanup but otherwise no change in functionality component:writers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants