Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to pyproj for projection to CF NetCDF grid mapping #1038

Merged
merged 12 commits into from
Apr 1, 2020

Conversation

djhoese
Copy link
Member

@djhoese djhoese commented Jan 9, 2020

This is the long term solution for fixing #1029 (FYI @RickKohrs).

This switches the CF writer to depend on pyproj 2.0+ for converting the PROJ information to a CF compatible set of attributes. This is a no brainer in the long run as it lets the projection-specific library handling the projection logic (and people who are much more up to date on changes in CF and PROJ).

However, this has some important changes and this PR is kind of 50/50 on deciding how it handles them. I'm not sure whether we want to completely switch to depending on pyproj and remove all previous methods for "unknown" projections or not. Pyproj also doesn't provide any default way of "naming" the grid_mapping variable. I made it default to the same name as the grid_mapping_name attribute.

I think I might make a short term solution for a 0.19.1 release of Satpy to fix #1029 but wanted to make this for now so people could see it and discuss.

@coveralls
Copy link

coveralls commented Jan 9, 2020

Coverage Status

Coverage increased (+0.06%) to 89.535% when pulling 11ae112 on djhoese:bugfix-cf-projections into 8853aac on pytroll:master.

@mraspaud
Copy link
Member

mraspaud commented Jan 9, 2020

I'd like to use the opportunity to add an oblique mercator test. I'll try to provide something during the day.

@djhoese
Copy link
Member Author

djhoese commented Jan 9, 2020

I learned while working on the short term solution last night that I'm pretty sure the projection tests aren't actually comparing any of the .attrs for the grid mapping variable. Meaning, they basically don't test anything.

@mraspaud
Copy link
Member

mraspaud commented Jan 9, 2020

I'll have a look now

@mraspaud
Copy link
Member

mraspaud commented Jan 9, 2020

you're right, the attrs weren't checked...

@codecov
Copy link

codecov bot commented Jan 9, 2020

Codecov Report

Merging #1038 into master will increase coverage by 0.05%.
The diff coverage is 98.92%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1038      +/-   ##
==========================================
+ Coverage   89.47%   89.53%   +0.05%     
==========================================
  Files         198      200       +2     
  Lines       29210    29356     +146     
==========================================
+ Hits        26136    26284     +148     
+ Misses       3074     3072       -2     
Impacted Files Coverage Δ
satpy/writers/cf_writer.py 93.79% <96.42%> (+1.14%) ⬆️
satpy/tests/writer_tests/test_cf.py 98.53% <100.00%> (ø)
satpy/tests/test_scene.py 99.71% <0.00%> (-0.14%) ⬇️
satpy/readers/hy2_scat_l2b_h5.py 98.00% <0.00%> (ø)
satpy/tests/reader_tests/test_hy2_scat_l2b_h5.py 100.00% <0.00%> (ø)
satpy/scene.py 90.19% <0.00%> (+0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8853aac...11ae112. Read the comment docs.

@djhoese
Copy link
Member Author

djhoese commented Jan 23, 2020

So I've just rebased this on to master and I have a few things I think we need to decide on before merging this:

  1. Do we completely depend on pyproj for all CF projection conversions? Would make the code nice but could delay updates/fixes for our users if we need to wait for a pyproj release (although they come pretty quickly, faster than satpy usually).
  2. Whoever is using the omerc stuff really needs to test what their clients can accept for the grid mapping? There was some stuff that was not CF compliant in the previous conversion functions.
  3. I think we should remove the name handling stuff that is based on the projection definition and instead use the area.area_id as the name of the variable in the NetCDF file. This should make it unique for NetCDF files with multiple areas in the same file (if possible or in the future).

Edit: Also see pyproj4/pyproj#515

@mraspaud
Copy link
Member

  1. If we can delegate it to pyproj, I'm good with it. But we need thorough tests to make things don't break with future releases.
  2. I'm using omerc, I'll check
  3. Fine by me.

@mraspaud
Copy link
Member

Here is the result of a before/after comparison with omerc:

Before

	int64 omerc ;
		omerc:azimuth_of_central_line = 8.14048957652079 ;
		omerc:false_easting = 0. ;
		omerc:false_northing = 0. ;
		omerc:gamma = 0LL ;
		omerc:geographic_crs_name = "unknown" ;
		omerc:grid_mapping_name = "oblique_mercator" ;
		omerc:horizontal_datum_name = "unknown" ;
		omerc:latitude_of_projection_origin = 0LL ;
		omerc:long_name = "omerc" ;
		omerc:longitude_of_projection_origin = 33.8226584093497 ;
		omerc:prime_meridian_name = "Greenwich" ;
		omerc:reference_ellipsoid_name = "sphere" ;

After

	int64 oblique_mercator ;
		oblique_mercator:azimuth_of_central_line = 8.14048957652079 ;
		oblique_mercator:crs_wkt = "PROJCRS[\"unknown\",BASEGEOGCRS[\"unknown\",DATUM[\"Unknown based on Normal Sphere (r=6370997) ellipsoid\",ELLIPSOID[\"Normal Sphere (r=6370997)\",6370997,0,LENGTHUNIT[\"metre\",1,ID[\"EPSG\",9001]]]],PRIMEM[\"Greenwich\",0,ANGLEUNIT[\"degree\",0.0174532925199433],ID[\"EPSG\",8901]]],CONVERSION[\"unknown\",METHOD[\"Hotine Oblique Mercator (variant B)\",ID[\"EPSG\",9815]],PARAMETER[\"Latitude of projection centre\",0,ANGLEUNIT[\"degree\",0.0174532925199433],ID[\"EPSG\",8811]],PARAMETER[\"Longitude of projection centre\",33.8226584093497,ANGLEUNIT[\"degree\",0.0174532925199433],ID[\"EPSG\",8812]],PARAMETER[\"Azimuth of initial line\",8.14048957652079,ANGLEUNIT[\"degree\",0.0174532925199433],ID[\"EPSG\",8813]],PARAMETER[\"Angle from Rectified to Skew Grid\",0,ANGLEUNIT[\"degree\",0.0174532925199433],ID[\"EPSG\",8814]],PARAMETER[\"Scale factor on initial line\",1,SCALEUNIT[\"unity\",1],ID[\"EPSG\",8815]],PARAMETER[\"Easting at projection centre\",0,LENGTHUNIT[\"metre\",1],ID[\"EPSG\",8816]],PARAMETER[\"Northing at projection centre\",0,LENGTHUNIT[\"metre\",1],ID[\"EPSG\",8817]]],CS[Cartesian,2],AXIS[\"(E)\",east,ORDER[1],LENGTHUNIT[\"metre\",1,ID[\"EPSG\",9001]]],AXIS[\"(N)\",north,ORDER[2],LENGTHUNIT[\"metre\",1,ID[\"EPSG\",9001]]]]" ;
		oblique_mercator:false_easting = 0LL ;
		oblique_mercator:false_northing = 0LL ;
		oblique_mercator:grid_mapping_name = "oblique_mercator" ;
		oblique_mercator:latitude_of_projection_origin = 0LL ;
		oblique_mercator:long_name = "oblique_mercator" ;
		oblique_mercator:longitude_of_projection_origin = 33.8226584093497 ;
		oblique_mercator:reference_ellipsoid_name = "sphere" ;
		oblique_mercator:unit = "m" ;

It's basically fine, but there are some parameters missing (both in the before and after), since CF 1.8 says:

reference_ellipsoid_name, prime_meridian_name, horizontal_datum_name and geographic_crs_name must be all defined if any one is defined, and if projected_crs_name is defined then geographic_crs_name must be also.

@mraspaud
Copy link
Member

It's better with pyproj master, but geographic_crs_name is still missing

	int64 oblique_mercator ;
		oblique_mercator:azimuth_of_central_line = 8.14048957652079 ;
		oblique_mercator:crs_wkt = "PROJCRS[\"unknown\",BASEGEOGCRS[\"unknown\",DATUM[\"Unknown based on Normal Sphere (r=6370997) ellipsoid\",ELLIPSOID[\"Normal Sphere (r=6370997)\",6370997,0,LENGTHUNIT[\"metre\",1,ID[\"EPSG\",9001]]]],PRIMEM[\"Greenwich\",0,ANGLEUNIT[\"degree\",0.0174532925199433],ID[\"EPSG\",8901]]],CONVERSION[\"unknown\",METHOD[\"Hotine Oblique Mercator (variant B)\",ID[\"EPSG\",9815]],PARAMETER[\"Latitude of projection centre\",0,ANGLEUNIT[\"degree\",0.0174532925199433],ID[\"EPSG\",8811]],PARAMETER[\"Longitude of projection centre\",33.8226584093497,ANGLEUNIT[\"degree\",0.0174532925199433],ID[\"EPSG\",8812]],PARAMETER[\"Azimuth of initial line\",8.14048957652079,ANGLEUNIT[\"degree\",0.0174532925199433],ID[\"EPSG\",8813]],PARAMETER[\"Angle from Rectified to Skew Grid\",0,ANGLEUNIT[\"degree\",0.0174532925199433],ID[\"EPSG\",8814]],PARAMETER[\"Scale factor on initial line\",1,SCALEUNIT[\"unity\",1],ID[\"EPSG\",8815]],PARAMETER[\"Easting at projection centre\",0,LENGTHUNIT[\"metre\",1],ID[\"EPSG\",8816]],PARAMETER[\"Northing at projection centre\",0,LENGTHUNIT[\"metre\",1],ID[\"EPSG\",8817]]],CS[Cartesian,2],AXIS[\"(E)\",east,ORDER[1],LENGTHUNIT[\"metre\",1,ID[\"EPSG\",9001]]],AXIS[\"(N)\",north,ORDER[2],LENGTHUNIT[\"metre\",1,ID[\"EPSG\",9001]]]]" ;
		oblique_mercator:false_easting = 0. ;
		oblique_mercator:false_northing = 0. ;
		oblique_mercator:grid_mapping_name = "oblique_mercator" ;
		oblique_mercator:horizontal_datum_name = "Unknown based on Normal Sphere (r=6370997) ellipsoid" ;
		oblique_mercator:inverse_flattening = 0. ;
		oblique_mercator:latitude_of_projection_origin = -0. ;
		oblique_mercator:long_name = "oblique_mercator" ;
		oblique_mercator:longitude_of_prime_meridian = 0. ;
		oblique_mercator:longitude_of_projection_origin = 33.8226584093497 ;
		oblique_mercator:prime_meridian_name = "Greenwich" ;
		oblique_mercator:reference_ellipsoid_name = "Normal Sphere (r=6370997)" ;
		oblique_mercator:scale_factor_at_projection_origin = 1. ;
		oblique_mercator:semi_major_axis = 6370997. ;
		oblique_mercator:semi_minor_axis = 6370997. ;

@djhoese
Copy link
Member Author

djhoese commented Feb 14, 2020

geographic_crs_name isn't CF though. Does your application need that for some reason? Even though it is "unknown"?

@mraspaud
Copy link
Member

@djhoese
Copy link
Member Author

djhoese commented Feb 14, 2020

Hm, but for your projection it isn't defined by the OGC WKT spec, so maybe that's ok? The CF docs don't say anything about "unknown" when it isn't defined. I assume that means you just don't include it if it isn't defined which is what pyproj is doing here.

What if you do have a OGC WKT GEOGCS defined? Does pyproj include it then?

Copy link
Member

@sfinkens sfinkens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice development! And thanks for getting in touch with the pyproj developers.

  1. Agree with Martin.
  2. Never used that projection.
  3. See my comment.

grid_mapping['name'] = grid_mapping['grid_mapping_name']
except AttributeError:
grid_mapping = mappings[area.proj_dict['proj']](area)
grid_mapping['name'] = area.proj_dict['proj']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean that the name attribute would depend on the pyproj version? If so, should we align those names?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion for 3 is not implemented. Right now this does not depend on the pyproj version because proj is statically defined and unchanging (merc, geos, etc). However, the name attribute is not required by CF for the grid mapping variable so we can remove this portion and instead use area.area_id as the name of the variable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see. Using area.area_id sounds good!

@mraspaud
Copy link
Member

@djhoese I don't think there is much more to do here, I'm glad with the result of this actually. If there are improvements to make, I suppose they should go in pyproj, no here, right ?

@mraspaud mraspaud added this to the v0.21.0 milestone Mar 25, 2020
@djhoese
Copy link
Member Author

djhoese commented Mar 25, 2020

There still need to be some changes for what area.area_id is and to use it as the variable name in the NetCDF. Also, I would like to drop the old functions completely and depend on pyproj 2.1+ (I think that's it at least). We could do it during the import of the module and raise an import error if the version isn't new enough. Thoughts?

@mraspaud
Copy link
Member

I'm good with depending in a new pyproj. But maybe we skip this PR for v0.21 then.

@djhoese
Copy link
Member Author

djhoese commented Mar 25, 2020

But maybe we skip this PR for v0.21 then.

I'm OK with that. Are you saying that due to the time it would take me to finish this or because you don't want this writer to depend on pyproj 2+?

@mraspaud
Copy link
Member

I was thinking time constraints, I'd like a release within a week.

@mraspaud mraspaud removed this from the v0.21.0 milestone Mar 26, 2020
# Conflicts:
#	.travis.yml
#	appveyor.yml
#	satpy/tests/writer_tests/test_cf.py
@ghost
Copy link

ghost commented Mar 30, 2020

DeepCode's analysis on #11ae11 found:

1 minor issue. ✔️ 1 issue were fixed.

👉 View analysis in DeepCode’s Dashboard

☺️ If you want to provide feedback on our bot, here is how to contact us.

@djhoese
Copy link
Member Author

djhoese commented Mar 30, 2020

@mraspaud This is ready for re-review. The big change is that the CRS conversion now completely depends on pyproj. The biggest change here on the user side is that there is no longer a warning if a CRS can't be converted "properly". This is because pyproj follows what the current CF standard is (not sure it is fully documented but I saw it on the mailing list) which is to provide a crs_wkt attribute in the grid mapping variable and no other attributes. I believe this is meant as a fallback for clients who say they can read any CF file but data providers' data is not in a CF CRS.

@sfinkens This also implements the grid mapping variable naming talked about above. Basically, it now names the grid mapping variable to area.area_id.

Copy link
Member

@mraspaud mraspaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me as afaics, but we need to check how this performs against the cf-checker.
Moreover, @sfinkens are you going to be affected by these changes ?

Comment on lines 786 to 799
with mock.patch('satpy.writers.cf_writer.warnings.warn') as warn:
res, grid_mapping = area2gridmapping(ds)
warn.assert_called()
proj_dict = pyresample.geometry.proj4_str_to_dict(res.attrs['grid_proj4'])
self.assertEqual(proj_dict['lon_0'], 4.535)
self.assertEqual(proj_dict['lat_0'], 46.0)
self.assertEqual(proj_dict['o_lon_p'], -5.465)
self.assertEqual(proj_dict['o_lat_p'], 90.0)
self.assertEqual(proj_dict['proj'], 'ob_tran')
self.assertEqual(proj_dict['o_proj'], 'stere')
self.assertEqual(proj_dict['ellps'], 'WGS84')
self.assertEqual(grid_mapping.attrs['name'], 'proj4')
# with mock.patch('satpy.writers.cf_writer.warnings.warn') as warn:
res, grid_mapping = area2gridmapping(ds)
# warn.assert_called()
# proj_dict = pyresample.geometry.proj4_str_to_dict(res.attrs['grid_proj4'])
# self.assertEqual(proj_dict['lon_0'], 4.535)
# self.assertEqual(proj_dict['lat_0'], 46.0)
# self.assertEqual(proj_dict['o_lon_p'], -5.465)
# self.assertEqual(proj_dict['o_lat_p'], 90.0)
# self.assertEqual(proj_dict['proj'], 'ob_tran')
# self.assertEqual(proj_dict['o_proj'], 'stere')
# self.assertEqual(proj_dict['ellps'], 'WGS84')
self.assertIn('crs_wkt', grid_mapping.attrs)
self.assertEqual(res.attrs['grid_mapping'], 'cosmo7')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@loreclem I believe you implemented this, would you mind checking if you are ok with this ?

satpy/tests/writer_tests/test_cf.py Outdated Show resolved Hide resolved
@djhoese
Copy link
Member Author

djhoese commented Mar 30, 2020

The tests were failing previously because reference_ellipsoid_name for pyproj 2.5+ seems to add a space (ex. WGS 84) where it didn't before. I've asked on gitter about this. If it is intended I may need to add a special case to the test to handle this.

@mraspaud mraspaud added this to the v0.21.0 milestone Mar 30, 2020
@djhoese
Copy link
Member Author

djhoese commented Mar 30, 2020

Ok I just pushed a commit with some more big changes after @peters77 pointed out that this PR wasn't actually working in a real world case. Turns out the tests weren't actually testing very much. The lesson from us (the developers) should be "just because you can mock something in a test, doesn't mean you should". There are ways with mocks to declare them with wraps=original_function so that you still call the original code while also being able to check how many calls there were and what the arguments were. However, these tests were also using these checks if a function was called or what arguments it was called with instead of checking the actual result for what that function should be doing. This most recent commit should fix all of this based on what I saw in the tests.

The other big change in this last commit: the da2cf method now removes the .attrs['name'] attribute and instead renames the xarray DataArray's Variable object. This means that my_data_array.name is the name of the Variable and that .attrs['name'] is no longer available after things have been CF-ified.

Lastly, long_name now only defaults to my_var.name if standard_name is not provided instead of always being set. This should go along with the CF conventions based on what @mraspaud told me on slack.

@mraspaud
Copy link
Member

Regarding my comment on geographic_crs_name, an issue has been opened in pyproj:
pyproj4/pyproj#585

Copy link
Member

@mraspaud mraspaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this looks quite good now. I still have a few comments, but nothing serious.

@@ -759,14 +754,14 @@ def _gm_matches(gmapping, expected):
'grid_mapping_name': 'geostationary',
'semi_major_axis': a,
'semi_minor_axis': b,
'sweep_axis': None,
'name': 'geos'})
# 'sweep_angle_axis': None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my offline test, I get 'sweep_angle_axis': 'y'. Should this be added ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think sweep angle axis is only guaranteed with pyproj 2.5+ since that version now bases the CF output on the WKT version of the CRS instead of the PROJ string. Ok with requiring pyproj 2.5+ for this writer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with it, but it might be too recent. Could we just check that sweep_angle_axis is y if it is there ?

Comment on lines 783 to 791
# warn.assert_called()
# proj_dict = pyresample.geometry.proj4_str_to_dict(res.attrs['grid_proj4'])
# self.assertEqual(proj_dict['lon_0'], 4.535)
# self.assertEqual(proj_dict['lat_0'], 46.0)
# self.assertEqual(proj_dict['o_lon_p'], -5.465)
# self.assertEqual(proj_dict['o_lat_p'], 90.0)
# self.assertEqual(proj_dict['proj'], 'ob_tran')
# self.assertEqual(proj_dict['o_proj'], 'stere')
# self.assertEqual(proj_dict['ellps'], 'WGS84')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't useful anymore and should be removed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I just wanted to make sure it was obvious that this is no longer being done. I can remove it.

# self.assertEqual(proj_dict['proj'], 'ob_tran')
# self.assertEqual(proj_dict['o_proj'], 'stere')
# self.assertEqual(proj_dict['ellps'], 'WGS84')
self.assertIn('crs_wkt', grid_mapping.attrs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we copy the generated wkt and test against it ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not the whole string but I'd be comfortable with checking if certain parts are there. Just in case the WKT changes a bit I don't want to have to update these tests every time.

Comment on lines 198 to 199
# modifies dataarray in-place
area2lonlat(dataarray)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For style, I'd rather avoid in-place modifications. Can we make this return a modified copy of the array and do:

Suggested change
# modifies dataarray in-place
area2lonlat(dataarray)
dataarray = area2lonlat(dataarray)

?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we had this before, but since we're changing this anyway...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. The way this is set up is that area2lonlat and area2gridmapping are given a copy by area2cf. So these two functions are in-place but are only called with a copy. Still want the change? Not a big deal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I'd rather have each function make a copy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the lonlat case we add coordinates, does that still only need a shallow copy?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, but don't take my word for it.

Comment on lines 279 to 201
res.extend(area2gridmapping(dataarray))
res.append(area2gridmapping(dataarray))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, can we have area2gridmapping also return a modified copy of the array ? In this case we probably just need a shallow copy as we just add an attribute.

Copy link
Member

@mraspaud mraspaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mraspaud mraspaud merged commit d12bb7a into pytroll:master Apr 1, 2020
@sfinkens
Copy link
Member

sfinkens commented Apr 1, 2020

Very nice, good work!
@martin I don't think this affects us, because PPS uses the lons/lats directly.
Good point about the tests @djhoese. I tend to patch too much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NetCDF (CF) writer doesn't include semi_minor_axis/semi_major_axis for new versions of pyproj
4 participants