Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New representation of crs objects #1225

Closed
edzer opened this issue Dec 22, 2019 · 7 comments
Closed

New representation of crs objects #1225

edzer opened this issue Dec 22, 2019 · 7 comments
Labels
breaking change ☠️ API change likely to affect existing code

Comments

@edzer
Copy link
Member

edzer commented Dec 22, 2019

TL;DR:

  • current crs objects are hard to move to the post-proj4string world we now live in
  • I suggest to have crs objects with two fields: input for user input / short description, and wkt for communicating to/from GDAL
  • this still supports specifying CRS by EPSG ID or proj4string, but also with 8 other CRS formats, widening the interoperability dramatically

Currently, crs objects have two fields: epsg (integer) and proj4string (character). Branch wkt2 added a third, wkt2 (character) with the WKT-2 representation. This tries to merge the PROJ.4 world view. where CRS are expressed as proj4strings, and the PROJ ( >= 6) world, where proj4strings are considered legacy (if not to be deprecated). As expected, this turned out to be very messy; see #1146.

Spawning from branch wkt2, I created a branch SetFromUserInput which rethinks crs objects. In short, in this branch crs objects have two character fields:

  • input: the string with which the object was initialised by the user, or the string that characterizes the CRS briefly when an object was read through GDAL; it is meant to be the human-readable version of the CRS (as opposed to WKT).
  • wkt: the wkt (or wkt2 in GDAL3/PROJ6) description of the CRS, which is meant to be communicate the entire CRS to and from GDAL

Only input is ever to be set by users, e.g. as in

> st_crs(4326)
Coordinate Reference System:
  User input: EPSG:4326 
  wkt:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    USAGE[
        SCOPE["unknown"],
        AREA["World"],
        BBOX[-90,-180,90,180]],
    ID["EPSG",4326]]

(where 4326 is converted to EPSG:4326) and input is passed on to OGRSpatialReference::SetFromUserInput. This means that not only proj4strings and EPSG IDs can be input, but also WKT(2), OGC urns, PROJJSON and even non-EPSG CRS's (in total 10 formats), e.g.:

> st_crs("urn:ogc:def:crs:EPSG::4326")
Coordinate Reference System:
  User input: urn:ogc:def:crs:EPSG::4326 
  wkt:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    USAGE[
        SCOPE["unknown"],
        AREA["World"],
        BBOX[-90,-180,90,180]],
    ID["EPSG",4326]]

This broadens the scope of crs objects in terms of interoperability, can also work with GDAL 2, and allows everyone to move away from proj4strings on their own pace. For convenience, the $ method for crs objects does a bit more than extracing input or wkt, but can generate some fields that help backward compatibility:

 > x = st_crs("urn:ogc:def:crs:EPSG::4326")
> x$epsg # now a character:
[1] "4326"
> x$proj4string # generated on-the-fly
[1] "+proj=longlat +datum=WGS84 +no_defs"

What messes up backward compatibility to reverse dependencies is stored legacy objects, e.g. in package tmap, and possibly data packages such as bcmaps. E.g.

> library(bcmaps)
Loading required package: sf
Linking to GEOS 3.8.0, GDAL 3.0.2, PROJ 6.2.1
> bc_bound <- get_layer("bc_bound")
> str(st_crs(bc_bound))
List of 2
 $ epsg       : int 3005
 $ proj4string: chr "+proj=aea +lat_1=50 +lat_2=58.5 +lat_0=45 +lon_0=-126 +x_0=1000000 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +"| __truncated__
 - attr(*, "class")= chr "crs"
> st_crs(bc_bound)
Coordinate Reference System:
  User input: +proj=aea +lat_1=50 +lat_2=58.5 +lat_0=45 +lon_0=-126 +x_0=1000000 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
  wkt:
BOUNDCRS[
... # long text skipped
]Warning messages:
1: In `$.crs`(x, "input") : old-style crs object found: please update code
2: In `$.crs`(x, "input") : old-style crs object found: please update code
3: In `$.crs`(x, "wkt") : old-style crs object found: please update code
4: In `$.crs`(x, "wkt") : old-style crs object found: please update code

These should ideally have been generated at package load rather than stored, to be robust against changes like this. I might be able to do an auto-conversion + warning for backward compatibility in packages that I control and that use them (sf, lwgeom).

This is related to #1033, #1146 and #1187.

@edzer
Copy link
Member Author

edzer commented Dec 22, 2019

My plan is to merge this into branch wkt2, merge that into master, and release when CRAN returns, after managing reverse dependencies fallout.

@ateucher
Copy link
Contributor

This looks like a great idea @edzer. What would be the best approach for data packages such as bcmaps? Just recreate the stored objects with the new version of sf?

@edzer
Copy link
Member Author

edzer commented Dec 23, 2019

Thanks Andy! In the end I guess yes; a more flexible solution that may work with both old and new sf could be to have bcmaps::get_layer set the crs with a call to st_set_crs(3005). But I still need to look into all rev deps.

@rsbivand
Copy link
Member

I agree; packages with canned data can revise defensively in sf and sp workflows by re-creating the crs/CRS on load. Please explore how this might be done, it'll be very useful in giving other packages in the same position a working example. I'm unsure how to add an R command to a straight data() or particularly lazy-loaded data, but with a get_...() function, it should be feasible. load() from GADM isn't obvious, as GADM needs to serve to old and new versions of sf (and sp).

@ateucher
Copy link
Contributor

Oh that's a more flexible idea... thanks!

@ateucher
Copy link
Contributor

ateucher commented Dec 24, 2019

Implemented in bcmaps here. Seems simple, but will test with dev sf whenever it's ready...

@edzer
Copy link
Member Author

edzer commented Mar 4, 2020

Branch SetFromUserInput has now been merged into master. CRAN release planned within 2 weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change ☠️ API change likely to affect existing code
Projects
None yet
Development

No branches or pull requests

3 participants