Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

geom_sf() #88

Closed
edzer opened this issue Nov 29, 2016 · 49 comments
Closed

geom_sf() #88

edzer opened this issue Nov 29, 2016 · 49 comments

Comments

@edzer
Copy link
Member

edzer commented Nov 29, 2016

We need a geom_sf() to render simple feature objects in ggplots, e.g. where

nc = st_read(system.file("shape/nc.shp", package="sf"))
ggplot() + geom_sf(nc)

would add the polygon boundaries, and further argument can alter the color of polygons and polygon boundaries, as well as point and line geometries. I wrote some building blocks, basically functions that convert sf objects into grid grobs here and am willing to help out, but would appreciate some help getting started from someone who's been here before: @thomasp85, @hadley, @mdsumner , @hrbrmstr?

Issues to keep in mind:

  • can geom_sf control the aspect ratio used, such that 1 km north equals 1 km east?
  • when combining with ggmap(), I think web mercator is assumed: automatically use st_transform?
  • when combining different geom_sfs, can we verify that coordinate reference systems correspond?
@thomasp85
Copy link
Contributor

One thing to keep in mind is that ggplot2 works with data in data.frames. I'm not familiar with sf object and how they are encoded, but this needs to be figured out first (maybe it already is)...

@edzer
Copy link
Member Author

edzer commented Nov 29, 2016

Thanks - sf objects are data.frames. The geometry is in a list-column, which, since Hadley's keynote during UseR! is considered "tidy".

@thomasp85
Copy link
Contributor

Super - that was a quick solve :)

You are welcome to open an issue in ggforce if you wish, unless you're planning on a separate ggsf package or something...

@edzer
Copy link
Member Author

edzer commented Nov 29, 2016

Thank you, great idea - I opened the issue there.

@hadley
Copy link
Contributor

hadley commented Nov 29, 2016

I think this should either be included in sf itself or in ggplot2.

@thomasp85
Copy link
Contributor

If you want it in ggplot2 I'm fine with moving it over there...

@edzer
Copy link
Member Author

edzer commented Nov 29, 2016

Maybe develop it in ggforce first, and then decide?

@thomasp85
Copy link
Contributor

I'll leave that up to you and Hadley - ggforce probably has a quicker release cycle than ggplot2 (unless Hadley has hired a new intern:)...

@hadley
Copy link
Contributor

hadley commented Nov 29, 2016

I think it's important enough that I'd do a ggplot2 release specifically for it

@thomasp85
Copy link
Contributor

Ok - I'll see if I can get time to make a proof-of-concept implementation of it in a ggplot2 branch instead of ggforce

@hadley
Copy link
Contributor

hadley commented Nov 29, 2016

Awesome - thanks!

@edzer
Copy link
Member Author

edzer commented Nov 29, 2016

Sounds great!

@hadley
Copy link
Contributor

hadley commented Nov 29, 2016

Wrt to coordinate systems, I think it might be to transform each layer to lat/lon coordinates, and then assume the user will use coord_map(), coord_quickmap(), or coord_proj() to ensure correct project.

Alternatively it would be possible to have some sort of helper that did:

something_sf <- function(sf, ...) {
  list(
    geom_sf(...),
    coord_map(proj = get_projection(sf))
  )
}

(this is purely imaginary code, but it would mean that the last sf added "wins" the battle of coordinate systems)

Doing anything more automatically would require bigger changes to ggplot2 because there's no automatic guessing of coordinate systems.

@edzer
Copy link
Member Author

edzer commented Nov 29, 2016

Reading ?mapproject, it looks like mapproj does not have any notions of a datum -- never assume that lat/lon always means the same, or means anything much without knowing the datum! The rest of the (open source) world has essentially settled on GDAL/PROJ.4 as the code base to categorize and convert to/from projections and transform between datums; mapproj was developed independently by a group with different goals. Roger Bivand @rsbivand will be visiting here next week Mon-Wed to work on sf, we will also take up this issue and report back.

@hadley
Copy link
Contributor

hadley commented Nov 29, 2016

Datum is what I was missing - thanks!

The easiest way to get this to work with ggplot2 would be to have some sort of universal numeric representation of location, but I'm guessing that that doesn't exist. Alternatively, if there was a special vector class that had an attribute we could do something like what we do for POSIXct + time zones at the scale level.

Hmmm, if coord_map() is suboptimal (since it uses mapproj) maybe it's worth considering ggspatial or similar which would bundle coord_proj() from @hrbrmstr and this geom. (And then in the long-term, deprecate the existing spatial stuff from ggplot2).

@edzer
Copy link
Member Author

edzer commented Nov 29, 2016

There is a great deal of interoperability in different coordinate reference systems (CRS), look for instance at http://spatialreference.org/ref/epsg/4326/ for 14 encodings of datum WGS84. Geodesists and national mapping agencies all over the world work continuously to improve this, but ignore package mapproj. GDAL provides automatic conversions, and will read and write data anywhere, converting CRS encodings appropriately.

Package sf has a geometry list-column that carries a crs attribute:

> library(sf)
Linking to GEOS 3.5.0, GDAL 2.1.0
> nc = st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)
> st_crs(nc)
$epsg
[1] 4267

$proj4string
[1] "+proj=longlat +datum=NAD27 +no_defs"

attr(,"class")
[1] "crs"

having epsg and proj4string has proven to be enough, over the last 15 years, for spatial data in R to interoperate with the rest of the world.

I think this is as close as you can get to a universal numeric representation of location, and to the POSIXct + TZ metaphor.

@hadley
Copy link
Contributor

hadley commented Nov 29, 2016

Does this approach sound reasonable?

  • geom_sf() aways re-projects to WGS84/EPS4267 (this ensures that all layers are consistent)
  • coord_proj() allows you to re-project to the coordinate system you actually want

This is a bit inefficient because of needless re-projection, but it would make initial ggplot2 implementation simpler (and then in the longer run we could figure out a system where you'd get a default coord_proj() automatically).

@edzer
Copy link
Member Author

edzer commented Nov 29, 2016

Not all spatial data have a CRS: it may be missing, local, or non-Earth related (think astrophysics, cell biology or MRT scan data). I think

  • by default geom_sf() should work like geom_points() but with aspect ratio 1, i.e. assume Euclidian coordinates,
  • in case of longlat, we should try to improve by choosing a better aspect ratio, depending on the (mean) latitude, e.g. 1/cos(mean(lat)),
  • if we define a coord_proj() (or e.g. a ggmap background) all geom_sf() should be (re)projected on the fly to the target projection; error raised if they don't have a CRS set

Ideally, the projection should happen by st_transform as that understands the source CRS.

(And as a side node: 4267 is not WGS84, but NAD27: a local (North American) Datum)

@hadley
Copy link
Contributor

hadley commented Nov 29, 2016

There's currently no way for a geom to request a coordinate system, so we have to rely on the user supplying either coord_fixed() (1:1), coord_quickmap() (better aspect ratio), or coord_proj().

If we can't always convert to WGS84, then we'll need a new vector class:

projected <- function(x, proj4string) {
  stopifnot(is.numeric(x))
  structure(x, class = "projected", proj4string = proj4string)
}

which will be (redundantly) applied to both x and y. Then we'll need a new spatial_trans() function (along the same lines as time_trans()) and a new ScaleContinuousProjected`: together that will ensure that all layers project to the same coordinate system. That's not exactly elegant, but I think it's the easiest way to get this working properly in ggplot2.

We'll also need some hacks so that a geom_sf() layer can convey its dimensions to the scales. I think the easiest way to do that will be automatically add xmin, xmax, ymin, and ymax aesthetics mapped to the bounding box of each feature.

@edzer
Copy link
Member Author

edzer commented Nov 29, 2016

Much of this is beyond my understanding. Do note that you can't reproject x and y independently.

@hadley
Copy link
Contributor

hadley commented Nov 29, 2016

Yeah, I was just thinking that too. In that case I think the best we can do is give a warning if there are multiple CRSs. Anyway, I think I at least understand the main problems now, and I can keep mulling over what we need to do in ggplot2 to better support spatial data.

It feels like ggspatial (or ggsf or whatever we want to call it) could be a good project for another summer intern.

@thomasp85
Copy link
Contributor

This sounds more and more like an effort of ggraph scope, which is not something I can take upon me right now. I'll be happy to chime in but I think someone else needs to drive it forward

@edzer
Copy link
Member Author

edzer commented Nov 30, 2016

While Hadley is thinking about an utopian way to dynamically reproject maps on the fly, we can do something simple now that will make many users happy, by making clear (documented) assumptions:

  • we plot simple features in Euclidian (metric) space, letting the user take care of the aspect ratio (= 1)
  • when we add simple features to a ggmap(), we assume it is longlat/WGS84 (as is done now)

sf provides the means to verify both assumptions, but we can even leave that up to the user, for now.

How does that sound?

@hadley
Copy link
Contributor

hadley commented Nov 30, 2016

Yeah, we can absolutely start by assuming that all layers are already correctly projected.

@barryrowlingson
Copy link
Contributor

Why do you even need a geom_sf? Can you not overload the existing geom_polygon (or geom_map?) function for sf-class objects? Or is there something in the ggplot2 architecture that makes this hard?

@hadley
Copy link
Contributor

hadley commented Nov 30, 2016

Because geom_polygon expects one row per vertex, geom_sf() would have one row per feature.

@barryrowlingson
Copy link
Contributor

The current geom_polygon expects one row per vertex. But could it not be made so that if the data is an sf object (or indeed inherits from sf) that it draws polygons using the geometry in each row?

I'm not sure putting the name of a class in a geom_name (as geom_sf) is a good idea either. There's no geom_data.frame, and this would seem to violate the grammar inherent in geom function names. Given that a single sf object could contain objects of differing geometries (points, lines and polys) it seems hard to decide on a good name for a geom function that could output such a range of different possible shapes. Also, if I create a class that inherits from sf then I don't want to have to remember that underneath its an sf object.

How about geom_spatial instead? This has no reference to class names, and describes the output as being the spatial component of the data. Users would then expect to see a map of the row-based spatial geometry encoded in whatever object they feed in, be it points, lines, or polygons. This could possibly even be ported to sp classes of it can be made generic.

@thomasp85
Copy link
Contributor

geom_polygon requires at least two rows for a single line and will throw warnings if only a single row is provided.

As i understand it sf data are data.frames containing a column with the sf data (I might be totally wrong - haven't got time to look into sf yet). Anyway I see no violation of the grammar as long as a geom requires a specific column class for some of its operations...

@hadley
Copy link
Contributor

hadley commented Dec 1, 2016

I think you're missing the fundamental difference between sf and sp, and unfortunately there's no way a geom could work in the way you describe.

@mtennekes
Copy link
Contributor

mtennekes commented Dec 2, 2016

First of all @edzer 👍 package. I'm happy to upgrade tmap to deal with sf objects. I'm new to simple features, but it looks like an elegant way to store spatial data.

However, I don't get the fundamental different between sf and sp yet. Is there any (meta) data stored in a sf object that is missing in a sp object, or vise versa? I mean, it seems like I can convert them back and forth:

nc <- st_read(system.file("shape/nc.shp", package="sf"))
nc_sp <- as(nc, "Spatial")

Or can a sf object consist of multiple geometry types (since it is vector based)?

@edzer
Copy link
Member Author

edzer commented Dec 2, 2016

@mtennekes - thanks! The metadata is stored in the attributes of the simple feature geometry list-column, but you get them easiest by st_bbox(sf) and st_crs(sf).

Yes, conversion should work for (multi)points, (multi)linestrings and (multi)polygons; in case of GEOMETRYCOLLECTION or other classes you can either give an error or make the additional effort.

@edzer
Copy link
Member Author

edzer commented Dec 2, 2016

Or can a sf object consist of multiple geometry types (since it is vector based)?

yes: an sfc (list-column with geometries) can be of type sfc_GEOMETRYCOLLECTION, in which case each geometry has type GEOMETRYCOLLECTION, and it can be sfc_GEOMETRY, in which case the geometries can be a mix. See vignettes for full longer story.

@mdsumner
Copy link
Member

mdsumner commented Dec 3, 2016

One of my utopian visions would be to have hierarchical spatial data created by expression in a general grammar. There's a sense in which a the gg aesthetic settings are a select, group by, arrange chain and the plotted geoms could alternatively result in persistent spatial layers. I see real value in an integration of complex object structure and aesthetic mappings, in a way that say a layer in GIS incorporates geometry and aesthetics, but is (usually) stripped back to only geometry or only presentation form when serialized, or published.

I don't think that is too utopian, but it requires some deeper integration of "spatial" and "the grammars". Some can already be easily done, for example a table of points measuring animal tracks over time is organizable by straight forward dplyr verbs and the result can be plotted with ggvis, as multi points, or multi-lines, or passed to an sf or sp composer function.

The concepts are not radically different, in fact I think they are exactly the same - but the tools are not easily compatible. I've been exploring areas where the two worlds are brought closer, but a fuller system needs wider discussion and involvement. I think there's a very timely opportunity here.

@edzer edzer mentioned this issue Dec 19, 2016
@mdsumner
Copy link
Member

Is is accurate to say that sf objects (from a single row) come as their own custom stat? Is there any example of geoms like that already? @hadley

I also realized that the fortify model would work, as long as we think about it on a per-feature basis, not over whole dataset, the internal decomposition would essentially be the same as that iiuc, though modified for the extra structure in multipolygons, applied to the grob calls.

@etiennebr
Copy link
Member

I'll repeat a bit from #131, but I think I'd rather have an interface on sfc than sf. Something like :

st_read(system.file("shape/nc.shp", package="sf")) %>% 
  ggplot(aes(sf=geom, fill=SID79)) +
  geom_sf()  # or some other name :)

That would offer free support for every class that inherits from data.frame. Piping is also easier.

More complex stuff is also easier to imagine (at least for me):

st_read(system.file("shape/nc.shp", package="sf")) %>% 
  as_tibble() %>%  # (sfc does not allow multiple geometries)
  mutate(buffer = st_buffer(geometry)) %>% 
  ggplot() +
  geom_sf(geometry, aes(sf=geom, fill=SID79))  + 
  geom_sf(buffer, aes(sf=geom, fill=NA))

It might also be worthwhile to consider that geometries have multiple representations; this is central to the grammar of graphics. A polygon could be represented by its vertex for instance. Would that be geom_sf(option = "point"), overload geom_point or geom_vertex? What about overloading all the geoms that could receive a geometry: geom_point, geom_segment, geom_polygon ? The case of GEOMETRY with multiple geometries being a bit ambiguous.

@edzer
Copy link
Member Author

edzer commented Feb 1, 2017

For those of you interested in what's happening: there is a working geom_sf() in the sf branch of ggplot2. Right now, it uses equirectangular projection, scale true at mean latitude, similar tosp::plot and sf::plot, and for now it will move everything to WGS84 before plotting.

library(sf)
library(maps)
world1 <- st_as_sf(map('world', plot = FALSE, fill = TRUE))
library(ggplot2)
ggplot() + geom_sf(data = world1)

gives
ggplt

and, from the help pages of geom_sf

 nc <- sf::st_read(system.file("shape/nc.shp", package = "sf"))
ggplot(nc) + geom_sf(aes(fill = AREA))

gives
nc

Comments are welcome.

@hadley
Copy link
Contributor

hadley commented Feb 1, 2017

@edzer It ensures all layers have the same CRS (not necessarily WGS84) - that's just used as the datum for the graticule

@edzer
Copy link
Member Author

edzer commented Feb 1, 2017

Ah, I didn't know you got that far. I now see:

library(sf)
library(maps)
library(ggplot2)
world1 <- sf::st_as_sf(map('world', plot = FALSE, fill = TRUE))
world2 <- st_transform(world1,
    "+proj=laea +y_0=0 +lon_0=155 +lat_0=-90 +ellps=WGS84 +no_defs")
ggplot() + geom_sf(data = world2)

wrld

where we need to revisit the tick labels, but otherwise: nice!

@rsbivand
Copy link
Member

rsbivand commented Feb 1, 2017

Very promising!

edzer added a commit that referenced this issue Feb 2, 2017
In the circular world example of #88 the graticule is no longer clipped by the graph box; the original st_graticule tried to do this. It now returns a global graticule, and transforms that; the axis tick values and labels for that case no longer make sense. This commit changes these into "." for this particular case, so they're not longer overplotted by ggplot (they still show as a ".", though).
@GreenStat
Copy link

GreenStat commented Feb 11, 2017

HI, I tried point and line geometry with (the experimental) 'geom_sf' :

library(sf)
library(maps)
library(ggplot2)

#points
data(meuse,package="sp")
meuse_sf <- st_as_sf(meuse, coords = c("x", "y"), crs = 28992, agr = "constant")

summary(meuse_sf)

meuse_sf %>% ggplot() + geom_sf(aes(size=cadmium))

this gives:
tmp1

I am a bit confused by the behaviour of aes in geom_sf, with aes(size=cadmium) I expected varying size of dots, instead it gives me varying size of line thickness of the circles.
How do I change type and size of dots?

meuse_sf %>% ggplot() + geom_sf(aes(color=cadmium)) gives:
tmp2

line color of the circles changes, I want a fill....

meuse_sf %>% ggplot() + geom_sf(aes(fill=cadmium))
this gives:
tmp3

more confusing, I still want a fill.... :-), how can I do that ?

lines

I took data from the trajectories package

data(storms,package="trajectories")
summary(storms)
plot(storms)
#this coerces a TracksCollection  to a SpatialLinesDataFrame
storms.sp <- as(storms,"SpatialLinesDataFrame") 
# sp to sf
storms.sf <- st_as_sf(storms.sp)

check if file is oke

storms.sf %>% plot()
#Error in `levels<-.factor`(`*tmp*`, value = as.character(if (is.numeric(breaks)) x[!duplicated(res)] else breaks[-length(breaks)])) : 
#  number of levels differs

# Ai, this is caused by posix  columns  in the sf, tmin and tmax are POSIXct
#if I drop them with storms.sf[,-c(6:7)] the problem is solved
# bug ?

# get a date column in the sf
storms.sf$date <- format(storms.sf$tmin,"%Y-%m-%d")

# standard sf plot
storms.sf[,-c(6:7)] %>% plot()

this gives :
tmp4


# test some options
storms.sf[,-c(6:7)] %>% ggplot() + geom_sf(aes(colour=date)) + theme_bw()
storms.sf[,-c(6:7)] %>% ggplot() + geom_sf(aes(lty=date)) + theme_bw()
storms.sf[,-c(6:7)] %>% ggplot() + geom_sf(aes(size=date)) + theme_bw()
storms.sf[,-c(6:7)] %>% ggplot() + geom_sf(aes(fill=date)) + theme_bw()

resulting in: (only first 2 plots)
tmp5
tmp6

here the legends are confusing and geom_sf(aes(fill=date)) gives me black lines with a legend with colors (not shown ,identical to points example)

to end with a 👍, combining maps is possible and looks great , this:

storms.sf[,-c(6:7)] %>% ggplot() +  geom_sf(aes(colour=date)) +  geom_sf(data=world1) +
 xlim(100,0) +ylim(0,60) + theme_bw()

gives :
tmp9

cudos !

@hadley
Copy link
Contributor

hadley commented Feb 11, 2017

@GreenStat could you please file the line and point issues individually on ggplot2?

@edzer I think we can now close this issue

@adrfantini
Copy link

Sorry to chime in in this now-closed issue (from which I learned a lot), but I have a fast question. I did not exactly understand what ggplot is doing right now. Am I right in assuming that if no projection information is provided in the input sf (could this even happen?), it will assume datum = sf::st_crs(4326)? And that if a datum is otherwise available, it will use that?

What if multiple geom_sf() are provided? Will all the data st_transform()ed to the datum of the last call?

@solomonsg
Copy link

solomonsg commented Oct 7, 2019

Dear all,

I am plotiing values in attribute table of a shapefile and the values in one of the table ( e.g. Duration) ranges from 0-4370 and i want to classfy them into 5 using somthing like breaks,
breaks=c(0, 2776, 3061, 3272, 3558, 4370)

Is there any way to to do this classfication in r;

shp <- read_sf('Duration.shp')
ggplot(shp) + geom_sf(aes(colour = Duration))

Best regards,

Solomon

@barryrowlingson
Copy link
Contributor

barryrowlingson commented Oct 7, 2019

Create a new factor variable by cutting on those breaks and plot that:

# first fake some point data:
shp = data.frame(x=runif(100), y=runif(100), Duration=runif(100,0,4370))
shp = st_as_sf(shp, coords=c(1,2))
# then...
breaks=c(0, 2776, 3061, 3272, 3558, 4370)
shp$DurationLevel = cut(shp$Duration,breaks=breaks)
ggplot(shp) + geom_sf(aes(colour=DurationLevel,fill=DurationLevel),size=4)

I've specified a fill and colour aesthetic otherwise the legend symbols don't get drawn properly. There's probably a way of making the legend draw as dots, but you've probably got polygons so adjust as needed...

Use the scale_*_discrete functions to change the colour scheme.

@phdykd
Copy link

phdykd commented Nov 2, 2020

Error: Object is neither from class sf, stars, Spatial, nor Raster. I have merged two dataframes. One of them had long,lat and the other one had geom for Africa. However, no matter what I do, I got the error saying "Error: Object is neither from class sf, stars, Spatial, nor Raster. I am trying to use geom_sf and tm for mapping. None of them worked.
Any recommendation?
Thanks

@edzer
Copy link
Member Author

edzer commented Nov 2, 2020

Maybe it is a data.frame. Have you tried passing it to st_sf() first?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests