-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversion between multipart and singlepart #199
Comments
Could you please provide a minimal reproducible example that supports your issue? I am completely unfamiliar with singlepart or multipart shapefiles - sf reads shapefiles through GDAL, that's it. |
@edzer A minimal example that supports the difference in performance, or that highlights what is meant by singlepart and/or multipart? Or perhaps both? |
both. |
Here is a minimal example, with the shape files mentioned below downloadable from this link. Original shape file links are also given below:
There are three shape files used:
(1) and (2) represent polygon files from the cartographic version of Statistics Canada's 2016 Census Federal Electoral Districts (2013 Representation order). I downloaded the shape files (which are multipart by default) and created a singlepart version using QGIS's native Multipart to singlepart tool. (3) represents a point file from the cartographic version of Statistics Canada's 2016 Census Dissemination Areas. I downloaded the shape file (which is multipart by default) and extracted their centroids using QGIS. In the above case, the intersection between the two layers occurred 40 times faster when working with singlepart polygons (40 seconds compared to ~30 minutes):
Reference: ARCMap help on Multipart to Singlepart |
Thanks, now I understand what you meant by multipart and singlepart shapefiles. I got used to calling these You can convert from x = st_cast(polygonfile.m, "POLYGON") Interestingly, the difference in timing is much more modest if you do this for 1000, 2000, 3000 points; for 7000 it seems to explode; I have no clue why. I recently added the option for prepared geometries, and this brings down the computing time quite dramatically for this case: > tic('Singlepart intersection, prepared')
> intersect_output <- st_intersects(pointfile, polygonfile.s, prepared = TRUE)
although coordinates are longitude/latitude, it is assumed that they are planar
> toc()
Singlepart intersection, prepared: 2.242 sec elapsed
> tic('Multipart intersection, prepared')
> intersect_output <- st_intersects(pointfile, polygonfile.m, prepared = TRUE)
although coordinates are longitude/latitude, it is assumed that they are planar
> toc()
Multipart intersection, prepared: 5.084 sec elapsed |
Your shapefiles have btw a wrong projection file: they suggest long/lat WGS84, but the coordinate values suggest otherwise. More dramatic improvements with > st_crs(polygonfile.m) = NA
> st_crs(polygonfile.s) = NA
> st_crs(pointfile) = NA
>
> # Spatial join: Point file with singlepart polygon file
> tic('Singlepart intersection')
> out.s = intersect_output <- st_intersects(pointfile, polygonfile.s, prepared = TRUE)
> toc()
Singlepart intersection: 2.194 sec elapsed
> # Spatial join: Point file with multipart polygon file
> tic('Multipart intersection')
> out.m = intersect_output <- st_intersects(pointfile, polygonfile.m, prepared = TRUE)
> toc()
Multipart intersection: 5.047 sec elapsed
>
> tic('Singlepart intersection - x')
> x = intersect_output <- st_intersects(polygonfile.s, pointfile, prepared = TRUE)
> toc()
Singlepart intersection - x: 0.843 sec elapsed
> all.equal(out.s, sp:::.invert(x, nrow(polygonfile.s), nrow(pointfile)))
[1] TRUE
> # Spatial join: Point file with multipart polygon file
> tic('Multipart intersection - x')
> x = intersect_output <- st_intersects(polygonfile.m, pointfile, prepared = TRUE)
> toc()
Multipart intersection - x: 0.426 sec elapsed
> all.equal(out.m, sp:::.invert(x, nrow(polygonfile.m), nrow(pointfile)))
[1] TRUE @rsbivand do you see any disadvantage if I make |
@edzer Thanks for looking into this. I'd imagine argument order is important since the intersection of X with Y might be different from the intersection of Y with X in general. The first argument is usually kept intact with the second argument joined. The result therefore has the same number of rows (or features) as X. Swapping them around will lead to a different result. |
Yes, but as you see in my example above, this different result is easily converted in the result wanted (sp:::.invert is a 6 liner), and this would be done without the user knowing. Still, we'd needs some heuristics as of when to do this; number of features is clearly not the right thing to look at. |
Wrote a new issue about which geometry to prepare; closing this one. |
Singlepart shapefiles are known to perform better with complex operations (like intersection) compared to multipart shapefiles. Could conversions of this be implemented in sf?
Perhaps the following interface:
The text was updated successfully, but these errors were encountered: