-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor overline function for sf objects #273
Comments
The transport ‘flow’ on any particular segment of the transport networks Creating such a route network, with aggregated values per segment, is Let’s start simple, with just 2 lines, which have an associated amount library(stplanr)
routes_fast_sf$value = 1
sl = routes_fast_sf[2:3, ]
sl$value = c(2, 5) These lines clearly have a decent amount of overlap, which can be sl_intersection = sf::st_intersection(sl[1, ], sl[2, ])
#> although coordinates are longitude/latitude, st_intersection assumes that they are planar
plot(sl$geometry, lwd = 9, col = sf::sf.colors(2, alpha = 0.5))
plot(sl_intersection, add = TRUE) Furthermore, we can find the aggregated value associated with this new sl_intersection$value = sum(sl_intersection$value, sl_intersection$value.1) We still do not have a full route network composed of 3 non-overlapping sl_seg1 = sf::st_difference(sl[1, ], sl_intersection)
#> although coordinates are longitude/latitude, st_difference assumes that they are planar
sl_seg2 = sf::st_difference(sl[2, ], sl_intersection)
#> although coordinates are longitude/latitude, st_difference assumes that they are planar
plot(sl$geometry, lwd = 9, col = sf::sf.colors(2, alpha = 0.5))
plot(sl_seg1, add = TRUE)
plot(sl_seg2, add = TRUE) We now have all the geographic components needed for a route network. rnet = rbind(sl_seg1, sl_seg2, sl_intersection)
#> Error in rbind.data.frame(...): numbers of columns of arguments do not match Lesson: we need to be more careful in isolating the value to aggregate. attrib = "value"
attrib1 = paste0(attrib, ".1")
sl_intersection = sf::st_intersection(sl[1, attrib], sl[2, attrib])
#> although coordinates are longitude/latitude, st_intersection assumes that they are planar
sl_intersection[[attrib]] = sl_intersection[[attrib]] + sl_intersection[[attrib1]]
sl_intersection[[attrib1]] = NULL That leaves us with a ‘clean’ object that only has a value (7) for the On this basis we can proceed to create the other segments, keeping only sl_seg = sf::st_difference(sl[attrib], sf::st_geometry(sl_intersection))
#> although coordinates are longitude/latitude, st_difference assumes that they are planar
rnet = rbind(sl_intersection, sl_seg) It worked! Now we’re in a position to plot the resulting route network, plot(rnet, lwd = rnet$value) A benchmarkTo test that the method is fast, or is at least not slower than the overline_sf2 = function(sl, attrib) {
attrib = "value"
attrib1 = paste0(attrib, ".1")
sl_intersection = sf::st_intersection(sl[1, attrib], sl[2, attrib])
sl_intersection[[attrib]] = sl_intersection[[attrib]] + sl_intersection[[attrib1]]
sl_intersection[[attrib1]] = NULL
sl_seg = sf::st_difference(sl[attrib], sf::st_geometry(sl_intersection))
rnet = rbind(sl_intersection, sl_seg)
return(rnet)
} If you are new to scripts/algorithms/functions, it may be worth taking a system.time({overline(sl, attrib = "value")})
#> user system elapsed
#> 0.049 0.000 0.049
system.time({overline_sf2(sl, attrib = "value")})
#> although coordinates are longitude/latitude, st_intersection assumes that they are planar
#> although coordinates are longitude/latitude, st_difference assumes that they are planar
#> user system elapsed
#> 0.033 0.000 0.033 The results are not Earth-shattering: the new function seems to be Dealing with many linesThe above method worked with 2 lines but how can it be used to process sl3 = routes_fast_sf[4, ]
rnet = overline_sf2(sl)
#> although coordinates are longitude/latitude, st_intersection assumes that they are planar
#> although coordinates are longitude/latitude, st_difference assumes that they are planar
plot(rnet$geometry, lwd = rnet$value)
plot(sl3, col = "red", add = TRUE) In this case the method of adding to rnet is simple: just add the rnet3 = rbind(rnet, sl3[attrib])
plot(rnet3$geometry, lwd = rnet3$value) This works fine. In fact it works better than the original sl1_3 = as(routes_fast_sf[2:4, ], "Spatial")
rnet3_sp = overline(sl1_3, attrib = "value")
plot(rnet3_sp, lwd = rnet3_sp$value) A question that arises from the previous example is this: what if the sl4_5 = routes_fast_sf[5:6, ]
plot(rnet3$geometry, lwd = rnet3$value)
plot(sl4_5$geometry, col = "red", add = TRUE) Both the new lines intersect with the newest part of the route network. Before we deal with them, it’s worth taking some time to consider what relations = sf::st_relate(sl4_5, rnet3)
#> although coordinates are longitude/latitude, st_relate assumes that they are planar
relations
#> [,1] [,2] [,3] [,4]
#> [1,] "FF1F00102" "FF1FF0102" "FF1FF0102" "1F1F00102"
#> [2,] "FF1F00102" "FF1FF0102" "FF1FF0102" "1F1F00102"
unique(as.vector(relations))
#> [1] "FF1F00102" "FF1FF0102" "1F1F00102" This shows us something important: although 2 elements (1 and 4) of In the simple case of whether to simply bind the next line (4) onto relate_rnet_3 = sf::st_relate(rnet, sl3, pattern = "1F1F00102")
#> although coordinates are longitude/latitude, st_relate_pattern assumes that they are planar
relate_rnet_3
#> Sparse geometry binary predicate list of length 3, where the predicate was `relate_pattern'
#> 1: (empty)
#> 2: (empty)
#> 3: (empty)
any(lengths(relate_rnet_3))
#> [1] FALSE The sl4 = sl4_5[1, ]
relate_rnet_4 = sf::st_relate(rnet3, sl4, pattern = "1F1F00102")
#> although coordinates are longitude/latitude, st_relate_pattern assumes that they are planar
relate_rnet_4
#> Sparse geometry binary predicate list of length 4, where the predicate was `relate_pattern'
#> 1: (empty)
#> 2: (empty)
#> 3: (empty)
#> 4: 1
any(lengths(relate_rnet_4))
#> [1] TRUE How to proceed? We need to avoid sel_overlaps = lengths(relate_rnet_4) > 0
rnet_overlaps = rnet3[sel_overlaps, ]
rnet3_tmp = rnet3[!sel_overlaps, ] We can check that there is only one overlapping feature as follows: nrow(rnet_overlaps)
#> [1] 1 And we can proceed to join the two features together using our new rnet_overlaps4 = overline_sf2(rbind(rnet_overlaps, sl4[attrib]))
#> although coordinates are longitude/latitude, st_intersection assumes that they are planar
#> although coordinates are longitude/latitude, st_difference assumes that they are planar Adding this back to the rnet = rbind(rnet3_tmp, rnet_overlaps4)
plot(rnet$geometry, lwd = rnet$value) |
Profound stuff for a Saturday evening, and immediately prompted this |
Re the "3 non-overlapping lines" part, this is exactly what ARC is meant to do - but it's totally broken for lines atm (it assumes a trick about closing polygons, but it works from a sequential model rather than edges, so needs a revisit overall). (@Robinlovelace I haven't looked in detail at the rest of your post! ) Here's a purely SC approach. We end up with a nested column identifying which object/s each section comes from (I haven't labelled section well, and here the object_ id is the rownames of the original object, which I'm reviewing anyway). library(stplanr)
routes_fast_sf$value = 1
sl = routes_fast_sf[2:3, ]
sl$value = c(2, 5)
library(silicate)
sc <- SC(sl)
## 3 non-overlapping lines
library(dplyr)
segments <- sc$object_link_edge %>% group_by(edge_) %>% tally() %>%
split(.$n) %>% purrr::map(~inner_join(.x, sc$object_link_edge) %>%
inner_join(sc$edge) %>%
inner_join(sc$vertex, c(".vx0" = "vertex_")) %>%
transmute(object_, .vx1, x0 = x_, y0 = y_, n) %>% ## limit to next vertex, and this vertex values
inner_join(sc$vertex, c(".vx1" = "vertex_")) %>% ## get last vertex values
transmute(object_, x0, y0, x1 = x_, y1 = y_, n)) %>%
bind_rows(.id = "section")
mk_edge <- function(x) {
## one row, one segment
sf::st_linestring(matrix(unlist(x[c("x0", "x1", "y0", "y1")]), ncol = 2))
}
mk_nest_sf <- function(x) {
sf::st_sf(objects = tidyr::nest(x["object_"] %>% distinct(), .key = "object_"),
geometry = sf::st_union(sf::st_sfc(purrr::map(purrr::transpose(x), mk_edge))))
}
## split the 1s by object
## combine the 2s
a <- rbind(mk_nest_sf(segments %>% dplyr::filter(n == 2)),
do.call(rbind, purrr::map(segments %>% dplyr::filter(n == 1) %>% split(.$object_), mk_nest_sf)))
a$section <- seq_len(nrow(a))
plot(a %>% dplyr::select(- object_), lwd = 3)
Each section is separate, and knows what object it came from
This is exactly the same plane partitioning idea used by the spacebucket - it's what Eonfusion called "fusion" and used this simplicial complex approach as silicate does. There's absolutely no dealing with fuzzy data, but that is best done independently IMO). |
it works with minimal_mesh and inlandwaters (I think). An interesting thing with this one is it's essentially grouped any path to the object it came from unless it's a shared boundary, which is fair - in reality we won't be turning back into sf like this until we're done with other tasks. and one more with This is exactly what I need to fix ARC |
I'm only deal with "occurs once" or "occurs twice" cases, which is the basic deal with polygon layers - for use in what overline is doing it will have to map over n in |
Interesting stuff. Maybe at some time I'll have to refactor the function again, to support |
It's great, I never really had such clear and clean overlaying examples before - this a perfect test case and made me realize many things. The info is all in the object link table, we don't want to need sf to reconstruct paths from arbitrary edges! |
Regarding next steps on my evolving |
Heads-up @mem48 here's the PR that presents a methodology to do this: https://github.com/ropensci/stplanr/blob/refactor-overline/vignettes/overline.Rmd Will be interesting to see how the 'split into single 2 point lines' method compares. |
That doesn't look like the first 4 lines but all of them! In any case this looks very promising and a good starter for a new implementation, potentially that is well suited to parallelisation in a lower level language like C++/rust. |
|
@Robinlovelace done more work on this as part of the PCT Phase 3. https://github.com/mem48/pct-raster-tests/blob/master/R/overline.R The code has changed a lot from my earlier example, around improving performance on large datasets and simplifying the geometry at the end to fewer longer lines. |
Latest: I will test the new function and incorporate from here https://github.com/mem48/pct-raster-tests |
https://github.com/mem48/pct-raster-tests/blob/master/R/overline.R contains the function called overline_malcolm2 (working title) which is the one in need of checking. There is one known bug when a route stops halfway down a segment, then the overlapping segments do not match and the function does not identify the overlap |
Here are some benchmarks for overline btw: Dataset with 60,000 lines in 214s:
|
~ 300 lines per second. Not bad. That suggests for 2 million routes over the UK it will be around 2 hours:
|
Currently
overline.sf()
simply wrapsoverline.sp()
. It converts thesf
object into aSpatial
object before runningoverline.sp()
, then converts it back into ansf
object. This is inefficient. Furthermore there are issues with theoverline.Spatial()
function, e.g. as outlined in #181 and #111.There is probably a need for aggregation via routing software such as dodgr but I think an sf-based
overline()
function would be useful as a number of people use it and often you have routes recorded as spatial lines. So this issue is probably overdue.The text was updated successfully, but these errors were encountered: