{{ message }}

# How to estimate shortest-path distance on a network and related issues#966

Closed
opened this issue Jan 28, 2019 · 12 comments
Closed

# How to estimate shortest-path distance on a network and related issues#966

opened this issue Jan 28, 2019 · 12 comments

### agila5 commented Jan 28, 2019

 I'm trying to code a spatial logistic model of car crashes in the municipality of Milan using `R` and, in particular, `sf` ( and some related packages), but I'm facing some problems. I'm not too sure about my approach and I think that the best way to explain the difficulties I'm finding is to provide a small and reproducible example (sorry for this long post). I decided to report this issue here (and not on SO or GIS stackexchange) since it's strictly related to `sf`, tell me if this is not the right place. I have a dataset containing the location of all car accidents that occurred in Milan during 2015 and these are the coordinates of the first 5 events (out of 8000 car crashes approximately). ```library(dplyr) library(sf) milan_car_crashes <- data.frame( ID = 1:5, X = c(1513037, 1513008, 1515473, 1514039, 1515748), Y = c(5034945, 5034750, 5036177, 5036820, 5037396) ) %>% # Transform as sf and set the CRS (Gauss-Boaga) st_as_sf(coords = c("X", "Y"), crs = "+proj=tmerc +lat_0=0 +lon_0=9 +k=0.9996 +x_0=1500000 +y_0=0 +ellps=intl +units=m +no_defs")``` Car crashes are an example of a spatial point pattern constrained to lie on a network of lines and, since I don't have an up to date shapefile of the highways of Milan, I decided to use the package `osmdata` to download it from OpenStreetMap. Looking at the wiki page of the Map Features of OSM, I chose to download the data of primary highways (while, on the real model, I will use also secondary, tertiary, unclassified, residential, motorways and trunk) as a simple feature and extract the LINESTRING geometry, setting the same CRS as my data. ```library(osmdata) milan_primary_highways <- opq("Milan, Italy") %>% add_osm_feature(key = "highway", value = "primary") %>% osmdata_sf() %>% trim_osmdata(bb_poly = getbb("Milan, Italy", format_out = "polygon")[[1]]) highways_sf <- milan_primary_highways\$osm_lines %>% st_transform(crs = "+proj=tmerc +lat_0=0 +lon_0=9 +k=0.9996 +x_0=1500000 +y_0=0 +ellps=intl +units=m +no_defs")``` Moreover, since car crashes represent a spatial point pattern that lie on a linear network, I think that the most appropriate way to measure the distance between any two points on the network is the shortest-path distance (and this is the biggest problem I'm facing as explained later). Firstly I want to check if the network of lines that I downloaded is connected or not (otherwise I don't think it's possible to measure a shortest-path distance), i.e. I have to verify that every highway is reachable from every other highway in the network. I coded this check using a combination of `st_touches` and `hclust` since I want to merge all streets touching each other (i.e. not only directly touching ones). This approach is really unsatisfactory since I can't use the sparse property of the matrix generated by `st_touches` (I don't know how to use `hclust()` on a `sgbp` object) therefore it's very memory consuming and it does not work if there are much more highways (i.e. I cannot use this approach if I consider also secondary, tertiary, unclassified and residentia highways) since the resulting matrix is too big. What's a better approach? ```# Create a matrix of touching streets touching_highways <- st_touches(highways_sf, sparse = FALSE) # Merge all streets that touch each other converting all highways_hclust <- hclust(as.dist(!touching_highways), method = "single") # Cut the dendrogram at heigh 0.5 so that all touching # streets stay in the same group highways_groups <- cutree(highways_hclust, h = 0.5) # Result table(highways_groups) ## highways_groups ## 1 2 3 4 5 6 7 8 9 10 11 12 ## 1444 21 18 8 4 5 8 2 3 5 2 2``` There are 12 groups of connected highways and I filter out those streets not belonging to the big group otherwise I cannot calculate the shortest-path distance if two points belong to two distinct networks. `highways_sf <- highways_sf[highways_groups == 1, ]` The second step for the estimate of the shortest-path distance is the projection of the car crashes on the network and I performed this task using `st_nearest_feature` and `st_nearest_point`. This step is necessary since not every car crash belongs to the network due to approximation and rounding errors. Then again I'm not sure this is the right approach but I simply followed the examples reported on the help page of `st_nearest_points`. Is there a better alternative? ```# Estimate the nearest point nearest_point <- milan_car_crashes %>% mutate( index_of_nearest_feature = st_nearest_feature(., highways_sf), nearest_feature = st_geometry(highways_sf[index_of_nearest_feature,]), nearest_point = purrr::pmap( list(geometry, nearest_feature), ~ st_nearest_points(.x, .y) %>% st_cast("POINT") %>% magrittr::extract2(2) ) ) %>% pull(nearest_point) %>% st_sfc(crs = "+proj=tmerc +lat_0=0 +lon_0=9 +k=0.9996 +x_0=1500000 +y_0=0 +ellps=intl +units=m +no_defs")``` I can now replace the geometry of the `sf` object with the new geometry of projected points. ```milan_car_crashes <- milan_car_crashes %>% st_drop_geometry() %>% st_as_sf(., geometry = nearest_point) ``` This is a plot of the result The biggest and most important problem is that I have no clue on how to estimate the shortest-path distance between two points belonging to the network of highways. How can I do that using `sf`? My sessionInfo. ```sessionInfo() ## R version 3.5.2 (2018-12-20) ## Platform: x86_64-w64-mingw32/x64 (64-bit) ## Running under: Windows 10 x64 (build 17134) ## ## Matrix products: default ## ## locale: ## [1] LC_COLLATE=Italian_Italy.1252 LC_CTYPE=Italian_Italy.1252 ## [3] LC_MONETARY=Italian_Italy.1252 LC_NUMERIC=C ## [5] LC_TIME=Italian_Italy.1252 ## ## attached base packages: ## [1] stats graphics grDevices utils datasets methods base ## ## other attached packages: ## [1] bindrcpp_0.2.2 osmdata_0.0.9 sf_0.7-2 dplyr_0.7.8 ## ## loaded via a namespace (and not attached): ## [1] Rcpp_1.0.0 pillar_1.3.1 compiler_3.5.2 bindr_0.1.1 ## [5] class_7.3-14 tools_3.5.2 digest_0.6.18 jsonlite_1.6 ## [9] lubridate_1.7.4 evaluate_0.12 tibble_2.0.1 lattice_0.20-38 ## [13] pkgconfig_2.0.2 rlang_0.3.1 DBI_1.0.0 curl_3.3 ## [17] yaml_2.2.0 xfun_0.4 e1071_1.7-0.1 xml2_1.2.0 ## [21] stringr_1.3.1 httr_1.4.0 knitr_1.21 classInt_0.3-1 ## [25] grid_3.5.2 tidyselect_0.2.5 glue_1.3.0 R6_2.3.0 ## [29] rmarkdown_1.11 sp_1.3-1 purrr_0.3.0 magrittr_1.5 ## [33] htmltools_0.3.6 units_0.6-2 rvest_0.3.2 assertthat_0.2.0 ## [37] stringi_1.2.4 crayon_1.3.4``` The text was updated successfully, but these errors were encountered:

### edzer commented Jan 28, 2019

 Somewhat out of scope: `sf` is good to provide geometry representations for points and lines (nodes and edges in your street network) but not the infrastructure for representing the graph and doing shortest path. Most R packages use `igraph` for this. We did some similar experiments recently with @Robinlovelace , @loreabad6 and @luukvdmeer found here https://github.com/spnethack/spnethack; there is infrastructure in packages stplanr and osmar; older experiments using sp and igraph from me: https://github.com/edzer/spnetwork/ I think a relatively easy way to go with `sf` would be to equip `tidygraph` objects ("tidy" wrappers around igraph objects) with simple feature list-columns for nodes and edges, and work with those. I'm still hoping for a guest blog on r-spatial.org reporting about the spnethack experiments!

### Robinlovelace commented Jan 28, 2019 • edited

 Shortest paths on route networks defined in `sf` objects can be done using the `SpatialLinesNetwork` function from the stplanr package, as shown in the reprex below. As Edzer says, we experimented with other ways of doing this and this seems that `tidygraph/sf` integration is possible and potentially useful here. More soon in the blog post (thanks for the reminder @edzer will look at it later today and push for changes for it to be 'camera ready', promise). Note: may also be worth checking out the dodgr package which is also for routing.

### Robinlovelace commented Jan 28, 2019

 Here's the reprex, building on your reproducible code (many thanks for that) showing how it can be done with `stplanr`: ```library(dplyr) #> #> Attaching package: 'dplyr' #> The following objects are masked from 'package:stats': #> #> filter, lag #> The following objects are masked from 'package:base': #> #> intersect, setdiff, setequal, union library(sf) #> Linking to GEOS 3.7.0, GDAL 2.3.2, PROJ 5.2.0 library(osmdata) #> Data (c) OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright X = c(1513037, 1513008, 1515473, 1514039, 1515748) Y = c(5034945, 5034750, 5036177, 5036820, 5037396) milan_car_crashes <- data.frame( ID = 1:5, X = X, Y = Y ) %>% # Transform as sf and set the CRS (Gauss-Boaga) st_as_sf(coords = c("X", "Y"), crs = "+proj=tmerc +lat_0=0 +lon_0=9 +k=0.9996 +x_0=1500000 +y_0=0 +ellps=intl +units=m +no_defs") milan_primary_highways <- opq("Milan, Italy") %>% add_osm_feature(key = "highway", value = "primary") %>% osmdata_sf() %>% trim_osmdata(bb_poly = getbb("Milan, Italy", format_out = "polygon")[[1]]) highways_sf <- milan_primary_highways\$osm_lines %>% st_transform(crs = "+proj=tmerc +lat_0=0 +lon_0=9 +k=0.9996 +x_0=1500000 +y_0=0 +ellps=intl +units=m +no_defs") # Create a matrix of touching streets touching_highways <- st_touches(highways_sf, sparse = FALSE) # Merge all streets that touch each other converting all highways_hclust <- hclust(as.dist(!touching_highways), method = "single") # Cut the dendrogram at heigh 0.5 so that all touching # streets stay in the same group highways_groups <- cutree(highways_hclust, h = 0.5) # Result table(highways_groups) #> highways_groups #> 1 2 3 4 5 6 7 8 9 10 11 12 #> 1444 21 18 8 4 5 8 2 3 5 2 2 highways_sf <- highways_sf[highways_groups == 1, ] # Estimate the nearest point nearest_point <- milan_car_crashes %>% mutate( index_of_nearest_feature = st_nearest_feature(., highways_sf), nearest_feature = st_geometry(highways_sf[index_of_nearest_feature,]), nearest_point = purrr::pmap( list(geometry, nearest_feature), ~ st_nearest_points(.x, .y) %>% st_cast("POINT") %>% magrittr::extract2(2) ) ) %>% pull(nearest_point) %>% st_sfc(crs = "+proj=tmerc +lat_0=0 +lon_0=9 +k=0.9996 +x_0=1500000 +y_0=0 +ellps=intl +units=m +no_defs") milan_car_crashes <- milan_car_crashes %>% st_drop_geometry() %>% st_as_sf(., geometry = nearest_point) plot(highways_sf\$geometry) plot(milan_car_crashes\$geometry, add = TRUE) library(stplanr) rnet = SpatialLinesNetwork(highways_sf) # calculate route from node 1 to 2: simple_od_data = data.frame(start = 1, end = 2) route_1_2 = sum_network_links(sln = rnet, routedata = simple_od_data) plot(route_1_2, add = TRUE, col = "red", lwd = 3) #> Warning in plot.sf(route_1_2, add = TRUE, col = "red", lwd = 3): ignoring #> all but the first attribute # from 5 random nodes to 5 random nodes n = length(rnet@nb) bigger_od_data = data.frame( start = c(1, mean(n), max(n)), end = c(sample(n, size = 3)) ) route_3_3 = sum_network_links(sln = rnet, routedata = bigger_od_data) plot(route_3_3\$geometry, col = "blue", add = TRUE, lwd = route_3_3\$count^2)``` ```# from all locations to all locations nodes_near = stplanr::find_network_nodes(sln = rnet, x = X, y = Y, maxdist = 2000) plot(highways_sf\$geometry) plot(milan_car_crashes\$geometry[c(1, 3)], add = TRUE) crash_od_data = data.frame(start = nodes_near[1], end = nodes_near[3]) route_1_3_crashes = sum_network_links(sln = rnet, routedata = crash_od_data) plot(route_1_3_crashes\$geometry, col = "blue", add = TRUE, lwd = 5)``` Created on 2019-01-28 by the reprex package (v0.2.1)

### jsta commented Jan 28, 2019

 An option that hasn't been mentioned thus far is to round-trip your analysis to GRASS using rgrass7.

### agila5 commented Jan 29, 2019

 Thank you very much for your responses, they are extremely helpful. I wasn’t aware of any of the packages and approaches that you mentioned and I’ll study them in greater detail in the next days. Could you also help me with the first question of the first post (i.e. how to determine which cluster of highways are connected) ? I’m not sure if I’m simply using the wrong function or the approach is completely off base. I’m pretty sure that I cannot just skip this part of the analysis because I don’t see how it’s possible to estimate a distance between two points that don’t belong to the same connected network. Moreover I’m not sure how this problem is handled in these packages (maybe the distance is set as Inf?)

### Robinlovelace commented Jan 29, 2019 • edited

 First impression: not sure, an extended reproducible example in which you show your attempts and reasoning building on the above code could help. This issue is better suited for a question and answer system such as stackoverflow, the r-sig-geo email list or RStudio's popular discourse site. This issue tracker is designed to report bugs and request features for the `sf` package so, while I think it's a great place to share code and ideas, I agree with @edzer that this is slightly off topic and suggest that, if you don't have any specific issues with the package, you ask a question in an appropriate forum and close this particular issue. Linking to the new place will allow people to find it and, hopefully, help provide useful answers to your question.

### agila5 commented Jan 29, 2019

 Ok, thank you very much for your help and your patience. It was a really useful post (at least for me). I'll try to create a better example for the problem and post it on SO or somewhere else.

closed this Jan 29, 2019
mentioned this issue Mar 19, 2019

### gert08 commented Jul 4, 2019

 Here's the reprex, building on your reproducible code (many thanks for that) showing how it can be done with `stplanr`: ```library(dplyr) #> #> Attaching package: 'dplyr' #> The following objects are masked from 'package:stats': #> #> filter, lag #> The following objects are masked from 'package:base': #> #> intersect, setdiff, setequal, union library(sf) #> Linking to GEOS 3.7.0, GDAL 2.3.2, PROJ 5.2.0 library(osmdata) #> Data (c) OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright X = c(1513037, 1513008, 1515473, 1514039, 1515748) Y = c(5034945, 5034750, 5036177, 5036820, 5037396) milan_car_crashes <- data.frame( ID = 1:5, X = X, Y = Y ) %>% # Transform as sf and set the CRS (Gauss-Boaga) st_as_sf(coords = c("X", "Y"), crs = "+proj=tmerc +lat_0=0 +lon_0=9 +k=0.9996 +x_0=1500000 +y_0=0 +ellps=intl +units=m +no_defs") milan_primary_highways <- opq("Milan, Italy") %>% add_osm_feature(key = "highway", value = "primary") %>% osmdata_sf() %>% trim_osmdata(bb_poly = getbb("Milan, Italy", format_out = "polygon")[[1]]) highways_sf <- milan_primary_highways\$osm_lines %>% st_transform(crs = "+proj=tmerc +lat_0=0 +lon_0=9 +k=0.9996 +x_0=1500000 +y_0=0 +ellps=intl +units=m +no_defs") # Create a matrix of touching streets touching_highways <- st_touches(highways_sf, sparse = FALSE) # Merge all streets that touch each other converting all highways_hclust <- hclust(as.dist(!touching_highways), method = "single") # Cut the dendrogram at heigh 0.5 so that all touching # streets stay in the same group highways_groups <- cutree(highways_hclust, h = 0.5) # Result table(highways_groups) #> highways_groups #> 1 2 3 4 5 6 7 8 9 10 11 12 #> 1444 21 18 8 4 5 8 2 3 5 2 2 highways_sf <- highways_sf[highways_groups == 1, ] # Estimate the nearest point nearest_point <- milan_car_crashes %>% mutate( index_of_nearest_feature = st_nearest_feature(., highways_sf), nearest_feature = st_geometry(highways_sf[index_of_nearest_feature,]), nearest_point = purrr::pmap( list(geometry, nearest_feature), ~ st_nearest_points(.x, .y) %>% st_cast("POINT") %>% magrittr::extract2(2) ) ) %>% pull(nearest_point) %>% st_sfc(crs = "+proj=tmerc +lat_0=0 +lon_0=9 +k=0.9996 +x_0=1500000 +y_0=0 +ellps=intl +units=m +no_defs") milan_car_crashes <- milan_car_crashes %>% st_drop_geometry() %>% st_as_sf(., geometry = nearest_point) plot(highways_sf\$geometry) plot(milan_car_crashes\$geometry, add = TRUE) library(stplanr) rnet = SpatialLinesNetwork(highways_sf) # calculate route from node 1 to 2: simple_od_data = data.frame(start = 1, end = 2) route_1_2 = sum_network_links(sln = rnet, routedata = simple_od_data) plot(route_1_2, add = TRUE, col = "red", lwd = 3) #> Warning in plot.sf(route_1_2, add = TRUE, col = "red", lwd = 3): ignoring #> all but the first attribute # from 5 random nodes to 5 random nodes n = length(rnet@nb) bigger_od_data = data.frame( start = c(1, mean(n), max(n)), end = c(sample(n, size = 3)) ) route_3_3 = sum_network_links(sln = rnet, routedata = bigger_od_data) plot(route_3_3\$geometry, col = "blue", add = TRUE, lwd = route_3_3\$count^2)``` ```# from all locations to all locations nodes_near = stplanr::find_network_nodes(sln = rnet, x = X, y = Y, maxdist = 2000) plot(highways_sf\$geometry) plot(milan_car_crashes\$geometry[c(1, 3)], add = TRUE) crash_od_data = data.frame(start = nodes_near[1], end = nodes_near[3]) route_1_3_crashes = sum_network_links(sln = rnet, routedata = crash_od_data) plot(route_1_3_crashes\$geometry, col = "blue", add = TRUE, lwd = 5)``` Created on 2019-01-28 by the reprex package (v0.2.1) Hi! I tried to reproduce this code and I get an error. milan_primary_highways <- opq("Milan, Italy") %>% add_osm_feature(key = "highway", value = "primary") %>% osmdata_sf() %>% trim_osmdata(bb_poly = getbb("Milan, Italy", format_out = "polygon")[[1]]) Error in rcpp_osmdata_sf(doc) : node can not be found Also, with bb_poly I can just use the place name to obtain the trimmed polygons but if I have 4 coordinates thar cover a little more than the city, what would you sudgest?

### agila5 commented Jul 4, 2019

 Hi. I'm sure that @Robinlovelace or someone else can answer you much better than me but I don't get any error with the following code. ```library(osmdata) #> Registered S3 method overwritten by 'rvest': #> method from #> read_xml.response xml2 #> Data (c) OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright milan_primary_highways <- opq("Milan, Italy") %>% add_osm_feature(key = "highway", value = "primary") %>% osmdata_sf() %>% trim_osmdata(bb_poly = getbb("Milan, Italy", format_out = "polygon")[[1]]) #> Loading required namespace: sf milan_primary_highways #> Object of class 'osmdata' with: #> \$bbox : 45.3867381,9.0408867,45.5358482,9.2781103 #> \$overpass_call : The call submitted to the overpass API #> \$meta : metadata including timestamp and version numbers #> \$osm_points : 'sf' Simple Features Collection with 4242 points #> \$osm_lines : 'sf' Simple Features Collection with 1662 linestrings #> \$osm_polygons : 'sf' Simple Features Collection with 0 polygons #> \$osm_multilines : NULL #> \$osm_multipolygons : NULL``` Created on 2019-07-04 by the reprex package (v0.2.1)

### jonoyuan commented Dec 17, 2019

 Thanks for your reprex, @Robinlovelace. But, I run into an issue with these few lines. ```rnet = SpatialLinesNetwork(highways_sf) simple_od_data = data.frame(start = 1, end = 2) route_1_2 = sum_network_links(sln = rnet, routedata = simple_od_data)``` When `highways_sf` is an `sf` object, I get the error: ```> rnet = SpatialLinesNetwork(highways_sf) Error in validityMethod(object) : nrow(object@sl) == length(igraph::E(object@g)) is not TRUE``` When `highways_sf` is an `sp` object, I get an error running `sum_network_links`: ```> route_1_2 = sum_network_links(sln = rnet, routedata = simple_od_data) Error in stplanr::sum_network_links(sln = net, routedata = simple_od_data) : SpatialLinesNetwork not supported. Use newer sfNetwork class instead``` Any clue how to create an `sfNetwork` object or get `sum_network_links` to work on a `SpatialLinesNetwork`?

### edzer commented Dec 17, 2019

 Please open/continue this issue at the right repo.

### Robinlovelace commented Dec 17, 2019

 @jonoyuan this is the repo Edzer refers to: https://github.com/ropensci/stplanr/issues