Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to estimate shortest-path distance on a network and related issues #966

Closed
agila5 opened this issue Jan 28, 2019 · 12 comments
Closed

How to estimate shortest-path distance on a network and related issues #966

agila5 opened this issue Jan 28, 2019 · 12 comments

Comments

@agila5
Copy link
Contributor

agila5 commented Jan 28, 2019

I'm trying to code a spatial logistic model of car crashes in the municipality of Milan using R and, in particular, sf ( and some related packages), but I'm facing some problems. I'm not too sure about my approach and I think that the best way to explain the difficulties I'm finding is to provide a small and reproducible example (sorry for this long post). I decided to report this issue here (and not on SO or GIS stackexchange) since it's strictly related to sf, tell me if this is not the right place.

I have a dataset containing the location of all car accidents that occurred in Milan during 2015 and these are the coordinates of the first 5 events (out of 8000 car crashes approximately).

library(dplyr)
library(sf)
milan_car_crashes <- data.frame(
  ID = 1:5,
  X = c(1513037, 1513008, 1515473, 1514039, 1515748),
  Y = c(5034945, 5034750, 5036177, 5036820, 5037396)
) %>%
  # Transform as sf and set the CRS (Gauss-Boaga)
  st_as_sf(coords = c("X", "Y"), 
           crs = "+proj=tmerc +lat_0=0 +lon_0=9 +k=0.9996 +x_0=1500000 +y_0=0 +ellps=intl +units=m +no_defs")

Car crashes are an example of a spatial point pattern constrained to lie on a network of lines and, since I don't have an up to date shapefile of the highways of Milan, I decided to use the package osmdata to download it from OpenStreetMap. Looking at the wiki page of the Map Features of OSM, I chose to download the data of primary highways (while, on the real model, I will use also secondary, tertiary, unclassified, residential, motorways and trunk) as a simple feature and extract the LINESTRING geometry, setting the same CRS as my data.

library(osmdata)
milan_primary_highways <- opq("Milan, Italy") %>%
  add_osm_feature(key = "highway", value = "primary") %>%
  osmdata_sf() %>%
  trim_osmdata(bb_poly = getbb("Milan, Italy", format_out = "polygon")[[1]])

highways_sf <- milan_primary_highways$osm_lines %>%
  st_transform(crs = "+proj=tmerc +lat_0=0 +lon_0=9 +k=0.9996 +x_0=1500000 +y_0=0 +ellps=intl +units=m +no_defs")

Moreover, since car crashes represent a spatial point pattern that lie on a linear network, I think that the most appropriate way to measure the distance between any two points on the network is the shortest-path distance (and this is the biggest problem I'm facing as explained later). Firstly I want to check if the network of lines that I downloaded is connected or not (otherwise I don't think it's possible to measure a shortest-path distance), i.e. I have to verify that every highway is reachable from every other highway in the network. I coded this check using a combination of st_touches and hclust since I want to merge all streets touching each other (i.e. not only directly touching ones). This approach is really unsatisfactory since I can't use the sparse property of the matrix generated by st_touches (I don't know how to use hclust() on a sgbp object) therefore it's very memory consuming and it does not work if there are much more highways (i.e. I cannot use this approach if I consider also secondary, tertiary, unclassified and residentia highways) since the resulting matrix is too big. What's a better approach?

# Create a matrix of touching streets
touching_highways <- st_touches(highways_sf, sparse = FALSE)
# Merge all streets that touch each other converting all
highways_hclust <- hclust(as.dist(!touching_highways), method = "single")
# Cut the dendrogram at heigh 0.5 so that all touching 
# streets stay in the same group
highways_groups <- cutree(highways_hclust, h = 0.5)
# Result
table(highways_groups)
## highways_groups
##    1    2    3    4    5    6    7    8    9   10   11   12 
## 1444   21   18    8    4    5    8    2    3    5    2    2

There are 12 groups of connected highways and I filter out those streets not belonging to the big group otherwise I cannot calculate the shortest-path distance if two points belong to two distinct networks.

highways_sf <- highways_sf[highways_groups == 1, ]

The second step for the estimate of the shortest-path distance is the projection of the car crashes on the network and I performed this task using st_nearest_feature and st_nearest_point. This step is necessary since not every car crash belongs to the network due to approximation and rounding errors. Then again I'm not sure this is the right approach but I simply followed the examples reported on the help page of st_nearest_points. Is there a better alternative?

# Estimate the nearest point 
nearest_point <- milan_car_crashes %>%
  mutate(
    index_of_nearest_feature = st_nearest_feature(., highways_sf),
    nearest_feature = st_geometry(highways_sf[index_of_nearest_feature,]),
    nearest_point = purrr::pmap(
      list(geometry, nearest_feature),
      ~ st_nearest_points(.x, .y) %>% st_cast("POINT") %>% magrittr::extract2(2)
    )
  ) %>%
  pull(nearest_point) %>%
  st_sfc(crs = "+proj=tmerc +lat_0=0 +lon_0=9 +k=0.9996 +x_0=1500000 +y_0=0 +ellps=intl +units=m +no_defs")

I can now replace the geometry of the sf object with the new geometry of projected points.

milan_car_crashes <- milan_car_crashes %>% 
  st_drop_geometry() %>% 
  st_as_sf(., geometry = nearest_point) 

This is a plot of the result

The biggest and most important problem is that I have no clue on how to estimate the shortest-path distance between two points belonging to the network of highways. How can I do that using sf?

My sessionInfo.

sessionInfo()
## R version 3.5.2 (2018-12-20)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 17134)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=Italian_Italy.1252  LC_CTYPE=Italian_Italy.1252   
## [3] LC_MONETARY=Italian_Italy.1252 LC_NUMERIC=C                  
## [5] LC_TIME=Italian_Italy.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] bindrcpp_0.2.2 osmdata_0.0.9  sf_0.7-2       dplyr_0.7.8   
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.0       pillar_1.3.1     compiler_3.5.2   bindr_0.1.1     
##  [5] class_7.3-14     tools_3.5.2      digest_0.6.18    jsonlite_1.6    
##  [9] lubridate_1.7.4  evaluate_0.12    tibble_2.0.1     lattice_0.20-38 
## [13] pkgconfig_2.0.2  rlang_0.3.1      DBI_1.0.0        curl_3.3        
## [17] yaml_2.2.0       xfun_0.4         e1071_1.7-0.1    xml2_1.2.0      
## [21] stringr_1.3.1    httr_1.4.0       knitr_1.21       classInt_0.3-1  
## [25] grid_3.5.2       tidyselect_0.2.5 glue_1.3.0       R6_2.3.0        
## [29] rmarkdown_1.11   sp_1.3-1         purrr_0.3.0      magrittr_1.5    
## [33] htmltools_0.3.6  units_0.6-2      rvest_0.3.2      assertthat_0.2.0
## [37] stringi_1.2.4    crayon_1.3.4
@edzer
Copy link
Member

edzer commented Jan 28, 2019

Somewhat out of scope: sf is good to provide geometry representations for points and lines (nodes and edges in your street network) but not the infrastructure for representing the graph and doing shortest path. Most R packages use igraph for this. We did some similar experiments recently with @Robinlovelace , @loreabad6 and @luukvdmeer found here https://github.com/spnethack/spnethack; there is infrastructure in packages stplanr and osmar; older experiments using sp and igraph from me: https://github.com/edzer/spnetwork/

I think a relatively easy way to go with sf would be to equip tidygraph objects ("tidy" wrappers around igraph objects) with simple feature list-columns for nodes and edges, and work with those. I'm still hoping for a guest blog on r-spatial.org reporting about the spnethack experiments!

@Robinlovelace
Copy link
Contributor

Robinlovelace commented Jan 28, 2019

Shortest paths on route networks defined in sf objects can be done using the SpatialLinesNetwork function from the stplanr package, as shown in the reprex below. As Edzer says, we experimented with other ways of doing this and this seems that tidygraph/sf integration is possible and potentially useful here. More soon in the blog post (thanks for the reminder @edzer will look at it later today and push for changes for it to be 'camera ready', promise).

Note: may also be worth checking out the dodgr package which is also for routing.

@Robinlovelace
Copy link
Contributor

Here's the reprex, building on your reproducible code (many thanks for that) showing how it can be done with stplanr:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(sf)
#> Linking to GEOS 3.7.0, GDAL 2.3.2, PROJ 5.2.0
library(osmdata)
#> Data (c) OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright

X = c(1513037, 1513008, 1515473, 1514039, 1515748)
Y = c(5034945, 5034750, 5036177, 5036820, 5037396)

milan_car_crashes <- data.frame(
  ID = 1:5,
  X = X,
  Y = Y
) %>%
  # Transform as sf and set the CRS (Gauss-Boaga)
  st_as_sf(coords = c("X", "Y"), 
           crs = "+proj=tmerc +lat_0=0 +lon_0=9 +k=0.9996 +x_0=1500000 +y_0=0 +ellps=intl +units=m +no_defs")

milan_primary_highways <- opq("Milan, Italy") %>%
  add_osm_feature(key = "highway", value = "primary") %>%
  osmdata_sf() %>%
  trim_osmdata(bb_poly = getbb("Milan, Italy", format_out = "polygon")[[1]])

highways_sf <- milan_primary_highways$osm_lines %>%
  st_transform(crs = "+proj=tmerc +lat_0=0 +lon_0=9 +k=0.9996 +x_0=1500000 +y_0=0 +ellps=intl +units=m +no_defs")

# Create a matrix of touching streets
touching_highways <- st_touches(highways_sf, sparse = FALSE)
# Merge all streets that touch each other converting all
highways_hclust <- hclust(as.dist(!touching_highways), method = "single")
# Cut the dendrogram at heigh 0.5 so that all touching 
  # streets stay in the same group
  highways_groups <- cutree(highways_hclust, h = 0.5)
# Result
table(highways_groups)
#> highways_groups
#>    1    2    3    4    5    6    7    8    9   10   11   12 
#> 1444   21   18    8    4    5    8    2    3    5    2    2
  
highways_sf <- highways_sf[highways_groups == 1, ]
  
# Estimate the nearest point 
nearest_point <- milan_car_crashes %>%
  mutate(
    index_of_nearest_feature = st_nearest_feature(., highways_sf),
    nearest_feature = st_geometry(highways_sf[index_of_nearest_feature,]),
    nearest_point = purrr::pmap(
      list(geometry, nearest_feature),
      ~ st_nearest_points(.x, .y) %>% st_cast("POINT") %>% magrittr::extract2(2)
    )
  ) %>%
  pull(nearest_point) %>%
  st_sfc(crs = "+proj=tmerc +lat_0=0 +lon_0=9 +k=0.9996 +x_0=1500000 +y_0=0 +ellps=intl +units=m +no_defs")

milan_car_crashes <- milan_car_crashes %>% 
  st_drop_geometry() %>% 
  st_as_sf(., geometry = nearest_point) 

plot(highways_sf$geometry)
plot(milan_car_crashes$geometry, add = TRUE)

library(stplanr)
rnet = SpatialLinesNetwork(highways_sf)

# calculate route from node 1 to 2:
simple_od_data = data.frame(start = 1, end = 2)
route_1_2 = sum_network_links(sln = rnet, routedata = simple_od_data)
plot(route_1_2, add = TRUE, col = "red", lwd = 3)
#> Warning in plot.sf(route_1_2, add = TRUE, col = "red", lwd = 3): ignoring
#> all but the first attribute

# from 5 random nodes to 5 random nodes
n = length(rnet@nb)
bigger_od_data = data.frame(
  start = c(1, mean(n), max(n)),
  end = c(sample(n, size = 3))
  )
route_3_3 = sum_network_links(sln = rnet, routedata = bigger_od_data)
plot(route_3_3$geometry, col = "blue", add = TRUE, lwd = route_3_3$count^2)

# from all locations to all locations
nodes_near = stplanr::find_network_nodes(sln = rnet, x = X, y = Y, maxdist = 2000)
plot(highways_sf$geometry)
plot(milan_car_crashes$geometry[c(1, 3)], add = TRUE)
crash_od_data = data.frame(start = nodes_near[1], end = nodes_near[3])
route_1_3_crashes = sum_network_links(sln = rnet, routedata = crash_od_data)
plot(route_1_3_crashes$geometry, col = "blue", add = TRUE, lwd = 5)

Created on 2019-01-28 by the reprex package (v0.2.1)

@jsta
Copy link
Contributor

jsta commented Jan 28, 2019

An option that hasn't been mentioned thus far is to round-trip your analysis to GRASS using rgrass7.

@agila5
Copy link
Contributor Author

agila5 commented Jan 29, 2019

Thank you very much for your responses, they are extremely helpful. I wasn’t aware of any of the packages and approaches that you mentioned and I’ll study them in greater detail in the next days.

Could you also help me with the first question of the first post (i.e. how to determine which cluster of highways are connected) ? I’m not sure if I’m simply using the wrong function or the approach is completely off base. I’m pretty sure that I cannot just skip this part of the analysis because I don’t see how it’s possible to estimate a distance between two points that don’t belong to the same connected network. Moreover I’m not sure how this problem is handled in these packages (maybe the distance is set as Inf?)

@Robinlovelace
Copy link
Contributor

Robinlovelace commented Jan 29, 2019

First impression: not sure, an extended reproducible example in which you show your attempts and reasoning building on the above code could help.

This issue is better suited for a question and answer system such as stackoverflow, the r-sig-geo email list or RStudio's popular discourse site. This issue tracker is designed to report bugs and request features for the sf package so, while I think it's a great place to share code and ideas, I agree with @edzer that this is slightly off topic and suggest that, if you don't have any specific issues with the package, you ask a question in an appropriate forum and close this particular issue. Linking to the new place will allow people to find it and, hopefully, help provide useful answers to your question.

@agila5
Copy link
Contributor Author

agila5 commented Jan 29, 2019

Ok, thank you very much for your help and your patience. It was a really useful post (at least for me).

I'll try to create a better example for the problem and post it on SO or somewhere else.

@gert08
Copy link

gert08 commented Jul 4, 2019

Here's the reprex, building on your reproducible code (many thanks for that) showing how it can be done with stplanr:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(sf)
#> Linking to GEOS 3.7.0, GDAL 2.3.2, PROJ 5.2.0
library(osmdata)
#> Data (c) OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright

X = c(1513037, 1513008, 1515473, 1514039, 1515748)
Y = c(5034945, 5034750, 5036177, 5036820, 5037396)

milan_car_crashes <- data.frame(
  ID = 1:5,
  X = X,
  Y = Y
) %>%
  # Transform as sf and set the CRS (Gauss-Boaga)
  st_as_sf(coords = c("X", "Y"), 
           crs = "+proj=tmerc +lat_0=0 +lon_0=9 +k=0.9996 +x_0=1500000 +y_0=0 +ellps=intl +units=m +no_defs")

milan_primary_highways <- opq("Milan, Italy") %>%
  add_osm_feature(key = "highway", value = "primary") %>%
  osmdata_sf() %>%
  trim_osmdata(bb_poly = getbb("Milan, Italy", format_out = "polygon")[[1]])

highways_sf <- milan_primary_highways$osm_lines %>%
  st_transform(crs = "+proj=tmerc +lat_0=0 +lon_0=9 +k=0.9996 +x_0=1500000 +y_0=0 +ellps=intl +units=m +no_defs")

# Create a matrix of touching streets
touching_highways <- st_touches(highways_sf, sparse = FALSE)
# Merge all streets that touch each other converting all
highways_hclust <- hclust(as.dist(!touching_highways), method = "single")
# Cut the dendrogram at heigh 0.5 so that all touching 
  # streets stay in the same group
  highways_groups <- cutree(highways_hclust, h = 0.5)
# Result
table(highways_groups)
#> highways_groups
#>    1    2    3    4    5    6    7    8    9   10   11   12 
#> 1444   21   18    8    4    5    8    2    3    5    2    2
  
highways_sf <- highways_sf[highways_groups == 1, ]
  
# Estimate the nearest point 
nearest_point <- milan_car_crashes %>%
  mutate(
    index_of_nearest_feature = st_nearest_feature(., highways_sf),
    nearest_feature = st_geometry(highways_sf[index_of_nearest_feature,]),
    nearest_point = purrr::pmap(
      list(geometry, nearest_feature),
      ~ st_nearest_points(.x, .y) %>% st_cast("POINT") %>% magrittr::extract2(2)
    )
  ) %>%
  pull(nearest_point) %>%
  st_sfc(crs = "+proj=tmerc +lat_0=0 +lon_0=9 +k=0.9996 +x_0=1500000 +y_0=0 +ellps=intl +units=m +no_defs")

milan_car_crashes <- milan_car_crashes %>% 
  st_drop_geometry() %>% 
  st_as_sf(., geometry = nearest_point) 

plot(highways_sf$geometry)
plot(milan_car_crashes$geometry, add = TRUE)

library(stplanr)
rnet = SpatialLinesNetwork(highways_sf)

# calculate route from node 1 to 2:
simple_od_data = data.frame(start = 1, end = 2)
route_1_2 = sum_network_links(sln = rnet, routedata = simple_od_data)
plot(route_1_2, add = TRUE, col = "red", lwd = 3)
#> Warning in plot.sf(route_1_2, add = TRUE, col = "red", lwd = 3): ignoring
#> all but the first attribute

# from 5 random nodes to 5 random nodes
n = length(rnet@nb)
bigger_od_data = data.frame(
  start = c(1, mean(n), max(n)),
  end = c(sample(n, size = 3))
  )
route_3_3 = sum_network_links(sln = rnet, routedata = bigger_od_data)
plot(route_3_3$geometry, col = "blue", add = TRUE, lwd = route_3_3$count^2)

# from all locations to all locations
nodes_near = stplanr::find_network_nodes(sln = rnet, x = X, y = Y, maxdist = 2000)
plot(highways_sf$geometry)
plot(milan_car_crashes$geometry[c(1, 3)], add = TRUE)
crash_od_data = data.frame(start = nodes_near[1], end = nodes_near[3])
route_1_3_crashes = sum_network_links(sln = rnet, routedata = crash_od_data)
plot(route_1_3_crashes$geometry, col = "blue", add = TRUE, lwd = 5)

Created on 2019-01-28 by the reprex package (v0.2.1)

Hi! I tried to reproduce this code and I get an error.

milan_primary_highways <- opq("Milan, Italy") %>%

  • add_osm_feature(key = "highway", value = "primary") %>%
  • osmdata_sf() %>%
  • trim_osmdata(bb_poly = getbb("Milan, Italy", format_out = "polygon")[[1]])

Error in rcpp_osmdata_sf(doc) : node can not be found

Also, with bb_poly I can just use the place name to obtain the trimmed polygons but if I have 4 coordinates thar cover a little more than the city, what would you sudgest?

@agila5
Copy link
Contributor Author

agila5 commented Jul 4, 2019

Hi. I'm sure that @Robinlovelace or someone else can answer you much better than me but I don't get any error with the following code.

library(osmdata)
#> Registered S3 method overwritten by 'rvest':
#>   method            from
#>   read_xml.response xml2
#> Data (c) OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright

milan_primary_highways <- opq("Milan, Italy") %>%
  add_osm_feature(key = "highway", value = "primary") %>%
  osmdata_sf() %>%
  trim_osmdata(bb_poly = getbb("Milan, Italy", format_out = "polygon")[[1]])
#> Loading required namespace: sf
milan_primary_highways
#> Object of class 'osmdata' with:
#>                  $bbox : 45.3867381,9.0408867,45.5358482,9.2781103
#>         $overpass_call : The call submitted to the overpass API
#>                  $meta : metadata including timestamp and version numbers
#>            $osm_points : 'sf' Simple Features Collection with 4242 points
#>             $osm_lines : 'sf' Simple Features Collection with 1662 linestrings
#>          $osm_polygons : 'sf' Simple Features Collection with 0 polygons
#>        $osm_multilines : NULL
#>     $osm_multipolygons : NULL

Created on 2019-07-04 by the reprex package (v0.2.1)

@jonoyuan
Copy link

Thanks for your reprex, @Robinlovelace. But, I run into an issue with these few lines.

rnet = SpatialLinesNetwork(highways_sf)
simple_od_data = data.frame(start = 1, end = 2)
route_1_2 = sum_network_links(sln = rnet, routedata = simple_od_data)

When highways_sf is an sf object, I get the error:

> rnet = SpatialLinesNetwork(highways_sf)
Error in validityMethod(object) : 
  nrow(object@sl) == length(igraph::E(object@g)) is not TRUE

When highways_sf is an sp object, I get an error running sum_network_links:

> route_1_2 = sum_network_links(sln = rnet, routedata = simple_od_data)
Error in stplanr::sum_network_links(sln = net, routedata = simple_od_data) : 
  SpatialLinesNetwork not supported. Use newer sfNetwork class instead

Any clue how to create an sfNetwork object or get sum_network_links to work on a SpatialLinesNetwork?

@edzer
Copy link
Member

edzer commented Dec 17, 2019

Please open/continue this issue at the right repo.

@Robinlovelace
Copy link
Contributor

@jonoyuan this is the repo Edzer refers to: https://github.com/ropensci/stplanr/issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants