Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split-out code for handling od data into new 'od' package? #367

Closed
Robinlovelace opened this issue Nov 22, 2019 · 13 comments
Closed

Split-out code for handling od data into new 'od' package? #367

Robinlovelace opened this issue Nov 22, 2019 · 13 comments

Comments

@Robinlovelace
Copy link
Member

Some of the most useful functions in stplanr revolve around handline OD data, like od2line(), od_oneway() and odmatrix_to_od(). I wonder if it would be useful, based on the concept of modularity in software development, to split such functions out into a new package with minimal dependencies, e.g. called od. Thoughts welcome.

Heads-up @mpadge, @richardellison, @rafapereirabr, @mem48, @rCarto and anyone else watching this, feedback welcome.

@mpadge
Copy link
Member

mpadge commented Nov 22, 2019

Absolutely! There's lots of stuff that bikedata does that is also pure OD manipulation with lots of overlap, plus the new dodgr routines (like dodgr_flows_si) are explicitly intended to use OD-type inputs. This would be really useful.

@rafapereirabr
Copy link

Hi all. I haven't been able to keep track of stplanr developments as I would like to. In any case, this suggestions makes sense to me!

@mem48
Copy link
Collaborator

mem48 commented Nov 27, 2019

Sounds like a a good idea in principle, would you inlcude the desire lines or keep the OD package non-spatial?

If you are going to do it I think there need to be a good use case e.g. packages that would use OD but not stplanr, and perhaps some performance gains. I'm thinking og how I wrote od_id_szudzik to replace od_id_order. The new OD package should be lightweight and efficent.

I've had a quick look at https://github.com/ropensci/stplanr/blob/master/R/od-funs.R

I find dependnacy on

  • dplyr
  • sf
  • sp
  • raster
  • rlang
  • stplanr
  • geosphere
  • stats
  • rgeos

Thats quite a lot ...

@rCarto
Copy link

rCarto commented Nov 27, 2019

I'm not sure how it would apply to osrm, but I will keep an eye on this thread.

@Robinlovelace
Copy link
Member Author

Sounds like a a good idea in principle, would you inlcude the desire lines or keep the OD package non-spatial?

Good question @mem48 I would start non spatial but think sf would likely end up being a useful dependency.

@Robinlovelace
Copy link
Member Author

That huge list of dependencies is part of the reason for creating a new one. We can do most of the stuff without them!

@Robinlovelace
Copy link
Member Author

Update on this, I'm working on this and have found that a co-benefit is that we can make od functions in R much faster...

@Robinlovelace
Copy link
Member Author

Preliminary results below, heads-up @mem48 I will open source the code behind this soon.

bench::mark(check = FALSE,
+ od_coordinates_to_linstring(odm_id),
+ stplanr::od2line(od_data_leeds, od_data_zones)
+ )
Creating centroids representing desire line start and end points.
# A tibble: 2 x 13
  expression                                         min  median `itr/sec` mem_alloc
  <bch:expr>                                     <bch:t> <bch:t>     <dbl> <bch:byt>
1 od_coordinates_to_linstring(odm_id)            37.73µs 42.02µs    20986.     3.1KB
2 stplanr::od2line(od_data_leeds, od_data_zones)  4.75ms  5.14ms      104.   105.2KB
# … with 8 more variables: `gc/sec` <dbl>, n_itr <int>, n_gc <dbl>, total_time <bch:tm>,
#   result <list>, memory <list>, time <list>, gc <list>

@Robinlovelace
Copy link
Member Author

A fairer test, with same inputs and output:

bench::mark(check = FALSE,
+    od = od_to_sfc(od_data_leeds, od_data_zones),
+    stplanr = stplanr::od2line(od_data_leeds, od_data_zones)
+  )
Creating centroids representing desire line start and end points.
# A tibble: 2 x 13
  expression    min median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result
  <bch:expr> <bch:> <bch:>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list>
1 od         2.27ms 2.36ms      415.    45.1KB     6.33   197     3      474ms <NULL>
2 stplanr    4.75ms  5.5ms      109.   105.2KB     2.05    53     1      488ms <NULL>
# … with 3 more variables: memory <list>, time <list>, gc <list>

@mem48
Copy link
Collaborator

mem48 commented Feb 10, 2020

Some impressive improvements, definitly intrested to contribute.

@Robinlovelace
Copy link
Member Author

Update, I've created a work-in-progress package here: https://github.com/Robinlovelace/od

@richardellison
Copy link
Collaborator

Agreed on the benefit of reducing dependencies as a clear benefit. One question is how this fits in with the route* functions in stplanr, assuming these stay in stplanr and use od as a dependency?

@Robinlovelace
Copy link
Member Author

Good question @richardellison, my current thinking is to use the od package in existing functions such as od2line() to speed them up. From the user's perspective nothing should change. Another idea I had was to spit-out the route() function into another package but, as discussed with @mem48, that may be a 'split-out too far' at the moment!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants