-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
onewayid performance #321
Comments
Fantastic work Malcolm. Will review asap. |
@Robinlovelace FYI just pushed some key bugfixes for tibbles and factors |
Thanks. Perfect timing for the od vignette, which I hope to have ready in time for the UseR conference next week. |
This is going to need a little more work, the main onewayid function doesn't like stplan.key being a number rather than a character. I've made first push of a new function |
See #323 |
Heads-up @mem48 after merging #328 I'm confident this issue has been addressed. Many thanks for pointing out the issues. This is now addressed in many ways, including your excellent All documented in reproducible benchmark below. Please test on your computer and let me know if you get similarly impressive (10x!) speedups. library(stplanr)
#> Registered S3 method overwritten by 'R.oo':
#> method from
#> throw.default R.methodsS3
od = pct::get_od()
#> No region provided. Returning national OD data.
#> Parsed with column specification:
#> cols(
#> `Area of residence` = col_character(),
#> `Area of workplace` = col_character(),
#> `All categories: Method of travel to work` = col_double(),
#> `Work mainly at or from home` = col_double(),
#> `Underground, metro, light rail, tram` = col_double(),
#> Train = col_double(),
#> `Bus, minibus or coach` = col_double(),
#> Taxi = col_double(),
#> `Motorcycle, scooter or moped` = col_double(),
#> `Driving a car or van` = col_double(),
#> `Passenger in a car or van` = col_double(),
#> Bicycle = col_double(),
#> `On foot` = col_double(),
#> `Other method of travel to work` = col_double()
#> )
#> Parsed with column specification:
#> cols(
#> X = col_double(),
#> Y = col_double(),
#> objectid = col_double(),
#> msoa11cd = col_character(),
#> msoa11nm = col_character()
#> )
create_df <- function(rows, cols) {
od[1:rows, 1:cols]
}
res = bench::press(
rows = c(5000, 10000, 15000),
cols = c(3, 4, 5),
{
d = create_df(rows, cols)
bench::mark(check = FALSE,
onewayid = onewayid(d, attrib = cols[-c(1:2)]),
od_oneway = od_oneway(d, attrib = cols[-c(1:2)]),
od_oneway_char = od_oneway(d, attrib = cols[-c(1:2)], stplanr.key = od_id_character(d[[1]], d[[2]])),
od_oneway_szud = od_oneway(d, attrib = cols[-c(1:2)], stplanr.key = od_id_szudzik(d[[1]], d[[2]])),
od_oneway_max = od_oneway(d, attrib = cols[-c(1:2)], stplanr.key = od_id_max_min(d[[1]], d[[2]]))
)
}
)
#> Running with:
#> rows cols
#> 1 5000 3
#> 2 10000 3
#> 3 15000 3
#> 4 5000 4
#> 5 10000 4
#> 6 15000 4
#> 7 5000 5
#> 8 10000 5
#> 9 15000 5
ggplot2::autoplot(res)
#> Loading required namespace: tidyr Created on 2019-07-29 by the reprex package (v0.3.0) |
onewayid performance is exponentially slower on large datasets, especially if character IDs are used
I've implemented a replacement for od_id_order() that uses Szudzik pairing in https://github.com/ropensci/stplanr/tree/onewayid
There is probably some more performance to be had out of refactoring onewayid to use szudzik_pairing(x, simple = TRUE)
The text was updated successfully, but these errors were encountered: