Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upParallel line2route #151
Parallel line2route #151
Conversation
|
Great work - speed-up is impressive on such a small example, will probably be more in larger example. The Travis build is failing though because the foreach library is not imported. You'll need to import it, like here, for it to work: https://github.com/ropensci/stplanr/blob/master/R/stplanr-package.R#L24 |
|
P.s. any thoughts on this @richardellison? Context: we're trying to make hitting the CycleStreets API key faster. Another implementation was to split l into ncors and then |
|
Also @nikolai-b try running |
| } | ||
| } | ||
| if(parallel){ | ||
| threads <- min(c(parallel:::detectCores() * 10, n_ldf)) |
richardellison
Nov 29, 2016
Collaborator
Why is the number of threads set to parallel:::detectCores() * 10? Why do you want more threads than there are cores?
Why is the number of threads set to parallel:::detectCores() * 10? Why do you want more threads than there are cores?
nikolai-b
Nov 29, 2016
Author
Contributor
I don't know how to get the max threads generically in R. On my machine:
cat /proc/sys/kernel/threads-max
126634
The main point being I do not believe this needs to be limited to the number of cores as it is just sending off http requests so a few hundred threads would hardly use up the cycles of even one core.
I don't know how to get the max threads generically in R. On my machine:
cat /proc/sys/kernel/threads-max
126634The main point being I do not believe this needs to be limited to the number of cores as it is just sending off http requests so a few hundred threads would hardly use up the cycles of even one core.
richardellison
Nov 30, 2016
Collaborator
That's true, although it appears that the speed up isn't much more than you would expect if you used only one thread per core although it may be running some larger tests.
Perhaps the number of threads and number of cores to use could be specified as parameters to the function instead of hard-coded (or forcing all cores to be used).
That's true, although it appears that the speed up isn't much more than you would expect if you used only one thread per core although it may be running some larger tests.
Perhaps the number of threads and number of cores to use could be specified as parameters to the function instead of hard-coded (or forcing all cores to be used).
Robinlovelace
Nov 30, 2016
Member
Yes I think we should use 1 thread per core, as here https://github.com/npct/pct-load/blob/master/R/generate_rnet.R#L6 and this stack overflow question/response: http://stackoverflow.com/questions/28954991/whether-to-use-the-detectcores-function-in-r-to-specify-the-number-of-cores-for
Note from my own experience: when
n_cores <- parallel:::detectCores()
cl <- makeCluster(n_cores)
When I set this and run a foreach job I can get 100% on each core no problem.
Yes I think we should use 1 thread per core, as here https://github.com/npct/pct-load/blob/master/R/generate_rnet.R#L6 and this stack overflow question/response: http://stackoverflow.com/questions/28954991/whether-to-use-the-detectcores-function-in-r-to-specify-the-number-of-cores-for
Note from my own experience: when
n_cores <- parallel:::detectCores()
cl <- makeCluster(n_cores)When I set this and run a foreach job I can get 100% on each core no problem.
nikolai-b
Dec 4, 2016
Author
Contributor
@Robinlovelace I am not fully familiar with how parallel works on windows but you advice to use detect cores as a max is not the advice in that stackoverflow article.
However, there are many reasons that you may want or need to start fewer workers, and even some cases where you can reasonably start more.
If our process was CPU intensive then we should not run more processes than cores (or they would be completing for CPU time) but here our processes are not CPU intensive at all, they are simple making a HTTP GET to Cyclestreets and as such we can run many of them per core without slowing down the core.
The example in generate_rnet.R is doing a very CPU intensive operation so is entirely different.
@Robinlovelace I am not fully familiar with how parallel works on windows but you advice to use detect cores as a max is not the advice in that stackoverflow article.
However, there are many reasons that you may want or need to start fewer workers, and even some cases where you can reasonably start more.
If our process was CPU intensive then we should not run more processes than cores (or they would be completing for CPU time) but here our processes are not CPU intensive at all, they are simple making a HTTP GET to Cyclestreets and as such we can run many of them per core without slowing down the core.
The example in generate_rnet.R is doing a very CPU intensive operation so is entirely different.
|
I understand the reasoning for wanting to parallelise this function and if it was calling your own servers then this wouldn't be a problem. My concern is that cyclestreets is a free service and the API has usage limits to avoid people hammering the server and I don't think we should be facilitating users to breach the API. Even if it didn't, I would expect many services to impose a limit on requests per second from a single IP address that will result in a proportion of requests failing (when they wouldn't have done if they were being submitted sequentially). We can still make the construction of the SpatialLinesDataFrame parallel without impacting on the cyclestreets server which may be useful, but may be unlikely to be much faster unless you're dealing with a very large number of routes. The same applies to the other functions in stplanr, which I think might benefit from using foreach as long as it doesn't involve making many requests to a public server. |
|
Thanks for the input Richard. Just to clarify CycleStreets.net offer a paid service and will happily throttle/block excessive calls so I don't think this is a problem their side. We now have a dedicated CycleStreets server for work on the PCT. Perhaps @mvl22 (who is co-founder and web development lead of CycleStreets.net) could comment? Setting it to |
|
If that is the case then making it parallel shouldn't be an issue. I do still think we need to somehow incorporate a method of returning an indicator of some sort if some requests have been blocked, or perhaps build in a delay of some sort once throttling starts occurring. |
Yes, parallelising requests to the dedicated PCT server is fine. In our experience, a lot of delay is simply the HTTP transfer, as the batch routing interface we wrote (which outputs a massive CSV) ploughs through things much more quickly than one thread requesting a route then awaiting the response. NB The batch routing interface uses parallelisation and has a lot of error scenario handling. Also, all the work is done locally in the CycleStreets network so there is less HTTP transfer delay. |
|
Many thanks Martin. Makes me think that another useful function would be something to make sense of the massive .csv file that CS kicks out - I know @mem48 has wrestled with these things. Also, if the batch routing functionality of CS is ever 'APIerised' we could write a wrapper for that, a little like the still-not-completed distance matrix wrapper I wrote: Line 63 in 97912c9 Thinking the Distance Matrix API be a point of departure for developing such a thing: https://developers.google.com/maps/documentation/distance-matrix/ |
Yes, that is on the roadmap. The system is a proper MVC structure now so an API will be easier to add. Main issue is handling how to receive a completion callback as it's obviously an asynchronous operation. Alternatively it could force uploading to Github for instance. Anyway, this is probably for another place to discuss :) |
|
Looking better now. Are the for loops necessary though? I would think using lapply (or mclapply if parallel) would be more efficient? |
|
Robin and Richard, this is a classic case for I'd be happy to help if you could give a clear use case ... |
|
I agree that the loops would be better as |
|
Hmm, you're right, the loops were in there before. In that case I agree that could be a separate PR. @Robinlovelace , this seems to be ready to merge as all checks have now passed. |
|
Here are some tests on 1000 lines, to discuss with @nikolai-b tomorrow: # Aim: test the performance of parallel code
# Relies on having large lines dataset
n = 1000 # number of lines to route
ii = round(n / nrow(flowlines))
for(i in 1:ii) {
if(i == 1)
l = flowlines else
l = tmap::sbind(l, flowlines)
}
devtools::install_github(repo = "ropensci/stplanr", ref = "9837766")
system.time({r1 = line2route(l)})
# result1 - rl
# user system elapsed
# 55.864 1.384 198.586
# result2 - ...
detach("package:stplanr", unload=TRUE)
devtools::install_github(repo = "nikolai-b/stplanr", ref = "2ecf449")
library(stplanr)
system.time({r2 = line2route(l = l, n_processes = 4)})
# result1 - rl
# user system elapsed
# 0.620 0.148 30.679
# tests
identical(r1, r2) # not identical
nrow(r1) == nrow(r2) # identical
identical(raster::geom(r1), raster::geom(r2)) # not identical geometries
plot(r1)
plot(r2) # very different appearance...
# try nikolai's non-parallel version:
system.time({r3 = line2route(l = l)}) |
|
Same issue with line2route(l = flowlines, route_fun = route_graphhopper)
r4 = line2route(l = flowlines, route_fun = route_graphhopper, n_processes = 4) |
|
This is now close to 10 times faster on our test machines (Windows and Linux): # Aim: test the performance of parallel code
library(stplanr)
# Relies on having large lines dataset
n = 100 # number of lines to route
ii = round(n / nrow(flowlines))
for(i in 1:ii) {
if(i == 1)
l = flowlines else
l = tmap::sbind(l, flowlines)
}
devtools::install_github(repo = "ropensci/stplanr", ref = "9837766")
system.time({r1 = line2route(l)})
# result1 - rl
# user system elapsed
# 55.864 1.384 198.586
# result2 - rl
# user system elapsed
# 44.336 0.392 125.790
detach("package:stplanr", unload=TRUE)
devtools::install_github(repo = "nikolai-b/stplanr")
library(stplanr)
system.time({r2 = line2route(l = l, n_processes = 12)})
# result1 - rl
# user system elapsed
# 0.620 0.148 30.679
# result2 - rl n_process = 10
# user system elapsed
# 1.588 0.212 22.789
# rl n_processes = 30
# user system elapsed
# 32.264 0.904 43.245
# tests
# rl n_processes = 20
# user system elapsed
# 1.564 0.332 31.438
# rl n_processes = 15
# user system elapsed windows n_processes = 8
# 1.33 0.05 16.09
# user system elapsed windows n_processes = 12
# user system elapsed
# 1.01 0.10 15.73
identical(r1, r2) # not identical
nrow(r1) == nrow(r2) # identical
identical(raster::geom(r1), raster::geom(r2)) # not identical geometries
plot(r1)
plot(r2) # very different appearance...
# try nikolai's non-parallel version:
system.time({r3 = line2route(l = l)})
system.time({r2 = line2route(l = l, route_fun = route_graphhopper, n_processes = 10)})
# user system elapsed
# 0.84 0.21 8.39
system.time({r2 = line2route(l = l, route_fun = route_graphhopper)})
# user system elapsed
# 4.63 0.25 16.40 |


A very basic parallel version of
line2routeThis makes a cluster using a forking which is not available on Windows (the R package will just call a stub presumably to make SOCK on Windows) and I've set the processes to 10 x the CPU cores arbitrarily.
With 49 routes the performance increase is ok. I would assume the performance improvement to increase with the number of routes (as the overhead in the forking etc.remains a relatively fixed cost)