Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel line2route #151

Merged
merged 5 commits into from Dec 9, 2016
Merged

Parallel line2route #151

merged 5 commits into from Dec 9, 2016

Conversation

@nikolai-b
Copy link
Contributor

@nikolai-b nikolai-b commented Nov 27, 2016

A very basic parallel version of line2route

This makes a cluster using a forking which is not available on Windows (the R package will just call a stub presumably to make SOCK on Windows) and I've set the processes to 10 x the CPU cores arbitrarily.

With 49 routes the performance increase is ok. I would assume the performance improvement to increase with the number of routes (as the overhead in the forking etc.remains a relatively fixed cost)

microbenchmark::microbenchmark( line2route(l = flowlines, parallel = T), line2route(l = flowlines, parallel = F), times = 3)

Unit: seconds
                                    expr      min      lq      mean   median        uq       max neval
 line2route(l = flowlines, parallel = T) 3.738415 3.99689  4.669092 4.255365  5.134431  6.013497     3
 line2route(l = flowlines, parallel = F) 7.744869 8.85792 11.754400 9.970970 13.759165 17.547359     3
Copy link
Member

@Robinlovelace Robinlovelace left a comment

Great work - speed-up is impressive on such a small example, will probably be more in larger example. The Travis build is failing though because the foreach library is not imported. You'll need to import it, like here, for it to work: https://github.com/ropensci/stplanr/blob/master/R/stplanr-package.R#L24

@Robinlovelace
Copy link
Member

@Robinlovelace Robinlovelace commented Nov 28, 2016

P.s. any thoughts on this @richardellison? Context: we're trying to make hitting the CycleStreets API key faster. Another implementation was to split l into ncors and then foreach() over each chunk but @nikolai-b doubts that will be any faster - more benchmarks needed I think. I think the donwside of another dependency can be offset by the potential utility of foreach in other functions.

@Robinlovelace
Copy link
Member

@Robinlovelace Robinlovelace commented Nov 28, 2016

Also @nikolai-b try running devtools::check() to test it locally - it will probably fail there too.

R/od-funs.R Outdated
}
}
if(parallel){
threads <- min(c(parallel:::detectCores() * 10, n_ldf))

This comment has been minimized.

@richardellison

richardellison Nov 29, 2016
Collaborator

Why is the number of threads set to parallel:::detectCores() * 10? Why do you want more threads than there are cores?

This comment has been minimized.

@nikolai-b

nikolai-b Nov 29, 2016
Author Contributor

I don't know how to get the max threads generically in R. On my machine:

cat /proc/sys/kernel/threads-max
126634

The main point being I do not believe this needs to be limited to the number of cores as it is just sending off http requests so a few hundred threads would hardly use up the cycles of even one core.

This comment has been minimized.

@richardellison

richardellison Nov 30, 2016
Collaborator

That's true, although it appears that the speed up isn't much more than you would expect if you used only one thread per core although it may be running some larger tests.

Perhaps the number of threads and number of cores to use could be specified as parameters to the function instead of hard-coded (or forcing all cores to be used).

This comment has been minimized.

@Robinlovelace

Robinlovelace Nov 30, 2016
Member

Yes I think we should use 1 thread per core, as here https://github.com/npct/pct-load/blob/master/R/generate_rnet.R#L6 and this stack overflow question/response: http://stackoverflow.com/questions/28954991/whether-to-use-the-detectcores-function-in-r-to-specify-the-number-of-cores-for

Note from my own experience: when

n_cores <- parallel:::detectCores()
cl <- makeCluster(n_cores)

When I set this and run a foreach job I can get 100% on each core no problem.

This comment has been minimized.

@nikolai-b

nikolai-b Dec 4, 2016
Author Contributor

@Robinlovelace I am not fully familiar with how parallel works on windows but you advice to use detect cores as a max is not the advice in that stackoverflow article.

However, there are many reasons that you may want or need to start fewer workers, and even some cases where you can reasonably start more.

If our process was CPU intensive then we should not run more processes than cores (or they would be completing for CPU time) but here our processes are not CPU intensive at all, they are simple making a HTTP GET to Cyclestreets and as such we can run many of them per core without slowing down the core.
The example in generate_rnet.R is doing a very CPU intensive operation so is entirely different.

@richardellison
Copy link
Collaborator

@richardellison richardellison commented Nov 29, 2016

I understand the reasoning for wanting to parallelise this function and if it was calling your own servers then this wouldn't be a problem. My concern is that cyclestreets is a free service and the API has usage limits to avoid people hammering the server and I don't think we should be facilitating users to breach the API. Even if it didn't, I would expect many services to impose a limit on requests per second from a single IP address that will result in a proportion of requests failing (when they wouldn't have done if they were being submitted sequentially).

We can still make the construction of the SpatialLinesDataFrame parallel without impacting on the cyclestreets server which may be useful, but may be unlikely to be much faster unless you're dealing with a very large number of routes. The same applies to the other functions in stplanr, which I think might benefit from using foreach as long as it doesn't involve making many requests to a public server.

@Robinlovelace
Copy link
Member

@Robinlovelace Robinlovelace commented Nov 29, 2016

Thanks for the input Richard. Just to clarify CycleStreets.net offer a paid service and will happily throttle/block excessive calls so I don't think this is a problem their side. We now have a dedicated CycleStreets server for work on the PCT. Perhaps @mvl22 (who is co-founder and web development lead of CycleStreets.net) could comment?

Setting it to FALSE by default should also help overcome some of those issues.

@richardellison
Copy link
Collaborator

@richardellison richardellison commented Nov 29, 2016

If that is the case then making it parallel shouldn't be an issue. I do still think we need to somehow incorporate a method of returning an indicator of some sort if some requests have been blocked, or perhaps build in a delay of some sort once throttling starts occurring.

@mvl22
Copy link
Contributor

@mvl22 mvl22 commented Nov 29, 2016

If that is the case then making it parallel shouldn't be an issue.

Yes, parallelising requests to the dedicated PCT server is fine.

In our experience, a lot of delay is simply the HTTP transfer, as the batch routing interface we wrote (which outputs a massive CSV) ploughs through things much more quickly than one thread requesting a route then awaiting the response.

NB The batch routing interface uses parallelisation and has a lot of error scenario handling. Also, all the work is done locally in the CycleStreets network so there is less HTTP transfer delay.

@Robinlovelace
Copy link
Member

@Robinlovelace Robinlovelace commented Nov 29, 2016

Many thanks Martin. Makes me think that another useful function would be something to make sense of the massive .csv file that CS kicks out - I know @mem48 has wrestled with these things.

Also, if the batch routing functionality of CS is ever 'APIerised' we could write a wrapper for that, a little like the still-not-completed distance matrix wrapper I wrote:

dist_google <- function(from, to, google_api = Sys.getenv("GOOGLEDIST"),

Thinking the Distance Matrix API be a point of departure for developing such a thing: https://developers.google.com/maps/documentation/distance-matrix/

@mvl22
Copy link
Contributor

@mvl22 mvl22 commented Nov 29, 2016

if the batch routing functionality of CS is ever 'APIerised' we could write a wrapper for that

Yes, that is on the roadmap. The system is a proper MVC structure now so an API will be easier to add. Main issue is handling how to receive a completion callback as it's obviously an asynchronous operation. Alternatively it could force uploading to Github for instance. Anyway, this is probably for another place to discuss :)

nikolai-b added 2 commits Nov 29, 2016
@nikolai-b nikolai-b force-pushed the nikolai-b:master branch from aa64dd9 to 2ecf449 Dec 4, 2016
@richardellison
Copy link
Collaborator

@richardellison richardellison commented Dec 4, 2016

Looking better now. Are the for loops necessary though? I would think using lapply (or mclapply if parallel) would be more efficient?

@mpadge
Copy link
Member

@mpadge mpadge commented Dec 4, 2016

Robin and Richard, this is a classic case for Rcppification. Somewhere I've got a speed comparison for the kind of stuff you're doing in line2route, but the closest easy ref is here - it's not an identical situation, but shows how dependent sp speeds can be on how one constructs and fills objects, and that sp object construction and filling is massively faster in Rcpp. I've also done speed comparisons for sp construction using Rcpp parallel versus straight R::parallel, with only minor differences, meaning that I'd expect the biggest advantage to come from Rcppifying these kinds of loops and parallelising within R.

I'd be happy to help if you could give a clear use case ...

@nikolai-b
Copy link
Contributor Author

@nikolai-b nikolai-b commented Dec 4, 2016

I agree that the loops would be better as lapply but they are there already (e.g. see comment by as @mpadge ) and I'd prefer to change them in a separate PR.

@richardellison
Copy link
Collaborator

@richardellison richardellison commented Dec 5, 2016

Hmm, you're right, the loops were in there before. In that case I agree that could be a separate PR.

@Robinlovelace , this seems to be ready to merge as all checks have now passed.
@mpadge I agree that porting these loops to Rcpp (as well as parts of other functions) will significantly improve the speed. It has been something I have been meaning to do but hadn't had a chance to do yet. I'm open a new issue for this.

@Robinlovelace
Copy link
Member

@Robinlovelace Robinlovelace commented Dec 9, 2016

Here are some tests on 1000 lines, to discuss with @nikolai-b tomorrow:

# Aim: test the performance of parallel code

# Relies on having large lines dataset
n = 1000 # number of lines to route
ii = round(n / nrow(flowlines))
for(i in 1:ii) {
  if(i == 1)
    l = flowlines else
      l = tmap::sbind(l, flowlines)
}

devtools::install_github(repo = "ropensci/stplanr", ref = "9837766")
system.time({r1 = line2route(l)})
# result1 - rl
# user  system elapsed 
# 55.864   1.384 198.586 
# result2 - ...
detach("package:stplanr", unload=TRUE)
devtools::install_github(repo = "nikolai-b/stplanr", ref = "2ecf449")
library(stplanr)
system.time({r2 = line2route(l = l, n_processes = 4)})
# result1 - rl
# user  system elapsed 
# 0.620   0.148  30.679 

# tests
identical(r1, r2) # not identical
nrow(r1) == nrow(r2) # identical
identical(raster::geom(r1), raster::geom(r2)) # not identical geometries
plot(r1)
plot(r2) # very different appearance...
# try nikolai's non-parallel version:
system.time({r3 = line2route(l = l)})

screenshot from 2016-12-09 00-12-21

@Robinlovelace
Copy link
Member

@Robinlovelace Robinlovelace commented Dec 9, 2016

Preliminary conclusion: the parallel version doesn't emit a warning/error message when it's not working - appears to generate straight lines atm... The non-par version works though:

system.time({r3 = line2route(l = l)})
# 58.684   1.384 194.046 
plot(r3)

screenshot from 2016-12-09 00-17-18

@Robinlovelace
Copy link
Member

@Robinlovelace Robinlovelace commented Dec 9, 2016

Same issue with route_graphhopper:

line2route(l = flowlines, route_fun = route_graphhopper)
r4 = line2route(l = flowlines, route_fun = route_graphhopper, n_processes = 4)
@Robinlovelace
Copy link
Member

@Robinlovelace Robinlovelace commented Dec 9, 2016

This is now close to 10 times faster on our test machines (Windows and Linux):

# Aim: test the performance of parallel code
library(stplanr)
# Relies on having large lines dataset
n = 100 # number of lines to route
ii = round(n / nrow(flowlines))
for(i in 1:ii) {
  if(i == 1)
    l = flowlines else
      l = tmap::sbind(l, flowlines)
}

devtools::install_github(repo = "ropensci/stplanr", ref = "9837766")
system.time({r1 = line2route(l)})
# result1 - rl
# user  system elapsed
# 55.864   1.384 198.586
# result2 - rl
# user  system elapsed
# 44.336   0.392 125.790
detach("package:stplanr", unload=TRUE)
devtools::install_github(repo = "nikolai-b/stplanr")
library(stplanr)
system.time({r2 = line2route(l = l, n_processes = 12)})
# result1 - rl
# user  system elapsed
# 0.620   0.148  30.679
# result2 - rl n_process = 10
# user  system elapsed
# 1.588   0.212  22.789
# rl n_processes = 30
# user  system elapsed
# 32.264   0.904  43.245
# tests
# rl n_processes = 20
# user  system elapsed
# 1.564   0.332  31.438
# rl n_processes = 15
# user  system elapsed windows n_processes = 8
# 1.33    0.05   16.09 
# user  system elapsed windows n_processes = 12
# user  system elapsed 
# 1.01    0.10   15.73
identical(r1, r2) # not identical
nrow(r1) == nrow(r2) # identical
identical(raster::geom(r1), raster::geom(r2)) # not identical geometries
plot(r1)
plot(r2) # very different appearance...
# try nikolai's non-parallel version:
system.time({r3 = line2route(l = l)})

system.time({r2 = line2route(l = l, route_fun = route_graphhopper, n_processes = 10)})
# user  system elapsed 
# 0.84    0.21    8.39 
system.time({r2 = line2route(l = l, route_fun = route_graphhopper)})
# user  system elapsed 
# 4.63    0.25   16.40 
@Robinlovelace Robinlovelace merged commit feac7ce into ropensci:master Dec 9, 2016
1 check passed
1 check passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants
You can’t perform that action at this time.