Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long time consolidating results after computations are complete #180

Closed
mvpsaraiva opened this issue Jul 13, 2021 · 3 comments
Closed

Long time consolidating results after computations are complete #180

mvpsaraiva opened this issue Jul 13, 2021 · 3 comments

Comments

@mvpsaraiva
Copy link
Collaborator

mvpsaraiva commented Jul 13, 2021

As notices by @dhersz, r5r can take a long time consolidating the results of accessibility() and travel_time_matrix(),
while the R console gets stuck apparently doing nothing after the progress indicator says the process is complete. This is particularly noticeable in large study areas.

The problem comes from a design decision taken in the package's early days. The Java backend processes the requests in parallel, by running each origin point of the OD matrix on a different thread, and then produces a data.frame of results for each origin point and consolidates them into a list. That list of data.frames is what is passed along to R, which are then combined into a single data.table using data.table::rbindlist(). This pattern works because each thread can work safely updating its own set of results, and no time is wasted synchronising access to a single results data.frame.

However, in large study areas, this means that a very large list of data.frames is passed from Java to R, containing tens (or hundreds) of thousands of data.frames that are actually Java objects that need to be converted to proper R data.frames. This conversion process is what halts R's console for a while after the computations are complete.

To speed up this process, I've moved the data.frame consolidation part to Java, so that a single data.frame is returned to R. Tests have shown that converting a single large data.frame is significantly faster than converting a list of many small ones.

Some benchmarks below, using the package's sample Porto Alegre dataset and 40,000 OD points (all times in seconds):

Before After Difference
Accessibility 137.2 68.1 69.1
Travel Time Matrix 476.3 413.2 63.1

In those tests, we got performance improvements of around 1 minute. In other situations, improvements can be even higher: @dhersz made some tests in a very large study area where processing time fell from 6 minutes to around 2 minutes.

This is currently implemented on the dev branch:

devtools::install_github("ipeaGIT/r5r", subdir = "r-package", ref = "dev")
@mvpsaraiva
Copy link
Collaborator Author

Added some extra parallelisation, and now the travel time results look like this:

Before After Parallel Diff
Travel Time Matrix 476.3 413.2 369.1 107.2

mvpsaraiva added a commit that referenced this issue Jul 13, 2021
	- Created mergedDataFrame with initial capacity already set to max;
	- Merging dataframes is made in parallel, by column.
@rafapereirabr
Copy link
Member

@mvpsaraiva , I guess we can close this one, right?

@mvpsaraiva
Copy link
Collaborator Author

I agree. Closing it now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants