Long time consolidating results after computations are complete #180

mvpsaraiva · 2021-07-13T12:57:02Z

As notices by @dhersz, r5r can take a long time consolidating the results of accessibility() and travel_time_matrix(),
while the R console gets stuck apparently doing nothing after the progress indicator says the process is complete. This is particularly noticeable in large study areas.

The problem comes from a design decision taken in the package's early days. The Java backend processes the requests in parallel, by running each origin point of the OD matrix on a different thread, and then produces a data.frame of results for each origin point and consolidates them into a list. That list of data.frames is what is passed along to R, which are then combined into a single data.table using data.table::rbindlist(). This pattern works because each thread can work safely updating its own set of results, and no time is wasted synchronising access to a single results data.frame.

However, in large study areas, this means that a very large list of data.frames is passed from Java to R, containing tens (or hundreds) of thousands of data.frames that are actually Java objects that need to be converted to proper R data.frames. This conversion process is what halts R's console for a while after the computations are complete.

To speed up this process, I've moved the data.frame consolidation part to Java, so that a single data.frame is returned to R. Tests have shown that converting a single large data.frame is significantly faster than converting a list of many small ones.

Some benchmarks below, using the package's sample Porto Alegre dataset and 40,000 OD points (all times in seconds):

	Before	After	Difference
Accessibility	137.2	68.1	69.1
Travel Time Matrix	476.3	413.2	63.1

In those tests, we got performance improvements of around 1 minute. In other situations, improvements can be even higher: @dhersz made some tests in a very large study area where processing time fell from 6 minutes to around 2 minutes.

This is currently implemented on the dev branch:

devtools::install_github("ipeaGIT/r5r", subdir = "r-package", ref = "dev")

The text was updated successfully, but these errors were encountered:

mvpsaraiva · 2021-07-13T14:43:45Z

Added some extra parallelisation, and now the travel time results look like this:

	Before	After	Parallel	Diff
Travel Time Matrix	476.3	413.2	369.1	107.2

- Created mergedDataFrame with initial capacity already set to max; - Merging dataframes is made in parallel, by column.

rafapereirabr · 2021-07-16T22:13:13Z

@mvpsaraiva , I guess we can close this one, right?

mvpsaraiva · 2021-07-16T23:39:36Z

I agree. Closing it now.

mvpsaraiva added a commit that referenced this issue Jul 13, 2021

Extra optimisations (Issue #180):

ff1df3e

- Created mergedDataFrame with initial capacity already set to max; - Merging dataframes is made in parallel, by column.

mvpsaraiva closed this as completed Jul 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long time consolidating results after computations are complete #180

Long time consolidating results after computations are complete #180

mvpsaraiva commented Jul 13, 2021 •

edited

mvpsaraiva commented Jul 13, 2021

rafapereirabr commented Jul 16, 2021

mvpsaraiva commented Jul 16, 2021

Long time consolidating results after computations are complete #180

Long time consolidating results after computations are complete #180

Comments

mvpsaraiva commented Jul 13, 2021 • edited

mvpsaraiva commented Jul 13, 2021

rafapereirabr commented Jul 16, 2021

mvpsaraiva commented Jul 16, 2021

mvpsaraiva commented Jul 13, 2021 •

edited