New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[performance] feed_to_graph_path is slow on larger feeds #12

Open
kuanb opened this Issue Dec 21, 2017 · 11 comments

Comments

Projects
None yet
2 participants
@kuanb
Owner

kuanb commented Dec 21, 2017

test_feed_to_graph_path itself is the slowest test by far. Create benchmarks and identify which steps are slowest. Find ways to speed up operations and get graph creation process to be as fast as possible.

@kuanb kuanb added the performance label Dec 21, 2017

@kuanb

This comment has been minimized.

Owner

kuanb commented Dec 25, 2017

Addressed (but still slow) via #14

@kuanb

This comment has been minimized.

Owner

kuanb commented Mar 28, 2018

Used snakeviz with cProfile and this is what the breakdown on performance of the operation looks like at present:
image

generate_edge_and_wait_values is the real hog here. It is primarily comprised of two steps:

  1. generate_wait_times (60% of the runtime of parent function)
  2. linearly_interpolate_infill_times (20% of the runtime of parent function)

Both are executing Pandas functions so, beneath them, are just Pandas ops and groupby functions, respectively. To speed this module up, I'll need to better manage the Pandas operations and identify optimizations I can make on how I am using the Pandas operations in the logic.

For example, since these are all wrapped in a single route iteration, the whole operation is embarrassingly parallelizable.

@kuanb

This comment has been minimized.

Owner

kuanb commented Apr 11, 2018

Parallelization with performant pickling enabled via #12

@kuanb

This comment has been minimized.

Owner

kuanb commented Apr 21, 2018

Noticing the unaccounted for stop id management step is taking quite a while:

Some unaccounted for stop ids. Resolving 2457...

^ Example from LA Metro GTFS zip file.

@kuanb

This comment has been minimized.

Owner

kuanb commented Jun 30, 2018

On smaller feeds (or even mid-sized feeds, like AC Transit), MP is slower. I need to figure out how to intelligently navigate away from using MP in these situations.

Sigh, this whole performance issue is not good.

Example:

%%time

st = time.time()
G_orig = pt.load_feed_as_graph(feed, start, end)
et = time.time()

# Runtime
print(round(et-st, 2))

Above run once with MP as False and one time as True.

No MP:

238.4
CPU times: user 3min 57s, sys: 350 ms, total: 3min 57s
Wall time: 3min 58s

Yes MP:

286.01
CPU times: user 1min 13s, sys: 390 ms, total: 1min 14s
Wall time: 4min 46s
@kuanb

This comment has been minimized.

Owner

kuanb commented Jul 12, 2018

Huge performance gain found right here: #87

(Thank you @yiyange)

@kuanb

This comment has been minimized.

Owner

kuanb commented Jul 15, 2018

Updated performance, with the last few updates incorporates (see all commits from Wed to today):

Without MP: 87.5s (63.3% faster)
With MP: 93.97s (67% faster)

cc @yiyange

@yiyange

This comment has been minimized.

Collaborator

yiyange commented Jul 16, 2018

I am curious in what cases using multi-processing is faster; when i played with it, it is much slower than without using it.

@kuanb

This comment has been minimized.

Owner

kuanb commented Jul 16, 2018

There is a higher initialization cost to using multiprocessing. The gains can be seen primarily on larger datasets, such as LA Metro. I should bench mark that.

@kuanb kuanb closed this Jul 16, 2018

@kuanb kuanb reopened this Jul 16, 2018

@kuanb

This comment has been minimized.

Owner

kuanb commented Jul 16, 2018

Whoops sorry didn't mean to close.

@kuanb

This comment has been minimized.

Owner

kuanb commented Jul 16, 2018

LA Metro (without digging around for the exact numbers) used to take 12-15 minutes.

It now takes:
Without MP: 231s
With MP: 229s

So, no observable improvement. Of course, it's running in a Docker environment that only has access to 2 CPUs on my '16 Macbook Pro. A better test would be to use a virtual machine on AWS / GCloud or wherever and see what gains are achieved there.

That said, we can observe that there are pretty limited (essentially no observable) gains to be had by MP for the typical user/use case (local machine, in a Notebook like environment). This is something that should be addressed long term.

@kuanb kuanb added the help wanted label Oct 9, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment