Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup startup and toolbox operations #3909

Merged
merged 53 commits into from May 2, 2017

Conversation

Projects
None yet
4 participants
@mvdbeek
Copy link
Member

commented Apr 10, 2017

@bgruening mentioned slow toolbox reloading in gitter a while back, so I had a look at what where major things that could be sped up.

The most effective speedup for reloading the toolbox on my dev instance is rebuilding the search index only for the tools that have changed since the toolbox reloads. The speedup varies with how many tools are installed of course, for me this went from ~2-3 seconds to 0.4 seconds.

In terms of startup times the most effective change is caching the ToolShedRepositories for 1 second in 1b6da12 and always fetching the ToolVersion objects from the cache in 4d5dd2d. This should be long enough to be able to load the complete toolbox with just a single emitted db query. This reduced the time to build the full galaxy app object from ~ 7.7 seconds to ~ 4.7 seconds. The more tools you have the better the speedup, I guess.

Data table reloading was also quite slow. It turns out there were hundreds of os.path.exists() calls, that even on my SSD were taking quite a while to finish. I addressed that in 12f7dca

I think this may even speedup other areas in galaxy, just running the unittests went from 140 seconds to 80 seconds locally. I guess everything that accesses a bunch of tools sequentially will benefit from this.

app.reindex_tool_search(new_toolbox)
app.toolbox = new_toolbox
app.reindex_tool_search()
app.tool_cache.reset_status()
return new_toolbox

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Apr 10, 2017

Member

You can remove the return statement.

if time.time() - self.update_time > 1:
content = os.walk(self.tool_data_path)
try:
self._tool_data_path_files = set([os.path.join(dirpath, fn) for dirpath, _, fn_list in content for fn in fn_list if fn and fn.endswith('.loc') or fn.endswith('.loc.sample')])

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Apr 10, 2017

Member

You don't need the squared parentheses inside set(), i.e. you can just write set(os.path.join...or fn.endswith('.loc.sample'))

This comment has been minimized.

Copy link
@mvdbeek

mvdbeek Apr 10, 2017

Author Member

That's a nice one, I didn't know about that!

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Apr 10, 2017

Member

Yeah, you use a generator instead of creating the list first and then converting it to a set.

if path in self.tool_data_path_files:
return True
else:
return False

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Apr 10, 2017

Member
return path in self.tool_data_path_files

?

@jmchilton

This comment has been minimized.

Copy link
Member

commented Apr 10, 2017

A few things:

  • This reduced the time to build the full galaxy app object from ~ 7.7 seconds to ~ 4.7 seconds.

    Wow - that is amazing - very nice!

  • The API test are indicating there is something wrong with the data table loading I think.

  • I opened a PR against your branch to normalize the log messages with other time tracking I do for various tool stuff - I use a fixed pattern to parse out timing information - there are some scripts in Galaxy to parse this stuff out. I think the pattern makes the log statements in Python look a little nicer as well.

@mvdbeek

This comment has been minimized.

Copy link
Member Author

commented Apr 10, 2017

Thanks @jmchilton @nsoranzo indeed, there are still some things left to be worked out, I just wanted to get some results from jenkins (your early feedback is great, of course!) ... somehow the toolshed tests are failing even on dev. Probably OSX to blame.

@mvdbeek mvdbeek force-pushed the mvdbeek:speedup_toolbox branch from 2e1eea8 to 3549a65 Apr 10, 2017

return add_doc_kwds

def update_index(self, tool_cache, index_help=True):
"""Use `tool_cache` to determine which tools need indexing and which tools should be expired."""

This comment has been minimized.

Copy link
@martenson

martenson Apr 10, 2017

Member

very nice!

@mvdbeek mvdbeek force-pushed the mvdbeek:speedup_toolbox branch 4 times, most recently from 882f815 to 9e853ef Apr 10, 2017

@mvdbeek

This comment has been minimized.

Copy link
Member Author

commented Apr 17, 2017

Thanks @jmchilton ... I think I'm close to the optimum for the things that I can improve without real-life feedback. I've brought the startup time down from 38 +/- 4 seconds on my server to 17+/- 2 seconds (the server does not yet have huge amount of tools installed, it's running in docker and some of the things are stored on a relatively slow network drive ... YMMV). Profiling shows me that most of the time is being spent now in parsing xml, so apart from moving everything to yaml or getting cElementTree to work I don't think there is much left that I can do.

For the toolbox reloading part I strongly suspect that @bgruening's reloading slowness was due to deactivated or outdated entries in the shed_tool_conf.xml file. Those entries would never enter the tool_cache, so every single one of them was triggering a new db query. I hope I solved this with the ToolShedRepository cache. In any case I added a few timing statement here and there, so that we have a better idea of what could be slow.

@mvdbeek mvdbeek changed the title [WIP] Speedup startup and toolbox operations Speedup startup and toolbox operations Apr 17, 2017

@mvdbeek mvdbeek added status/review and removed status/WIP labels Apr 17, 2017

@jmchilton jmchilton closed this Apr 17, 2017

@jmchilton jmchilton reopened this Apr 17, 2017

@jmchilton

This comment has been minimized.

Copy link
Member

commented Apr 17, 2017

Opps sorry about closing that - I meant to cancel my half written comment not close your PR.

@martenson
Copy link
Member

left a comment

I have 91 tools locally and this PR:

Increases my startup toolbox indexing time from ~900ms to ~1.4s.
Decreases my tool installation index rebuilding down to 33ms from ~700ms.
Leaves my startup time at ~8s.

Overall very nice improvement on the installation rebuild time.

@mvdbeek Can we somehow clean the async queue for index rebuilding if Galaxy goes down? It is useless at that point.

@mvdbeek

This comment has been minimized.

Copy link
Member Author

commented Apr 19, 2017

Leaves my startup time at ~8s.

Hmm, that's a bit surprising, because that means the startup actually became slower, since the toolbox indexing doesn't count towards startup time anymore, or did you include the indexing in the timing?

Are you using sqlite locally ? It's possible that the ToolShedRepositoryCache doesn't do much for you if you are not using sqlite.

Decreases my tool installation index rebuilding down to 33ms from ~700ms.

That was the primary objective, so hooray :)

@mvdbeek Can we somehow clean the async queue for index rebuilding if Galaxy goes down? It is useless at that point.

Did you actually manage to queue up indexing form a previous restart?
I think we can check that the toolbox had been reloaded at least once before executing the task.
Which is not the same as cleaning the queue. @dannon do you have a suggestion of how to do that?

@martenson

This comment has been minimized.

Copy link
Member

commented Apr 19, 2017

@mvdbeek I have postgres on a very quick machine with ssd.

I included the index in timing. So ~6.7 without I guess.

And yeah, after one restart I had three queued up indexing tasks.

@mvdbeek

This comment has been minimized.

Copy link
Member Author

commented Apr 19, 2017

And yeah, after one restart I had three queued up indexing tasks.

OK, we should definitely address that before merging. And I have seen an instance on my server where the search index had disappeared (no tools found by typing in the search box) until I triggered a toolbox reload.

@mvdbeek mvdbeek changed the title Speedup startup and toolbox operations WIP: Speedup startup and toolbox operations Apr 19, 2017

@mvdbeek mvdbeek added the status/WIP label Apr 19, 2017

@mvdbeek mvdbeek force-pushed the mvdbeek:speedup_toolbox branch from a2c40a4 to af156a8 May 1, 2017

@mvdbeek mvdbeek changed the title WIP: Speedup startup and toolbox operations Speedup startup and toolbox operations May 1, 2017

@mvdbeek mvdbeek added status/review and removed status/WIP labels May 1, 2017

@mvdbeek

This comment has been minimized.

Copy link
Member Author

commented May 1, 2017

OK, I am building the toolbox search index during startup as we used to (af156a8), because doing that via the task queue didn't work on my production instance, which would start up with an empty tool search and only fill the search once the toolbox had been reloaded once. I am keeping track of how many times the toolbox had been reloaded, and reindex the search only if it has been rebuild less often than the toolbox. This way we ignore queued up rebuild requests. I think this is now ready for a final round of review (ping @martenson).

@martenson
Copy link
Member

left a comment

Thank you Marius, again, a fabulous contribution! 💯

@martenson martenson merged commit a6415b2 into galaxyproject:dev May 2, 2017

5 of 6 checks passed

continuous-integration/travis-ci/push The Travis CI build is in progress
Details
api test Build finished. 275 tests run, 0 skipped, 0 failed.
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
framework test Build finished. 148 tests run, 0 skipped, 0 failed.
Details
integration test Build finished. 34 tests run, 0 skipped, 0 failed.
Details
toolshed test Build finished. 579 tests run, 0 skipped, 0 failed.
Details

@erasche erasche added this to Open Issues in Performance Sep 3, 2017

@erasche erasche removed this from Open Issues in Performance Sep 3, 2017

@mvdbeek mvdbeek deleted the mvdbeek:speedup_toolbox branch Aug 16, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.