Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup startup and toolbox operations #3909

Merged
merged 53 commits into from
May 2, 2017

Conversation

mvdbeek
Copy link
Member

@mvdbeek mvdbeek commented Apr 10, 2017

@bgruening mentioned slow toolbox reloading in gitter a while back, so I had a look at what where major things that could be sped up.

The most effective speedup for reloading the toolbox on my dev instance is rebuilding the search index only for the tools that have changed since the toolbox reloads. The speedup varies with how many tools are installed of course, for me this went from ~2-3 seconds to 0.4 seconds.

In terms of startup times the most effective change is caching the ToolShedRepositories for 1 second in 1b6da12 and always fetching the ToolVersion objects from the cache in 4d5dd2d. This should be long enough to be able to load the complete toolbox with just a single emitted db query. This reduced the time to build the full galaxy app object from ~ 7.7 seconds to ~ 4.7 seconds. The more tools you have the better the speedup, I guess.

Data table reloading was also quite slow. It turns out there were hundreds of os.path.exists() calls, that even on my SSD were taking quite a while to finish. I addressed that in 12f7dca

I think this may even speedup other areas in galaxy, just running the unittests went from 140 seconds to 80 seconds locally. I guess everything that accesses a bunch of tools sequentially will benefit from this.

@mvdbeek mvdbeek added area/performance area/tools area/toolshed kind/refactoring cleanup or refactoring of existing code, no functional changes status/WIP labels Apr 10, 2017
@mvdbeek mvdbeek added this to the 17.05 milestone Apr 10, 2017
app.reindex_tool_search(new_toolbox)
app.toolbox = new_toolbox
app.reindex_tool_search()
app.tool_cache.reset_status()
return new_toolbox
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove the return statement.

if time.time() - self.update_time > 1:
content = os.walk(self.tool_data_path)
try:
self._tool_data_path_files = set([os.path.join(dirpath, fn) for dirpath, _, fn_list in content for fn in fn_list if fn and fn.endswith('.loc') or fn.endswith('.loc.sample')])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need the squared parentheses inside set(), i.e. you can just write set(os.path.join...or fn.endswith('.loc.sample'))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a nice one, I didn't know about that!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you use a generator instead of creating the list first and then converting it to a set.

if path in self.tool_data_path_files:
return True
else:
return False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return path in self.tool_data_path_files

?

@jmchilton
Copy link
Member

A few things:

  • This reduced the time to build the full galaxy app object from ~ 7.7 seconds to ~ 4.7 seconds.

    Wow - that is amazing - very nice!

  • The API test are indicating there is something wrong with the data table loading I think.

  • I opened a PR against your branch to normalize the log messages with other time tracking I do for various tool stuff - I use a fixed pattern to parse out timing information - there are some scripts in Galaxy to parse this stuff out. I think the pattern makes the log statements in Python look a little nicer as well.

@mvdbeek
Copy link
Member Author

mvdbeek commented Apr 10, 2017

Thanks @jmchilton @nsoranzo indeed, there are still some things left to be worked out, I just wanted to get some results from jenkins (your early feedback is great, of course!) ... somehow the toolshed tests are failing even on dev. Probably OSX to blame.

return add_doc_kwds

def update_index(self, tool_cache, index_help=True):
"""Use `tool_cache` to determine which tools need indexing and which tools should be expired."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice!

@mvdbeek mvdbeek force-pushed the speedup_toolbox branch 4 times, most recently from 882f815 to 9e853ef Compare April 16, 2017 17:36
@mvdbeek
Copy link
Member Author

mvdbeek commented Apr 17, 2017

Thanks @jmchilton ... I think I'm close to the optimum for the things that I can improve without real-life feedback. I've brought the startup time down from 38 +/- 4 seconds on my server to 17+/- 2 seconds (the server does not yet have huge amount of tools installed, it's running in docker and some of the things are stored on a relatively slow network drive ... YMMV). Profiling shows me that most of the time is being spent now in parsing xml, so apart from moving everything to yaml or getting cElementTree to work I don't think there is much left that I can do.

For the toolbox reloading part I strongly suspect that @bgruening's reloading slowness was due to deactivated or outdated entries in the shed_tool_conf.xml file. Those entries would never enter the tool_cache, so every single one of them was triggering a new db query. I hope I solved this with the ToolShedRepository cache. In any case I added a few timing statement here and there, so that we have a better idea of what could be slow.

@mvdbeek mvdbeek changed the title [WIP] Speedup startup and toolbox operations Speedup startup and toolbox operations Apr 17, 2017
@jmchilton jmchilton closed this Apr 17, 2017
@jmchilton jmchilton reopened this Apr 17, 2017
@jmchilton
Copy link
Member

Opps sorry about closing that - I meant to cancel my half written comment not close your PR.

Copy link
Member

@martenson martenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have 91 tools locally and this PR:

Increases my startup toolbox indexing time from ~900ms to ~1.4s.
Decreases my tool installation index rebuilding down to 33ms from ~700ms.
Leaves my startup time at ~8s.

Overall very nice improvement on the installation rebuild time.

@mvdbeek Can we somehow clean the async queue for index rebuilding if Galaxy goes down? It is useless at that point.

@mvdbeek
Copy link
Member Author

mvdbeek commented Apr 19, 2017

Leaves my startup time at ~8s.

Hmm, that's a bit surprising, because that means the startup actually became slower, since the toolbox indexing doesn't count towards startup time anymore, or did you include the indexing in the timing?

Are you using sqlite locally ? It's possible that the ToolShedRepositoryCache doesn't do much for you if you are not using sqlite.

Decreases my tool installation index rebuilding down to 33ms from ~700ms.

That was the primary objective, so hooray :)

@mvdbeek Can we somehow clean the async queue for index rebuilding if Galaxy goes down? It is useless at that point.

Did you actually manage to queue up indexing form a previous restart?
I think we can check that the toolbox had been reloaded at least once before executing the task.
Which is not the same as cleaning the queue. @dannon do you have a suggestion of how to do that?

@martenson
Copy link
Member

@mvdbeek I have postgres on a very quick machine with ssd.

I included the index in timing. So ~6.7 without I guess.

And yeah, after one restart I had three queued up indexing tasks.

@mvdbeek
Copy link
Member Author

mvdbeek commented Apr 19, 2017

And yeah, after one restart I had three queued up indexing tasks.

OK, we should definitely address that before merging. And I have seen an instance on my server where the search index had disappeared (no tools found by typing in the search box) until I triggered a toolbox reload.

@mvdbeek mvdbeek changed the title Speedup startup and toolbox operations WIP: Speedup startup and toolbox operations Apr 19, 2017
@mvdbeek mvdbeek changed the title WIP: Speedup startup and toolbox operations Speedup startup and toolbox operations May 1, 2017
@mvdbeek
Copy link
Member Author

mvdbeek commented May 1, 2017

OK, I am building the toolbox search index during startup as we used to (af156a8), because doing that via the task queue didn't work on my production instance, which would start up with an empty tool search and only fill the search once the toolbox had been reloaded once. I am keeping track of how many times the toolbox had been reloaded, and reindex the search only if it has been rebuild less often than the toolbox. This way we ignore queued up rebuild requests. I think this is now ready for a final round of review (ping @martenson).

Copy link
Member

@martenson martenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Marius, again, a fabulous contribution! 💯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/performance area/tools area/toolshed kind/refactoring cleanup or refactoring of existing code, no functional changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants