Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

41% reduction in freiburg galaxy startup time #4495

Merged
merged 4 commits into from Sep 4, 2017

Conversation

@erasche
Copy link
Member

commented Aug 25, 2017

So it turns out the shed_tool_conf.xml (and every other tool conf) was being re-parsed ahead of every. single. tool shed repository. For large galaxies with large numbers of installed tools.. this sucks. A single (and therefor unscientific datapoint), boot time was decreased from 12 minutes to 7 minutes.

cc @bgruening

@nsoranzo

This comment has been minimized.

Copy link
Member

commented Aug 25, 2017

Isn't get_repository_install_dir() called only for proprietary datatypes (not for "normal" TS tool repositories)?

UniverseApplication.__init__() -> ConfiguresGalaxyMixin._configure_datatypes_registry() -> InstalledRepositoryManager.load_proprietary_datatypes() which executes:

        for tool_shed_repository in self.context.query(self.install_model.ToolShedRepository) \
                                                .filter(and_(self.install_model.ToolShedRepository.table.c.includes_datatypes == true(),
                                                             self.install_model.ToolShedRepository.table.c.deleted == false())) \
                                                .order_by(self.install_model.ToolShedRepository.table.c.id):
            relative_install_dir = self.get_repository_install_dir(tool_shed_repository)
self.tool_trees = []
for tool_config in self.tool_configs:
tree, error_message = xml_util.parse_xml(tool_config)
log.error(error_message)

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Aug 25, 2017

Member
            if error_message:
                log.error(error_message)

This comment has been minimized.

Copy link
@mvdbeek

mvdbeek Aug 25, 2017

Member

Could we just store those tool_path attributes ? Seems that is the only thing needed here.

This comment has been minimized.

Copy link
@dannon

dannon Aug 25, 2017

Member

+1 to mvdbeek's suggestion. And, to be certain, these aren't modified after startup, correct?

This comment has been minimized.

Copy link
@mvdbeek

mvdbeek Aug 25, 2017

Member

An admin could do this by hand, but those shouldn't be changed in normal operation.

@erasche

This comment has been minimized.

Copy link
Member Author

commented Aug 25, 2017

@nsoranzo absolutely, you're right. Only for proprietary datatypes. On our instance, this seems to be ~300 repos, and parsing of the toolshed XML is quite slow.

@mvdbeek

This comment has been minimized.

Copy link
Member

commented Aug 25, 2017

I'll just throw this out here: I noticed while working on restart performance that restarts tend to be faster when they happen successively on our production server. In my logs I can see that restart time is about 30-40% slower when I haven't restarted recently. I don't exactly know what's going on there (maybe some NFS caching ?). Though the change makes sense to me, could well be that this speeds things up enormously.

@erasche

This comment has been minimized.

Copy link
Member Author

commented Aug 25, 2017

@mvdbeek oh, I'd definitely believe caching. We could be doing a lot more in-app for caching. Tool loading is the next largest time sink for us, I'm curious about caching pre-parsed versions of those (e.g. with messagepack or cPickle or some cheap binary object serialisation) to avoid the overhead of re-parsing every tool on every reboot. As @bgruening will tell you, this is a lower priority for us since switching to zerg mode, but I'd still be interested in it for quality-of-life.

@erasche erasche added this to In Progress in Performance Aug 25, 2017

@mvdbeek

This comment has been minimized.

Copy link
Member

commented Aug 28, 2017

@erasche I think you can address the failing unit tests by making sure that the path to the tool conf file exists, and then log an error if it doesn't instead of failing. I think that would be good behavior anyway.

Yeah, regarding tool loading I've done a lot of profiling, it's the actual XML parsing that is slow, and it doesn't seem to improve by much when switching to cElementTree instead of ElementTree. I wonder if we wouldn't gain something by converting tools to yaml.

erasche added some commits Aug 28, 2017

@erasche erasche moved this from In Progress to Startup in Performance Aug 31, 2017

@jmchilton jmchilton merged commit 8f13032 into galaxyproject:dev Sep 4, 2017

5 of 6 checks passed

api test Build finished. 284 tests run, 0 skipped, 1 failed.
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
framework test Build finished. 161 tests run, 0 skipped, 0 failed.
Details
integration test Build finished. 44 tests run, 0 skipped, 0 failed.
Details
lgtm analysis: JavaScript No alert changes
Details
toolshed test Build finished. 579 tests run, 0 skipped, 0 failed.
Details
@jmchilton

This comment has been minimized.

Copy link
Member

commented Sep 4, 2017

Great - thanks for the performance fixes @erasche.

@mvdbeek excellent insights - thanks!

@erasche erasche referenced this pull request Sep 7, 2017

Open

[RFC] Caching Framework #4571

@bgruening bgruening deleted the erasche:parse-once branch Oct 4, 2017

@martenson martenson moved this from Startup to Merged in Performance Feb 27, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.