New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache depencencies on the fly when first used #3348

Merged
merged 4 commits into from Jan 1, 2017

Conversation

Projects
None yet
6 participants
@abretaud
Copy link
Contributor

abretaud commented Dec 21, 2016

Hi,
A tiny patch to allow caching (conda) dependencies on the fly when a tool is executed the first time (well, anytime the cache is not found in fact).

There is some discussion about this in galaxyproject/planemo#612

Just tested it with planemo, it seems to work fine

ping @mvdbeek @jmchilton

馃巺 馃巺 馃巺

@galaxybot galaxybot added the triage label Dec 21, 2016

@galaxybot galaxybot added this to the 17.01 milestone Dec 21, 2016

"""
resolved_dependencies = self.requirements_to_dependencies(requirements, **kwds)
cacheable_dependencies = [dep for dep in resolved_dependencies.values() if dep.cacheable]
hashed_dependencies_dir = self.get_hashed_dependencies_path(cacheable_dependencies)
if os.path.exists(hashed_dependencies_dir):
if not os.path.exists(hashed_dependencies_dir):

This comment has been minimized.

@mvdbeek

mvdbeek Dec 21, 2016

Member

Could you add a check for conda_auto_install here? I am very much afraid of triggering this at random (whenever these dependencies are being resolved for the first time...) timepoints on production instances.

if not os.path.exists(hashed_dependencies_dir):
# Cache not present, try to create it
self.build_cache(requirements, **kwds)
if os.path.exists(hashed_dependencies_dir): # Check caching was successfull

This comment has been minimized.

@mvdbeek

mvdbeek Dec 21, 2016

Member

You'll need one more space before the comment here.

@mvdbeek

This comment has been minimized.

Copy link
Member

mvdbeek commented Dec 21, 2016

If you could add the check for conda_auto_install you have my 馃憤.
Though ideally I would prefer to see a separate setting for this behavior, i.e. build_cache_on_first_run or sth. like that, given that the cached environments are -- in principle -- not bound to conda, so it's a bit counter-intuitive to couple this to conda_auto_install. We could still make build_cache_on_first_run default to True if conda_auto_install is set to true.

@abretaud

This comment has been minimized.

Copy link
Contributor

abretaud commented Dec 21, 2016

Ok, I added the build_cache_on_first_run option (hope it won't make too many options for conda related features!)
I'm hesitant to make it default to True if conda_auto_install is True. It seems a cool behavior, but I fear it would bring some confusion (ie I set it to False in galaxy.ini but it is magically set to True internally)

For planemo, I guess we could set build_cache_on_first_run to True when --use_cached_dependency_manager?

@mvdbeek

This comment has been minimized.

Copy link
Member

mvdbeek commented Dec 21, 2016

That looks very useful, thanks for looking into this 馃憤

@mvdbeek

This comment has been minimized.

Copy link
Member

mvdbeek commented Dec 21, 2016

I'm hesitant to make it default to True if conda_auto_install is True. It seems a cool behavior, but I fear it would bring some confusion (ie I set it to False in galaxy.ini but it is magically set to True internally)

For planemo, I guess we could set build_cache_on_first_run to True when --use_cached_dependency_manager?

You're right, this might be a bit surprising. I think that for planemo we could make use_cached_dependency_manager and build_cache_on_first_run the default if conda_auto_install is specified.

"""
resolved_dependencies = self.requirements_to_dependencies(requirements, **kwds)
cacheable_dependencies = [dep for dep in resolved_dependencies.values() if dep.cacheable]
hashed_dependencies_dir = self.get_hashed_dependencies_path(cacheable_dependencies)
if not os.path.exists(hashed_dependencies_dir) and self.extra_config['build_cache_on_first_run'] == True:

This comment has been minimized.

@mvdbeek

mvdbeek Dec 21, 2016

Member

Travis is complaining here: comparison to True should be 'if cond is True:' or 'if cond:'

@jmchilton

This comment has been minimized.

Copy link
Member

jmchilton commented Dec 21, 2016

I think I like the direction we are heading here - and I certainly think y'all are correct that planemo should be used to setup these defaults correctly if --conda_auto_install.

I do have two changes I'd like to at least throw out there.

  • I feel like precache_dependencies would be a better name.
  • I also feel like the default should be True, though maybe I'm wrong? If you are using a cached dependency manager you want the dependencies cached right?
@nsoranzo

This comment has been minimized.

Copy link
Member

nsoranzo commented Dec 21, 2016

I think we already have many options for Conda and this one is not needed.

I am very much afraid of triggering this at random (whenever these dependencies are being resolved for the first time...) timepoints on production instances.

@mvdbeek Building the cached env probably takes the same time of creating the Conda env for the job (which is later destroyed), no?

@jmchilton

This comment has been minimized.

Copy link
Member

jmchilton commented Dec 21, 2016

@nsoranzo I disagree a bit - if it defaults to True no one needs to worry about it by default. There are scenarios where you want to disable it - for instance if you want to have your dependencies mounted read-only (very reasonable if your trying to lock things down for security reasons).

@nsoranzo

This comment has been minimized.

Copy link
Member

nsoranzo commented Dec 21, 2016

@nsoranzo I disagree a bit - if it defaults to True no one needs to worry about it by default. There are scenarios where you want to disable it - for instance if you want to have your dependencies mounted read-only (very reasonable if your trying to lock things down for security reasons).

Fair enough, +1 if it defaults to True.

@mvdbeek

This comment has been minimized.

Copy link
Member

mvdbeek commented Dec 21, 2016

Building the cached env probably takes the same time of creating the Conda env for the job (which is later destroyed), no?

That's true ... I guess it's not that big a deal from this angle.
I was just thinking about bioperl, which takes a good 15 minutes to build, but it's true that that would also happen if it's set to false.

@bgruening

This comment has been minimized.

Copy link
Member

bgruening commented Jan 1, 2017

@abretaud great addition!

@bgruening bgruening merged commit 408254f into galaxyproject:dev Jan 1, 2017

4 checks passed

api test Build finished. 244 tests run, 0 skipped, 0 failed.
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
framework test Build finished. 132 tests run, 0 skipped, 0 failed.
Details
toolshed test Build finished. 580 tests run, 0 skipped, 0 failed.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment