Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Dependency Resolver Mapping #3444

Merged
merged 12 commits into from Jan 23, 2017

Conversation

Projects
None yet
6 participants
@jmchilton
Copy link
Member

commented Jan 18, 2017

  • 495802d: Implements the ability to map requirement names and/or versions into other names and/or versions for specific sites. This is implemented for the Conda, Galaxy Package, and environment module dependency resolver. I will follow up with more documentation if merged - but the unit tests show off the format. Implements framework #1927.
  • 81d71d2: Implements specification expansion as outlined in the CWL, but per the discussion in #3117 I have stripped the ability for Galaxy tools to actually leverage this functionality in 2474da0. It is still important library functionality for galaxy-lib though so the guts of dependency resolution still allow it when coming from CWL tools.
  • ca9adfd: Refactors how different resolvers can be configured. It is more modular and a better design (it implements an important TODO block from the code and allows some of these other modifications).
  • 6de74a7: Previously in #3391 we added the ability for resolvers to resolve all dependencies for a tool together. This introduced some design technical debt because resolve_all took in ToolRequirement objects whereas resolve took in name, version, and type specifically. Mapping and specification handling is a lot cleaner the interface more cohesive when both methods just consume ToolRequirements - so this change is made in 6de74a7.
  • More minor refactorings are found in 5fc5ab1 and 4a544e9 that generally increase the consistency of this package.

This is labelled as 17.05 but if anyone wants to merge it before #3433 is merged and we branch - I wouldn't complain 馃槈. It will give us greater flexibility to adapt old tools to new Conda packages.

@jxtx

This comment has been minimized.

Copy link
Contributor

commented Jan 18, 2017

image

@mvdbeek

This comment has been minimized.

Copy link
Member

commented Jan 18, 2017

This looks excellent! 馃憤 (also for merging before branching off 17.01)
I also wouldn't mind if we do implement the spec part, this should help with perfectly valid arguments like this, right?

It will give us greater flexibility to adapt old tools to new Conda packages.

For this we'd need to ship a template with galaxy, right?

I think we could even strip out the cached_dependency stuff ... this should become a corner case where a tool defines both toolshed and conda dependencies that differ in name or version. So if people do run into locking issues, all they (or us if we ship a template) have to do is map the redundant dependency onto the dependency that can be resolved. We just need to do a set on the requirements.

@jmchilton

This comment has been minimized.

Copy link
Member Author

commented Jan 18, 2017

For this we'd need to ship a template with galaxy, right?

Yeah - but I think the pieces to do that are here in this file. I'd consider it to be a bug fix to ship this file after the release myself. I can add an empty variant of such a file if you'd like.

I also wouldn't mind if we do implement the spec part

Obviously I'm on board for that too but lets keep that out of this PR. I feel like a lot of what happened in the failed #3117 was people stepping up and describing use cases and the desire to implement the mapping stuff (now in this PR) - lets get this in as is and then we can think about use cases driving the specification stuff and make an argument for with the mapping stuff as given.

if people do run into locking issues, all they (or us if we ship a template) have to do is map the redundant dependency onto the dependency that can be resolved. We just need to do a set on the requirements.

That is an excellent point.

@jmchilton jmchilton force-pushed the jmchilton:deps_framework_overhaul branch from 495802d to 039b206 Jan 18, 2017

jmchilton added a commit to jmchilton/galaxy that referenced this pull request Jan 18, 2017

@jmchilton

This comment has been minimized.

Copy link
Member Author

commented Jan 18, 2017

Rebased so all the links above are invalid. But I've added a variant of #3433 and a default Conda mapping file in 039b206 along with a test case that verifies R is mapped to "r-base" when using the Conda resolver.

Update: Actually forget to git add some stuff - maybe a95eebb is a better attempt at the sample and default stuff.

@jmchilton jmchilton force-pushed the jmchilton:deps_framework_overhaul branch 2 times, most recently from a95eebb to 89b920d Jan 18, 2017

jmchilton added a commit to jmchilton/galaxy that referenced this pull request Jan 19, 2017

@jmchilton

This comment has been minimized.

Copy link
Member Author

commented Jan 19, 2017

I'm fairly confident that failing API test is unrelated.

@bgruening

This comment has been minimized.

Copy link
Member

commented Jan 19, 2017

@jmchilton can we also map such things? https://github.com/galaxyproject/galaxy/blob/dev/tools/plotting/bar_chart.xml#L39
it would make a lot of sense to map the tools shipped with Galaxy.

@jmchilton

This comment has been minimized.

Copy link
Member Author

commented Jan 19, 2017

@bgruening We can just update those to use the actual conda requirements directly because requirements of type python-module have never done anything functional - there is no worry about breaking backward compatibility.

@bgruening

This comment has been minimized.

Copy link
Member

commented Jan 19, 2017

Fair enough, but then we can remove these deps from the tool and add real conda deps? And you could iterate during startup over all internal tools and install deps?

jmchilton added a commit to jmchilton/galaxy that referenced this pull request Jan 19, 2017

Make requirement tags in bar_chart.xml functional with default depend鈥
鈥ncy resolver.

I'm not sure if we need to update the version since previously these tags were purely for documentation and had no functional purpose.

xref this comment galaxyproject#3444 (comment) from @bgruening
@jmchilton

This comment has been minimized.

Copy link
Member Author

commented Jan 19, 2017

@bgruening I suppose - the PR that adds the script that iterates over all the tools and creates missing Conda environments is #3430 - no one really clamored to have it in the release so it remains as WIP. If we want to have it in 17.01 - I can rebase it on top of this PR and keep working on polishing it up. It might take another day or so. It takes so long though and certain things are always going to fail - that I think it would be better to just document that you can run the script (perhaps each time there is a failed dependency resolution). This would still be a significant step forward in terms of getting a working Galaxy running quickly.

I'm open to working on that - it is up to everyone else how much we delay the release for this stuff. Also it would be nice to have a UI - that just showed all the tools and gave a little install button to run through and do this work.

@mvdbeek

This comment has been minimized.

Copy link
Member

commented Jan 21, 2017

It takes so long though and certain things are always going to fail - that I think it would be better to just document that you can run the script (perhaps each time there is a failed dependency resolution).

I agree with you here. Maybe we can collapse identical requirements, so that it's not attempting X times to install the thing that fails, but even then I think this needs a UI and shouldn't delay branching.

I'm open to working on that - it is up to everyone else how much we delay the release for this stuff. Also it would be nice to have a UI - that just showed all the tools and gave a little install button to run through and do this work.

I agree on that as well, but for me this PR is good and ready to go -- I would be in favor of merging.

@mvdbeek

This comment has been minimized.

Copy link
Member

commented Jan 21, 2017

@galaxybot test this

@bgruening

This comment has been minimized.

Copy link
Member

commented Jan 22, 2017

I'm a little bit scared by this large PR, but I'm also in favor of merging it now.
@jmchilton the script that iterates over all tools (maybe datatypes) can wait until the next release I think. We might test this more extensively for all tools and also add features maybe to solve datatype dependencies?

@jmchilton

This comment has been minimized.

Copy link
Member Author

commented Jan 23, 2017

I'm also in favor of merging it now.

Merge it then 馃槈, we can delay the tool installation portion for the next release I'm fine with that.

@bgruening

This comment has been minimized.

Copy link
Member

commented Jan 23, 2017

Sorry, can you merge in dev, I will merge it asap then!

@jmchilton

This comment has been minimized.

Copy link
Member Author

commented Jan 23, 2017

Doh - that went very stale, very quickly. I'm working on it.

@jmchilton jmchilton force-pushed the jmchilton:deps_framework_overhaul branch from 89b920d to 47e39b0 Jan 23, 2017

jmchilton added a commit to jmchilton/galaxy that referenced this pull request Jan 23, 2017

@jmchilton

This comment has been minimized.

Copy link
Member Author

commented Jan 23, 2017

Okay - I rebased everything on top of dev again. There was some oddness when merging with differences in #3461. I've looked over the resulting diff and ran the tool dependency related tests though and things seem fine.

return ToolRequirement( name=name, type=type, version=version, specs=specs )

def __eq__(self, other):
return self.name == other.name and self.type == other.type and self.version == other.version

This comment has been minimized.

Copy link
@mvdbeek

mvdbeek Jan 23, 2017

Member

This is only correct if we assume that specs are the same if name type and version are the same, otherwise we also need to check that the specs are equal (same for __hash__).

This comment has been minimized.

Copy link
@mvdbeek

mvdbeek Jan 23, 2017

Member

Also it looks like something went wrong (duplication of lines 75-82 41-48?)

This comment has been minimized.

Copy link
@jmchilton

jmchilton Jan 23, 2017

Author Member

Yeah - I messed something up - I had hand copied something are the rebase from around here - let me take a look.

This comment has been minimized.

Copy link
@jmchilton

jmchilton Jan 23, 2017

Author Member

So I figured out what happened - it took these lines and added them to RequirementSpecification (or put the class definition inside of ToolRequirement) and I didn't notice. Then the tests were failing so I added the eq, hash, etc... lines back to ToolRequirement not noticing that RequirementSpecification had the wrong lines.

I've pushed a commit to fix this I think and I think I also handled specs correctly in the eq and hash calculations - but let me know if you think they could be improved or need to be fixed.

This comment has been minimized.

Copy link
@mvdbeek

mvdbeek Jan 23, 2017

Member

Yep, looking good. I'm gonna merge once the tests pass.

jmchilton added some commits Nov 10, 2016

More modular config option handling for dependency resolvers.
The handling of global configuration options for Conda specific options was hacky and a poor design. This is much more modular and allows the elegant cascading option sources for all resolvers.
Allow finer grain specification of requirements.
This adapts CWL-style ``SoftwareRequirement`` ``specs`` to solve #1927. Here I'm trying to implement the CWL specification in a way that helps enable the feasibility of Conda packaging in Galaxy. It is a delicate balancing act aimed to upset as many interested parties as I can.

To understand the problem - consider the ``blast+`` requirement found in the Galaxy wrappers. It looks something like this:

```
<requirement type="package" version="2.2.31" name="blast+">
```

Great, that works for Galaxy and Tool Shed packages. It doesn't work for bioconda at all. I think this problem is fairly uncommon - most packages have simple names shared across Debian, Brew, Conda, etc... - but it does happen in some cases that there are inconsistencies. Some people have taken to duplicating the requirement - this is bad and should not be done since they are mutually exclusive and Galaxy will attempt to resolve both.

This introduces the following syntax for tools with profile >= 16.10:

```
<requirement type="package" version="2.2.31" name="blast+">
    <specification uri="https://anaconda.org/bioconda/blast" />
    <specification uri="https://packages.debian.org/sid/ncbi-blast+" version="2.2.31-3" />
</requirement>
```

This allows finer grain resolution of the requirement without sacrificing the abstract name at the top. It allows the name and the version to be adapted by resolvers as needed (hopefully rarely so).

This syntax is the future facing one, but obviously this tool would not work on older Galaxy versions. To remedy this - an alternative syntax can be used for tools targetting Galaxy verions pre-16.10:

```
<requirement type="package" version="2.2" specification_uris="https://anaconda.org/bioconda/blast@2.2.31,https://packages.debian.org/jessie/ncbi-blast+@2.2.29-3">blast+</requirement>
```

This syntax sucks - but it does give newer Galaxies the ability to resolve the specifications without breaking the more simple functionality for older Galaxies.

For more information on the CWL side of this - checkout the discussion on common-workflow-language/cwltool#214. The CWL specification information is defined at http://www.commonwl.org/v1.0/CommandLineTool.html#SoftwarePackage.

Conflicts:
	lib/galaxy/tools/deps/__init__.py
	test/functional/tools/samples_tool_conf.xml
Give resolver's with a resolve_all method an interface.
This follows the existing pattern within the module.
Change interface of depependency resolver's resolve for consistency.
If resolve_all takes in requirements, then so should resolve.
Remove tool XML modifications for requirement specifications.
Per discussion on #3117 - some people were in favor of the idea but some were opposed. I think it is still important to implement this for CWL tools and useful from a library perspective. This just removes changes to Galaxy tool XML.
Outline infrastructure for managing Conda installations outside of Ga鈥
鈥axy.

- Construct a DependencyManager according to Galaxy's configuration before Galaxy starts to allow auto installation of Conda.
- Add script for managing tool dependencies outside of a Galaxy environment.
- Refactor to allow this to be done without duplicating logic related to defaults, configuration handling, etc....

Implements #3426.
Swap default for Conda "auto_init" to True.
Everyone gets a Conda when Galaxy starts up.
Samples and defaults for Conda mapping.
- Map R to r-base and blast+ to blast for conda dependencies.
- Add sample environment modules file showing off the new mapping feature.
- Add integration test case to verify the default conda mapping (R -> r-base in particular).
Fix unit test test_tool_deps to not auto-install Conda.
This became the default behavior with conda_auto_init defaulting to True.

@jmchilton jmchilton force-pushed the jmchilton:deps_framework_overhaul branch from 47e39b0 to 3e3d41f Jan 23, 2017

@mvdbeek mvdbeek merged commit 98ee91f into galaxyproject:dev Jan 23, 2017

4 checks passed

api test Build finished. 256 tests run, 0 skipped, 0 failed.
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
framework test Build finished. 135 tests run, 0 skipped, 0 failed.
Details
toolshed test Build finished. 580 tests run, 0 skipped, 0 failed.
Details
@mvdbeek

This comment has been minimized.

Copy link
Member

commented Jan 23, 2017

Alright, thanks a lot @jmchilton, I think this is a big step ahead!

@@ -22,10 +25,11 @@ class DependencyResolver(Dictifiable, object):
# resolution.
disabled = False
resolves_simple_dependencies = True
config_options = {}
__metaclass__ = ABCMeta

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Jan 25, 2017

Member

FYI this is not Python3-safe, the @six.add_metaclass(ABCMeta) decorator should be used instead. There are a few more instances in this file and elsewhere in lib/galaxy/, I will take care of fixing them all.

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Jan 25, 2017

Member

Done in #3491.

@@ -98,6 +103,11 @@ if [ -z "$GALAXY_CONFIG_FILE" ]; then
export GALAXY_CONFIG_FILE
fi

if [ $INITIALIZE_TOOL_DEPENDENCIES -eq 1 ]; then
# Install Conda environment if needed.
python ./scripts/manage_tool_dependencies.py -c "$GALAXY_CONFIG_FILE" init_if_needed

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Jan 25, 2017

Member

@jmchilton @natefoo Should this be run as part of ./scripts/common_startup.sh (called at line 66) instead of only when Galaxy is started through run.sh ?

This comment has been minimized.

Copy link
@jmchilton

jmchilton Jan 26, 2017

Author Member

I really don't think so - this is of no use to the reports app or the tool shed for instance.

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Jan 26, 2017

Member

You're right, common_startup.sh is not the right place for the reason you mentioned. But I was concerned about servers started through e.g. supervisor by invoking directly uwsgi or ./scripts/paster.py .

This comment has been minimized.

Copy link
@jmchilton

jmchilton Jan 26, 2017

Author Member

They will need to setup Conda - and it would be best to do that manually before hand but Galaxy will attempt to do it on its own also I think (with the multi-process problems that entails). I guess this becomes a documentation problem in my mind - at least until we get to the point where we have a GUI for managing this stuff (doing it from the GUI when an admin logs in would be ideal).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can鈥檛 perform that action at this time.