New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve Conda Dependencies All at Once #3391

Merged
merged 10 commits into from Jan 10, 2017

Conversation

Projects
None yet
5 participants
@jmchilton
Copy link
Member

jmchilton commented Jan 6, 2017

No real progress yet but I wanted to continue the conversation and ideas that are being discussed in #3299.

@jmchilton jmchilton force-pushed the jmchilton:resolve_all_at_once branch 3 times, most recently from adabdc9 to 5b64bf5 Jan 6, 2017

@jmchilton

This comment has been minimized.

Copy link
Member

jmchilton commented Jan 6, 2017

I've made some significant progress on this this afternoon. I've added a test tool that likely wouldn't work before according to the bug report in #3299 and it seems to work now.

I think what is left is to redo the docs and re-implement the tool shed interaction stuff.

If anyone knows of some potentially problematic tools and wants to help with testing - that would be great.

@jmchilton jmchilton changed the title [WIP] Start work on resolving Conda recipes all together. [WIP] Resolve Conda Dependencies All at Once Jan 6, 2017

@bgruening

This comment has been minimized.

Copy link
Member

bgruening commented Jan 6, 2017

@galaxybot test this

@bgruening

This comment has been minimized.

Copy link
Member

bgruening commented Jan 7, 2017

Great work John. I created a Docker image tracking this branch so we can start testing:

docker run -i -t -p 8080:80 quay.io/bgruening/galaxy:jmc_conda

I will also try some flavors based on this branch. If planemo targets your galaxy branch, it will use this new way to resolve dependencies, correct? I would like to test a few IUC packages if I get some computer resources.

We own you a few 🍺!

@bgruening

This comment has been minimized.

Copy link
Member

bgruening commented Jan 7, 2017

@mvdbeek

This comment has been minimized.

Copy link
Member

mvdbeek commented Jan 7, 2017

If planemo targets your galaxy branch, it will use this new way to resolve dependencies, correct?

My guess is that if you use --conda_auto_install it will. planemo conda_install followed by planemo test will probably get the fallback "one by one" behavior.

@bgruening

This comment has been minimized.

Copy link
Member

bgruening commented Jan 7, 2017

@mvdbeek it acutally works for testing on tools-iuc. Have a look at this branch: https://github.com/galaxyproject/tools-iuc/compare/jmc_conda?expand=1

@mvdbeek

This comment has been minimized.

Copy link
Member

mvdbeek commented Jan 7, 2017

As far as I can tell from https://travis-ci.org/galaxyproject/tools-iuc/jobs/189690515, you're testing deseq2, which only has a single requirement, so this doesn't apply.

@jmchilton

This comment has been minimized.

Copy link
Member

jmchilton commented Jan 7, 2017

I had a fix for the NPE I believe last night but forgot to push - it would happen when there are unmet dependencies. If I find some time at my laptop today I'll push that out. Thanks for everything you guys!

@jmchilton jmchilton force-pushed the jmchilton:resolve_all_at_once branch from 5b64bf5 to aafb10f Jan 8, 2017

jmchilton added some commits Jan 6, 2017

Refactor DependencyManager to walk dependencies resolvers first, then…
… requirements.

Invert previous loop to give resolvers a chance to resolve everything at once in subsequent commits.
Implement resolve_all on Conda dependency resolver.
This is a somewhat substantial reversal in the typical way that the Conda resolver works. Everything related to caching, copying, linking, and building environments on the fly should now only apply if both of the following conditions are met (1) there is more than one requirement tag in a tool and (2) not all of them are resolvable exactly by the Conda resolver. For recipes that don't meet both of these two criteria - the normal case I would suspect going forward - Galaxy will just look for a hashed environment for these requirements built for all the requirements at once whenever the requirements are installed.

The new 90% case, such environments should be much less buggy for two primary reasons.

- #3299 is solved - in other words Conda is deferred to and if packages have potential conflicts - Conda can choose the right combination of build specifiers to resolve things correctly.
- Environments are no longer built on a per-job basis - this means file system problems related to linking and copying aren't really an issue and complexity related to caching can be safely ignored.

My guess is we should re-write all the Conda docs to make the other use case seem like a corner case - because hopefully it is now.

This commit includes a test tool that wouldn't work without the rewrite I believe.

@jmchilton jmchilton force-pushed the jmchilton:resolve_all_at_once branch from aafb10f to c2ac192 Jan 8, 2017

@jmchilton

This comment has been minimized.

Copy link
Member

jmchilton commented Jan 8, 2017

Tool shed tests were failing - because fixing the NPE broke something else (added a logic error - my own fault). I've rebase and fixed things up again.

@bgruening

This comment has been minimized.

Copy link
Member

bgruening commented Jan 8, 2017

@mvdbeek

This comment has been minimized.

Copy link
Member

mvdbeek commented Jan 8, 2017

Alright, with auto_install enabled it does what it's supposed to do:

galaxy.tools.deps.conda_util DEBUG 2017-01-08 16:47:43,308 Executing command: /home/mvandenb/miniconda3/bin/conda create -y --name mulled-v1-d95627189237fac550a9b53956b040c455a7f3682aea67e941022bbc115bc78a bowtie2=2.3.0 samtools=1.3.1
Fetching package metadata ...............
127.0.0.1 - - [08/janv./2017:16:47:46 +0200] "GET /api/histories/f597429621d6eb2b/contents?details=33b43b4e7093c91f%2Ca799d38679e985db&order=hid&v=dev&q=update_time-ge&q=deleted&q=purged&qv=2017-01-08T15%3A47%3A42.000Z&qv=False&qv=False HTTP/1.1" 200 - "http://127.0.0.1:8080/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36"
127.0.0.1 - - [08/janv./2017:16:47:46 +0200] "POST /api/tools/toolshed.g2.bx.psu.edu/repos/devteam/bowtie2/bowtie2/2.3.0/build HTTP/1.1" 200 - "http://127.0.0.1:8080/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36"
Solving package specifications: ..........

Package plan for installation in environment /home/mvandenb/miniconda3/envs/mulled-v1-d95627189237fac550a9b53956b040c455a7f3682aea67e941022bbc115bc78a:

The following NEW packages will be INSTALLED:

    bowtie2:       2.3.0-py35_0    bioconda   
    curl:          7.49.0-1                   
    libgcc:        5.2.0-0                    
    openssl:       1.0.2j-0                   
    perl-threaded: 5.22.0-10       bioconda   
    pip:           9.0.1-py35_1               
    python:        3.5.2-0                    
    readline:      6.2-2                      
    samtools:      1.3.1-5         bioconda   
    setuptools:    27.2.0-py35_0              
    sqlite:        3.13.0-0                   
    tbb:           2017_20161128-0 conda-forge
    tk:            8.5.18-0                   
    wheel:         0.29.0-py35_0              
    xz:            5.2.2-1                    
    zlib:          1.2.8-3                    

Linking packages ...
[      COMPLETE      ]|##########################################################################################################################| 100%
#
# To activate this environment, use:
# > source activate mulled-v1-d95627189237fac550a9b53956b040c455a7f3682aea67e941022bbc115bc78a
#
# To deactivate this environment, use:
# > source deactivate mulled-v1-d95627189237fac550a9b53956b040c455a7f3682aea67e941022bbc115bc78a
#

Nice job! We're halfway there to eliminate the cached dependencies :).
Some things that are not yet working is the display of the resolved status
installed_repo_details
And I guess we also need to call the install_all method at tool install time.

Create merged environments at install time
Currently tested with UI only
@bgruening

This comment has been minimized.

Copy link
Member

bgruening commented Jan 8, 2017

IUC testing looks good so far: https://github.com/galaxyproject/tools-iuc/compare/jmc_conda?expand=1
Galaxy Docker is also happy.

One of the more complicated flavors is also happy with auto_init=True and others (https://github.com/bgruening/galaxy-rna-workbench/blob/master/Dockerfile#L8): https://travis-ci.org/bgruening/galaxy-rna-workbench/builds/189689669#L536

@mvdbeek

This comment has been minimized.

Copy link
Member

mvdbeek commented Jan 8, 2017

@jmchilton
I have added some code for the install_time creation of the merged environments here:
jmchilton#52

mvdbeek and others added some commits Jan 8, 2017

@jmchilton jmchilton changed the title [WIP] Resolve Conda Dependencies All at Once Resolve Conda Dependencies All at Once Jan 9, 2017

@jmchilton jmchilton added status/review and removed status/WIP labels Jan 9, 2017

@jmchilton

This comment has been minimized.

Copy link
Member

jmchilton commented Jan 9, 2017

@mvdbeek I pulled out of my cleanup commit and merged your latest PR - thanks again! Is there anything else in your assessment that needs to be done to the code before merging?

We should probably re-write the docs to mention this and the caching stuff together. I'll update #3248 to include this and be sure to cover both before the next release.

@galaxybot galaxybot added this to the 17.01 milestone Jan 9, 2017

@mvdbeek

This comment has been minimized.

Copy link
Member

mvdbeek commented Jan 9, 2017

We should address the regression in the requirement status display soon, but that can be done in another PR if you want to merge now. 👍

@jmchilton

This comment has been minimized.

Copy link
Member

jmchilton commented Jan 9, 2017

@mvdbeek Ahhh... like the GUI would show the individual requirements don't resolve because they all need to be resolved together to see this new resolution?

Edit: Ahh - like you have a screenshot of just above this question. Sorry.

@mvdbeek

This comment has been minimized.

Copy link
Member

mvdbeek commented Jan 9, 2017

@jmchilton

This comment has been minimized.

Copy link
Member

jmchilton commented Jan 9, 2017

Well that stinks - in neither place in the code is it really broken out in a per-tool fashion. These are done on a per-repository basis.

  • We could significantly rewrite these blocks of code to break things out on a per-tool basis.
  • We could install the requirements individually also when installing them together. This seems silly in some ways but it might be "more correct" given how requirements have traditionally worked. And I imagine Conda will probably cache things to make it quicker than fresh installs.
@mvdbeek

This comment has been minimized.

Copy link
Member

mvdbeek commented Jan 10, 2017

We could install the requirements individually also when installing them together. This seems silly in some ways but it might be "more correct" given how requirements have traditionally worked. And I imagine Conda will probably cache things to make it quicker than fresh installs.

What would speak against this:

  • At least on large shared file systems it takes just as long, so it would be good if we can avoid that (and if you have to use --copy it also takes a significant amount of storage).
  • what do you do with numpy for python2 and numpy for python3? ...

Why don't we resolve as in this PR, but make sure that the to_dict representation of MergedCondaDependency contains the individual packages that it is composed of?

The downside is that it would be confusing if a tool that uses bowtie2 and samtools works, but that you'd need to "install" samtools again for a tool that only requires samtools.

We could significantly rewrite these blocks of code to break things out on a per-tool basis.

In a sense we do resolve on a per tool basis ... I'm just not sure how to arrange this information in an obvious and easily digestible way to users/admins

@jmchilton

This comment has been minimized.

Copy link
Member

jmchilton commented Jan 10, 2017

@mvdbeek You are the second person in 24 hours to ask me to merge a broken PR. I'm fine with it as long as we are confident it can be tackled before "the next release".

I'm going to work on the Qiime hackathon today and I'll dig into this more tomorrow.

In a sense we do resolve on a per tool basis ... I'm just not sure how to arrange this information in an obvious and easily digestible way to users/admins

It didn't seem like it at first glace to me, but as far as I got to tracking down this code was these lines:

lib/galaxy/webapps/galaxy/controllers/admin_toolshed.py
        requirements = suc.get_unique_requirements_from_repository(repository)
        requirements_status = view.get_requirements_status(requirements, repository.installed_tool_dependencies)
        tool_requirements = suc.get_tool_shed_repo_requirements(app=trans.app,
                                                                tool_shed_url=tool_shed_url,
                                                                repo_info_dicts=repo_info_dicts)

        view = views.DependencyResolversView(self.app)
        requirements_status = view.get_requirements_status(tool_requirements)

I guess digging into get_unique_requirements_from_repository and get_tool_shed_repo_requirements it is clear they have the information on a per-tool basis. I think we just need to pass up the dict in the form {"tool_id": <requirements_dict>, **} instead of just the requirements_dict. And then when we display things we show the table with like sections for each tool

| Dependency | version | status|
| tool_id_0 | | |
| samtools | 0.18 | Installed |
| numpy | 1.0 | Installed |
| tool_id_1 | | |
| samtools | 0.18 | Installed |
| six | 1.0 | Installed |

That could be the first step - a second step might be to collapse the dependencies and require you to click on the tool to see them? So an even more tool-centric view of things - which is probably what admins will want to see anyway.

@bgruening

This comment has been minimized.

Copy link
Member

bgruening commented Jan 10, 2017

Given the positive feedback from @jmchilton and the 👍 from @mvdbeek let's merge this, it is a big step forward.

@bgruening bgruening merged commit 542dc40 into galaxyproject:dev Jan 10, 2017

4 checks passed

api test Build finished. 255 tests run, 0 skipped, 0 failed.
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
framework test Build finished. 133 tests run, 0 skipped, 0 failed.
Details
toolshed test Build finished. 580 tests run, 0 skipped, 0 failed.
Details

jmchilton added a commit to jmchilton/galaxy that referenced this pull request Jan 13, 2017

Fix "exact" property for merged conda dependencies.
This was broken with there initial commit as part of galaxyproject#3391.

jmchilton added a commit to jmchilton/galaxy that referenced this pull request Jan 13, 2017

Comment out integration test case that was broken with galaxyproject#…
…3391.

It is broken in the same way the GUI is broken (see galaxyproject#3398).

@jmchilton jmchilton referenced this pull request Jan 13, 2017

Closed

Various conda fixes. #3424

mvdbeek added a commit to mvdbeek/galaxy that referenced this pull request Jan 14, 2017

Fix display of repository dependency status
that broke with the introduction of the all-at-once dependency
resolution in galaxyproject#3391 and addresses in part galaxyproject#3398. I have only restored
the (collapsed) display of dependencies for all requirements in a
repository, so breaking the information out on a per tool basis remains
on the TODO list.

jmchilton added a commit to jmchilton/galaxy that referenced this pull request Jan 16, 2017

jmchilton added a commit to jmchilton/galaxy that referenced this pull request Feb 22, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment