New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manifest-driven shed repository definitions #143

Merged
merged 5 commits into from Apr 28, 2015

Conversation

Projects
None yet
3 participants
@jmchilton
Copy link
Member

jmchilton commented Apr 27, 2015

Currently there exists a tension between what is best for developers (storing all tools in a single repository - e.g. ncbi_blast_plus or bedtools) and what is best for Galaxy users (storing a single repository per tool and collecting them together with a suite - e.g. samtools or gatk). More discussion here.

This pull request extends the semantics of .shed.yml in a attempt to resolve this tension and make the best practice for Galaxy users trivial to manage for developers. Previously each .shed.yml could only correspond to a single Tool Shed repository and it would collect all files in a directory (except an optional list of ignored files). This pull request extends the shed_create and shed_upload commands to allow .shed.yml files to correspond to any number of actual Tool Shed repositories each with fully customizable file includes and excludes.

While there is a great deal customization allowed - two new keys auto_tool_repositories and suite provide shortcuts to quickly and implicity define repositories for for each individual tool in the directory and build a suite for those. Consider the following (admittedly idealized) samtools example:

owner: "devteam"
remote_repository_url: "https://github.com/galaxyproject/tools-devteam/tool_collections/samtools"
homepage_url: "https://github.com/galaxyproject/tools-devteam/"
categories:
  - "SAM"
auto_tool_repositories:
  name_template: "{{ tool_id }}"
  description_template: "Wrapper for samtools application {{ tool_name }}."
suite:
  name: "suite_samtools_1_2"
  description: "A suite of Galaxy tools designed to work with version 1.2 of the SAMtools package."
  long_description:
  > SAM (Sequence Alignment/Map) format is a generic format for storing large nucleotide sequence
    alignments.   This repository suite associates selected repositories containing Galaxy utilities that require
    version 1.2 of the SAMTools package.  These associated Galaxy utilities consist of a Galaxy Data
    Manager contained in the repository named data_manager_sam_fasta_index_builder and Galaxy tools
    contained in several separate repositories.

This example assumes the .shed.yml file is placed in a "flat" directory with each samtools tool wrapper and planemo will create and update repositories for each individual tool given the specified templates in auto_tool_repositories. The suite key here will auto-generate a suite repository for all of these tools and will automatically created the corresponding repository_dependencies.xml to populate it with (this is generated during shed_upload and never needs to exist in your repository).

Again this example is admittedly idealized, but if auto_tool_repositories is not specified, a repositories list can be specified instead. There are some examples of this in the test data included with this pull request:

  • This .shed.yml is a simple example of specifying custom repositories for individual tools.
    -This demonstrates complex inclusions files from sub-directories and renaming.
  • This .shed.yml demonstrates complex inclusions files from sub-directories and renaming.

The test data also includes some more advanced usages of the suite key as well - specifically using it without auto_tool_repositories as a generic replacement for repository_dependencies.xml and adding additional dependent repositories in addition to the ones defined by the .shed.yml file.

Implements #26.

jmchilton added some commits Apr 25, 2015

Allow specification of multiple repositories per .shed.yml file.
```
repositories:
  cs-cat1:
    include:
      - cat1.xml
      - macros.xml
      - test-data
  cs-cat2:
    include:
      - cat2.xml
      - macros.xml
      - test-data
```

Adding tests for tar ball and repository creation to verify this.

``exclude`` now works in addition to ``ignore`` in ``.shed.yml`` for consistency with ``include``.
Add flag to ``.shed.yml`` to auto de-multiplex tools into repos.
An example might look like:

```
owner: "iuc"
remote_repository_url: "https://github.com/galaxyproject/planemo/tree/master/tests/data/repos/multi_repos_flat_flag"
homepage_url: "http://planemo.readthedocs.org/en/latest/"
categories:
  - "Text Manipulation"
auto_tool_repositories:
  name_template: "cs-{{ tool_id }}"
  description_template: "The tool {{ tool_name }} from the cat tool suite."
```
Allow creation of suites from .shed.yml.
An example of creating a .shed.yml that produces just a single repository with one suite in it might be:

```
owner: devteam
suite:
  name: suite_1
  description: "A suite of Galaxy tools designed to work with version 1.2 of the SAMtools package."
  include_repositories:
  - name: data_manager_sam_fasta_index_builder
    owner: devteam
  - name: bam_to_sam
    owner: devteam
  - name: sam_to_bam
    owner: devteam
  - name: samtools_bedcov
    owner: devteam
```

In this case we are defining explicit dependent repositories but it can also be used with .shed.yml files that define other repositries. For instance if used with ``auto_tool_repositories`` these will automatically be included in the suite.

```
owner: "iuc"
remote_repository_url: "https://github.com/galaxyproject/planemo/tree/master/tests/data/repos/multi_repos_flat_flag"
homepage_url: "http://planemo.readthedocs.org/en/latest/"
categories:
  - "Text Manipulation"
auto_tool_repositories:
  name_template: "cs-{{ tool_id }}"
  description_template: "The tool {{ tool_name }} from the cat tool suite."
suite:
  name: "suite_cat"
  description: "A suite of Cat tools."
  long_description: "A longer description of all the cat tools."
```
Improved glob handling with glob2.
Think these semantics are a little better.
Allow more complex includes.
Previously custom include statements must have plain strings - files, directories, or globs relative to the .shed.yml file. This has now been extended to allow more complex source and destination selection.

The following is taken from the added test data and demonstrates pulling in and renaming a single file from outside the .shed.yml directory and copying a whole directory into a new directory.

```
repositories:
  cs-cat1:
    description: "The tool Cat 1 from the cat tool suite."
    include:
      - cat1.xml
      - macros.xml
      - test-data
      - source: ../shared_files/CITATION
        destination: CITATION.txt
      - source: ../shared_files/extra_test_data/**
        strip_components: 3  # drop "..", "shared_files", "extra_test_data" from source
        destination: test-data
```
repos[name] = repo


def find_repository(tsi, owner, name):

This comment has been minimized.

@erasche

erasche Apr 27, 2015

Member

Does this belong in shed util?

This comment has been minimized.

@jmchilton

jmchilton Apr 27, 2015

Author Member

What is shed util?

This comment has been minimized.

@erasche

erasche Apr 27, 2015

Member

Oh, I meant the shed.py file in planemo where a bunch of TS interactivity
was stuck

man. 27. apr. 2015, 08.32 skrev John Chilton notifications@github.com:

In planemo/shed.py
#143 (comment):

  • repository_dependencies.repo_pairs = list(repo_pairs) + list(extra_pairs)
  • repo = {
  •    "_files": {
    
  •        REPO_DEPENDENCIES_CONFIG_NAME: str(repository_dependencies)
    
  •    },
    
  •    "include": [],
    
  •    "name": name,
    
  •    "description": description,
    
  • }
  • if long_description:
  •    repo["long_description"] = long_description
    
  • repos[name] = repo

+def find_repository(tsi, owner, name):

What is shed util?


Reply to this email directly or view it on GitHub
https://github.com/galaxyproject/planemo/pull/143/files#r29145707.

This comment has been minimized.

@jmchilton

jmchilton Apr 27, 2015

Author Member

That is this file :). Though I would like to break it up (#135).

This comment has been minimized.

@erasche

erasche Apr 27, 2015

Member

I...uh...how could I possibly be this blind. So sorry @jmchilton

@jmchilton

This comment has been minimized.

Copy link
Member Author

jmchilton commented Apr 27, 2015

I would like to do a planemo release tonight - preferably including this pull request. Let me know if this .shed.yml additions in here rub anyone the wrong way and I will do the release without this.

@erasche

This comment has been minimized.

Copy link
Member

erasche commented Apr 27, 2015

+1

@erasche

This comment has been minimized.

Copy link
Member

erasche commented Apr 27, 2015

@jmchilton

This comment has been minimized.

Copy link
Member Author

jmchilton commented Apr 28, 2015

Awesome - thanks for the review @erasche.

jmchilton added a commit that referenced this pull request Apr 28, 2015

Merge pull request #143 from jmchilton/shed_realizations
Manifest-driven shed repository definitions

@jmchilton jmchilton merged commit 79c5c93 into galaxyproject:master Apr 28, 2015

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@jmchilton jmchilton deleted the jmchilton:shed_realizations branch Apr 28, 2015

@peterjc

This comment has been minimized.

Copy link
Contributor

peterjc commented on tests/test_utils.py in fd298aa May 7, 2015

Is there a reason for having the slash-dot outside the single quotes? i.e. Why not:

io.shell("cp -r '%s/.' '%s'" % (repo, dest))

This comment has been minimized.

Copy link
Member Author

jmchilton replied May 7, 2015

... ... I don't know :). I'll admit it looks sort of ... odd... the way it is now - feel free to change it.

This comment has been minimized.

Copy link
Contributor

peterjc replied May 7, 2015

Done in 6bcf699

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment