New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nesting include settings in .shed.yml (source as list)? #180

Closed
peterjc opened this Issue May 5, 2015 · 9 comments

Comments

Projects
None yet
2 participants
@peterjc
Copy link
Contributor

peterjc commented May 5, 2015

Consider a moderately complex include setting like this which is quite repetitive:

include:
- README.rst
- venn_list.py
- venn_list.xml
- tool_dependencies.xml
- source: ../../test-data/magic.pdf
  strip_components: 2
- source: ../../test-data/venn_list.tabular
  strip_components: 2
- source: ../../test-data/rhodopsin_proteins.fasta
  strip_components: 2

It seems that currently include is a flat list of elements (defaulting to source entries if not a mini-dict), but allowing each source entry to be a list would allow this kind of bundling:

include:
- README.rst
- venn_list.py
- venn_list.xml
- tool_dependencies.xml
- strip_components: 2
  source:
  - ../../test-data/magic.pdf
  - ../../test-data/venn_list.tabular
  - ../../test-data/rhodopsin_proteins.fasta

e.g. Something like this hack:

$ git diff
diff --git a/planemo/shed/__init__.py b/planemo/shed/__init__.py
index 8bacfc8..e108bc6 100644
--- a/planemo/shed/__init__.py
+++ b/planemo/shed/__init__.py
@@ -882,7 +882,10 @@ class RealizedFile(object):
     def realized_files_for(path, include_info):
         if not isinstance(include_info, dict):
             include_info = {"source": include_info}
-        source = include_info.get("source")
+        source_list = include_info.get("source")
+        if not isinstance(source_list, list):
+            # A single filename or pattern
+            source_list = [source_list]
         destination = include_info.get("destination", None)
         strip_components = include_info.get("strip_components", 0)
         if destination is None:
@@ -890,14 +893,16 @@ class RealizedFile(object):
             destination_specified = False
         else:
             destination_specified = True
-        abs_source = os.path.join(path, source)
-        dest_is_file = destination_specified and os.path.isfile(abs_source)
         realized_files = []
-        for globbed_file in _glob(path, source):
-            src = os.path.relpath(globbed_file, path)
-            realized_files.append(
-                RealizedFile(path, src, destination, dest_is_file, strip_components)
-            )
+        for source in source_list:
+            abs_source = os.path.join(path, source)
+            dest_is_file = destination_specified and os.path.isfile(abs_source)
+            for globbed_file in _glob(path, source):
+                src = os.path.relpath(globbed_file, path)
+                realized_files.append(
+                    RealizedFile(path, src, destination, dest_is_file, strip_components)
+                )
+        info("Found %i files from source %r" % (len(realized_files), source_list))
         return realized_files

     def __str__(self)

More work would be needed to preserve the work in #158 for spotting missing includes.

Also, harking back to #160 this would be possible too (but requires a few more tweaks to planemo/shed/__init__.py in realize_to to deal with implicit destinations of . being used internally):

include:
- strip_components: 2
  source:
  - ../../tools/venn_list/README.rst
  - ../../tools/venn_list/venn_list.py
  - ../../tools/venn_list/venn_list.xml
  - ../../tools/venn_list/tool_dependencies.xml
  - ../../test-data/magic.pdf
  - ../../test-data/venn_list.tabular
  - ../../test-data/rhodopsin_proteins.fasta

Having played with this a bit, it would be an alternative solution to my proposal in #160, and might have some other future usecases - but overall I think it is too complicated?

@jmchilton

This comment has been minimized.

Copy link
Member

jmchilton commented May 5, 2015

It seems to make sense to me - I like it. I don't think it is too complicated.

An alternative implementation that might work to preserve the work in #158 is to preprocess the include_info entries such that each entry with a source list gets cloned n times with a single item and stuck back into the configuration. So modifying the code that calls realized_files_for but keeping realized_files_for the same. Not sure if this is any cleaner but it is one idea.

@peterjc

This comment has been minimized.

Copy link
Contributor Author

peterjc commented May 5, 2015

The preprocessing idea should work - are there any benefits to doing the wildcard glob expansion earlier as well?

@jmchilton

This comment has been minimized.

Copy link
Member

jmchilton commented May 5, 2015

Not that I can think of - and that would probably require reworking the logic for detecting if there are no matches. If it turns out to be cleaner to that earlier to though - that is fine.

peterjc added a commit to peterjc/planemo that referenced this issue May 6, 2015

Allow include source to be a list of filenames/patterns.
This allows nesting within the include setting of .shed.yml
for example if you have a batch of files with the same
strip_components setting. See GitHub issue galaxyproject#180.
@jmchilton

This comment has been minimized.

Copy link
Member

jmchilton commented May 6, 2015

Addressed with #185 - thanks again @peterjc!

@jmchilton jmchilton closed this May 6, 2015

@peterjc peterjc reopened this May 7, 2015

@peterjc

This comment has been minimized.

Copy link
Contributor Author

peterjc commented May 7, 2015

Apologies - I didn't check my own work carefully enough yesterday. This example works: peterjc/pico_galaxy@226ce53

include:
- README.rst
- tool_dependencies.xml
- venn_list.py
- venn_list.xml
- strip_components: 2
  source:
  - ../../test-data/magic.pdf
  - ../../test-data/venn_list.tabular
  - ../../test-data/rhodopsin_proteins.fasta

Giving:

$ planemo shed_upload --tar_only  ~/repositories/pico_galaxy/tools/venn_list/
...
$ tar -tzf shed_upload.tar.gz
README.rst
test-data/magic.pdf
test-data/rhodopsin_proteins.fasta
test-data/venn_list.tabular
tool_dependencies.xml
venn_list.py
venn_list.xml

However this fails: peterjc/pico_galaxy@1029aa9

include:
- strip_components: 2
  source:
  - ../../tools/venn_list/README.rst
  - ../../tools/venn_list/tool_dependencies.xml
  - ../../tools/venn_list/venn_list.py
  - ../../tools/venn_list/venn_list.xml
  - ../../test-data/magic.pdf
  - ../../test-data/venn_list.tabular
  - ../../test-data/rhodopsin_proteins.fasta

The files under tools/venn_list/ are still present at the root of the tar-ball (as above). I'll try to get a fix for this ready...

@peterjc

This comment has been minimized.

Copy link
Contributor Author

peterjc commented May 7, 2015

I think the problem (which manifests as a destination of "./") stems from the inconsistent way that strip_components gets applied to the source versus the (implicit) destination.

@peterjc

This comment has been minimized.

Copy link
Contributor Author

peterjc commented May 7, 2015

Progress: Have to avoid os.path.relpath(...) before applying strip_components as otherwise lose the .. directory entries.

@peterjc peterjc removed the question label May 7, 2015

@peterjc

This comment has been minimized.

Copy link
Contributor Author

peterjc commented May 7, 2015

@peterjc

This comment has been minimized.

Copy link
Contributor Author

peterjc commented May 14, 2015

Fixed with the merging of #196, closing issue.

@peterjc peterjc closed this May 14, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment