Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tool for exporting individual files to galaxy file source plugins. #11613

Merged
merged 14 commits into from
Jun 23, 2021

Conversation

jmchilton
Copy link
Member

What did you do?

  • Stole tool from EU for exporting individual files to file source plugins, working on making it a bit more general.

Why did you make this change?

  • It is a frequently requested change and it requires some support in Galaxy core for both testing and in the tool framework - so I don't think it belongs in the tool shed. This is a bit like the cloud send tool already so I think the precedence for adding data export tools that require Galaxy support.

How to test the changes?

#else:
#set file_ext = $name.ext
#end if
'${name.element_identifier}.${file_ext}'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmchilton Add tool wrapper support for doing this natively. If the extension is already included don't duplicate it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like #3220 ? I was going to pick this back up, but let me know if you have other plans.

#if $name.ext == 'vcf_bgzip':
#set file_ext = 'vcf.bz'
#else:
#set file_ext = $name.ext
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmchilton Add datatype support this. Special case this in the datatype registry not in this tool.

@bgruening
Copy link
Member

@jmchilton one question that was coming up today is if we can add "create-folder" support. In the sense that we can select a folder from the files plugin or provide a path in a text param?

For the FTP case we could do:

from pathlib import Path

galaxy_files = "...."
ftp_path = "./foobra/foo/bar/foo.txt"
folder_path = Path(ftp_path).parent

a.makedirs(folder_path)
a.upload(ftp_path, galaxy_files)

But how does the user set the ftp_path?

@bgruening
Copy link
Member

@jmchilton @mvdbeek we need something like this soon on Main. Should I talk to Nate and get our tool into Main for the time being?

@mvdbeek mvdbeek force-pushed the export_file branch 2 times, most recently from a955607 to f71bffd Compare May 25, 2021 13:02
@mvdbeek
Copy link
Member

mvdbeek commented May 25, 2021

@jmchilton one question that was coming up today is if we can add "create-folder" support. In the sense that we can select a folder from the files plugin or provide a path in a text param?

That already works if the folder exists, you have to use the arrow icon on the right:
Screenshot 2021-05-25 at 15 11 15

What would be nice is if we can support creating the folder in the UI (and improve the navigation).

@bgruening
Copy link
Member

That already works if the folder exists, you have to use the arrow icon on the right:

Does that also work in a workflow? I couldn't get this working. But super cool if it does!

@mvdbeek
Copy link
Member

mvdbeek commented May 25, 2021

Does that also work in a workflow? I couldn't get this working. But super cool if it does!

It should, but maybe the editor doesn't play nice. I'll add some tests

@mvdbeek mvdbeek changed the title [WIP] Add tool for exporting individual files to galaxy file source plugins. Add tool for exporting individual files to galaxy file source plugins. May 27, 2021
@mvdbeek mvdbeek added this to the 21.09 milestone May 27, 2021
@mvdbeek
Copy link
Member

mvdbeek commented May 27, 2021

The tool now does what I'd like it do:

  • it exports collections, where collections are folders and dataset elements are files
  • filenames are truncated at 255 characters, unicode is allowed, except for unicode control characters, / and leading dots
  • MetadataFile / FileParameter are optionally exported (e.g bam.bai, vcf.gz.tbi, etc)
  • parameters are not passed on the commandline, instead a JSON file is built. That works around command line length limits and issues with safe characters

I did add a couple of new things that are available when templating commands or configfile sections.
These are available on wrapped datasets:

  • name_and_ext is element_identifier.extensions
  • name_and_ext_filesystem_safe is the same as name_and_ext but removes unicode control characters, / and leading dots
  • all_metadata_files is [(file_ext, path)]

These are available on wrapped collections:

  • all_metadata_files like for datasets but contains all metadata files for all datasets in collection
  • all_paths all dataset paths in collection as list
  • all_element_identifiers_and_extensions_filesystem_safe like name_and_ext_filesystem_safe, but for all datasets in collection

I'm not sure I'm really happy with the added attributes on collections ... maybe for tools like this it'd be better to work with json_wrapped and then let the tool figure out what it wants to do with the collection ?

@mvdbeek
Copy link
Member

mvdbeek commented May 28, 2021

Alright … I did enhance json_wrap, although I ended up not really using it for the tool. Instead I added a serialize method to the input wrapper classes. Both produce the same results, but as a generic way to write such tools it’s probably better to just pass in the actual data inputs, not all of the inputs of the tool, as we’d be doing with <inputs name="inputs" data_style="staging_path_and_source_path" />. I also renamed the *filesystem_safe things to staging_path(s), which seems more succinct, and which we could use for actual working directory staging in the future (xref #3220).

paths for data parameters set the ``data_style`` attribute to ``paths`` (see [inputs_as_json_with_paths.xml](https://github.com/galaxyproject/galaxy/blob/dev/test/functional/tools/inputs_as_json_with_paths.xml) for an example).
paths for data or collection inputs set the ``data_style`` attribute to ``paths`` (see [inputs_as_json_with_paths.xml](https://github.com/galaxyproject/galaxy/blob/dev/test/functional/tools/inputs_as_json_with_paths.xml) for an example).
To include a dictionary with staging paths, paths and metadata files set the ``data_style`` attribute to ``staging_path_and_source_path``.
An example tool that uses ``staging_path_and_source_path`` is [inputs_as_json_with_staging_path_and_source_path.xml](https://github.com/galaxyproject/galaxy/blobl/dev/test/functional/tools/inputs_as_json_with_staging_path_and_source_path.xml)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blobl looks wrong.

@@ -0,0 +1,67 @@
<tool id="export_remote" name="Export datasets" version="0.1.0" profile="21.05">
<description>to remote files source</description>
<!-- TODO
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is still a TODO is that correct? Do we need to mark this tools as galaxy-env-tool?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bgruening
Copy link
Member

I did add a couple of new things that are available when templating commands or configfile sections.
These are available on wrapped datasets:

name_and_ext is element_identifier.extensions
name_and_ext_filesystem_safe is the same as name_and_ext but removes unicode control characters, / and leading dots
all_metadata_files is [(file_ext, path)]

These are available on wrapped collections:

all_metadata_files like for datasets but contains all metadata files for all datasets in collection
all_paths all dataset paths in collection as list
all_element_identifiers_and_extensions_filesystem_safe like name_and_ext_filesystem_safe, but for all datasets in collection

This is not part of the PR anymore, left? This seems to be super useful, but IUC should look over it. E.g. the naming seems inconsistent.

@mvdbeek
Copy link
Member

mvdbeek commented Jun 7, 2021

This is not part of the PR anymore, left?

It's just renamed to staging_path where appropriate.
So available now for datasets are:

  • name_and_ext: is element_identifier.extensions
  • staging_path: same as name_and_ext but removes unicode control characters, / and leading dots
  • all_metadata_files: a list of tuples with extension (e.g .bai) and path

For collections:

  • all_paths all dataset paths in collection as list
  • all_staging_paths: a list of staging paths
  • all_metadata_files: a list of lists of tuples with extension (e.g .bai) and path

On top of that there is the serialize() function that returns a dict (dataset) or list of dicts (collection) whose definition is

        {'staging_path': staging_path,
         'source_path': source_path,
         'metadata_files': [{'staging_path': f"{staging_path}.{mf[0]}", 'source_path': mf[1]} for mf in metadata_files]
         }

@bgruening
Copy link
Member

Restarted the test and its green now!

@mvdbeek mvdbeek merged commit 7177d8e into galaxyproject:dev Jun 23, 2021
@nsoranzo nsoranzo deleted the export_file branch August 25, 2021 14:53
mvdbeek added a commit to mvdbeek/galaxy that referenced this pull request Sep 9, 2021
Include a test tool that fails like this without the fix:

```
Traceback (most recent call last):
  File "/Users/mvandenb/src/galaxy/lib/galaxy/util/template.py", line 80, in fill_template
    return unicodify(t, log_exception=False)
  File "/Users/mvandenb/src/galaxy/lib/galaxy/util/__init__.py", line 1060, in unicodify
    value = str(value)
  File "/Users/mvandenb/src/galaxy/.venv/lib/python3.9/site-packages/Cheetah/Template.py", line 1053, in __unicode__
    return getattr(self, mainMethName)()
  File "cheetah_DynamicallyCompiledCheetahTemplate_1631180564_759418_48584.py", line 89, in respond
  File "/Users/mvandenb/src/galaxy/lib/galaxy/tools/wrappers.py", line 342, in is_of_type
    datatype = self.datatypes_registry.get_datatype_by_extension(e)
AttributeError: 'NoneType' object has no attribute 'get_datatype_by_extension'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/mvandenb/src/galaxy/lib/galaxy/jobs/runners/__init__.py", line 237, in prepare_job
    job_wrapper.prepare()
  File "/Users/mvandenb/src/galaxy/lib/galaxy/jobs/__init__.py", line 1160, in prepare
    self.command_line, self.extra_filenames, self.environment_variables = tool_evaluator.build()
  File "/Users/mvandenb/src/galaxy/lib/galaxy/tools/evaluation.py", line 463, in build
    raise e
  File "/Users/mvandenb/src/galaxy/lib/galaxy/tools/evaluation.py", line 459, in build
    self.__build_command_line()
  File "/Users/mvandenb/src/galaxy/lib/galaxy/tools/evaluation.py", line 484, in __build_command_line
    command_line = fill_template(command, context=param_dict, python_template_version=self.tool.python_template_version)
  File "/Users/mvandenb/src/galaxy/lib/galaxy/util/template.py", line 118, in fill_template
    return fill_template(template_text=template_text,
  File "/Users/mvandenb/src/galaxy/lib/galaxy/util/template.py", line 126, in fill_template
    raise first_exception or e
  File "/Users/mvandenb/src/galaxy/lib/galaxy/util/template.py", line 80, in fill_template
    return unicodify(t, log_exception=False)
  File "/Users/mvandenb/src/galaxy/lib/galaxy/util/__init__.py", line 1060, in unicodify
    value = str(value)
  File "/Users/mvandenb/src/galaxy/.venv/lib/python3.9/site-packages/Cheetah/Template.py", line 1053, in __unicode__
    return getattr(self, mainMethName)()
  File "cheetah_DynamicallyCompiledCheetahTemplate_1631180564_725492_78593.py", line 87, in respond
  File "/Users/mvandenb/src/galaxy/lib/galaxy/tools/wrappers.py", line 342, in is_of_type
    datatype = self.datatypes_registry.get_datatype_by_extension(e)
AttributeError: 'NoneType' object has no attribute 'get_datatype_by_extension'
```

Broken in galaxyproject#11613
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants