New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved home and temp directory handling. #5193

Merged
merged 4 commits into from Jan 19, 2018

Conversation

@jmchilton
Member

jmchilton commented Dec 11, 2017

Home Directory Handling

  • For profile < 18.01 tools:
    • If use_shared_home="false" is explicitly set on the command block - the tool will be given a clean $HOME directory.
    • If shared_home_dir is set in Galaxy's config or the job destination configuration, Galaxy will set $HOME in the tool's job environment to point at this directory.
  • For profile >= 18.01 tools, jobs will be given a clean home directory by default unless use_shared_home="true" is set. If that is set, the profile < 18.01 behavior is used.

In addition to these changes for Galaxy tools, the tool framework itself has been updated to allow other potential behaviors including the CWL defaults.

Integration tests for each of these cases has been added.

Upgrading to 18.01 - while nothing should break by default I guess we should recommend dropping overriding HOME in any Galaxy environment configuration (e.g. env directives in job_conf.xml). If any such configuration is found, we should recommend instead setting "shared_home_dir" for destinations to the that cluster destination's shared HOME directory. If all destinations share a single HOME directory, this can be set in galaxy.ini instead of job_conf.xml.

Temp Directory Handling

There were serious disagreements with how to proceed here (#4606). I felt we should aggressively isolate tool TMP directories and provide deployers options to tweak these through structured arguments. @nsoranzo felt we should defer to the environment variables already being set job_conf.xml - but tweak their meanings by default. @natefoo fell somewhere in between. I think all sides have merit and could make sense depending on what you want to improve (easing cognitive load, reducing out-of-box errors, easing advanced deployer configs, enhancing structured reasoning of paths by Galaxy, etc...).

I've navigated a path here that doesn't particularly reduce out of the box errors as I wanted by setting up a per-job temp directory by default but does make it possible and shouldn't break any existing configurations and doesn't make it anything more difficult to manage these things via environment variables.

The new approach doesn't change any thing by default for Galaxy tools (regardless of tool profile version). The only way to achieve new behaviors is for the deployer to set a job_conf parameter called tmp_dir. If this is set, it can be set to True (to default to a new, clean directory below the job directory) or to a shell expression to allow setting this dynamically at runtime to paths or paths beneath directories as needed on a per-destination basis. If a temp directory is set this way all three variables TMP, TMPDIR, and TEMP are set.

In addition to these changes for Galaxy tools, the tool framework itself has been updated to allow other potential behaviors including the CWL defaults.

Integration tests for each of these cases has been added.

Upgrading to 18.01 - no changes needed.

Docker Environment Handling

  • Introduce integration tests for Docker jobs.
  • Improve Docker job execution to set group id for running Docker in addition user id.
  • Mirror the CWL behavior of mounting an external /tmp by default.
  • For newer tools, pass through the HOME, TMP, TMPDIR, and TEMP environment variables into the Docker container in addition to GALAXY_SLOTS.
  • Fix the pass through of environment variables into Docker containers.
@dannon

This comment has been minimized.

Member

dannon commented Dec 18, 2017

@nsoranzo, @bgruening, and @natefoo ping, maybe?

@@ -21,6 +22,16 @@ def skip_if_jenkins(cls):
return cls
def skip_unless_executable(executable):
if which(executable):
return lambda func: func

This comment has been minimized.

@bgruening

bgruening Dec 20, 2017

Member

This is a wired syntax. For everyone that is searching for it: https://docs.python.org/2/library/unittest.html#skipping-tests-and-expected-failures

# explicit processing by Galaxy.
for environment_variable in self.environment_variables:
if environment_variable.get("name") == "HOME":
home_target = None

This comment has been minimized.

@bgruening

bgruening Dec 20, 2017

Member

If it is HOME you can continue I guess.

This comment has been minimized.

@natefoo

natefoo Jan 8, 2018

Member

+1. It also seems weird that tools should be able to set these vars but I guess this a feature for local tool devs or Galaxies?

This comment has been minimized.

@jmchilton

jmchilton Jan 8, 2018

Member

There was a CWL tool in the GA4GH that set this I believe. It is not so common in the GalaxyXML/BioConda world - but if you take a container-first approach to development I guess one can see why someone might want to do this. If we want to disallow this edge case we maybe could - but the backend should support it - just not the Galaxy tool parser IMO.

I'll rebase with the continue and break requested.

This comment has been minimized.

@natefoo

natefoo Jan 9, 2018

Member

No, I think this is ok, just wanted to make sure I understood what was happening. I appreciate the explanation.

home_target = None
for tmp_directory in self.tmp_directories:
if environment_variable.get("name") == tmp_directory:
tmp_target = None

This comment has been minimized.

@bgruening

bgruening Dec 20, 2017

Member

If tmp_target is None you don't need this for loop anymore.

This comment has been minimized.

@natefoo
@bgruening

This comment has been minimized.

Member

bgruening commented Dec 20, 2017

I'm fine with what I understood :).
Leaving it to @natefoo and @nsoranzo.

@natefoo

natefoo approved these changes Jan 8, 2018

tmp_dir = self.get_destination_configuration("tmp_dir", None)
if not tmp_dir or tmp_dir.lower() == "true":
working_directory = self.working_directory
return '$(mktemp -d "%s/tmp.XXXXXXXXX")' % working_directory

This comment has been minimized.

@natefoo

natefoo Jan 8, 2018

Member

Since this is in the working dir, I think something like $([ ! -e %s/tmp ] || mv %s/tmp %s/tmp.$(date +%Y%m%d-%H%M%S) ; mkdir %s/tmp) may be better so the temp dir name is constant. I've found when working with jobs as an admin, this deliberate way (Pulsar does this) is more useful.

# explicit processing by Galaxy.
for environment_variable in self.environment_variables:
if environment_variable.get("name") == "HOME":
home_target = None

This comment has been minimized.

@natefoo

natefoo Jan 8, 2018

Member

+1. It also seems weird that tools should be able to set these vars but I guess this a feature for local tool devs or Galaxies?

home_target = None
for tmp_directory in self.tmp_directories:
if environment_variable.get("name") == tmp_directory:
tmp_target = None

This comment has been minimized.

@natefoo
@@ -155,7 +155,7 @@ def build_docker_run_command(
if terminal:
command_parts.append("-t")
for env_directive in env_directives:
command_parts.extend(["-e", shlex_quote(env_directive)])
command_parts.extend(["-e", env_directive])

This comment has been minimized.

@natefoo

natefoo Jan 8, 2018

Member

Is removing quoting intentional here?

This comment has been minimized.

@jmchilton

jmchilton Jan 8, 2018

Member

Yes - this was broken without that change. I'll rebase with a comment.

@@ -622,6 +622,21 @@ def parse(self, tool_source, guid=None):
self.parse_command(tool_source)
self.environment_variables = self.parse_environment_variables(tool_source)
self.tmp_directories = tool_source.parse_tmp_directories()

This comment has been minimized.

@natefoo

natefoo Jan 8, 2018

Member

Is this more accurately tmp_directory_vars or tmp_directory_varnames?

This comment has been minimized.

@jmchilton

jmchilton Jan 8, 2018

Member

Yes that would clarify this a lot I think.

jmchilton added some commits Mar 13, 2017

Dockerized job integration testing.
Add simple test cases for both mulled containers and explicit Docker image resolution.
Set group id in addition to user id when running docker containers by…
… default.

Noticed cwltool does this and it is a good idea. It wasn't the bug I was tracking but it is a solid improvement.
Improved home and temp directory handling.
Home Directory Handling
-----------------------

- For profile < 18.01 tools:
  - If use_shared_home="false" is set on the command block - the tool will be given a clean $HOME directory.
  - If ``shared_home_dir`` is set in Galaxy's config or the job destination configuration, Galaxy will set $HOME in the tool's job environment to point at this directory.
- For profile >= 18.01 tools, jobs will be given a clean home directory by default unless ``use_shared_home="true"`` is set. If that is set, the profile < 18.01 behavior is used.

In addition to these changes for Galaxy tools, the tool framework itself has been updated to allow other potential behaviors including the CWL defaults.

Integration tests for each of these cases has been added.

Upgrading to 18.01 - we recommend dropping overriding ``HOME`` in any Galaxy environment configuration (e.g. env directives in job_conf.xml). If any such configuration is found, we recommend instead setting "shared_home_dir" for destinations to the that cluster destination's shared HOME directory. If all destinations share a single HOME directory, this can be set in galaxy.ini instead of job_conf.xml.

Temp Directory Handling
-----------------------

There were serious disagreements with how to proceed here. I felt we should aggressively isolate tool TMP directories and provide deployers options to tweak these through structured arguments. Nicola felt we should defer to the environment variables already being set job_conf.xml - but tweak their meanings by default. Nate fell somewhere in between. I think all sides have merit and could make sense depending on what you want to improve (easing cognative load, reducing out-of-box errors, easing advanced deployer configs, structured reasonsing of paths by Galaxy, etc...).

I've navigated a path here that doesn't particularly reduce out of the box errors as I wanted by setting up a per-job temp directory by default but does make it possible and shouldn't break any existing configurations and doesn't make it anything more difficult to manage these things via environment variables.

The new approach doesn't change any thing by default for Galaxy tools (regardless of tool profile version). The only way to achieve new behaviors is for the deployer to set a job_conf parameter called "tmp_dir". If this is set, it can be set to "True" (to default to a new, clean directory below the job directory) or to a shell expression to allow setting this dynamically at runtime to paths or paths beneath directories as needed on a per-destination basis. If a temp directory is set this way all three variables TMP, TMPDIR, and TEMP are set.

Docker Environment Handling
---------------------------

- Mirror the CWL behavior of mounting an external /tmp by default.
- For newer tools, pass through the HOME, TMP, TMPDIR, and TEMP environment variables into the Docker container in addition to GALAXY_SLOTS.
- Fix the pass through of environment variables into Docker containers.

jmchilton added a commit to jmchilton/galaxy that referenced this pull request Jan 9, 2018

@jmchilton

This comment has been minimized.

Member

jmchilton commented Jan 9, 2018

I guess I think this is ready to go again - thanks for the comments @bgruening and @natefoo. There was a failing API test but I think it is the one marked flakey as of https://github.com/galaxyproject/galaxy/pull/5286/files but I'm re-running the tests to make sure.

@martenson martenson requested a review from nsoranzo Jan 17, 2018

@nsoranzo

This comment has been minimized.

Member

nsoranzo commented Jan 19, 2018

Unfortunately I won't have time to properly review this PR until 2018-01-25, but the description looks good to me, so feel free to merge it to have it in before 18.01 is branched. Sorry about that!

@natefoo natefoo merged commit 080148a into galaxyproject:dev Jan 19, 2018

6 checks passed

api test Build finished. 343 tests run, 4 skipped, 0 failed.
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
framework test Build finished. 167 tests run, 0 skipped, 0 failed.
Details
integration test Build finished. 79 tests run, 4 skipped, 0 failed.
Details
selenium test Build finished. 118 tests run, 3 skipped, 0 failed.
Details
toolshed test Build finished. 577 tests run, 0 skipped, 0 failed.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment