Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wreck: better integration of global lwj.environ environment for wreck jobs #1405

Merged
merged 4 commits into from Apr 1, 2018

Conversation

Projects
None yet
4 participants
@grondo
Copy link
Contributor

grondo commented Mar 31, 2018

I was thinking that perhaps reducing content in per-job environment lwj.x.y.ID.environ might reduce the size of content store with many jobs. Turns out in most cases of submitting a long series of jobs, the environment in the KVS is identical and squashed by the content-store, but I thought I'd push up the work I did in support of better integration of the "global" job environment lwj.environ, which might be useful if each job has a slightly different environment, or just for better control of the environment.

This PR adds support to wreckrun/submit to first check lwj.environ table by default and only submit any different variables as the per job environment. A set of utility commands flux wreck setenv,getenv,unsetenv are provided to set and manipulate this global environment. A new option -S, --skip-env is also added to flux-wreckrun and submit to skip exporting the environment completely, so that the global environment is always used.

To export the current environment to all future jobs, use flux wreck setenv all. If you want to run all jobs using this environment (FLUX_URI and other Flux env vars are set explicitly per-job by wrexecd), then pass -S, --skip-env in flux-submit or flux-wreckrun and avoid the extra KVS fetch of the global lwj.environ, otherwise any environment differences are exported to the job in the job-specific env table.

E.g.:

(flux-19528-) grondo@ipa15:~/git/flux-core.git$ flux wreck getenv
(flux-19528-) grondo@ipa15:~/git/flux-core.git$ flux wreck setenv FOO=bar
(flux-19528-) grondo@ipa15:~/git/flux-core.git$ flux wreck getenv
FOO=bar
(flux-19528-) grondo@ipa15:~/git/flux-core.git$ flux wreck unsetenv FOO
(flux-19528-) grondo@ipa15:~/git/flux-core.git$ flux wreck getenv
(flux-19528-) grondo@ipa15:~/git/flux-core.git$ flux wreck setenv all    
(flux-19528-) grondo@ipa15:~/git/flux-core.git$ flux wreck getenv | wc -l
84
(flux-19528-) grondo@ipa15:~/git/flux-core.git$ flux wreckrun -n4 /bin/true
(flux-19528-) grondo@ipa15:~/git/flux-core.git$ flux kvs get lwj.0.0.1.environ
{ }
(flux-19528-) grondo@ipa15:~/git/flux-core.git$ FOO=bar flux wreckrun -n4 /bin/true
(flux-19528-) grondo@ipa15:~/git/flux-core.git$ flux kvs get lwj.0.0.2.environ
{ "FOO": "bar" }
(flux-19528-) grondo@ipa15:~/git/flux-core.git$ FOO=bar flux wreckrun -S -n4 /bin/true
(flux-19528-) grondo@ipa15:~/git/flux-core.git$ flux kvs get lwj.0.0.3.environ
{ }
(flux-19528-) grondo@ipa15:~/git/flux-core.git$ flux wreck getenv LANG
LANG=C
(flux-19528-) grondo@ipa15:~/git/flux-core.git$ flux wreck setenv LANG=en_US
(flux-19528-) grondo@ipa15:~/git/flux-core.git$ flux wreck getenv LANG
LANG=en_US
(flux-19528-) grondo@ipa15:~/git/flux-core.git$ flux wreckrun -S -n4 printenv LANG
en_US
en_US
en_US
en_US

@grondo grondo force-pushed the grondo:wreck-setenv branch from e9390bb to 4f79732 Mar 31, 2018

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented Mar 31, 2018

Excellent! Thanks @grondo. I am sure this in combination with R_lite instead of rdl emitted from the scheduler will make the memory size and growth day and night difference. At some point, it seems measuring the actual differences would provide a good data point towards the final design!

@coveralls

This comment has been minimized.

Copy link

coveralls commented Mar 31, 2018

Coverage Status

Coverage increased (+0.05%) to 78.928% when pulling 42b982e on grondo:wreck-setenv into 1e7d621 on flux-framework:master.

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Mar 31, 2018

This is great! Will merge once travis finishes.

@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented Mar 31, 2018

This may not make as much difference as expected, unless for some reason each flux submit has a different environment. I tried out the soak test and discovered that the environment for all jobs was squashed by content-store (so that was working as designed!)

In addition to .rdl key, I'm now thinking that another source of content growth might be the job "lwj" directories themselves, given how they are built up piecemeal, so each job directory may have several versions as new directory entries are created and committed. I'm wondering if another short-term hack would be to precreate all know keys and directories in the lwj.x.y.z dir at job.create time by the job module...

@grondo grondo force-pushed the grondo:wreck-setenv branch from 4f79732 to 8faa637 Mar 31, 2018

@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented Mar 31, 2018

Had to force-push a fix for the new tests under --chain-lint. Hopefully should pass Travis now.

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Mar 31, 2018

I'm wondering if another short-term hack would be to precreate all know keys and directories in the lwj.x.y.z dir at job.create time by the job module...

Do you mean have the commands stop writing to the KVS, and instead send their info to the job module to be put in (like request parameters, environment, etc?)

@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented Mar 31, 2018

Do you mean have the commands stop writing to the KVS, and instead send their info to the job module to be put in (like request parameters, environment, etc?)

That's too big a change for this implementation I think.

I think actually I retract my previous statement. I was at first thinking that if we precreate all directory entries for the job (e.g. rank 0-n dirs, exit_status, etc), then the directory object itself would not have as many revisions, but I forgot for a second how the kvs works, sorry...

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Mar 31, 2018

restarted clang build that hung here

PASS: t4000-issues-test-driver.t 2 - t0505-msg-handler-reg.lua
PASS: t4000-issues-test-driver.t 3 - t0821-kvs-segfault.sh
@garlick

This comment has been minimized.

Copy link
Member

garlick commented Apr 1, 2018

I think this just needs a rebase? I can take care of it for you @grondo if you're already heading out of town.

grondo added some commits Mar 31, 2018

wreck: don't ignore lwj.environ in wreckrun/submit
wreck execution first reads a global lwj.environ table and uses
that as the default environment for all jobs, overridden by any
environment variables encoded in the per-job `environ` table.

Currently, flux-wreckrun and flux-submit do not check for lwj.environ
and instead export the entire (filtered) current environment to each
job, so the lwj.environ is largely going ignored.

Fix this by first reading a "default environment" from lwj.environ
in wreckrun/submit, and only push variables that are not already
the same as the default env to the job. This should cut down on
KVS size for large numbers of jobs with largely the same, but slightly
different, environments. (If all environment tables are the same, then
this change does nothing since all environ entries in KVS will be
squashed by CAS)
wreck: Add -S, --skip-env option to wreckrun/submit
Add an option -S, --skip-env to submit and wreckrun to skip the
export of current environment to the job. This can be used to speed
up submit/wreckrun when the global lwj.environ is sufficient, since
that default table doesn't need to be fetched from the KVS before
submitting the job to the instance.
wreck: add flux wreck environment manipulation commands
Add flux-wreck setenv, getenv, unsetenv commands to set, get, and
manipulate the global wreck environment under lwj.environ.
t/t2000-wreck-env.t: add tests for wreck global environment
Add tests for wreck global environment lwj.environ, and corresponding
commands to manipulate and use that environemnt.

Note: the test is split from t2000-wreck.t but still prefixed with
t2000 for easier test devlepment, parallel tests, etc. Since this is
short-lived code, it is probably ok for now.

@grondo grondo force-pushed the grondo:wreck-setenv branch from 8faa637 to 42b982e Apr 1, 2018

@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented Apr 1, 2018

Rebased. Thanks!

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Apr 1, 2018

Restarted another build hung in the same t4000-issues test as above.

@garlick garlick merged commit 4a51fe1 into flux-framework:master Apr 1, 2018

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.05%) to 78.928%
Details
@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented Apr 1, 2018

Thanks, @garlick!

@grondo grondo deleted the grondo:wreck-setenv branch Apr 26, 2018

@grondo grondo referenced this pull request May 10, 2018

Closed

0.9.0 Release #1479

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.