Allow users to write to any volume #200

jakepruitt · 2018-05-23T21:23:41Z

Adds user-defined volumes to the tmp directory PR #199.

Things I'm not sure about:

performing chmod 777 in the watcher before messages begin. Should this be per-worker instead? I figured we might as well save unnecessary work.
I feel like my tests are missing an important integration piece, but I don't want to actually make chmod system calls in the tests. Not sure how I feel there.
Does this actually work? I'm going to test it out on a quick stack that performs writes to user-defined volumes.
Code quality: I have one pyramid of doom and a few long-lines in the test code. Should I make those prettier?

Since this isn't a painful implementation (barring any testing surprises), I'm happy to go with this over #199.

cc/ @rclark @mapbox/platform-engine-room

rclark

So as a consequence of not actually creating a user in #199, you have gained a confusing step here, where you have to make the other volumes available to anyone.

You'd need to chmod either way -- if you define a user you'd have to give that user permission to /tmp and to the other mounted volumes.

I don't want to actually make chmod system calls in the tests

I thought you put all the tests inside a docker container. Are you worried that the chmod will impact files mounted from outside the container?

rclark · 2018-05-23T21:37:05Z

lib/watcher.js

+  async chmod(volumes) {
+    return Promise.all(volumes.map((volume) => {
+      return new Promise((resolve, reject) => {
+        fs.chmod(volume, 0o777, (err) => {


Funtime! try util.promisify()!

rclark · 2018-05-23T21:40:16Z

lib/watcher.js

@@ -36,6 +48,8 @@ class Watcher {
        setImmediate(loop);
      };

+      await this.chmod(this.workerOptions.volumes);


Since the order of these operations is

chmod

then start polling

... it would be clearer if you do the await chmod() on L24, before the return new Promise() step.

jakepruitt · 2018-05-24T01:24:02Z

Things are looking good on the test side! Non-volume writes fail, but /tmp and user-defined mounts succeed!

Going to modify some tests

vsmart · 2018-05-24T12:34:46Z

lib/watcher.js

@@ -20,8 +24,14 @@ class Watcher {
    this.messages = Messages.create({ queueUrl: options.queueUrl });
  }

+  async chmod(volumes) {


Could you rename this function to indicate how the permissions are changing here? Without looking up the permissions code 0o777 it's hard to know what this is doing.

Edit: Also, is this recursive by default, or do you know to indicate that all files are modified recursively with a -R?

@vsmart both of the volumes modified here should be empty at the time when the container starts up, so I do not think -R will be necessary.

vsmart · 2018-05-24T12:36:25Z

lib/worker.js

@@ -53,8 +54,8 @@ class Worker {
    return await this.message.retry();
  }

-  async cleanTmp() {
-    return fsExtra.emptyDir('/tmp');
+  async clean(volumes) {


Great that you used async/await style! It's very nice to read. 💯

vsmart · 2018-05-24T12:38:05Z

test/watcher.test.js

@@ -7,6 +7,7 @@ const Watcher = require('../lib/watcher');
 const Message = require('../lib/message');
 const Messages = require('../lib/messages');
 const Worker = require('../lib/worker');
+const fs = require('fs');


Could you add a test on how this behaves when volumes is empty []? Is there a default volume or does it error out?

This is one part that I'm not sure about - the defaults are set in the cloudformation template to always include the /tmp directory:

ecs-watchbot/lib/template.js

Lines 65 to 68 in a2126d2

const mounts = {

mountPoints: [{ ContainerPath: '/tmp', SourceVolume: 'tmp' }],

volumes: [{ Name: 'tmp' }]

};

I'm not sure if there's any other default setting that I should do - should I add defaulting in the watcher? In the workers?

jakepruitt · 2018-05-25T02:21:46Z

🤔 Just realized the implications here about persistent volumes. Should we remove the ability to create persistent volumes in the ./lib/template.js file, and from all of the documentation?

jakepruitt · 2018-05-25T02:39:43Z

Note: we could keep persistent volumes around and only clean ephemeral volumes.

jakepruitt · 2018-05-25T22:21:18Z

After digging through the 22 repositories where mounts are defined for watchbot stacks, the only stack that uses persistent volumes is ecs-conex: https://github.com/mapbox/ecs-conex/blob/8014ea9d2e791d54a930c8ad861173d10e911586/cloudformation/ecs-conex.template.js#L22

mounts: '/mnt/data:/mnt/data,/var/run/docker.sock:/var/run/docker.sock',

While I know the /var/run/docker.sock link needs to be there for connecting the two containers, I do not believe the /mnt/data needs to be a persistent volume.

For this use case, could we change the /mnt/data to a temporary volume?

Could we also change the /var/run/docker.sock to a read-only ContainerDefinition.LinuxParameters.Device?

I'll try out ecs-conex with this branch and see what's needed for those modifications.

jakepruitt · 2018-06-12T23:48:10Z

I spent yesterday trying this PR out with https://github.com/mapbox/ecs-conex to see if the restrictions were too limiting for that repository, which is kind of a permissions edge case. After many different attempts, it seems like the non-sudo user for the child process gets explicitly prevented from using the /var/run/docker.sock connection. So I think we'll need to use fresh: true (refs #196) for ecs-conex.

A few other notes about this PR:

In the process of getting ecs-conex set up on watchbot 4, I noticed a few other features in lib/template that needed to be implemented, like returning an object with refs. I went ahead and added that here.
I switched the Logs resource to LogGroup to maintain backwards consistency
Since OSX doesn't have a /tmp directory, I switched the tests to run in a linux container. This makes the tests a little bit slower to run.
I'm not 100% certain with how I've created the defaults. I am automatically adding the /tmp directory as a volume for all watchbot projects. As far as I know, I don't think this'll cause any problems.

I'm opening this up to review for anyone on @mapbox/platform-engine-room that wants to review it before it merges into #184.

rclark · 2018-06-13T00:43:01Z

I think we'll need to use fresh: true

I'm worried about this -- both from the perspective of performance implications that have been raised during benchmarking of #196, and from the fact that root permission feels like a side-effect: you want to run the child process as sudo, and that just so happens to be in the case in "fresh mode" where the primary goal is to turn containers off after every run.

On the later point I may feel more satisfied if it was sudo: true instead of fresh: true. At least that makes the contract more clear. "You can run the child process as root but you can't run very fast".

jakepruitt · 2018-06-13T00:54:21Z

On the later point I may feel more satisfied if it was sudo: true instead of fresh: true. At least that makes the contract more clear. "You can run the child process as root but you can't run very fast".

Just to be extra clear - you're talking about changing the name of fresh mode to sudo mode, not adding an option in the normal mode where the containers are re-used but you get to run as sudo?

If so, I think you're probably right, the fresh distinction is probably confusing since normal mode does "freshen" the filesystem. The main difference is the permission you have to write to the entire filesystem or not.

rclark · 2018-06-13T01:08:28Z

I'm teeter-tottering on that distinction.

The feature that you need to make conex work is sudo mode -- where the child process is run as root. The feature that you have is fresh mode, where you allow the child process to write anywhere on the filesystem and you turn of the container after every job. It just so happens that branch is written such that the child process runs as root, though that does seem to be the feature's reason-for-being.

If you rename fresh -> sudo, the language makes more sense for the conex use case you're solving. However you're adding the baggage of poor scheduler performance and excessive container start/stop overheard where you really don't need that. Conex doesn't want to write anywhere on the filesystem, it just wants to run as root.

A performant sudo-mode could still use the default logic to clean up a mounted /tmp volume and you'd have a performant solution to the conex problem. While I'm not a big fan of the proliferation of "modes" here, I think that sudo-mode on its own is definitely a better solution to the conex problem.

arunasank · 2018-06-13T05:07:50Z

A performant sudo-mode could still use the default logic to clean up a mounted /tmp volume and you'd have a performant solution to the conex problem. While I'm not a big fan of the proliferation of "modes" here, I think that sudo-mode on its own is definitely a better solution to the conex problem.

Is there a reason why sudo mode cannot be the default in both fresh mode and turbo mode, considering it already is the default in fresh mode?

jakepruitt · 2018-06-13T18:18:07Z

Is there a reason why sudo mode cannot be the default in both fresh mode and turbo mode, considering it already is the default in fresh mode?

The mechanism that default (turbo) mode uses to restrict writes to only volumes is by changing the user to a non-sudo user and restricting that user's permissions. So as long as we move forward with the restriction approach, we won't implement any sudo behavior in the default mode.

Conex doesn't want to write anywhere on the filesystem, it just wants to run as root.

I think that sudo-mode on its own is definitely a better solution to the conex problem

I'm really interested in this kind of use case, and think it may bite us in other projects. It would be nice to be able to say "you're running as root, but root can't write anywhere." Maybe we could acheive this by performing multiple chmod's on all non-volume directories, preventing the root user from ever writing there, but still allowing execution and reading? No matter how we implement it, I think we want to make sure that any modes that re-use the container have restrictions preventing them from writing anywhere on the file system.

jakepruitt · 2018-06-14T19:03:13Z

After chatting today, someone brought up the --read-only docker option (refs https://docs.docker.com/engine/reference/commandline/run/#mount-volume--v---read-only and ReadOnlyRootFilesystem https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-ecs-taskdefinition-containerdefinitions.html)

This does exactly what this PR does manually. We should maybe try to do that instead.

jakepruitt · 2018-06-15T00:31:34Z

Alright, after testing this out on the read-only file system it looks good for a test stack. Going to try on conex.

jakepruitt · 2018-06-16T00:44:07Z

Successfully got the container recycling set up with ecs-conex at https://github.com/mapbox/ecs-conex/tree/watchbot4!

I think the approach here is nice and succinct. Going to merge it into the #184 PR now!

* Restrict writes to volumes and clean them after every job * Try out the `ReadOnlyRootFilesystem` option * Capitalization * Add watchbot-log * use strict * No need to chmod now

* ♻ that container * Add logging (#185) * adds logging of watcher-level errors, worker receives, and completion status * prefixed logs from child processes * fixes logger factory to accept a message * --> false for legibility * move binary split to dependency * package lock changes * Scale down threshold (#187) * change scale-down MetricIntervalLowerBound to MetricIntervalUpperBound * exit main loop after workers finish * resolve() after all workers return instead of exiting * fix tests and mocks * cleanup * logs * another log * use logger and process.stdout for logs * more logs * edit logs * Add alarms to "♻️ that container" PR (#198) * Add alarms and alarm docs * Add failedPlacementAlarmPeriods * Add CloudWatch Alarms snapshots * Update template jest snapshots * Add CloudWatch Alarms snapshots * Add failedworker and failedworkerplacement metric * Typo r/LogGroup/Logs * Change metric name * Metric Filter of worker errors to "[failure]" * Have current published version instead of undefined * Jake's Review * uh update-jest * Update alarms.md * Add template validation tests (#215) * Add travis user * Ensure this fails * Add validation for notificationEmail or notificationTopic * Add minSize and maxSize of service scaleup and scaledown, deadletter queue threshold, info to doc (#211) * Closes #208, #207, #206, #182, #149, #72, #15 (cherry picked from commit 8de328df79ccf52b8d612c625891555808c2fa0e) * Add minSize as option * update jest tests * Change MinSize to 0 * update jest * identation and minSize to 0 * Add deadletterThreshold info in Worker-retry-cycle * Update tests with maxSize property * remove superfluous logging * add fresh mode as a watchbot option * if else * freshMode * console log * typeof * true * concise * add fresh * fix tests * fix binary test * update snapshots * Allow users to write to any volume (#200) * Restrict writes to volumes and clean them after every job * Try out the `ReadOnlyRootFilesystem` option * Capitalization * Add watchbot-log * use strict * No need to chmod now

jakepruitt requested review from rclark, arunasank and vsmart May 23, 2018 21:23

rclark reviewed May 23, 2018

View reviewed changes

vsmart reviewed May 24, 2018

View reviewed changes

jakepruitt changed the base branch from writes-to-tmp to container-recycling June 4, 2018 21:46

jakepruitt mentioned this pull request Jun 4, 2018

Restrict writes to tmp and clean after every worker #199

Closed

jakepruitt force-pushed the any-volume branch 3 times, most recently from f5a62b6 to a9b904b Compare June 12, 2018 20:17

jakepruitt force-pushed the container-recycling branch from 1cdfdf9 to 63cba8b Compare June 12, 2018 20:20

jakepruitt force-pushed the any-volume branch from a9b904b to ba05a67 Compare June 12, 2018 20:34

jakepruitt requested review from tapaswenipathak and brendanmcfarland June 12, 2018 23:49

Restrict writes to volumes and clean them after every job

3fe1bbd

jakepruitt force-pushed the any-volume branch from ba05a67 to 3fe1bbd Compare June 14, 2018 22:19

jakepruitt added 2 commits June 14, 2018 16:22

Try out the ReadOnlyRootFilesystem option

82b0147

Capitalization

cd35e7c

jakepruitt added 3 commits June 15, 2018 13:33

Add watchbot-log

fa52140

use strict

8c01ee4

No need to chmod now

5bed2d4

jakepruitt merged commit 3a196da into container-recycling Jun 16, 2018

jakepruitt deleted the any-volume branch June 16, 2018 00:44

jakepruitt mentioned this pull request Jun 18, 2018

Restricting writes to just /tmp and cleaning /tmp after every worker #197

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow users to write to any volume #200

Allow users to write to any volume #200

jakepruitt commented May 23, 2018

rclark left a comment

rclark May 23, 2018

rclark May 23, 2018

jakepruitt commented May 24, 2018

vsmart May 24, 2018

vsmart May 24, 2018

jakepruitt May 24, 2018

vsmart May 24, 2018

vsmart May 24, 2018

jakepruitt May 24, 2018

jakepruitt commented May 25, 2018

jakepruitt commented May 25, 2018

jakepruitt commented May 25, 2018

jakepruitt commented Jun 12, 2018

rclark commented Jun 13, 2018 •

edited

Loading

jakepruitt commented Jun 13, 2018

rclark commented Jun 13, 2018

arunasank commented Jun 13, 2018

jakepruitt commented Jun 13, 2018

jakepruitt commented Jun 14, 2018

jakepruitt commented Jun 15, 2018

jakepruitt commented Jun 16, 2018

	const mounts = {
	mountPoints: [{ ContainerPath: '/tmp', SourceVolume: 'tmp' }],
	volumes: [{ Name: 'tmp' }]
	};

Allow users to write to any volume #200

Allow users to write to any volume #200

Conversation

jakepruitt commented May 23, 2018

rclark left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jakepruitt commented May 24, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jakepruitt commented May 25, 2018

jakepruitt commented May 25, 2018

jakepruitt commented May 25, 2018

jakepruitt commented Jun 12, 2018

rclark commented Jun 13, 2018 • edited Loading

jakepruitt commented Jun 13, 2018

rclark commented Jun 13, 2018

arunasank commented Jun 13, 2018

jakepruitt commented Jun 13, 2018

jakepruitt commented Jun 14, 2018

jakepruitt commented Jun 15, 2018

jakepruitt commented Jun 16, 2018

rclark commented Jun 13, 2018 •

edited

Loading