New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow users to write to any volume #200
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So as a consequence of not actually creating a user in #199, you have gained a confusing step here, where you have to make the other volumes available to anyone.
You'd need to chmod either way -- if you define a user you'd have to give that user permission to /tmp and to the other mounted volumes.
I don't want to actually make chmod system calls in the tests
I thought you put all the tests inside a docker container. Are you worried that the chmod will impact files mounted from outside the container?
lib/watcher.js
Outdated
async chmod(volumes) { | ||
return Promise.all(volumes.map((volume) => { | ||
return new Promise((resolve, reject) => { | ||
fs.chmod(volume, 0o777, (err) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Funtime! try util.promisify()!
lib/watcher.js
Outdated
@@ -36,6 +48,8 @@ class Watcher { | |||
setImmediate(loop); | |||
}; | |||
|
|||
await this.chmod(this.workerOptions.volumes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the order of these operations is
- chmod
- then start polling
... it would be clearer if you do the await chmod()
on L24, before the return new Promise()
step.
Things are looking good on the test side! Non-volume writes fail, but Going to modify some tests |
lib/watcher.js
Outdated
@@ -20,8 +24,14 @@ class Watcher { | |||
this.messages = Messages.create({ queueUrl: options.queueUrl }); | |||
} | |||
|
|||
async chmod(volumes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you rename this function to indicate how the permissions are changing here? Without looking up the permissions code 0o777
it's hard to know what this is doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edit: Also, is this recursive by default, or do you know to indicate that all files are modified recursively with a -R
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vsmart both of the volumes modified here should be empty at the time when the container starts up, so I do not think -R
will be necessary.
lib/worker.js
Outdated
@@ -53,8 +54,8 @@ class Worker { | |||
return await this.message.retry(); | |||
} | |||
|
|||
async cleanTmp() { | |||
return fsExtra.emptyDir('/tmp'); | |||
async clean(volumes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great that you used async/await style! It's very nice to read. 💯
test/watcher.test.js
Outdated
@@ -7,6 +7,7 @@ const Watcher = require('../lib/watcher'); | |||
const Message = require('../lib/message'); | |||
const Messages = require('../lib/messages'); | |||
const Worker = require('../lib/worker'); | |||
const fs = require('fs'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a test on how this behaves when volumes is empty []
? Is there a default volume or does it error out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one part that I'm not sure about - the defaults are set in the cloudformation template to always include the /tmp
directory:
Lines 65 to 68 in a2126d2
const mounts = { | |
mountPoints: [{ ContainerPath: '/tmp', SourceVolume: 'tmp' }], | |
volumes: [{ Name: 'tmp' }] | |
}; |
I'm not sure if there's any other default setting that I should do - should I add defaulting in the watcher? In the workers?
🤔 Just realized the implications here about persistent volumes. Should we remove the ability to create persistent volumes in the |
Note: we could keep persistent volumes around and only clean ephemeral volumes. |
After digging through the 22 repositories where mounts are defined for watchbot stacks, the only stack that uses persistent volumes is ecs-conex: https://github.com/mapbox/ecs-conex/blob/8014ea9d2e791d54a930c8ad861173d10e911586/cloudformation/ecs-conex.template.js#L22
While I know the For this use case, could we change the Could we also change the I'll try out ecs-conex with this branch and see what's needed for those modifications. |
f5a62b6
to
a9b904b
Compare
1cdfdf9
to
63cba8b
Compare
I spent yesterday trying this PR out with https://github.com/mapbox/ecs-conex to see if the restrictions were too limiting for that repository, which is kind of a permissions edge case. After many different attempts, it seems like the non-sudo user for the child process gets explicitly prevented from using the A few other notes about this PR:
I'm opening this up to review for anyone on @mapbox/platform-engine-room that wants to review it before it merges into #184. |
I'm worried about this -- both from the perspective of performance implications that have been raised during benchmarking of #196, and from the fact that root permission feels like a side-effect: you want to run the child process as sudo, and that just so happens to be in the case in "fresh mode" where the primary goal is to turn containers off after every run. On the later point I may feel more satisfied if it was |
Just to be extra clear - you're talking about changing the name of If so, I think you're probably right, the |
I'm teeter-tottering on that distinction. The feature that you need to make conex work is sudo mode -- where the child process is run as root. The feature that you have is fresh mode, where you allow the child process to write anywhere on the filesystem and you turn of the container after every job. It just so happens that branch is written such that the child process runs as root, though that does seem to be the feature's reason-for-being. If you rename fresh -> sudo, the language makes more sense for the conex use case you're solving. However you're adding the baggage of poor scheduler performance and excessive container start/stop overheard where you really don't need that. Conex doesn't want to write anywhere on the filesystem, it just wants to run as root. A performant sudo-mode could still use the default logic to clean up a mounted |
Is there a reason why sudo mode cannot be the default in both fresh mode and turbo mode, considering it already is the default in fresh mode? |
The mechanism that default (turbo) mode uses to restrict writes to only volumes is by changing the user to a non-sudo user and restricting that user's permissions. So as long as we move forward with the restriction approach, we won't implement any sudo behavior in the default mode.
I'm really interested in this kind of use case, and think it may bite us in other projects. It would be nice to be able to say "you're running as root, but root can't write anywhere." Maybe we could acheive this by performing multiple chmod's on all non-volume directories, preventing the root user from ever writing there, but still allowing execution and reading? No matter how we implement it, I think we want to make sure that any modes that re-use the container have restrictions preventing them from writing anywhere on the file system. |
After chatting today, someone brought up the This does exactly what this PR does manually. We should maybe try to do that instead. |
Alright, after testing this out on the read-only file system it looks good for a test stack. Going to try on conex. |
Successfully got the container recycling set up with ecs-conex at https://github.com/mapbox/ecs-conex/tree/watchbot4! I think the approach here is nice and succinct. Going to merge it into the #184 PR now! |
* Restrict writes to volumes and clean them after every job * Try out the `ReadOnlyRootFilesystem` option * Capitalization * Add watchbot-log * use strict * No need to chmod now
* ♻ that container * Add logging (#185) * adds logging of watcher-level errors, worker receives, and completion status * prefixed logs from child processes * fixes logger factory to accept a message * --> false for legibility * move binary split to dependency * package lock changes * Scale down threshold (#187) * change scale-down MetricIntervalLowerBound to MetricIntervalUpperBound * exit main loop after workers finish * resolve() after all workers return instead of exiting * fix tests and mocks * cleanup * logs * another log * use logger and process.stdout for logs * more logs * edit logs * Add alarms to "♻️ that container" PR (#198) * Add alarms and alarm docs * Add failedPlacementAlarmPeriods * Add CloudWatch Alarms snapshots * Update template jest snapshots * Add CloudWatch Alarms snapshots * Add failedworker and failedworkerplacement metric * Typo r/LogGroup/Logs * Change metric name * Metric Filter of worker errors to "[failure]" * Have current published version instead of undefined * Jake's Review * uh update-jest * Update alarms.md * Add template validation tests (#215) * Add travis user * Ensure this fails * Add validation for notificationEmail or notificationTopic * Add minSize and maxSize of service scaleup and scaledown, deadletter queue threshold, info to doc (#211) * Closes #208, #207, #206, #182, #149, #72, #15 (cherry picked from commit 8de328df79ccf52b8d612c625891555808c2fa0e) * Add minSize as option * update jest tests * Change MinSize to 0 * update jest * identation and minSize to 0 * Add deadletterThreshold info in Worker-retry-cycle * Update tests with maxSize property * remove superfluous logging * add fresh mode as a watchbot option * if else * freshMode * console log * typeof * true * concise * add fresh * fix tests * fix binary test * update snapshots * Allow users to write to any volume (#200) * Restrict writes to volumes and clean them after every job * Try out the `ReadOnlyRootFilesystem` option * Capitalization * Add watchbot-log * use strict * No need to chmod now
Adds user-defined volumes to the tmp directory PR #199.
Things I'm not sure about:
chmod 777
in the watcher before messages begin. Should this be per-worker instead? I figured we might as well save unnecessary work.chmod
system calls in the tests. Not sure how I feel there.Since this isn't a painful implementation (barring any testing surprises), I'm happy to go with this over #199.
cc/ @rclark @mapbox/platform-engine-room