tests: add smoke test for app sandbox #3371
Conversation
b234bd9
to
c6be790
Compare
First iteration of this uncovered a bug on kvm, where |
Second iteration of this uncovered a bug on non-overlayfs pod, where |
c6be790
to
f3f8a2a
Compare
Third iteration of this uncovered an issue on environment handling, the entrypoints are not receiving a proper |
f3f8a2a
to
6797373
Compare
Fourth iteration of this hit a race in KVM, due a too low thread limit. Tracked as #3382. This unfortunately happens from time to time at |
6797373
to
fb5b06d
Compare
0d6f833
to
64af74f
Compare
Fifth iteration of this hit a problem on rm, as the unmounting tasks are still TODO. As the mounts are done in stage1, this unfortunately means that systemd+journald in stage1 still hold references to the application mounts (custom volumes and ancillary mounts like procfs, sysfs, devs, etc.). Coupled with #1922 (semaphore is running on a hold 3.13 kernel), it means that stage1 pod processes are holding those rootfs paths busy as bind-mounts targets and thus stage0 can't remove rootfs directory. This requires proper cleaning of all mountpoints under app rootfs by some stage1 helper before proceeding. |
Sixth iteration hit a misbehavior on mounting/unmounting, due to the lack of /etc/mtab in stage1 rootfs. |
still WIP, hence bumping to the next release |
64af74f
to
ae339d7
Compare
e88667f
to
eb7b760
Compare
b23d941
to
c50a873
Compare
args = enterCmd | ||
args = append(args, "/usr/bin/systemctl") | ||
args = append(args, "daemon-reload") | ||
// TODO(sur): find all RW cgroups exposed for this app and clean them up |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@s-urbaniak: this TODO is in place here as a reminder to re-visit the rm steps once #3389 is merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a few nits
) | ||
|
||
func init() { | ||
flag.StringVar(&flagApp, "app", "", "Application name") | ||
flag.BoolVar(&debug, "debug", false, "Run in debug mode") | ||
|
||
// `--phase` is not part of stage1 contract | ||
flag.IntVar(&flagPhase, "phase", 0, "Removal phase, defaults to 0 when called from the outside") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"phase 0" and "phase 1" are very abstract. We have established and documented terms for stages, where "stage 0" happens on the host, and "stage 1" happens in the context of the pod.
Can we either make the context of phases more explicit in the documentation, or even call this flag stage
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of agree with this comment, but I also the fear that some degree of confusion will remain anyway: app-rm is a stage1 entrypoint, however half of it (phase0/stage0) is run on the host and half of it (phase1/stage1) in the pod context.
I'll perform a s/phase/stage/
because it helps reducing the number of concepts and reflecting where this is being run.
case 1: | ||
// phase1: app-rm:phase0 -> app-rm:phase1 | ||
err = cleanupPhase1(appName, enterCmd) | ||
default: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a nit, but I suggest to make the set of supported cleanup phases (or cleanup in stages) more explicit and fail for an unsupported phase:
switch flagPhase {
case 0:
// cleanup things in stage0
case 1:
// cleanup things in stage1
default:
log.FatalF("unsupported phase %d", flagPhase)
}
} | ||
|
||
// cleanupPhase0 is default phase for rm entrypoint, performing | ||
// initial cleaning steps which don't custom logic in pod context. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's document here, that all of this explicitly happens in stage0, i.e.:
This cleanup phase happens in stage 0.
It removes the app service files and calls:
1. `systemctl daemon-reload` in stage1
2. itself in stage1 for phase 1 cleanups
args = enterCmd | ||
args = append(args, "/usr/bin/systemctl") | ||
args = append(args, "daemon-reload") | ||
// TODO(sur): find all RW cgroups exposed for this app and clean them up |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack 👍
|
||
// TODO unmount all the volumes | ||
// cleanupPhase1 inspects pod systemd-pid1 mountinfo to find all remaining |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// cleanupPhase1 is executed in stage1 and inspects the pod's systemd-pid1 moutinfo ....
82a21ef
to
2ad36c1
Compare
@s-urbaniak Thanks for the review, much appreciated! Rebased to address your comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
} | ||
|
||
// cleanupStage0 is the default initial step for rm entrypoint, which takes | ||
// cares of cleaning up resources in stage0 and calling into stage1 by: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// takes care
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good catch.
@@ -59,7 +69,31 @@ func main() { | |||
} | |||
|
|||
enterCmd := stage1common.PrepareEnterCmd(false) | |||
switch flagStage { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling this stage0 vs stage1 is a bit confusing - it only refers to the mount namespace (stage0 vs. stage1) - even the "stage 0" cleanup code is removing things created by the mass of code we call "the stage1".
Perhaps a better naming scheme is in order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those were called phase0/phase1, then renamed to stage0/stage1 to reflect the context they are running in. See #3371 (comment). Another option would be --context=parent
and --context=pod
to also make explicit that part of this are running in a different context. @s-urbaniak @jonboulle opinions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find stage0 less confusing than phase0 ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left this as is for the moment. This an internal implementation anyway, perhaps we can make this clearer down the road if we have other similar cases to document/classify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@casey LGTY?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, NBD.
2ad36c1
to
4ad8256
Compare
Part of #3349