cmd/snap/quota: refactor quota CLI as per new design #10333

anonymouse64 · 2021-06-01T19:23:52Z

Make quota command only responsible for showing/displaying quota group
information.
Introduce new set-quota command which creates or updates quota groups
Move the display of memory limit under a map section "constraints" which then
has a memory key to allow future resource types to live under this section.
Display current memory usage for quota groups under a new section "current",
which like "constraints" can be expanded for future resource types too.

Also update the spread test to check that memory usage reported by snapd is
approximately that of what the kernel reports.

THE SPREAD TESTS ARE FINALLY HAPPY 🎉 🦖 🎉 🌮

Signed-off-by: Ian Johnson <ian.johnson@canonical.com>

* Make quota command only responsible for showing/displaying quota group information. * Introduce new set-quota command which creates or updates quota groups * Move the display of memory limit under a map section "constraints" which then has a memory key to allow future resource types to live under this section. * Display current memory usage for quota groups under a new section "current", which like "constraints" can be expanded for future resource types too. Also update the spread test and unit tests for the new output formats. Signed-off-by: Ian Johnson <ian.johnson@canonical.com>

This too is a race condition, thanks to Maciej for pointing this out. Signed-off-by: Ian Johnson <ian.johnson@canonical.com>

…kernel This check ensures that what the kernel says the current memory usage is and what snapd says the current memory usage is don't differ by more than 10%. In practice they should be exactly equal if the program is doing nothing as the go-example-webserver should be, but something does happen to change it \ shouldn't change drastically. Signed-off-by: Ian Johnson <ian.johnson@canonical.com>

This ensures that we can check the `snap quota` and `snap quotas` output for the "current" section with the memory usage in it. Signed-off-by: Ian Johnson <ian.johnson@canonical.com>

Signed-off-by: Ian Johnson <ian.johnson@canonical.com>

These limits were too low so the service was killed and no memory usage was reported in the output, but we want to see some usage from the output to check that it works, so increase the limits so the server is not killed due to OOM. Signed-off-by: Ian Johnson <ian.johnson@canonical.com>

Also fix the defer statement which was missing the snap to unset the config on. Signed-off-by: Ian Johnson <ian.johnson@canonical.com>

…roups We only should be checking what snapd says about memory usage for these empty groups, we don't need to compare with what the kernel says. Signed-off-by: Ian Johnson <ian.johnson@canonical.com>

Signed-off-by: Ian Johnson <ian.johnson@canonical.com>

…ystems On the following systems: - Arch Linux - Fedora 33 - Fedora 34 - Debian sid - Ubuntu 21.04 The memory usage of an "empty" but active cgroup ends up being 4K for some reason, so handle this in the expected output. Signed-off-by: Ian Johnson <ian.johnson@canonical.com>

mvo5

Thank you, looks very good. One small idea about the test but fine for a followup.

mvo5 · 2021-06-02T19:55:19Z

cmd/snap/cmd_quota.go

+
+var shortSetQuotaHelp = i18n.G(`Create or update a quota group.`)
+var longSetQuotaHelp = i18n.G(`
+The set-quota command updates or creates a quota group with the specified set of


Thanks for this very detailed description!

mvo5 · 2021-06-02T19:58:07Z

daemon/api_quotas.go

@@ -93,7 +85,7 @@ func getQuotaGroups(c *Command, r *http.Request, _ *auth.UserState) Response {
 	}
 	sort.Strings(names)

-	results := make([]quotaGroupResultJSON, len(quotas))
+	results := make([]client.QuotaGroupResult, len(quotas))


Thanks for this

mvo5 · 2021-06-02T19:59:48Z

tests/main/snap-quota-groups/task.yaml

+
+  percentChg="$(python3 -c "import math; print(math.ceil(abs($snapdSaysMemUsage - $kernelSaysMemUsage) / $snapdSaysMemUsage * 100))")"
+
+  if [ "$percentChg" -gt 10 ]; then


It's a bit funny that the margin (needs to be) this big but that's fine of course

I just kinda random picked 10% as a reasonable value, not much reason to it. Though also, it's probably a bit artificial given that systemd queries the exact same bit of information from the kernel that we are checking here, but I wanted to avoid the race anyways by giving it a margin of error here. We can start with a smaller margin of error if you like and increase it if we see it fail with differences above the margin.

mvo5 · 2021-06-02T20:04:33Z

tests/main/snap-quota-groups/task.yaml

+    exit 1
+  fi
+
+  snapdSaysMemUsage="$(sudo snap run http --body GET snapd:///v2/quotas/group-four | jq -r '.result."current-memory"')"


I wonder if we should add a test here that checks that the size that the groups reports is roughly in line with /proc/$(pidof go-webserver)/status | grep VmSize or VmRSS or similar. Not sure what exactly we need to correlate there though.

I really prefer to avoid this because memory accounting at this level is rather complicated and there are too many different values that are roughly "how much memory is this thing using", and I fear on different kernels we may see different values reported by systemd for the group versus the individual process. I'm already a bit frustrated with writing this bit of the spread test as it's all very finicky and prone to silly errors requiring re-runs of the spread test.

mvo5 · 2021-06-02T20:05:57Z

cmd/snap/cmd_quota_test.go

+ccc      aaa     memory=400B   
+ddd      aaa     memory=400B   
+bbb      zzz     memory=1000B  memory=400B
+`[1:])


Nice trick!

Signed-off-by: Ian Johnson <ian.johnson@canonical.com>

non capturing groups are not a thing in grep -E because of course they aren't Signed-off-by: Ian Johnson <ian.johnson@canonical.com>

Signed-off-by: Ian Johnson <ian.johnson@canonical.com>

anonymouse64 · 2021-06-02T21:55:05Z

I'm like 80% sure that the spread tests are 65% likelier to be happy now than they were before, with confidence level of 2%

anonymouse64 · 2021-06-03T04:18:32Z

somehow the quota groups without any services in them are causing oom message 🙃

mardy

Looks good to me!

Just a note, mainly for future discussions: I think that this is one of those cases in which the spread tests should be allowed to use a mocked systemd: this would guarantee consistent behaviour and make it possible to easily test more corner cases.

Or actually: it's fine for the spread tests to continue be this way, since they ensure that our functionality works fine across all distros; but I see that we do not only use them for that, but also for testing stuff which could be more easily (and more reliably, and faster!) tested at component level. So, maybe we should start considering the idea of stripping the spread tests to that bare minimum which gives us confidence that every features is supported in every distro, and create a new testing level for functional tests (pytest could be a good candidate for them), where we run our components unmodified, but with everything mocked around them.

mardy · 2021-06-03T06:28:40Z

cmd/snap/cmd_quota.go

+
+All snaps provided are appended to the group; to remove a snap from a
+quota group the entire group must be removed with remove-quota and recreated 
+without the quota group. To remove a sub-group from the quota group, the 


"without the quota group" -> "without the undesired snap"

Unless, of course, I misunderstood something :-)

Yes good catch this was a typo. I will adjust the wording but not in this PR since it is finally green and I would love to land it :-)

Also this reminds me when @degville is back it would be great to have him look at the help texts here as well.

pedronis

thanks

…least 4K On some systems, an empty cgroup with just cgroups nested inside it (but not necessarily any processes) will have 4K memory usage, so on these systems, we should make sure that the nested cgroups have enough space in them. In a follow-up, we will adjust the minimum usable memory limit for a given cgroup to be 4K to prevent this situation from happening in practice. Signed-off-by: Ian Johnson <ian.johnson@canonical.com>

anonymouse64 · 2021-06-03T18:37:43Z

after much ado, the spread tests here are finally green (well at least the snap-quota ones are green, there are some unrelated failures)

anonymouse64 · 2021-06-03T18:40:04Z

I think that this is one of those cases in which the spread tests should be allowed to use a mocked systemd: this would guarantee consistent behaviour and make it possible to easily test more corner cases.

@mardy As frustrating as it is to get these spread tests to work for all the corner cases on the different distros we support, I do actually think it's quite important we test with the real systemd's from real distros in the spread tests, since the sorts of bugs and weirdness we are seeing in spread tests here is actually indeed the same kind of weirdness that real users would see so it's important we iron out the experience and using the real systemd with all its faults and features is important for this. I will be the first to tell you how cool it is that we run so thorough spread tests on so many tests that are as real as we can get essentially, even though I will also be the first to loudly complain about how annoying these spread tests can be 😄

anonymouse64 · 2021-06-03T20:14:08Z

The only spread failures here are google:debian-10-64:tests/main/interfaces-many-core-provided and google:ubuntu-18.04-64:tests/regression/lp-1867193 (and all the tumbleweed ones that have been failing since forever), which are unrelated so this PR should be good to go in now if @pedronis wants to force merge it in the AM

mardy

Thanks for the explanations, LGTM! :-)

Merge pull request #10346 from anonymouse64/feature/quota-groups-remastered-deluxe-diamond-edition-2.5 This is to support nesting and to avoid confusing situations like being able to create empty but nested quota groups which trigger oom-killer to be invoked on the empty quota group because newer systems require at least 4K for the accounting of a sub-group. This came about during investigations of the spread test failures in #10333.

As requested by Alberto a long time ago: snapcore#10333 (comment) Signed-off-by: Ian Johnson <ian.johnson@canonical.com>

anonymouse64 added 2 commits June 1, 2021 14:15

client/quota.go: support current-memory key

a70b06b

Signed-off-by: Ian Johnson <ian.johnson@canonical.com>

anonymouse64 added the ✨quota rebooted✨ label Jun 1, 2021

anonymouse64 force-pushed the feature/quota-groups-remastered-deluxe-diamond-edition-2 branch 3 times, most recently from 3de904f to 13aa951 Compare June 2, 2021 00:34

anonymouse64 added 2 commits June 1, 2021 20:42

tests/main/snap-quota-groups: check the cgroup procs file in a loop

123e316

This too is a race condition, thanks to Maciej for pointing this out. Signed-off-by: Ian Johnson <ian.johnson@canonical.com>

anonymouse64 force-pushed the feature/quota-groups-remastered-deluxe-diamond-edition-2 branch from 13aa951 to d4139c9 Compare June 2, 2021 01:42

pedronis self-requested a review June 2, 2021 07:58

anonymouse64 added 7 commits June 2, 2021 07:56

tests/snap-quota: add a snap with a service to one of the groups

55908d6

This ensures that we can check the `snap quota` and `snap quotas` output for the "current" section with the memory usage in it. Signed-off-by: Ian Johnson <ian.johnson@canonical.com>

tests/main/snap-quota: fix trusty check

7f1252e

Signed-off-by: Ian Johnson <ian.johnson@canonical.com>

tests/main/snap-quota: fix group-one line in output

b20c603

Also fix the defer statement which was missing the snap to unset the config on. Signed-off-by: Ian Johnson <ian.johnson@canonical.com>

tests/main/snap-quota-groups: refactor memory check for empty quota g…

58206b6

…roups We only should be checking what snapd says about memory usage for these empty groups, we don't need to compare with what the kernel says. Signed-off-by: Ian Johnson <ian.johnson@canonical.com>

tests/snap-quota-groups: fix typo

fad038c

Signed-off-by: Ian Johnson <ian.johnson@canonical.com>

mvo5 approved these changes Jun 2, 2021

View reviewed changes

anonymouse64 added 2 commits June 2, 2021 15:53

tests/snap-quota-groups: silly typos

a09494e

Signed-off-by: Ian Johnson <ian.johnson@canonical.com>

tests/main/snap-quota: fix more typos

35420e6

non capturing groups are not a thing in grep -E because of course they aren't Signed-off-by: Ian Johnson <ian.johnson@canonical.com>

anonymouse64 added the Squash-merge Please squash this PR when merging. label Jun 2, 2021

tests/main/snap-quota-groups: use python if python3 is not available

1259480

Signed-off-by: Ian Johnson <ian.johnson@canonical.com>

mardy reviewed Jun 3, 2021

View reviewed changes

pedronis reviewed Jun 3, 2021

View reviewed changes

anonymouse64 mentioned this pull request Jun 3, 2021

o/servicestate/quota_control.go: enforce minimum of 4K for quota groups #10346

Merged

mardy approved these changes Jun 4, 2021

View reviewed changes

mvo5 merged commit d750b7b into snapcore:master Jun 7, 2021

anonymouse64 deleted the feature/quota-groups-remastered-deluxe-diamond-edition-2 branch June 7, 2021 12:51

anonymouse64 added a commit to anonymouse64/snapd that referenced this pull request Jan 11, 2022

cmd/snap/quota: fix typo in the message

1c8ebe9

As requested by Alberto a long time ago: snapcore#10333 (comment) Signed-off-by: Ian Johnson <ian.johnson@canonical.com>

anonymouse64 mentioned this pull request Jan 11, 2022

cmd/snap/quota: fix typo in the help message #11238

Merged

anonymouse64 added a commit to anonymouse64/snapd that referenced this pull request Jan 11, 2022

cmd/snap/quota: fix typo in the help message

93337f7

As requested by Alberto a long time ago: snapcore#10333 (comment) Signed-off-by: Ian Johnson <ian.johnson@canonical.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/snap/quota: refactor quota CLI as per new design #10333

cmd/snap/quota: refactor quota CLI as per new design #10333

anonymouse64 commented Jun 1, 2021 •

edited

mvo5 left a comment

mvo5 Jun 2, 2021

mvo5 Jun 2, 2021

mvo5 Jun 2, 2021

anonymouse64 Jun 2, 2021

mvo5 Jun 2, 2021

anonymouse64 Jun 2, 2021

mvo5 Jun 2, 2021

anonymouse64 commented Jun 2, 2021

anonymouse64 commented Jun 3, 2021

mardy left a comment

mardy Jun 3, 2021

anonymouse64 Jun 3, 2021

pedronis left a comment

anonymouse64 commented Jun 3, 2021

anonymouse64 commented Jun 3, 2021

anonymouse64 commented Jun 3, 2021

mardy left a comment


		percentChg="$(python3 -c "import math; print(math.ceil(abs($snapdSaysMemUsage - $kernelSaysMemUsage) / $snapdSaysMemUsage * 100))")"

		if [ "$percentChg" -gt 10 ]; then

cmd/snap/quota: refactor quota CLI as per new design #10333

cmd/snap/quota: refactor quota CLI as per new design #10333

Conversation

anonymouse64 commented Jun 1, 2021 • edited

mvo5 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anonymouse64 commented Jun 2, 2021

anonymouse64 commented Jun 3, 2021

mardy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pedronis left a comment

Choose a reason for hiding this comment

anonymouse64 commented Jun 3, 2021

anonymouse64 commented Jun 3, 2021

anonymouse64 commented Jun 3, 2021

mardy left a comment

Choose a reason for hiding this comment

anonymouse64 commented Jun 1, 2021 •

edited