New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/snap/quota: refactor quota CLI as per new design #10333
cmd/snap/quota: refactor quota CLI as per new design #10333
Conversation
Signed-off-by: Ian Johnson <ian.johnson@canonical.com>
* Make quota command only responsible for showing/displaying quota group information. * Introduce new set-quota command which creates or updates quota groups * Move the display of memory limit under a map section "constraints" which then has a memory key to allow future resource types to live under this section. * Display current memory usage for quota groups under a new section "current", which like "constraints" can be expanded for future resource types too. Also update the spread test and unit tests for the new output formats. Signed-off-by: Ian Johnson <ian.johnson@canonical.com>
3de904f
to
13aa951
Compare
This too is a race condition, thanks to Maciej for pointing this out. Signed-off-by: Ian Johnson <ian.johnson@canonical.com>
…kernel This check ensures that what the kernel says the current memory usage is and what snapd says the current memory usage is don't differ by more than 10%. In practice they should be exactly equal if the program is doing nothing as the go-example-webserver should be, but something does happen to change it \ shouldn't change drastically. Signed-off-by: Ian Johnson <ian.johnson@canonical.com>
13aa951
to
d4139c9
Compare
This ensures that we can check the `snap quota` and `snap quotas` output for the "current" section with the memory usage in it. Signed-off-by: Ian Johnson <ian.johnson@canonical.com>
Signed-off-by: Ian Johnson <ian.johnson@canonical.com>
These limits were too low so the service was killed and no memory usage was reported in the output, but we want to see some usage from the output to check that it works, so increase the limits so the server is not killed due to OOM. Signed-off-by: Ian Johnson <ian.johnson@canonical.com>
Also fix the defer statement which was missing the snap to unset the config on. Signed-off-by: Ian Johnson <ian.johnson@canonical.com>
…roups We only should be checking what snapd says about memory usage for these empty groups, we don't need to compare with what the kernel says. Signed-off-by: Ian Johnson <ian.johnson@canonical.com>
Signed-off-by: Ian Johnson <ian.johnson@canonical.com>
…ystems On the following systems: - Arch Linux - Fedora 33 - Fedora 34 - Debian sid - Ubuntu 21.04 The memory usage of an "empty" but active cgroup ends up being 4K for some reason, so handle this in the expected output. Signed-off-by: Ian Johnson <ian.johnson@canonical.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, looks very good. One small idea about the test but fine for a followup.
|
||
var shortSetQuotaHelp = i18n.G(`Create or update a quota group.`) | ||
var longSetQuotaHelp = i18n.G(` | ||
The set-quota command updates or creates a quota group with the specified set of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this very detailed description!
@@ -93,7 +85,7 @@ func getQuotaGroups(c *Command, r *http.Request, _ *auth.UserState) Response { | |||
} | |||
sort.Strings(names) | |||
|
|||
results := make([]quotaGroupResultJSON, len(quotas)) | |||
results := make([]client.QuotaGroupResult, len(quotas)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this
|
||
percentChg="$(python3 -c "import math; print(math.ceil(abs($snapdSaysMemUsage - $kernelSaysMemUsage) / $snapdSaysMemUsage * 100))")" | ||
|
||
if [ "$percentChg" -gt 10 ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit funny that the margin (needs to be) this big but that's fine of course
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just kinda random picked 10% as a reasonable value, not much reason to it. Though also, it's probably a bit artificial given that systemd queries the exact same bit of information from the kernel that we are checking here, but I wanted to avoid the race anyways by giving it a margin of error here. We can start with a smaller margin of error if you like and increase it if we see it fail with differences above the margin.
exit 1 | ||
fi | ||
|
||
snapdSaysMemUsage="$(sudo snap run http --body GET snapd:///v2/quotas/group-four | jq -r '.result."current-memory"')" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should add a test here that checks that the size that the groups reports is roughly in line with /proc/$(pidof go-webserver)/status | grep VmSize or VmRSS or similar. Not sure what exactly we need to correlate there though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really prefer to avoid this because memory accounting at this level is rather complicated and there are too many different values that are roughly "how much memory is this thing using", and I fear on different kernels we may see different values reported by systemd for the group versus the individual process. I'm already a bit frustrated with writing this bit of the spread test as it's all very finicky and prone to silly errors requiring re-runs of the spread test.
ccc aaa memory=400B | ||
ddd aaa memory=400B | ||
bbb zzz memory=1000B memory=400B | ||
`[1:]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice trick!
Signed-off-by: Ian Johnson <ian.johnson@canonical.com>
non capturing groups are not a thing in grep -E because of course they aren't Signed-off-by: Ian Johnson <ian.johnson@canonical.com>
Signed-off-by: Ian Johnson <ian.johnson@canonical.com>
I'm like 80% sure that the spread tests are 65% likelier to be happy now than they were before, with confidence level of 2% |
somehow the quota groups without any services in them are causing oom message 🙃 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
Just a note, mainly for future discussions: I think that this is one of those cases in which the spread tests should be allowed to use a mocked systemd: this would guarantee consistent behaviour and make it possible to easily test more corner cases.
Or actually: it's fine for the spread tests to continue be this way, since they ensure that our functionality works fine across all distros; but I see that we do not only use them for that, but also for testing stuff which could be more easily (and more reliably, and faster!) tested at component level. So, maybe we should start considering the idea of stripping the spread tests to that bare minimum which gives us confidence that every features is supported in every distro, and create a new testing level for functional tests (pytest could be a good candidate for them), where we run our components unmodified, but with everything mocked around them.
|
||
All snaps provided are appended to the group; to remove a snap from a | ||
quota group the entire group must be removed with remove-quota and recreated | ||
without the quota group. To remove a sub-group from the quota group, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"without the quota group" -> "without the undesired snap"
Unless, of course, I misunderstood something :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes good catch this was a typo. I will adjust the wording but not in this PR since it is finally green and I would love to land it :-)
Also this reminds me when @degville is back it would be great to have him look at the help texts here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks
…least 4K On some systems, an empty cgroup with just cgroups nested inside it (but not necessarily any processes) will have 4K memory usage, so on these systems, we should make sure that the nested cgroups have enough space in them. In a follow-up, we will adjust the minimum usable memory limit for a given cgroup to be 4K to prevent this situation from happening in practice. Signed-off-by: Ian Johnson <ian.johnson@canonical.com>
after much ado, the spread tests here are finally green (well at least the snap-quota ones are green, there are some unrelated failures) |
@mardy As frustrating as it is to get these spread tests to work for all the corner cases on the different distros we support, I do actually think it's quite important we test with the real systemd's from real distros in the spread tests, since the sorts of bugs and weirdness we are seeing in spread tests here is actually indeed the same kind of weirdness that real users would see so it's important we iron out the experience and using the real systemd with all its faults and features is important for this. I will be the first to tell you how cool it is that we run so thorough spread tests on so many tests that are as real as we can get essentially, even though I will also be the first to loudly complain about how annoying these spread tests can be 😄 |
The only spread failures here are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanations, LGTM! :-)
Merge pull request #10346 from anonymouse64/feature/quota-groups-remastered-deluxe-diamond-edition-2.5 This is to support nesting and to avoid confusing situations like being able to create empty but nested quota groups which trigger oom-killer to be invoked on the empty quota group because newer systems require at least 4K for the accounting of a sub-group. This came about during investigations of the spread test failures in #10333.
As requested by Alberto a long time ago: snapcore#10333 (comment) Signed-off-by: Ian Johnson <ian.johnson@canonical.com>
As requested by Alberto a long time ago: snapcore#10333 (comment) Signed-off-by: Ian Johnson <ian.johnson@canonical.com>
information.
has a memory key to allow future resource types to live under this section.
which like "constraints" can be expanded for future resource types too.
Also update the spread test to check that memory usage reported by snapd is
approximately that of what the kernel reports.
THE SPREAD TESTS ARE FINALLY HAPPY 🎉 🦖 🎉 🌮