-
Notifications
You must be signed in to change notification settings - Fork 576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
systemd: allow only a single daemon-reload at the same time #6331
Conversation
This is an RFC PR to see if the "mount protocol error" reported in systemd/systemd#10872 can be worked around by serializing the mount unit adding/removal. Proposing to get full spread runs. This is similar to canonical#6243 but it goes further by ensuring a single daemon reload on the systemd go package level. Note that there is still a chance that the protocol error happens if something else (like dpkg or the user) runs "systemd daemon-reload" while we write a mount unit. But the risk should be hughely smaller.
Codecov Report
@@ Coverage Diff @@
## master #6331 +/- ##
==========================================
- Coverage 78.98% 78.96% -0.02%
==========================================
Files 561 561
Lines 43623 43637 +14
==========================================
+ Hits 34454 34460 +6
- Misses 6371 6384 +13
+ Partials 2798 2793 -5
Continue to review full report at Codecov.
|
@bboozzoo Could you please run your reproducer script that installs snaps and triggers the protocol error bug against a snapd build with this PR? |
// can be unmounted. | ||
// note that the long option --lazy is not supported on trusty. | ||
// the explicit -d is only needed on trusty. | ||
isMounted, err := osutil.IsMounted(mountedDir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this is not new logic I would like to understand the motivation behind this particular arrangement of actions:
- we unmount the snap ourselves (MNT_DETACH)
- we stop the mount unit (which we could have by using LazyUnmount=true in the mount unit)
- we disable the mount unit (what for?)
- we remove the mount unit
- we reload systemd
Why is the logic simply not:
- we disable --now the mount unit via systemctl
- we remove the mount unit by removing the file
- we reload systemd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does trusty have a working disable --now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The locking looks sensible. I asked a question about the specific way we handle the removal operation but it is unrelated to the main issue.
This adds a regression test for the mount protocol error that systemd sometimes throws.
@bboozzoo Actually - silly me, no need to run your script, I added it as a spread test, I guess we need to see how slow it is to determine if we want to keep it enabled (and if we should limit it to e.g. only arch or ubuntu-18.04 amd64 or something). But for now its probably a good idea. Thanks also to @sergiocazzolato for initially writing the spread test for this. |
One more interessting observation - this spread test does not fail on master with ubuntu-18.04-64 - but it does on arch-liunux-64 so we can probably limit it to that to ensure we deal with regressions. |
As a side note, systemd in Arch got recently updated to 240. Wonder if that will make any difference. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, but I'm bit confused/surprised that the tests don't need much more checks now that we moved from writing unit, to enabling starting etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, small suggestion about the test
|
||
execute: | | ||
for _ in $(seq 50); do | ||
snap install test-snapd-tools test-snapd-public |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could use instance names to install more snaps at time
names=(test-snapd-tools)
for n in $(seq 9); do
names+=(test-snapd-tools_$n)
done
for _ in $(seq 50); do
snap install ${names[@]}
snap install ${names[@]}
done
@mvo5 even if we know it wasn't failing we still want to run the test at least on our main current target distro |
|
||
execute: | | ||
for _ in $(seq 50); do | ||
snap install test-snapd-tools test-snapd-public |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mvo5 Is it possible to speed up the test by downloading the snaps and then installing them with --dangerous? This test is taking about 7 minutes to run from a vm in my machine and I guess it will take much more from a board like the pi2.
See: https://paste.ubuntu.com/p/tjscnkfk2w/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of the download cache we implemented recently it should not download the snap multiple times. However I agree we need to target this test much stronger. I.e. we never want this to run on a PI :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I play around with this now, I limited the number of tries to 10 now and also limited the number of systems this will run on (only 18.10, arch and fedora-28 for now). I am checking now how long this takes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Thanks to Maciej and Sergio!
Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>
systemd: allow only a single daemon-reload at the same time
This is an RFC PR to see if the "mount protocol error" reported in
systemd/systemd#10872 can be worked around by serializing the mount unit
adding/removal. Proposing to get full spread runs.
This is similar to #6243 but it goes further by ensuring a single daemon
reload on the systemd go package level. Note that there is still a
chance that the protocol error happens if something else (like dpkg or
the user) runs "systemd daemon-reload" while we write a mount unit.
But the risk should be hugely smaller.
This is a followup/different approach to #6243 which was tackling the issue not deep enough.