systemd: allow only a single daemon-reload at the same time #6331

mvo5 · 2019-01-07T14:00:19Z

This is an RFC PR to see if the "mount protocol error" reported in
systemd/systemd#10872 can be worked around by serializing the mount unit
adding/removal. Proposing to get full spread runs.

This is similar to #6243 but it goes further by ensuring a single daemon
reload on the systemd go package level. Note that there is still a
chance that the protocol error happens if something else (like dpkg or
the user) runs "systemd daemon-reload" while we write a mount unit.
But the risk should be hugely smaller.

This is a followup/different approach to #6243 which was tackling the issue not deep enough.

This is an RFC PR to see if the "mount protocol error" reported in systemd/systemd#10872 can be worked around by serializing the mount unit adding/removal. Proposing to get full spread runs. This is similar to canonical#6243 but it goes further by ensuring a single daemon reload on the systemd go package level. Note that there is still a chance that the protocol error happens if something else (like dpkg or the user) runs "systemd daemon-reload" while we write a mount unit. But the risk should be hughely smaller.

codecov-io · 2019-01-07T14:46:22Z

Codecov Report

Merging #6331 into master will decrease coverage by 0.01%.
The diff coverage is 34.09%.

@@            Coverage Diff             @@
##           master    #6331      +/-   ##
==========================================
- Coverage   78.98%   78.96%   -0.02%     
==========================================
  Files         561      561              
  Lines       43623    43637      +14     
==========================================
+ Hits        34454    34460       +6     
- Misses       6371     6384      +13     
+ Partials     2798     2793       -5

Impacted Files	Coverage Δ
wrappers/core18.go	`49.47% <ø> (ø)`	⬆️
overlord/snapstate/backend/mountunit.go	`100% <100%> (+61.29%)`	⬆️
systemd/systemd.go	`73.56% <27.5%> (-8.66%)`	⬇️
overlord/hookstate/hookmgr.go	`74.51% <0%> (+0.96%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 35421f9...abf6252. Read the comment docs.

mvo5 · 2019-01-08T07:52:25Z

@bboozzoo Could you please run your reproducer script that installs snaps and triggers the protocol error bug against a snapd build with this PR?

zyga · 2019-01-08T09:01:28Z

systemd/systemd.go

+	// can be unmounted.
+	// note that the long option --lazy is not supported on trusty.
+	// the explicit -d is only needed on trusty.
+	isMounted, err := osutil.IsMounted(mountedDir)


While this is not new logic I would like to understand the motivation behind this particular arrangement of actions:

we unmount the snap ourselves (MNT_DETACH)

we stop the mount unit (which we could have by using LazyUnmount=true in the mount unit)

we disable the mount unit (what for?)

we remove the mount unit

we reload systemd

Why is the logic simply not:

we disable --now the mount unit via systemctl

we remove the mount unit by removing the file

we reload systemd

does trusty have a working disable --now?

zyga

The locking looks sensible. I asked a question about the specific way we handle the removal operation but it is unrelated to the main issue.

This adds a regression test for the mount protocol error that systemd sometimes throws.

mvo5 · 2019-01-08T13:56:40Z

@bboozzoo Actually - silly me, no need to run your script, I added it as a spread test, I guess we need to see how slow it is to determine if we want to keep it enabled (and if we should limit it to e.g. only arch or ubuntu-18.04 amd64 or something). But for now its probably a good idea. Thanks also to @sergiocazzolato for initially writing the spread test for this.

mvo5 · 2019-01-08T20:43:50Z

One more interessting observation - this spread test does not fail on master with ubuntu-18.04-64 - but it does on arch-liunux-64 so we can probably limit it to that to ensure we deal with regressions.

bboozzoo · 2019-01-09T06:09:15Z

As a side note, systemd in Arch got recently updated to 240. Wonder if that will make any difference.

pedronis

looks good, but I'm bit confused/surprised that the tests don't need much more checks now that we moved from writing unit, to enabling starting etc?

bboozzoo

LGTM, small suggestion about the test

bboozzoo · 2019-01-09T12:20:27Z

tests/main/mount-protocol-error/task.yaml

+
+execute: |
+   for _ in $(seq 50); do
+       snap install test-snapd-tools test-snapd-public


we could use instance names to install more snaps at time

names=(test-snapd-tools) for n in $(seq 9); do names+=(test-snapd-tools_$n) done for _ in $(seq 50); do snap install ${names[@]} snap install ${names[@]} done

pedronis · 2019-01-09T12:30:13Z

One more interessting observation - this spread test does not fail on master with ubuntu-18.04-64 - but it does on arch-liunux-64 so we can probably limit it to that to ensure we deal with regressions.

@mvo5 even if we know it wasn't failing we still want to run the test at least on our main current target distro

sergiocazzolato · 2019-01-09T18:30:41Z

tests/main/mount-protocol-error/task.yaml

+
+execute: |
+   for _ in $(seq 50); do
+       snap install test-snapd-tools test-snapd-public


@mvo5 Is it possible to speed up the test by downloading the snaps and then installing them with --dangerous? This test is taking about 7 minutes to run from a vm in my machine and I guess it will take much more from a board like the pi2.
See: https://paste.ubuntu.com/p/tjscnkfk2w/

Because of the download cache we implemented recently it should not download the snap multiple times. However I agree we need to target this test much stronger. I.e. we never want this to run on a PI :)

I play around with this now, I limited the number of tries to 10 now and also limited the number of systems this will run on (only 18.10, arch and fedora-28 for now). I am checking now how long this takes.

pedronis

+1

Thanks to Maciej and Sergio!

Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>

systemd: allow only a single daemon-reload at the same time

mvo5 added this to the 2.37 milestone Jan 7, 2019

zyga reviewed Jan 8, 2019

View reviewed changes

zyga approved these changes Jan 8, 2019

View reviewed changes

pedronis self-requested a review January 8, 2019 09:30

tests: add regression test to reproduce systemd mount protocol error

e9e69c9

This adds a regression test for the mount protocol error that systemd sometimes throws.

improve mount-protocol-error spread test description

afe7a89

pedronis reviewed Jan 9, 2019

View reviewed changes

bboozzoo reviewed Jan 9, 2019

View reviewed changes

systemd: add unit tests for new {Add,Remove}MountUnitFile

2114554

sergiocazzolato reviewed Jan 9, 2019

View reviewed changes

pedronis approved these changes Jan 9, 2019

View reviewed changes

mvo5 and others added 2 commits January 9, 2019 20:12

tests: improve mount-protocol-error spread test

f87d600

Thanks to Maciej and Sergio!

tests/main/mount-protocol-error: shellchecks

2161fa8

Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>

mvo5 merged commit cc6acb6 into canonical:master Jan 10, 2019

mvo5 added a commit to mvo5/snappy that referenced this pull request Jan 31, 2019

Merge pull request canonical#6331 from mvo5/snapd-daemon-reload-gil

e14cf44

systemd: allow only a single daemon-reload at the same time

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

systemd: allow only a single daemon-reload at the same time #6331

systemd: allow only a single daemon-reload at the same time #6331

mvo5 commented Jan 7, 2019

codecov-io commented Jan 7, 2019

mvo5 commented Jan 8, 2019

zyga Jan 8, 2019

pedronis Jan 9, 2019

zyga left a comment

mvo5 commented Jan 8, 2019

mvo5 commented Jan 8, 2019

bboozzoo commented Jan 9, 2019

pedronis left a comment

bboozzoo left a comment

bboozzoo Jan 9, 2019

pedronis commented Jan 9, 2019

sergiocazzolato Jan 9, 2019 •

edited

Loading

mvo5 Jan 9, 2019

mvo5 Jan 9, 2019

pedronis left a comment

systemd: allow only a single daemon-reload at the same time #6331

systemd: allow only a single daemon-reload at the same time #6331

Conversation

mvo5 commented Jan 7, 2019

codecov-io commented Jan 7, 2019

Codecov Report

mvo5 commented Jan 8, 2019

zyga Jan 8, 2019

Choose a reason for hiding this comment

pedronis Jan 9, 2019

Choose a reason for hiding this comment

zyga left a comment

Choose a reason for hiding this comment

mvo5 commented Jan 8, 2019

mvo5 commented Jan 8, 2019

bboozzoo commented Jan 9, 2019

pedronis left a comment

Choose a reason for hiding this comment

bboozzoo left a comment

Choose a reason for hiding this comment

bboozzoo Jan 9, 2019

Choose a reason for hiding this comment

pedronis commented Jan 9, 2019

sergiocazzolato Jan 9, 2019 • edited Loading

Choose a reason for hiding this comment

mvo5 Jan 9, 2019

Choose a reason for hiding this comment

mvo5 Jan 9, 2019

Choose a reason for hiding this comment

pedronis left a comment

Choose a reason for hiding this comment

sergiocazzolato Jan 9, 2019 •

edited

Loading