Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemd: allow only a single daemon-reload at the same time #6331

Merged
merged 6 commits into from
Jan 10, 2019

Conversation

mvo5
Copy link
Contributor

@mvo5 mvo5 commented Jan 7, 2019

This is an RFC PR to see if the "mount protocol error" reported in
systemd/systemd#10872 can be worked around by serializing the mount unit
adding/removal. Proposing to get full spread runs.

This is similar to #6243 but it goes further by ensuring a single daemon
reload on the systemd go package level. Note that there is still a
chance that the protocol error happens if something else (like dpkg or
the user) runs "systemd daemon-reload" while we write a mount unit.
But the risk should be hugely smaller.

This is a followup/different approach to #6243 which was tackling the issue not deep enough.

This is an RFC PR to see if the "mount protocol error" reported in
systemd/systemd#10872 can be worked around by serializing the mount unit
adding/removal. Proposing to get full spread runs.

This is similar to canonical#6243 but it goes further by ensuring a single daemon
reload on the systemd go package level. Note that there is still a
chance that the protocol error happens if something else (like dpkg or
the user) runs "systemd daemon-reload" while we write a mount unit.
But the risk should be hughely smaller.
@codecov-io
Copy link

Codecov Report

Merging #6331 into master will decrease coverage by 0.01%.
The diff coverage is 34.09%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6331      +/-   ##
==========================================
- Coverage   78.98%   78.96%   -0.02%     
==========================================
  Files         561      561              
  Lines       43623    43637      +14     
==========================================
+ Hits        34454    34460       +6     
- Misses       6371     6384      +13     
+ Partials     2798     2793       -5
Impacted Files Coverage Δ
wrappers/core18.go 49.47% <ø> (ø) ⬆️
overlord/snapstate/backend/mountunit.go 100% <100%> (+61.29%) ⬆️
systemd/systemd.go 73.56% <27.5%> (-8.66%) ⬇️
overlord/hookstate/hookmgr.go 74.51% <0%> (+0.96%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 35421f9...abf6252. Read the comment docs.

@mvo5 mvo5 added this to the 2.37 milestone Jan 7, 2019
@mvo5
Copy link
Contributor Author

mvo5 commented Jan 8, 2019

@bboozzoo Could you please run your reproducer script that installs snaps and triggers the protocol error bug against a snapd build with this PR?

// can be unmounted.
// note that the long option --lazy is not supported on trusty.
// the explicit -d is only needed on trusty.
isMounted, err := osutil.IsMounted(mountedDir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this is not new logic I would like to understand the motivation behind this particular arrangement of actions:

  • we unmount the snap ourselves (MNT_DETACH)
  • we stop the mount unit (which we could have by using LazyUnmount=true in the mount unit)
  • we disable the mount unit (what for?)
  • we remove the mount unit
  • we reload systemd

Why is the logic simply not:

  • we disable --now the mount unit via systemctl
  • we remove the mount unit by removing the file
  • we reload systemd

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does trusty have a working disable --now?

Copy link
Contributor

@zyga zyga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The locking looks sensible. I asked a question about the specific way we handle the removal operation but it is unrelated to the main issue.

@pedronis pedronis self-requested a review January 8, 2019 09:30
This adds a regression test for the mount protocol error that systemd
sometimes throws.
@mvo5
Copy link
Contributor Author

mvo5 commented Jan 8, 2019

@bboozzoo Actually - silly me, no need to run your script, I added it as a spread test, I guess we need to see how slow it is to determine if we want to keep it enabled (and if we should limit it to e.g. only arch or ubuntu-18.04 amd64 or something). But for now its probably a good idea. Thanks also to @sergiocazzolato for initially writing the spread test for this.

@mvo5
Copy link
Contributor Author

mvo5 commented Jan 8, 2019

One more interessting observation - this spread test does not fail on master with ubuntu-18.04-64 - but it does on arch-liunux-64 so we can probably limit it to that to ensure we deal with regressions.

@bboozzoo
Copy link
Contributor

bboozzoo commented Jan 9, 2019

As a side note, systemd in Arch got recently updated to 240. Wonder if that will make any difference.

Copy link
Collaborator

@pedronis pedronis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, but I'm bit confused/surprised that the tests don't need much more checks now that we moved from writing unit, to enabling starting etc?

Copy link
Contributor

@bboozzoo bboozzoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, small suggestion about the test


execute: |
for _ in $(seq 50); do
snap install test-snapd-tools test-snapd-public
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could use instance names to install more snaps at time

names=(test-snapd-tools)
for n in $(seq 9); do
    names+=(test-snapd-tools_$n)
done
for _ in $(seq 50); do
    snap install ${names[@]}
    snap install ${names[@]}
done

@pedronis
Copy link
Collaborator

pedronis commented Jan 9, 2019

One more interessting observation - this spread test does not fail on master with ubuntu-18.04-64 - but it does on arch-liunux-64 so we can probably limit it to that to ensure we deal with regressions.

@mvo5 even if we know it wasn't failing we still want to run the test at least on our main current target distro


execute: |
for _ in $(seq 50); do
snap install test-snapd-tools test-snapd-public
Copy link
Collaborator

@sergiocazzolato sergiocazzolato Jan 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mvo5 Is it possible to speed up the test by downloading the snaps and then installing them with --dangerous? This test is taking about 7 minutes to run from a vm in my machine and I guess it will take much more from a board like the pi2.
See: https://paste.ubuntu.com/p/tjscnkfk2w/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the download cache we implemented recently it should not download the snap multiple times. However I agree we need to target this test much stronger. I.e. we never want this to run on a PI :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I play around with this now, I limited the number of tries to 10 now and also limited the number of systems this will run on (only 18.10, arch and fedora-28 for now). I am checking now how long this takes.

Copy link
Collaborator

@pedronis pedronis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

mvo5 and others added 2 commits January 9, 2019 20:12
Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>
@mvo5 mvo5 merged commit cc6acb6 into canonical:master Jan 10, 2019
mvo5 added a commit to mvo5/snappy that referenced this pull request Jan 31, 2019
systemd: allow only a single daemon-reload at the same time
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants