Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/snap-confine: handle CURRENT_TAGS on systems that support it #10540

Merged
merged 11 commits into from Sep 14, 2021

Conversation

bboozzoo
Copy link
Collaborator

Try to determine whether the libudev/libsystemd supports CURRENT_TAGS which has
been introduced in systemd v247:
https://github.com/systemd/systemd/blob/f6278558da0304ec6b646bb172ce4688c7f162a5/NEWS#L1037-L1119
This comes in pair with TAGS becoming sticky, what means that our checks no
longer behave correctly as the tags do not go away.

If the library supports current tags, use the dynamically collected symbol to
double check that the device is really currently tagged for a snap. The
implementation introduced in v247 correctly checks whether the running systemd
supports current tags and otherwise falls back to looking at TAGS property.

Try to determine whether the libudev/libsystemd supports CURRENT_TAGS which has
been introduced in systemd v247:
https://github.com/systemd/systemd/blob/f6278558da0304ec6b646bb172ce4688c7f162a5/NEWS#L1037-L1119
This comes in pair with TAGS becoming sticky, what means that our checks no
longer behave correctly as the tags do not go away.

If the library supports current tags, use the dynamically collected symbol to
double check that the device is really currently tagged for a snap. The
implementation introduced in v247 correctly checks whether the running systemd
supports current tags and otherwise falls back to looking at TAGS property.

Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>
…sticky udev TAGS

Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>
@bboozzoo bboozzoo added the Needs security review Can only be merged once security gave a :+1: label Jul 16, 2021
@bboozzoo bboozzoo marked this pull request as draft July 16, 2021 12:12
@anonymouse64 anonymouse64 self-requested a review July 16, 2021 14:39
Copy link
Collaborator

@alexmurray alexmurray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a security perspective, this looks fine - snap-confine is setuid root and dlopen does not consult LD_LIBRARY_PATH for setuid binaries so this should be safe from someone trying to supply their own udev_device_has_current_tag in place of the real function from libudev.

Also the use of udev_device_has_current_tag looks fine from what I can see too.

@alexmurray alexmurray removed the Needs security review Can only be merged once security gave a :+1: label Jul 21, 2021
@bboozzoo bboozzoo marked this pull request as ready for review July 27, 2021 07:20
Instead of dlopen() and looking for the symbol ourselves, define a weak symbol
and let the dynamic linker do the work.

Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>
Copy link
Contributor

@mardy mardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just comments about comments :-)

* and on slow devices we may indeed observe a device that no longer
* exists.
*
* Similar debug + continue pattern repeats in all the udev calls in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment! I was exactly wondering if all those debug() calls in sc_udev_allow_assigned_device() should be something more serious, but this clarifies it.

if [ "$SPREAD_REBOOT" = 0 ]; then
echo "Given a snap is installed"
"$TESTSTOOLS"/snaps-state install-local test-snapd-sh
echo "Given a snap is installed"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that you didn't originally write this, but maybe it's a good occasion to fix this comment (I guess that it should be "given snap is installed").

if [ "$(systemctl --version | awk '/systemd [0-9]+/ { print $2 }')" -ge 247 ]; then
REBOOT
fi
# Reboot needed just on systemd 247+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But now we are not rebooting anymore. Should this comment be removed?

if [ "$SPREAD_REBOOT" = 0 ]; then
echo "Given a snap is installed"
"$TESTSTOOLS"/snaps-state install-local test-snapd-sh
echo "Given a snap is installed"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as for the other test

Copy link
Member

@anonymouse64 anonymouse64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, one comment about how we can expand the comments in the code and also I agree with @mardy on the spread tests

debug("cannot find device from syspath %s", path);
continue;
}
if (udev_device_has_current_tag != NULL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this deserves a comment too, something like

Suggested change
if (udev_device_has_current_tag != NULL) {
// If we are able to query if the device has a current tag,
// do so and if there are no current tags, continue to prevent
// allowing assigned devices to the cgroup - this has the net
// desired effect of not re-creating device cgroups that were
// previously created/setup but should no longer be setup due
// to interface disconnection, etc.
if (udev_device_has_current_tag != NULL) {

Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>
Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>
Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>
@bboozzoo
Copy link
Collaborator Author

Add a blocked label again. The testing with a simple binary was showing that the 'weak' symbol is getting properly resolved. However, when I tried with the snap artifact from our github job, the symbol isn't even listed in readelf output:

root@ubuntu:/home/guest# readelf -a -W /snap/snapd/x1/usr/lib/snapd/snap-confine |grep udev_device
000000000061b108  0000002000000007 R_X86_64_JUMP_SLOT     0000000000000000 udev_device_get_devnode@LIBUDEV_183 + 0
000000000061b1b0  0000003500000007 R_X86_64_JUMP_SLOT     0000000000000000 udev_device_new_from_syspath@LIBUDEV_183 + 0
000000000061b2e8  0000005d00000007 R_X86_64_JUMP_SLOT     0000000000000000 udev_device_get_syspath@LIBUDEV_183 + 0
000000000061b390  0000007200000007 R_X86_64_JUMP_SLOT     0000000000000000 udev_device_get_devnum@LIBUDEV_183 + 0
000000000061b460  0000008d00000007 R_X86_64_JUMP_SLOT     0000000000000000 udev_device_unref@LIBUDEV_183 + 0
    32: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND udev_device_get_devnode@LIBUDEV_183 (7)
    53: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND udev_device_new_from_syspath@LIBUDEV_183 (7)
    93: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND udev_device_get_syspath@LIBUDEV_183 (7)                                                                                                                                                                                                                                                                                             
   114: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND udev_device_get_devnum@LIBUDEV_183 (7)
   141: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND udev_device_unref@LIBUDEV_183 (7)

while building locally:

$ readelf -a -W snap-confine/snap-confine|grep udev_device_has
0000000000018fe8  0000009900000006 R_X86_64_GLOB_DAT      0000000000000000 udev_device_has_current_tag@LIBUDEV_247 + 0
   153: 0000000000000000     0 FUNC    WEAK   DEFAULT  UND udev_device_has_current_tag@LIBUDEV_247 (12)
   328: 0000000000000000     0 FUNC    WEAK   DEFAULT  UND udev_device_has_current_tag@LIBUDEV_247

@bboozzoo
Copy link
Collaborator Author

Another data point. A binary built on 20.04 has the weak symbol correctly resolved on Arch, and correctly does not find the symbol on 16.04, although I can clearly see that ld.so tries to look it up:

guest@ubuntu:~/udevtest$ LD_DEBUG=all ./a.out |& grep udev_dev
      6942:     symbol=udev_device_has_current_tag;  lookup in file=./a.out [0]
      6942:     symbol=udev_device_has_current_tag;  lookup in file=/lib/x86_64-linux-gnu/libudev.so.1 [0]
      6942:     symbol=udev_device_has_current_tag;  lookup in file=/lib/x86_64-linux-gnu/libc.so.6 [0]
      6942:     symbol=udev_device_has_current_tag;  lookup in file=/lib/x86_64-linux-gnu/librt.so.1 [0]
      6942:     symbol=udev_device_has_current_tag;  lookup in file=/lib/x86_64-linux-gnu/libpthread.so.0 [0]
      6942:     symbol=udev_device_has_current_tag;  lookup in file=/lib64/ld-linux-x86-64.so.2 [0]

Now a binary built on 16.04, has the weak symbol:

guest@ubuntu:~/udevtest$ nm -a ./a.out |grep udev_dev
                 w udev_device_has_current_tag

but it is not correctly resolved anywhere, neither on 20.04 or Arch, see:

guest@ubuntu:~/udevtest$ ./a.out 
no symbol
guest@ubuntu:~/udevtest$ LD_DEBUG=all ./a.out |& grep udev_dev
guest@ubuntu:~/udevtest$ LD_DEBUG=all ./a.out |& grep udev_
     11685:     symbol=udev_new;  lookup in file=./a.out [0]
     11685:     symbol=udev_new;  lookup in file=/lib/x86_64-linux-gnu/libudev.so.1 [0]
     11685:     binding file ./a.out [0] to /lib/x86_64-linux-gnu/libudev.so.1 [0]: normal symbol `udev_new' [LIBUDEV_183]
     11685:     symbol=udev_unref;  lookup in file=./a.out [0]
     11685:     symbol=udev_unref;  lookup in file=/lib/x86_64-linux-gnu/libudev.so.1 [0]
     11685:     binding file ./a.out [0] to /lib/x86_64-linux-gnu/libudev.so.1 [0]: normal symbol `udev_unref' [LIBUDEV_183]
guest@ubuntu:~/udevtest$ 

So somehow, the Xenial toolchain breaks how symbols are resolved, and that's why I was not able to observe current tags being checked when I used the snapd snap artifact from our testing.

Note that we could try to define udev_device_has_current_tag with a weak
attribute, which should in the normal case be the filled by ld.so when loading
snap-confined. However this was observed to work in practice only when the
binary itself is build with recent enough toolchain (eg. gcc & binutils on
Ubuntu 20.04). In case when the snapd snap needs to be built on 16.04, we have
observed that the weak symbol was indeed produced, but when resolving libraries
and symbols by ld.so on a recent host, the symbol would not be populated.

The patch revers to the original proposed behavior.

Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>
@bboozzoo
Copy link
Collaborator Author

I will restore the original proposed behavior to unblock this PR.

@bboozzoo
Copy link
Collaborator Author

Pushed an update restoring dlopen() and dropped the 'blocked' label.

@codecov-commenter
Copy link

codecov-commenter commented Sep 10, 2021

Codecov Report

Merging #10540 (76e8529) into master (cfea6ca) will increase coverage by 0.00%.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff            @@
##           master   #10540    +/-   ##
========================================
  Coverage   78.34%   78.35%            
========================================
  Files         888      888            
  Lines       99871   100091   +220     
========================================
+ Hits        78247    78425   +178     
- Misses      16718    16748    +30     
- Partials     4906     4918    +12     
Flag Coverage Δ
unittests 78.35% <ø> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
store/cache.go 69.23% <0.00%> (-1.93%) ⬇️
systemd/systemd.go 85.94% <0.00%> (-0.46%) ⬇️
overlord/snapstate/storehelpers.go 79.22% <0.00%> (-0.36%) ⬇️
overlord/snapstate/handlers.go 70.80% <0.00%> (-0.01%) ⬇️
dirs/dirs.go 94.62% <0.00%> (+0.02%) ⬆️
overlord/ifacestate/helpers.go 77.52% <0.00%> (+0.49%) ⬆️
overlord/hookstate/ctlcmd/refresh.go 71.64% <0.00%> (+0.76%) ⬆️
systemd/emulation.go 40.18% <0.00%> (+3.34%) ⬆️
overlord/snapstate/backend/mountunit.go 88.88% <0.00%> (+8.88%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cfea6ca...76e8529. Read the comment docs.

@mvo5 mvo5 added this to the 2.52 milestone Sep 13, 2021
…n 14.04

Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>
@mvo5 mvo5 added the Squash-merge Please squash this PR when merging. label Sep 13, 2021
Copy link
Member

@anonymouse64 anonymouse64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one question about the check in the spread test, but still lgtm overall

@mvo5
Copy link
Contributor

mvo5 commented Sep 13, 2021

Fwiw, looks good overall from a (not too deep) look.

Copy link
Member

@anonymouse64 anonymouse64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still lgtm

@mvo5 mvo5 merged commit a125a37 into snapcore:master Sep 14, 2021
@mvo5 mvo5 removed this from the 2.52 milestone Sep 14, 2021
@mvo5
Copy link
Contributor

mvo5 commented Sep 14, 2021

I removed the 2.52 milestone, I think we want this in 2.53 instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Squash-merge Please squash this PR when merging.
Projects
None yet
6 participants