Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

travis: also rebuild everything with ASan and UBSan and install dependencies necessary for running parse-hwdb.py #10678

Merged
merged 8 commits into from
Nov 8, 2018

Conversation

evverx
Copy link
Member

@evverx evverx commented Nov 7, 2018

No description provided.

@evverx evverx changed the title travis: install dependencies necessary for running parse-hwdb.py travis: also rebuild everything with ASan and install dependencies necessary for running parse-hwdb.py Nov 7, 2018
@evverx
Copy link
Member Author

evverx commented Nov 7, 2018

The most important part here is the second commit where systemd is rebuilt with ASan. It should help to catch issues like #10677, #10267, #10157 and countless others. I'm doing it mostly because the cluster where I ran the unit tests under ASan (among many other things) is being phased out and is likely to disappear at the end of the month.

@evverx
Copy link
Member Author

evverx commented Nov 7, 2018

I ran into https://docs.travis-ci.com/user/common-build-problems/#log-length-exceeded :-) I'll move it into a separate stage.

So test-execute and test-path are failing due #10677 as expected. test-sd-hwdb surprised me:

SYSTEMD_KBD_MODEL_MAP='/build/src/locale/kbd-model-map' PATH='/build/build:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin' SYSTEMD_LANGUAGE_FALLBACK_MAP='/build/src/locale/language-fallback-map' /build/build/test-sd-hwdb
--- stderr ---
Found container virtualization container-other.
/* test_failed_enumerate */
hwdb.bin does not exist, please run 'systemd-hwdb update'
=================================================================
==8268==ERROR: AddressSanitizer: heap-use-after-free on address 0x606000000140 at pc 0x7fed4822ac27 bp 0x7ffcdc756c30 sp 0x7ffcdc756c20
WRITE of size 4 at 0x606000000140 thread T0
    #0 0x7fed4822ac26 in sd_hwdb_unref ../src/libsystemd/sd-hwdb/sd-hwdb.c:373
    #1 0x55a5a493e290 in sd_hwdb_unrefp ../src/systemd/sd-hwdb.h:41
    #2 0x55a5a493e61e in test_failed_enumerate ../src/test/test-sd-hwdb.c:8
    #3 0x55a5a493ee09 in main ../src/test/test-sd-hwdb.c:67
    #4 0x7fed47c5e412 in __libc_start_main (/lib64/libc.so.6+0x24412)
    #5 0x55a5a493e18d in _start (/build/build/test-sd-hwdb+0x218d)

0x606000000140 is located 0 bytes inside of 64-byte region [0x606000000140,0x606000000180)
freed by thread T0 here:
    #0 0x7fed48667480 in free (/lib64/libasan.so.5+0xef480)
    #1 0x7fed480f201e in freep ../src/basic/alloc-util.h:65
    #2 0x7fed480f2726 in proc_cmdline_parse ../src/basic/proc-cmdline.c:86
    #3 0x7fed480d7406 in log_parse_environment_realm ../src/basic/log.c:1121
    #4 0x7fed480180af in test_setup_logging ../src/shared/tests.c:104
    #5 0x55a5a493ee04 in main ../src/test/test-sd-hwdb.c:65
    #6 0x7fed47c5e412 in __libc_start_main (/lib64/libc.so.6+0x24412)

previously allocated by thread T0 here:
    #0 0x7fed48667c88 in realloc (/lib64/libasan.so.5+0xefc88)
    #1 0x7fed48029796 in greedy_realloc ../src/basic/alloc-util.c:55
    #2 0x7fed480f4824 in get_process_cmdline ../src/basic/process-util.c:147
    #3 0x7fed480f21a4 in proc_cmdline ../src/basic/proc-cmdline.c:37
    #4 0x7fed480f26bb in proc_cmdline_parse ../src/basic/proc-cmdline.c:91
    #5 0x7fed480d7406 in log_parse_environment_realm ../src/basic/log.c:1121
    #6 0x7fed480180af in test_setup_logging ../src/shared/tests.c:104
    #7 0x55a5a493ee04 in main ../src/test/test-sd-hwdb.c:65
    #8 0x7fed47c5e412 in __libc_start_main (/lib64/libc.so.6+0x24412)

SUMMARY: AddressSanitizer: heap-use-after-free ../src/libsystemd/sd-hwdb/sd-hwdb.c:373 in sd_hwdb_unref
Shadow bytes around the buggy address:
  0x0c0c7fff7fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c0c7fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c0c7fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c0c7fff8000: fa fa fa fa 00 00 00 00 00 00 05 fa fa fa fa fa
  0x0c0c7fff8010: fd fd fd fd fd fd fd fd fa fa fa fa fd fd fd fd
=>0x0c0c7fff8020: fd fd fd fd fa fa fa fa[fd]fd fd fd fd fd fd fd
  0x0c0c7fff8030: fa fa fa fa fd fd fd fd fd fd fd fd fa fa fa fa
  0x0c0c7fff8040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c0c7fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c0c7fff8060: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c0c7fff8070: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==8268==ABORTING
-------

@evverx evverx force-pushed the more-travis-tweaks branch 2 times, most recently from 0c98ac9 to c570084 Compare November 7, 2018 22:19
evverx added a commit to evverx/systemd that referenced this pull request Nov 7, 2018
@evverx evverx changed the title travis: also rebuild everything with ASan and install dependencies necessary for running parse-hwdb.py travis: also rebuild everything with ASan and UBSan and install dependencies necessary for running parse-hwdb.py Nov 8, 2018
@evverx
Copy link
Member Author

evverx commented Nov 8, 2018

@mrc0mmand could you take a look? I made it work in the sense that the code compiles, the tests are run and pretty much all known issues have been successfully detected. But I wouldn't say that it's perfect :-) Feel free to push commits on top of this branch if you think that it would be better to rearrange stages (or, maybe, run two jobs in parallel), rename things and anything else you think could be improved. By the way, I also think that it would be useful to add something like that to CentOS CI.

@mrc0mmand
Copy link
Member

@mrc0mmand could you take a look?

Will do!

Feel free to push commits on top of this branch if you think that it would be better to rearrange stages (or, maybe, run two jobs in parallel), rename things and anything else you think could be improved.

I guess that's going to be an issue, as that works only for maintainers (i.e. people with write access to systemd/systemd)

By the way, I also think that it would be useful to add something like that to CentOS CI.

Definitely.

@evverx
Copy link
Member Author

evverx commented Nov 8, 2018

I guess that's going to be an issue, as that works only for maintainers

I thought that write access to the "systemd-centos-ci" repository you have somehow transferred to this one. @poettering @keszybz would it be possible to invite @mrc0mmand as a collaborator?

evverx added a commit to evverx/systemd that referenced this pull request Nov 8, 2018
@evverx
Copy link
Member Author

evverx commented Nov 8, 2018

Anyway, I rebased the branch on top of master where as far as I can see all the issues are fixed so Travis CI should be green and, in theory, the PR is ready to roll :-)

$DOCKER_EXEC git clean -dxff
$DOCKER_EXEC meson -Db_sanitize=address,undefined build
$DOCKER_EXEC ninja -v -C build
$DOCKER_EXEC sh -c "printf '#!/bin/sh\necho The test is failing for unknown reason, skipping; exit 77' >/build/build/test-capability"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one should be removed after #10687 is merged.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's part of the PR, so we should be fine

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the second copy of the workaround. Kludges tend to sprawl :-) I'll wait for your PR to be merged to remove it altogether and to make sure that the test passes under ASan+UBsan too just in case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I see. Thanks!

.travis.yml Outdated
@@ -36,6 +36,25 @@ jobs:
after_script:
- $CI_MANAGERS/fedora.sh CLEANUP

- stage: Build & test with ASan
Copy link
Member

@mrc0mmand mrc0mmand Nov 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion it would be great if both standard and ASAN runs ran in parallel - dropping the stage definition here should do the trick.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried something like that but it didn't work out. Travis just ran the ASan part skipping the "basic" part altogether. Maybe, I got the indentation wrong. I'll try again to see how it goes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify what I meant by "later". My idea was to make Travis work and merge it as soon as possible so that the CI could start catching issues that are easily detectable by ASan+UBsan and the stages/jobs could be rearranged later in a way that wouldn't require catching up with new bugs lurking around :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a working fix for the parallel runs (e.g. https://travis-ci.org/mrc0mmand/systemd/jobs/452344355)

From a084480b15b817b95f5db4e1605d88635c1fdd99 Mon Sep 17 00:00:00 2001
From: Frantisek Sumsal <frantisek@sumsal.cz>
Date: Thu, 8 Nov 2018 12:47:07 +0100
Subject: [PATCH] travis: run ASan job in parallel with the standard one

---
 .travis.yml | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/.travis.yml b/.travis.yml
index add0576d4..eaeb95051 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -36,8 +36,7 @@ jobs:
           after_script:
               - $CI_MANAGERS/fedora.sh CLEANUP
 
-        - stage: Build & test with ASan
-          name: Fedora Rawhide
+        - name: Fedora Rawhide (ASan)
           language: bash
           env:
               - FEDORA_RELEASE="rawhide"
-- 
2.17.2

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Now I'm a bit puzzled about what went wrong when I tried to do the same thing :-) Anyway, I applied the patch. Let's see how it goes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for the info. I'm looking into the test-capability fail right now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the meantime, I'm going to put the kludge there again and use meson test -t 3 to run the tests with 3 times the normal timeout. The PR is already useful as it is even without test-capability, which can be fixed later.

Copy link
Member

@mrc0mmand mrc0mmand Nov 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, this is, once again, some black magic stuff.

Let's take a look at a more verbose version of the log:

==5649==AddressSanitizer Init done
have ambient caps: yes
Capabilities:= cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read+eip
Capabilities:= cap_dac_override,cap_net_raw+ep
==5656==Could not attach to thread 5655 (errno 1).
==5656==Failed suspending threads.
==5655==LeakSanitizer has encountered a fatal error.
==5655==HINT: For debugging, try setting environment variable LSAN_OPTIONS=verbosity=1:log_threads=1
==5655==HINT: LeakSanitizer does not work under ptrace (strace, gdb, etc)

From the issues and documentation I found so far it means that Docker is missing the SYS_PTRACE capability. However, as shown in the output above, the capability is there (and docker run --cap-add SYS_PTRACE makes no difference).

And to make matters worse, it, of course, works as expected on both CentOS and Fedora...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, after another painfully long debug session I finally found a fix (or workaround, still not sure). The fail is caused by test_drop_privileges_fail, as it drops all privileges, even SYS_PTRACE, which is necessary for ASan. I have no idea why it works on Fedora/CentOS, but the proposed patch below seems to fix the issue:

diff --git a/src/test/test-capability.c b/src/test/test-capability.c
index 730dbf8cb..fc0d347bb 100644
--- a/src/test/test-capability.c
+++ b/src/test/test-capability.c
@@ -21,7 +21,7 @@ static uid_t test_uid = -1;
 static gid_t test_gid = -1;
 
 /* We keep CAP_DAC_OVERRIDE to avoid errors with gcov when doing test coverage */
-static uint64_t test_flags = 1ULL << CAP_DAC_OVERRIDE;
+static uint64_t test_flags = (1ULL << CAP_DAC_OVERRIDE) | (1ULL << CAP_SYS_PTRACE);

@evverx - If it's a relatively sane solution, I'll open another PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrc0mmand I've just opened #10692. Let's continue the discussion there.

evverx and others added 7 commits November 8, 2018 13:49
It should help to catch issues like systemd#10677.
The test is currently failing when run under ASan in a docker container:
```
--- command ---
SYSTEMD_KBD_MODEL_MAP='/build/src/locale/kbd-model-map' PATH='/build/build:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin' SYSTEMD_LANGUAGE_FALLBACK_MAP='/build/src/locale/language-fallback-map' /build/build/test-capability
--- stderr ---
have ambient caps: yes
Capabilities:= cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read+eip
Capabilities:= cap_dac_override,cap_net_raw+ep
==7021==LeakSanitizer has encountered a fatal error.
==7021==HINT: For debugging, try setting environment variable LSAN_OPTIONS=verbosity=1:log_threads=1
==7021==HINT: LeakSanitizer does not work under ptrace (strace, gdb, etc)
Assertion 'WIFEXITED(status) && WEXITSTATUS(status) == 0' failed at ../src/test/test-capability.c:71, function fork_test(). Aborting.
-------
```

https://api.travis-ci.org/v3/job/452349948/log.txt
@evverx
Copy link
Member Author

evverx commented Nov 8, 2018

@poettering @keszybz @yuwata I think the PR is ready to merge. Could anyone press the merge button?

@evverx
Copy link
Member Author

evverx commented Nov 8, 2018

I'm going to merge the PR so that the CI can start to catch issues easily detectable by ASan+UBSan. I could always revert it if anyone thinks that it's a bad idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

2 participants