Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

util: fix sign-compare warning #12856

Merged
merged 1 commit into from
Jul 9, 2019

Conversation

yuwata
Copy link
Member

@yuwata yuwata commented Jun 21, 2019

@yuwata
Copy link
Member Author

yuwata commented Jun 21, 2019

cc @mrc0mmand

@yuwata
Copy link
Member Author

yuwata commented Jun 21, 2019

Ugh...

======================================================================
FAIL: test_dhcp_client_route_remove_on_renew (__main__.NetworkdDHCPClientTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test/test-network/systemd-networkd-tests.py", line 2461, in test_dhcp_client_route_remove_on_renew
    self.assertRegex(output, 'inet 192.168.5.2[0-9]*/24 brd 192.168.5.255 scope global dynamic veth99')
AssertionError: Regex didn't match: 'inet 192.168.5.2[0-9]*/24 brd 192.168.5.255 scope global dynamic veth99' not found in '69: veth99@veth-peer: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000\n    inet 192.168.5.230/24 brd 192.168.5.255 scope global secondary dynamic veth99\n       valid_lft 97sec preferred_lft 97sec'

======================================================================
FAIL: test_dhcp_client_with_ipv4ll_fallback_with_dhcp_server (__main__.NetworkdDHCPClientTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test/test-network/systemd-networkd-tests.py", line 2405, in test_dhcp_client_with_ipv4ll_fallback_with_dhcp_server
    self.assertRegex(output, 'inet 192.168.5.[0-9]*/24 brd 192.168.5.255 scope global dynamic veth99')
AssertionError: Regex didn't match: 'inet 192.168.5.[0-9]*/24 brd 192.168.5.255 scope global dynamic veth99' not found in '78: veth99@veth-peer: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000\n    inet 192.168.5.181/24 brd 192.168.5.255 scope global secondary dynamic veth99\n       valid_lft 108sec preferred_lft 108sec'

======================================================================
FAIL: test_ip_link_unmanaged (__main__.NetworkdNetworkTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test/test-network/systemd-networkd-tests.py", line 1464, in test_ip_link_unmanaged
    self.check_operstate('dummy98', 'off', setup_state='unmanaged')
  File "./test/test-network/systemd-networkd-tests.py", line 305, in check_operstate
    self.assertRegex(get_operstate(link, show_status, setup_state), expected)
AssertionError: Regex didn't match: 'off' not found in 'degraded'

======================================================================
FAIL: test_link_local_addressing (__main__.NetworkdNetworkTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test/test-network/systemd-networkd-tests.py", line 1498, in test_link_local_addressing
    self.assertNotRegex(output, 'inet6* .* scope link')
AssertionError: Regex matched: 'inet6 fe80::9bdd:ceea:4d2f:4f87/64 scope link' matches 'inet6* .* scope link' in '220: dummy98: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000\n    link/ether 5e:48:d8:8c:93:94 brd ff:ff:ff:ff:ff:ff\n    inet6 fe80::9bdd:ceea:4d2f:4f87/64 scope link \n       valid_lft forever preferred_lft forever'

The first and second errors might be ok. On renew, we first assign a new address, then remove the old address. But I have no idea why the new address still has secondary flag after the old address is removed.

The third one may also ok, but not sure. No. Who does bring up the interface???

The last one looks critical, as LinkLocalAddressing=no seems broken again.

@yuwata
Copy link
Member Author

yuwata commented Jun 21, 2019

@mrc0mmand Is the kernel updated? Or, some default sysctl value is changed?

@mrc0mmand
Copy link
Member

@mrc0mmand Is the kernel updated? Or, some default sysctl value is changed?

Since the box got recently updated, I'd say the first one.

@yuwata
Copy link
Member Author

yuwata commented Jun 21, 2019

@mrc0mmand Indeed. The compiler is also updated. Previously

Linux version 5.1.9-arch1-1-ARCH (builduser@heftig-25959) (gcc version 8.3.0 (GCC)) #1 SMP PREEMPT Tue Jun 11 16:18:09 UTC 2019

but now

Linux version 5.1.12-arch1-1-ARCH (builduser@heftig-24809) (gcc version 9.1.0 (GCC)) #1 SMP PREEMPT Wed Jun 19 09:16:00 UTC 2019

I use

$ uname -r
5.1.12-200.fc29.x86_64
$ rpm -q gcc
gcc-8.3.1-2.fc29.x86_64

and everything is fine. So, at least the sign compare issue seems to be introduced (or investigated?) by gcc-9.

@mrc0mmand
Copy link
Member

Apparently, something else has changed with the newer gcc, as the hwdb test and update service started failing pretty reliably.

@mrc0mmand
Copy link
Member

The runtime difference is now quite significant:

$ gcc --version
gcc (GCC) 9.1.1 20190605 (Red Hat 9.1.1-2)
$ CC=cc CXX=gcc meson  build-Db_sanitize=address,undefined  -Dc_args='-g -O0 -ftrapv
$ time build/test-conf-parser                                                                                                                                                                                                                                                                  
...                                    
== test_config_parse[7] ==                                                                                                                       
== test_config_parse[8] ==                                  
== test_config_parse[9] ==                                  
== test_config_parse[10] ==                                 
== test_config_parse[11] ==                                 
== test_config_parse[12] ==                                                       
== test_config_parse[13] ==                      
== test_config_parse[14] ==                                        
/tmp/test-conf-parser.U4nrMy:1: Line too long     
== test_config_parse[15] ==                    
/tmp/test-conf-parser.vry7fp:1: Continuation line too long       
                                                                                                                                                 
real    0m30.607s                                            
user    0m29.165s                                           
sys     0m0.268s
$ clang --version
clang version 7.0.1 (Fedora 7.0.1-4.fc29)
$ CC=clang CXX=clang++ meson  build-clang -Db_sanitize=address,undefined -Db_lundef=false -Dc_args='-g -O0 -ftrapv'
$ time build-clang/test-conf-parser
...
== test_config_parse[7] ==
== test_config_parse[8] ==
== test_config_parse[9] ==
== test_config_parse[10] ==
== test_config_parse[11] ==
== test_config_parse[12] ==
== test_config_parse[13] ==
== test_config_parse[14] ==
/tmp/test-conf-parser.GGsqcG:1: Line too long
== test_config_parse[15] ==
/tmp/test-conf-parser.HmcG2B:1: Continuation line too long

real    0m4.303s
user    0m3.516s
sys     0m0.262s

@mrc0mmand
Copy link
Member

Also, looking at results from older CI runs with gcc 8.3.0, the performance issue is indeed caused by newer gcc:

...
276/526 hwdb-test                               OK      99.68 s 
...
411/526 test-conf-parser                        OK       5.68 s 
...

@mrc0mmand
Copy link
Member

It seems the performance hit is caused by detect_stack_use_after_return=1:

$ export ASAN_OPTIONS=strict_string_checks=1:check_initialization_order=1:strict_init_order=1:detect_stack_use_after_return=1
$ time build/test-conf-parser
...
== test_config_parse[12] ==
== test_config_parse[13] ==
== test_config_parse[14] ==
/tmp/test-conf-parser.fbGlaS:1: Line too long
== test_config_parse[15] ==
/tmp/test-conf-parser.lBGEXc:1: Continuation line too long

real    0m29.998s
user    0m29.604s
sys     0m0.254s
$ export ASAN_OPTIONS=strict_string_checks=1:check_initialization_order=1:strict_init_order=1:detect_stack_use_after_return=0  
$ time build/test-conf-parser  
...
== test_config_parse[12] ==
== test_config_parse[13] ==
== test_config_parse[14] ==
/tmp/test-conf-parser.QrPBEX:1: Line too long
== test_config_parse[15] ==
/tmp/test-conf-parser.t39GvA:1: Continuation line too long

real    0m2.952s
user    0m2.693s
sys     0m0.242s

I guess I could switch the detect_stack_use_after_return option off for a while to not disrupt other PRs, until we find a solution (cc @evverx).

mrc0mmand added a commit to mrc0mmand/systemd-centos-ci that referenced this pull request Jun 21, 2019
With gcc9 the detect_stack_use_after_return introduces a severe
performance overhead causing timing issues in the testsuite. Let's
disable it (only for gcc run) until a proper solution is found

See systemd/systemd#12856
mrc0mmand added a commit to mrc0mmand/systemd-centos-ci that referenced this pull request Jun 21, 2019
With gcc9 the detect_stack_use_after_return introduces a severe
performance overhead causing timing issues in the testsuite. Let's
disable it (only for gcc run) until a proper solution is found

See systemd/systemd#12856
@mrc0mmand
Copy link
Member

Well, I prepared a temporary workaround in systemd/systemd-centos-ci#135, but looking & comparing further results from older CI runs, everything with the gcc9 takes at least twice as much time. Given that, I'd rather downgrade the gcc than started stockpiling workarounds, and tried to investigate the performance issue in the meanwhile.

@evverx, @yuwata, @keszybz any ideas?

@evverx
Copy link
Member

evverx commented Jun 21, 2019

@mrc0mmand I agree it would probably be better to downgrade gcc (if it's not too complicated).

everything with the gcc9 takes at least twice as much time

Just to be sure, do you mean ASan only?

@mrc0mmand
Copy link
Member

Just to be sure, do you mean ASan only?

Not really, it applies even to the standard vagrant job without sanitizers:

gcc-8:

[TASK] ninja-test
Waiting for PID 573 to finish
.......
PID 573 finished with EC 0 in 70s
[RESULT] ninja-test - PASS (log file: /build/vagrant-arch-testsuite.p7F/ninja-test_PASS.log)

...

[TASK] TEST-25-IMPORT
Waiting for PID 411947 to finish
..........
PID 411947 finished with EC 0 in 100s
[RESULT] TEST-25-IMPORT - PASS (log file: /build/vagrant-arch-testsuite.p7F/TEST-25-IMPORT_PASS.log)

gcc-9:

[TASK] ninja-test
Waiting for PID 573 to finish
...............
PID 573 finished with EC 0 in 150s
[RESULT] ninja-test - PASS (log file: /build/vagrant-arch-testsuite.27W/ninja-test_PASS.log)

...

[TASK] TEST-25-IMPORT
Waiting for PID 434535 to finish
......................
PID 434535 finished with EC 0 in 220s
[RESULT] TEST-25-IMPORT - PASS (log file: /build/vagrant-arch-testsuite.27W/TEST-25-IMPORT_PASS.log)

@evverx
Copy link
Member

evverx commented Jun 21, 2019

Since I'm not particularly familiar with Arch, @eworm-de @falconindy I'm wondering what is the best way to pin down gcc? Though it can't be it affects systemd only. Maybe Arch will roll it back shortly. I don't know.

@mrc0mmand
Copy link
Member

mrc0mmand commented Jun 21, 2019

So far I'm trying to follow https://wiki.archlinux.org/index.php/Arch_Linux_Archive#How_to_downgrade_one_package, hopefully it's going to work

Edit: I meant the section below, which replaces the repositories with a snapshot: https://wiki.archlinux.org/index.php/Arch_Linux_Archive#How_to_restore_all_packages_to_a_specific_date

@evverx
Copy link
Member

evverx commented Jun 21, 2019

I meant the section below, which replaces the repositories with a snapshot

As far as I understand, it will hide those networkd failures (which @yuwata says are legit) if we roll everything back.

@evverx
Copy link
Member

evverx commented Jun 21, 2019

Anyway, it if helps to make the CI green, I'm all for it. Those failures are new and need looking into, which might take a while.

@mrc0mmand
Copy link
Member

@evverx the issue with sanitizers is reproducible on Fedora as well, so I filed a bug in RH Bugzilla. I considered using the official gcc Bugzilla, but according to the guidelines they don't like bug reports from gcc snapshots bundled with distributions and I wasn't in a mood to compile my own gcc :-)

Also, the CI run in systemd/systemd-centos-ci#136 passed, so I'll merge it and slowly start re-triggering failed CI jobs. However, the tests still take a slightly more time than usual, so there's maybe something else going on...

@yuwata
Copy link
Member Author

yuwata commented Jun 22, 2019

I am not sure whether the error is false positive or not. So, I added 'postponed' label. But ok to close this if it is confirmed that the error is false positive.

I hope the errors in networkd are also false positive.

@@ -267,7 +267,7 @@ static char *format_timestamp_internal(

assert(buf);

if (l <
if (l < (size_t)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, i wonder if it wouldn't be better to cast the whole addition instead of just the first value of it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Done.

@yuwata yuwata removed the postponed label Jul 8, 2019
@yuwata
Copy link
Member Author

yuwata commented Jul 8, 2019

@mrc0mmand Updated.

@keszybz keszybz added the good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed label Jul 8, 2019
@yuwata
Copy link
Member Author

yuwata commented Jul 9, 2019

BTW, without this commit, gcc-9.1.1 can build systemd successfully. So, I think the issue is a bug of gcc-9.1.0.

@keszybz
Copy link
Member

keszybz commented Jul 9, 2019

OK, this change is very small and not particularly ugly, let's just merge it.

@keszybz keszybz merged commit 8164e30 into systemd:master Jul 9, 2019
@keszybz keszybz removed the good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed label Jul 9, 2019
@mrc0mmand
Copy link
Member

@yuwata

BTW, without this commit, gcc-9.1.1 can build systemd successfully. So, I think the issue is a bug of gcc-9.1.0.

Ah, I just thought it has moved to some optimization level, but that's not the case, thanks for the explanation. The -O2 issues are indeed different:

$ ninja -C build                                                                                                       
ninja: Entering directory `build'                                                                                      
[295/1513] Compiling C object 'src/shared/5afaae1@@systemd-shared-242@sta/dm-util.c.o'.                                
../src/shared/dm-util.c: In function ‘dm_deferred_remove’:                                                             
../src/shared/dm-util.c:35:9: warning: ‘strncpy’ specified bound 128 equals destination size [-Wstringop-truncation]
   35 |         strncpy(dm.name, name, sizeof(dm.name));                                                               
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~    
[539/1513] Compiling C object 'src/network/70b1c79@@networkd-core@sta/networkd-link.c.o'.                              
FAILED: src/network/70b1c79@@networkd-core@sta/networkd-link.c.o                                                       
cc -Isrc/network/70b1c79@@networkd-core@sta -Isrc/network -I../src/network -Isrc/basic -I../src/basic -Isrc/shared -I../src/shared -Isrc/systemd -I../src/systemd -Isrc/journal -I../src/journal -Isrc/journal-remote -I../src/journal-remote c
In file included from ../src/basic/macro.h:558,                                                                        
                 from ../src/basic/alloc-util.h:9,      
                 from ../src/network/networkd-link.c:7:
../src/network/networkd-link.c: In function ‘link_sysctl_ipv6_enabled’:                                                
../src/basic/log.h:104:9: error: ‘%s’ directive argument is null [-Werror=format-overflow=]                            
  104 |         log_internal_realm(LOG_REALM_PLUS_LEVEL(LOG_REALM, (level)), __VA_ARGS__)                              
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                              
../src/shared/log-link.h:21:25: note: in expansion of macro ‘log_internal’                                             
   21 |                         log_internal(level, error, __FILE__, __LINE__, __func__, ##__VA_ARGS__); \             
      |                         ^~~~~~~~~~~~                                                                           
../src/shared/log-link.h:33:50: note: in expansion of macro ‘log_link_full’                                            
   33 | #define log_link_warning_errno(link, error, ...) log_link_full(link, LOG_WARNING, error, ##__VA_ARGS__)        
      |                                                  ^~~~~~~~~~~~~                                                 
../src/network/networkd-link.c:77:24: note: in expansion of macro ‘log_link_warning_errno’                             
   77 |                 return log_link_warning_errno(link, r,                                                         
      |                        ^~~~~~~~~~~~~~~~~~~~~~      
../src/network/networkd-link.c:78:77: note: format string is defined here                                              
   78 |                                               "Failed to read net.ipv6.conf.%s.disable_ipv6 sysctl property: %m",                                                                                                                      
      |                                                                             ^~                                 
cc1: some warnings being treated as errors         
[544/1513] Compiling C object 'src/machine/bdfdaa9@@machine-core@sta/machine-dbus.c.o'.                                
ninja: build stopped: subcommand failed.           

And I think you already opened a PR for the error.

@yuwata yuwata deleted the fix-sign-compare-warning branch July 9, 2019 09:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

None yet

5 participants