Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The test_limiter_node test is hanging indefinitely on CMAKE_BUILD_TYPE=Release (gcc-9, gcc-8) #342

Closed
phprus opened this issue Jan 30, 2021 · 26 comments
Labels

Comments

@phprus
Copy link
Contributor

phprus commented Jan 30, 2021

Commit: 2dba207

With CMAKE_BUILD_TYPE=relwithdebinfo the test_limiter_node test work fine:

42: Test command: /home/phprus/tmp/4/6/tbb/oneTBB-2dba2072869a189b9fdab3ffa431d3ea49059a19/build/gcc8/gnu_8.2_cxx11_64_relwithdebinfo/test_limiter_node "--force-colors=1"
42: Test timeout computed to be: 9.99988e+06
42: [doctest] doctest version is "2.3.5"
42: [doctest] run with "--help" for options
42: ===============================================================================
42: [doctest] test cases:      6 |      6 passed |      0 failed |      0 skipped
42: [doctest] assertions: 217212 | 217212 passed |      0 failed |
42: [doctest] Status: SUCCESS!
 42/129 Test  #42: test_limiter_node ........................   Passed    0.33 sec

With CMAKE_BUILD_TYPE=Release and gcc-9, gcc-8 the test_limiter_node test is hanging indefinitely:

42: Test command: /home/phprus/tmp/4/6/tbb/oneTBB-2dba2072869a189b9fdab3ffa431d3ea49059a19/build/gcc9/gnu_9.3_cxx11_64_release/test_limiter_node "--force-colors=1"
42: Test timeout computed to be: 9.99988e+06
42: [doctest] doctest version is "2.3.5"
42: [doctest] run with "--help" for options
42: ===============================================================================
42: /home/phprus/tmp/4/6/tbb/oneTBB-2dba2072869a189b9fdab3ffa431d3ea49059a19/test/tbb/test_limiter_node.cpp:515:
42: TEST CASE:  Message is released if successor does not accept
42:
42: /home/phprus/tmp/4/6/tbb/oneTBB-2dba2072869a189b9fdab3ffa431d3ea49059a19/test/tbb/test_limiter_node.cpp:515: FATAL ERROR: test case CRASHED: SIGTERM - Termination request signal
42:
42: ===============================================================================
42: [doctest] test cases:      4 |      3 passed |      1 failed |      2 skipped
42: [doctest] assertions: 217181 | 217181 passed |      0 failed |
42: [doctest] Status: FAILURE!
 42/129 Test  #42: test_limiter_node ........................***Exception: Child terminated  8.75 sec
@phprus
Copy link
Contributor Author

phprus commented Feb 1, 2021

Microsoft Visual Studio 2017:

The following tests FAILED:
         21 - test_task_group (Failed)
         23 - test_task_arena (Failed)
         37 - test_flow_graph_whitebox (Failed)
         50 - test_async_node (Failed)
         60 - test_global_control (Failed)
        122 - test_malloc_compliance (Failed)
        126 - test_malloc_whitebox (Failed)
Errors while running CTest

@phprus
Copy link
Contributor Author

phprus commented Feb 2, 2021

Release build with CXXFLAGS="-fno-ipa-cp-clone" (gcc 8, 9):

100% tests passed, 0 tests failed out of 129

gcc-7:

test 90
        Start  90: conformance_global_control

90: Test command: /home/phprus/tmp/4/6/tbb/oneTBB-2dba2072869a189b9fdab3ffa431d3ea49059a19/build/gcc7/gnu_7.5_cxx11_64_release/conformance_global_control "--force-colors=1"
90: Test timeout computed to be: 9.99988e+06
90: [doctest] doctest version is "2.3.5"
90: [doctest] run with "--help" for options
90: ===============================================================================
90: /home/phprus/tmp/4/6/tbb/oneTBB-2dba2072869a189b9fdab3ffa431d3ea49059a19/test/conformance/conformance_global_control.cpp:241:
90: TEST CASE:  terminate_on_exception: enabled
90:   user exception
90:
90: /home/phprus/tmp/4/6/tbb/oneTBB-2dba2072869a189b9fdab3ffa431d3ea49059a19/test/conformance/conformance_global_control.cpp:241: ERROR: test case THREW exception: exception thrown in subcase - will translate later when the whole test case has been exited (cannot translate while there is an active exception)
90:
90: ===============================================================================
90: /home/phprus/tmp/4/6/tbb/oneTBB-2dba2072869a189b9fdab3ffa431d3ea49059a19/test/conformance/conformance_global_control.cpp:241:
90: TEST CASE:  terminate_on_exception: enabled
90:
90: /home/phprus/tmp/4/6/tbb/oneTBB-2dba2072869a189b9fdab3ffa431d3ea49059a19/test/conformance/conformance_global_control.cpp:285: FATAL ERROR: The exception is not expected
90:
90: ===============================================================================
90: [doctest] test cases:      6 |      5 passed |      1 failed |      0 skipped
90: [doctest] assertions:   4333 |   4332 passed |      1 failed |
90: [doctest] Status: FAILURE!
 90/129 Test  #90: conformance_global_control ...............***Failed    0.08 sec

@alexey-katranov
Copy link
Contributor

Thank you for the findings. We build the libraries with RelWithDebInfo and it is slightly different with Release in optimization flags. As I understand, Release enables -O3 (instead of -O2) that brings ipa-cp-clone optimization flag. The reason why ipa-cp-clone breaks the tests is not still clear.

As for conformance_global_control is it sporadic or does it always fail with gcc7?

@phprus
Copy link
Contributor Author

phprus commented Feb 3, 2021

RelWithDebInfo (-O2) + CXXFLAGS="-fipa-cp-clone" => fail (gcc 7,8,9).

Always (Release or RelWithDebInfo).
openSUSE 15.1,
gcc version 7.5.0 (SUSE Linux),
glibc-2.26-lp151.19.19.1.x86_64,
Kernel: 4.12.14-lp151.28.67-default.

@diablodale
Copy link

Perhaps related -- Today I changed from my join-token based limiter to flow::limiter_node. I had a consistent failure of the limiter_node accepting 10-20 messages and then accepted no more. I used the default limiter_node<T> and sent continue_msg() via an edge to the limiter.decrementer(). I was unable to relate the count until this failure with the node's threshold or arena thread count.

I made a single code change to limiter_node<T, int> and sent (int)1 via the edges. Now I have no issues. The limiter_node works as I expect. 🤔

I encountered this issue in Windows 64-bit, Debug and Release compiles via Microsoft VS2019 v16.8.4 Community with oneapi-tbb-2021.1.1

Is there an issue in the specializations of threshold_regulator? For example, the specialization for continue_msg has execute() but doesn't have try_put_task() with a decrement like its sibling specialization.

@alexey-katranov
Copy link
Contributor

@diablodale , at first glance, the issues are different because the discussed issue is caused by optimization level of gcc compiler. However, in your case you experiencing issues even in debug with Microsoft Compiler.

@aleksei-fedotov , can you please look at the issue related to limiter_node reported by @diablodale?

@diablodale
Copy link

@aleksei-fedotov I can open a separate issue at your request.

@phprus
Copy link
Contributor Author

phprus commented Mar 2, 2021

Current master https://github.com/oneapi-src/oneTBB/tree/4523a7615eaed49c4ed75654b349ca41b92c6381

gcc7 + Release + -DTBB_STRICT=OFF:

The following tests FAILED:
	 43 - test_limiter_node (Child terminated)
	 91 - conformance_global_control (Failed)

gcc8 + Release + -DTBB_STRICT=OFF:

The following tests FAILED:
	 43 - test_limiter_node (Child terminated)

gcc9 + Release + -DTBB_STRICT=OFF:

The following tests FAILED:
	 43 - test_limiter_node (Child terminated)

@phprus
Copy link
Contributor Author

phprus commented Apr 4, 2021

Current master 9e15720 and release https://github.com/oneapi-src/oneTBB/releases/tag/v2021.2.0
without changes

RelWithDebInfo + gcc-7 (7.5.0) + openSUSE 15.2 + glibc-2.26-lp152.26.6.1.x86_64:

The following tests FAILED:
	 90 - conformance_global_control (Failed)
90: ===============================================================================
90: /.../oneTBB-2021.2.0/test/conformance/conformance_global_control.cpp:241:
90: TEST CASE:  terminate_on_exception: enabled
90:   user exception
90:
90: /.../oneTBB-2021.2.0/test/conformance/conformance_global_control.cpp:241: ERROR: test case THREW exception: exception thrown in subcase - will translate later when the whole test case has been exited (cannot translate while there is an active exception)
90:
90: ===============================================================================
90: /.../oneTBB-2021.2.0/test/conformance/conformance_global_control.cpp:241:
90: TEST CASE:  terminate_on_exception: enabled
90:
90: /.../oneTBB-2021.2.0/test/conformance/conformance_global_control.cpp:285: FATAL ERROR: The exception is not expected
90:
90: ===============================================================================
90: [doctest] test cases:      6 |      5 passed |      1 failed |      0 skipped
90: [doctest] assertions:   4333 |   4332 passed |      1 failed |
90: [doctest] Status: FAILURE!

@phprus
Copy link
Contributor Author

phprus commented Apr 4, 2021

Release 2021.2.0

Debian 9 + gcc-6.3 + RelWithDebInfo:

The following tests FAILED:
	 90 - conformance_global_control (Failed)

Debian 9 + gcc-6.3 + Release:

The following tests FAILED:
	 26 - test_resumable_tasks (SEGFAULT)
	 42 - test_limiter_node (SEGFAULT)
	 90 - conformance_global_control (Failed)
26: ===============================================================================
26: /root/oneTBB-2021.2.0/test/tbb/test_resumable_tasks.cpp:425:
26: TEST CASE:  Nested arena
26:
26: /root/oneTBB-2021.2.0/test/tbb/test_resumable_tasks.cpp:425: FATAL ERROR: test case CRASHED: SIGSEGV - Segmentation violation signal
26:
26: ===============================================================================
26: [doctest] test cases:      2 |      1 passed |      1 failed |      3 skipped
26: [doctest] assertions:  14013 |  14013 passed |      0 failed |
26: [doctest] Status: FAILURE!


42: ===============================================================================
42: /root/oneTBB-2021.2.0/test/tbb/test_limiter_node.cpp:515:
42: TEST CASE:  Message is released if successor does not accept
42:
42: /root/oneTBB-2021.2.0/test/tbb/test_limiter_node.cpp:515: FATAL ERROR: test case CRASHED: SIGSEGV - Segmentation violation signal
42:
42: ===============================================================================
42: [doctest] test cases:      4 |      3 passed |      1 failed |      2 skipped
42: [doctest] assertions: 217181 | 217181 passed |      0 failed |
42: [doctest] Status: FAILURE!

@phprus
Copy link
Contributor Author

phprus commented Apr 4, 2021

openSUSE 15.2 + gcc-10 + Release:
test_limiter_node - SEGFAULT

@phprus
Copy link
Contributor Author

phprus commented Apr 10, 2021

gcc-7 + Release (-O3) or RelWithDebInfo (-O2) + -DTBB_USE_ASSERT=1:

The following tests FAILED:
42 - test_limiter_node (SEGFAULT) - only Release
90 - conformance_global_control (Child aborted)

Release and RelWithDebInfo:
test 90 conformance_global_control failed with assertion:
Assertion !continue_execution() failed on line 138 of file /.../oneTBB-2021.2.0/src/tbb/../../include/oneapi/tbb/detail/_task.h

test 42
        Start  42: test_limiter_node

42: Test command: /.../oneTBB-2021.2.0/build/target_opensuse151-gcc7-assert/gnu_7.5_cxx11_64_release/test_limiter_node "--force-colors=1"
42: Test timeout computed to be: 10000000
42: [doctest] doctest version is "2.3.5"
42: [doctest] run with "--help" for options
42: ===============================================================================
42: /.../oneTBB-2021.2.0/test/tbb/test_limiter_node.cpp:515:
42: TEST CASE:  Message is released if successor does not accept
42:
42: /.../oneTBB-2021.2.0/test/tbb/test_limiter_node.cpp:515: FATAL ERROR: test case CRASHED: SIGSEGV - Segmentation violation signal
42:
42: ===============================================================================
42: [doctest] test cases:      4 |      3 passed |      1 failed |      2 skipped
42: [doctest] assertions: 217181 | 217181 passed |      0 failed |
42: [doctest] Status: FAILURE!
 42/129 Test  #42: test_limiter_node ........................***Exception: SegFault  0.38 sec
test 90
        Start  90: conformance_global_control

90: Test command: /.../oneTBB-2021.2.0/build/target_opensuse151-gcc7-assert/gnu_7.5_cxx11_64_release/conformance_global_control "--force-colors=1"
90: Test timeout computed to be: 10000000
90: Assertion !continue_execution() failed on line 138 of file /.../oneTBB-2021.2.0/src/tbb/../../include/oneapi/tbb/detail/_task.h
90: [doctest] doctest version is "2.3.5"
90: [doctest] run with "--help" for options
90: ===============================================================================
90: /.../oneTBB-2021.2.0/test/conformance/conformance_global_control.cpp:241:
90: TEST CASE:  terminate_on_exception: enabled
90:   user exception
90:
90: /.../oneTBB-2021.2.0/test/conformance/conformance_global_control.cpp:241: FATAL ERROR: test case CRASHED: SIGABRT - Abort (abnormal termination) signal
90:
90: ===============================================================================
90: /.../oneTBB-2021.2.0/test/conformance/conformance_global_control.cpp:241:
90: TEST CASE:  terminate_on_exception: enabled
90:
90: ===============================================================================
90: [doctest] test cases:      5 |      4 passed |      1 failed |      1 skipped
90: [doctest] assertions:   4330 |   4330 passed |      0 failed |
90: [doctest] Status: FAILURE!
 90/129 Test  #90: conformance_global_control ...............Child aborted***Exception:   0.08 sec

@phprus
Copy link
Contributor Author

phprus commented Apr 10, 2021

if tbb libraries build with -O2 and tests build with -O3 (gcc-7):

The following tests FAILED:
	 42 - test_limiter_node (SEGFAULT)
	 90 - conformance_global_control (Child aborted)

@alexey-katranov RelWithDebInfo - not a solution, because an error in header files. Code using oneTBB cannot use -O3.

@alexey-katranov
Copy link
Contributor

We could reproduce test_limiter_node failure with gcc 9.3. However, it cannot be reproduced with gcc 10.2. Currently, it is not clear what is going wrong, e.g. specifying make_edge as noinline fixes the segfault. We will continue the investigation.

@phprus
Copy link
Contributor Author

phprus commented May 7, 2021

Commit https://github.com/oneapi-src/oneTBB/tree/6caecf9630a66fa08512eda86086aa25a4764504

CC=gcc-7 CXX=g++-7 cmake -DCMAKE_VERBOSE_MAKEFILE=1 -DCMAKE_BUILD_TYPE=Release ../..:

100% tests passed, 0 tests failed out of 131

CC=gcc-8 CXX=g++-8 cmake -DCMAKE_VERBOSE_MAKEFILE=1 -DCMAKE_BUILD_TYPE=Release ../..:

test 92
        Start  92: conformance_global_control

92: Test command: /.../oneTBB-6caecf9630a66fa08512eda86086aa25a4764504/build/target_opensuse151-gcc8/gnu_8.2_cxx11_64_release/conformance_global_control "--force-colors=1"
92: Test timeout computed to be: 10000000
92: [doctest] doctest version is "2.3.5"
92: [doctest] run with "--help" for options
92: ===============================================================================
92: /.../oneTBB-6caecf9630a66fa08512eda86086aa25a4764504/test/conformance/conformance_global_control.cpp:242:
92: TEST CASE:  terminate_on_exception: enabled
92:   user exception
92:
92: /.../oneTBB-6caecf9630a66fa08512eda86086aa25a4764504/test/conformance/conformance_global_control.cpp:242: ERROR: test case THREW exception: exception thrown in subcase - will translate later when the whole test case has been exited (cannot translate while there is an active exception)
92:
92: ===============================================================================
92: /.../oneTBB-6caecf9630a66fa08512eda86086aa25a4764504/test/conformance/conformance_global_control.cpp:242:
92: TEST CASE:  terminate_on_exception: enabled
92:
92: /.../oneTBB-6caecf9630a66fa08512eda86086aa25a4764504/test/conformance/conformance_global_control.cpp:286: FATAL ERROR: The exception is not expected
92:
92: ===============================================================================
92: [doctest] test cases:      6 |      5 passed |      1 failed |      0 skipped
92: [doctest] assertions:   4333 |   4332 passed |      1 failed |
92: [doctest] Status: FAILURE!
 92/131 Test  #92: conformance_global_control ...............***Failed    0.06 sec



99% tests passed, 1 tests failed out of 131

Total Test time (real) =  94.49 sec

The following tests FAILED:
     92 - conformance_global_control (Failed)
Errors while running CTest

CC=gcc-9 CXX=g++-9 cmake -DCMAKE_VERBOSE_MAKEFILE=1 -DCMAKE_BUILD_TYPE=Release ../..:

100% tests passed, 0 tests failed out of 131

Unexpected failure: conformance_global_control + gcc-8. In previous versions, this test ran successfully in gcc-8 (failure in gcc-7).

@alexey-katranov
Copy link
Contributor

It seems similar to the issue with gcc-7. Can you try to extend the workaround in exception.cpp:51 for gcc-8?

@phprus
Copy link
Contributor Author

phprus commented May 7, 2021

After

-#if __GNUC__ == 7
+#if defined(__GNUC__) && __GNUC__ <= 8

100% tests passed, 0 tests failed out of 131

@alexey-katranov
Copy link
Contributor

@phprus , thank you. We will apply the fix.

@diablodale
Copy link

@aleksei-fedotov I did not get a reply from you. I can open a separate issue at your request for the issue above I reported. Is that what you want?

@alexey-katranov
Copy link
Contributor

@diablodale , thank you for the reminder, it seems your issue was lost because of the first issue. I think it makes sense to open a new issue to avoid confusion with the current issue.
Aleksei is on vacation right now, I’ll contact him when he is back (in a week or two)

@aleksei-fedotov
Copy link
Contributor

@diablodale, for some reason issue had been missing from my radars, sorry about that.

Regarding the comment, you said you used default limiter_node<T>. This node does not have default value for threshold parameter. What value did you use as an argument?

As far as I understand from the description you provided, limiter_node should behave correctly, since it has corresponding tests in place.

Please let me know how these tests differ from your use case or simply provide concrete reproducer of the issue.

@phprus
Copy link
Contributor Author

phprus commented Jun 30, 2021

Commit 4a23d00

#if __GNUC__ && __GNUC__ < 10 && !TBB_USE_DEBUG

@alexey-katranov, GCC-10 from openSUSE Leap 15.2 is also affected :(

g++-10 -v:

Using built-in specs.
COLLECT_GCC=g++-10
COLLECT_LTO_WRAPPER=/usr/lib64/gcc/x86_64-suse-linux/10/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-suse-linux
Configured with: ../configure --prefix=/usr --infodir=/usr/share/info --mandir=/usr/share/man --libdir=/usr/lib64 --libexecdir=/usr/lib64 --enable-languages=c,c++,objc,fortran,obj-c++,ada,go,d --enable-offload-targets=nvptx-none=/usr/nvptx-none,amdgcn-amdhsa=/usr/amdgcn-amdhsa, --without-cuda-driver --enable-checking=release --disable-werror --with-gxx-include-dir=/usr/include/c++/10 --enable-ssp --disable-libssp --disable-libvtv --enable-cet=auto --disable-libcc1 --disable-plugin --with-bugurl=https://bugs.opensuse.org/ --with-pkgversion='SUSE Linux' --with-slibdir=/lib64 --with-system-zlib --enable-libstdcxx-allocator=new --disable-libstdcxx-pch --enable-libphobos --enable-version-specific-runtime-libs --with-gcc-major-version-only --enable-linker-build-id --enable-linux-futex --enable-gnu-indirect-function --program-suffix=-10 --without-system-libunwind --enable-multilib --with-arch-32=x86-64 --with-tune=generic --build=x86_64-suse-linux --host=x86_64-suse-linux
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 10.2.1 20200825 [revision c0746a1beb1ba073c7981eb09f55b3d993b32e5c] (SUSE Linux)

@phprus
Copy link
Contributor Author

phprus commented Sep 4, 2021

@alexey-katranov ping

It seems similar to the issue with gcc-7. Can you try to extend the workaround in exception.cpp:51 for gcc-8?

If this error is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82081, then it is fixed in GCC versions 8.4/9.2/10.

@phprus
Copy link
Contributor Author

phprus commented Sep 5, 2021

The test_limiter_node.cpp test requires a similar replacement (tbb::flow::make_edge -> auto make_edge_ptr = tbb::flow::make_edge<int>; make_edge_ptr(...)) in the lines:

tbb::flow::make_edge(bn, ln);

tbb::flow::make_edge(ln, fn);

Without this changes test test_limiter_node hangs in GCC 7.4 Release build.

@phprus phprus mentioned this issue Nov 13, 2021
14 tasks
@anton-potapov
Copy link
Contributor

@phprus as #647 is landed - can we close this ?

@phprus
Copy link
Contributor Author

phprus commented Dec 14, 2021

@anton-potapov, Yes, we can close this issue. 12 hours of tests run passed without errors.

@phprus phprus closed this as completed Dec 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants