Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Add Cygwin support #156

Open
zmajeed opened this issue Jun 22, 2019 · 40 comments
Open

Feature Request: Add Cygwin support #156

zmajeed opened this issue Jun 22, 2019 · 40 comments

Comments

@zmajeed
Copy link

zmajeed commented Jun 22, 2019

This issue is to track support for Cygwin - the Linux environment on Windows

I've added makefiles support to for a cygwin build that I'd like to send as a PR - should it be for the tbb_2019 branch or would you like to create a new tbb_2019_cygwin branch?

My changes are at tbb_2019...zmajeed:cygwin_build

Many tests still fail - I hope others will be able to contribute fixes to get the cygwin build fully working

@zmajeed zmajeed changed the title Add Cygwin support Feature Request: Add Cygwin support Jun 24, 2019
@anton-potapov
Copy link
Contributor

@zmajeed oneTBB now uses CMake. If this still applicable, please update your patch and open a pull request.

@zmajeed
Copy link
Author

zmajeed commented Oct 7, 2021

My fork https://github.com/zmajeedforks/oneTBB/tree/cygwin_build is based on latest master - it has changes for building on cygwin - master...zmajeedforks:cygwin_build - I'll give you a PR when tests pass - and update my progress here

@zmajeed
Copy link
Author

zmajeed commented Oct 7, 2021

First I got these cmake errors

-- HWLOC target HWLOC::hwloc_1_11 doesn't exist. The tbbbind target cannot be created
-- HWLOC target HWLOC::hwloc_2 doesn't exist. The tbbbind_2_0 target cannot be created
-- HWLOC target HWLOC::hwloc_2_5 doesn't exist. The tbbbind_2_5 target cannot be created

I didn't know what tbbbind or hwloc is - found it's related to MPI which I'm not interested in though seems supported on cygwin - could look into later - it seems optional and unnecessary to see something that feels like nothing might build

I decided to ignore it and proceeded with building

@zmajeed
Copy link
Author

zmajeed commented Oct 7, 2021

Next

src/tbb/dynamic_link.cpp: In function ‘void tbb::detail::r1::init_ap_data()’:
src/tbb/dynamic_link.cpp:250:9: error: ‘Dl_info’ was not declared in this scope
  250 |         Dl_info dlinfo;

Turns out Dl_info and dladdr have custom implementations on cygwin, need #define __GNU_VISIBLE 1 to get them

@zmajeed
Copy link
Author

zmajeed commented Oct 7, 2021

Very happy to see -Wall -Wextra -Werror - I remember opening an issue for some warnings a while back!
Not a fan of -Wfatal-errors though

@zmajeed
Copy link
Author

zmajeed commented Oct 7, 2021

btw I'm building with gcc 11.2 and no problems so far

@zmajeed
Copy link
Author

zmajeed commented Oct 7, 2021

then - need -D_POSIX_C_SOURCE=200809L for strdup

@zmajeed
Copy link
Author

zmajeed commented Oct 7, 2021

then - rml_tbb.cpp has

#pragma weak __TBB_make_rml_server

but weak symbols have some subtleties on cygwin - I don't yet know why weak symbols are needed at all - so disabled them in _config.h

#define __TBB_WEAK_SYMBOLS_PRESENT ( !_WIN32 && !__APPLE__ && !__sun && !__CYGWIN__ && (__TBB_GCC_VERSION >= 40000 || __INTEL_COMPILER ) )

@zmajeed
Copy link
Author

zmajeed commented Oct 7, 2021

next - reusing the linux proxy def file for cygwin - in GNU.cmake

set(TBB_DEF_FILE_PREFIX lin${TBB_ARCH})

@zmajeed
Copy link
Author

zmajeed commented Oct 7, 2021

cygwin has no RLIMIT_NPROC for getrlimit, setrlimit

exclude cygwin for the same reason as windows

// On Windows there is no real thread number limit beside available memory.
// Therefore, the test for thread limit is unreasonable.
#if TBB_USE_EXCEPTIONS && !_WIN32 && !__ANDROID__ && !__CYGWIN__

@zmajeed
Copy link
Author

zmajeed commented Oct 7, 2021

finally - a number of test files have assembler errors - either too many sections or string table overflow resulting in file too big

this is due to heavy use of templates - arguably this is a gcc issue but simplifying templates could be worthwhile

only -O3 optimization seems to prevent these errors in all cases, -Wa,-mbig-obj and -Os don't always work

the problem with these options is they increase compile times for files that don't need them - so I did not make -O3 default for all tests

instead the option is added to only those files that broke in test/CmakeLists.txt

# these files need -O3 to fix assembler errors for too many sections, file too big
if (CYGWIN)
  set_source_files_properties(
    tbb/test_concurrent_unordered_map.cpp
    conformance/conformance_concurrent_unordered_map.cpp
...
    PROPERTIES COMPILE_OPTIONS -O3
  )
endif()

@zmajeed
Copy link
Author

zmajeed commented Oct 7, 2021

so build works on cygwin - have not run tests yet

@zmajeed
Copy link
Author

zmajeed commented Oct 8, 2021

5 tests failed for first build - all due to dynamic linking - cygwin has dlls not shared objects - I hoped cmake would just do the right thing but looks like it needs the tweaking I did for the older makefiles build

96% tests passed, 5 tests failed out of 133
Total Test time (real) = 614.80 sec
The following tests FAILED:
          4 - test_dynamic_link (Failed)
         51 - test_overwrite_node (BAD_COMMAND)
        125 - test_malloc_compliance (Failed)
        126 - test_malloc_used_by_lib (Subprocess aborted)
        127 - test_malloc_lib_unload (Failed)

Errors

Can't load libtbbmalloc_debug.so.2 or libtbbmalloc.so.2
test/tbbmalloc/test_malloc_used_by_lib.cpp:98, assertion lib: Can't load lib_test_malloc_used_by_lib_debug.so
The executable doesn't export its symbols. Is the -rdynamic switch set during linking?

@alexey-katranov
Copy link
Contributor

5 tests failed for first build - all due to dynamic linking - cygwin has dlls not shared objects

I think you need to update tests (perhaps, the library as well) that they use dll suffix under cygwin.

@alexey-katranov
Copy link
Contributor

finally - a number of test files have assembler errors - either too many sections or string table overflow resulting in file too big

Can you please specify which tests cause the issue?

@zmajeed
Copy link
Author

zmajeed commented Oct 12, 2021

My original changes to the old makefiles build included cygwin dll support - I'll make similar changes for the cmake build

The note above about assembler errors includes a snippet from test/CMakeLists. txt that lists the offending files

@zmajeed
Copy link
Author

zmajeed commented Nov 2, 2021

I've gone through the 5 test failures - I'll note what's happening with each below - couple of these are hairy and need a fair amount of debugging to sort out

@zmajeed
Copy link
Author

zmajeed commented Nov 2, 2021

Here's how to run all tests

$ ctest --test-dir build/cyg_gcc_debug --output-on-failure

and run a single test, test_dynamic_link

$ ctest --test-dir build/cyg_gcc_debug -V -R dynamic_link                 

and build a single test

$ make -C build/cyg_gcc_debug/test test_dynamic_link                

@zmajeed
Copy link
Author

zmajeed commented Nov 2, 2021

The easiest was 51 - test_overwrite_node (BAD_COMMAND) - turned out to be a transient build issue that did not set the execute bits on the test executable - the test passed on a rebuild

$ ctest --test-dir build/cyg_gcc_debug -V -R overwrite_node               
51 - test_overwrite_node (BAD_COMMAND)
Test project build/cyg_gcc_debug
    Start  51: test_overwrite_node
Process not started
build/cyg_gcc_debug/gnu_11.2_cxx11_64_debug/test_overwrite_node.exe
[permission denied]
1/2 Test  #51: test_overwrite_node ..............***Not Run   0.00 sec
    Start 106: conformance_overwrite_node
2/2 Test #106: conformance_overwrite_node .......   Passed    0.33 sec

@zmajeed
Copy link
Author

zmajeed commented Nov 2, 2021

Next is 4 - test_dynamic_link (Failed) - caused by symbols not being exported from the test executable - fixed with dllexport attribute on the symbols in test/tbb/test_dynamic_link.cpp

$ ctest --test-dir build/cyg_gcc_debug -V -R dynamic_link
test 4
    Start 4: test_dynamic_link
4: Test command: build/cyg_gcc_debug/gnu_11.2_cxx11_64_debug/test_dynamic_link.exe "--force-colors=1"
4: test/tbb/test_dynamic_link.cpp:94:
4: TEST CASE:  Test dynamic_link with non-existing library
4:
4: test/common/utils_dynamic_libs.h:116: FATAL ERROR: REQUIRE( func ) is NOT correct!
4:   values: REQUIRE( NULL )
4:   logged: The executable doesn't export its symbols. Is the -rdynamic switch set during linking?
4:           Can't find required symbol in dynamic library
4:
4: test/tbb/test_dynamic_link.cpp:72: FATAL ERROR: REQUIRE( (utils::GetAddress(utils::OpenLibrary(nullptr), "foo1") && utils::GetAddress(utils::OpenLibrary(nullptr), "foo2")) ) THREW exception: "unknown exception"
4:   logged: The executable doesn't export its symbols. Is the -rdynamic switch set during linking?
4: test/tbb/test_dynamic_link.cpp:100:
4: TEST CASE:  Test dynamic_link
4:
4: test/common/utils_dynamic_libs.h:116: FATAL ERROR: REQUIRE( func ) is NOT correct!
4:   values: REQUIRE( NULL )
4:   logged: The executable doesn't export its symbols. Is the -rdynamic switch set during linking?
4:           Can't find required symbol in dynamic library
4:
4: test/tbb/test_dynamic_link.cpp:72: FATAL ERROR: REQUIRE( (utils::GetAddress(utils::OpenLibrary(nullptr), "foo1") && utils::GetAddress(utils::OpenLibrary(nullptr), "foo2")) ) THREW exception: "unknown exception"
4:   logged: The executable doesn't export its symbols. Is the -rdynamic switch set during linking?
4: [doctest] test cases: 2 | 0 passed | 2 failed | 0 skipped
4: [doctest] assertions: 4 | 0 passed | 4 failed |
4: [doctest] Status: FAILURE!
1/1 Test #4: test_dynamic_link ................***Failed    0.48 sec
0% tests passed, 1 tests failed out of 1
Total Test time (real) =   0.84 sec
The following tests FAILED:
          4 - test_dynamic_link (Failed)
Errors while running CTest

Make the change

diff --git a/test/tbb/test_dynamic_link.cpp b/test/tbb/test_dynamic_link.cpp
index 2856db37..684bb435 100644
--- a/test/tbb/test_dynamic_link.cpp
+++ b/test/tbb/test_dynamic_link.cpp
@@ -30,6 +30,8 @@ enum FOO_TYPE {

 #if _WIN32 || _WIN64
 #define TEST_EXPORT
+#elif __CYGWIN__
+  #define TEST_EXPORT extern "C"  __attribute__((dllexport))
 #else
 #define TEST_EXPORT extern "C"
 #endif /* _WIN32 || _WIN64 */

Rebuild and rerun test

$ ctest --test-dir build/cyg_gcc_debug -V -R dynamic_link
test 4
    Start 4: test_dynamic_link
4: [doctest] test cases: 2 | 2 passed | 0 failed | 0 skipped
4: [doctest] assertions: 9 | 9 passed | 0 failed |
4: [doctest] Status: SUCCESS!
1/1 Test #4: test_dynamic_link ................   Passed    0.44 sec

The following tests passed:
        test_dynamic_link

100% tests passed, 0 tests failed out of 1

@zmajeed
Copy link
Author

zmajeed commented Nov 4, 2021

The third test to fail is 125 - test_malloc_compliance (Failed) - because cygwin does not support RLIMIT_AS for address space or memory limits and getrlimit returns EINVAL errno 22

Luckily cygwin can use the Windows version of the test - this works!

But it is not straightforward to get cygwin to use Windows API the way TBB tests are currently structured

The first big hurdle is the plethora of platform feature test macros in the common header files for the tests.

For cygwin to use Windows API, it has to #include <windows.h>. This is particularly difficult to manage in a surgical manner in a shared header file because cygwin by default needs conditional compiles for Linux and Unix

A second problem is the common functions used in tests are all implemented with various platform-specific blocks in header files. It would be nice if common functions were in a separate library with implementations separated from the interfaces. Since this is not the case, refactoring is trickier.

The third problem is the shared headers have hard dependencies on the doctest unit test framework used by TBB. doctest is a single header library. This means it cannot be built into a separate library and used by a test executable at the same time because of multiple definition errors for doctest symbols when linking

Again it would be nice if shared functions that are not actually tests themselves did not depend on doctest - and the only dependencies were in the unit test .cpp files

My temporary fix to get the test to pass is to place a copy of the shared function in test_malloc_compliance.cpp just for cygwin. I'll see if a better solution is possible for the PR.

Here's the original failure

$ ctest --test-dir build/cyg_gcc_debug -V -R malloc_compliance
test 125
    Start 125: test_malloc_compliance

125: Test command: build/cyg_gcc_debug/gnu_11.2_cxx11_64_debug/test_malloc_compliance.exe "--force-colors=1"
125: Test timeout computed to be: 10000000
125: Can't set limits: errno 22
1/1 Test #125: test_malloc_compliance ...........***Failed    0.45 sec

0% tests passed, 1 tests failed out of 1
Total Test time (real) =   1.38 sec
The following tests FAILED:
        125 - test_malloc_compliance (Failed)

The test passes after moving some code around in test_malloc_compliance.cpp to prevent #include <windows.h> from bleeding into non-Windows code blocks - and using a copy of the Windows version of the shared GetMemoryUsage function from test/common/memory_usage.h

It's very slow though which could be cygwin overhead of using Win32

$ ctest --test-dir build/cyg_gcc_debug -V -R malloc_compliance
test 125
    Start 125: test_malloc_compliance

125: Test command: cyg_gcc_debug/gnu_11.2_cxx11_64_debug/test_malloc_compliance.exe "--force-colors=1"
125: Test timeout computed to be: 10000000
125: ===============================================================================
125: [doctest] test cases:       1 |       1 passed | 0 failed | 0 skipped
125: [doctest] assertions: 1511136 | 1511136 passed | 0 failed |
125: [doctest] Status: SUCCESS!
1/1 Test #125: test_malloc_compliance ...........   Passed   34.76 sec

The following tests passed:
        test_malloc_compliance
100% tests passed, 0 tests failed out of 1
Total Test time (real) =  35.36 sec

@zmajeed
Copy link
Author

zmajeed commented Nov 4, 2021

The last two tests to fail 126 - test_malloc_used_by_lib (Subprocess aborted) and 127 - test_malloc_lib_unload (Failed) are particularly difficult to debug and fix because they seem to involve race conditions with C++ static data, symbols from dynamically loaded libraries using dlopen and dlsym and multiple threads in pthreads.

Both tests crash with segv faults that trash the stack so pinpointing the culprit is tedious

The segv in both tests is from a dynamically loaded function from a dll in one thread that is accessed after the library has been unloaded with dlclose. Test 126 - test_malloc_used_by_lib (Subprocess aborted) has a custom barrier with 4 threads and 127 - test_malloc_lib_unload (Failed) has just one thread - of course these threads are in addition to the main thread for the test framework - there are always multiple threads involved. On the face of it test source code looks fine - the segv is in internal standard library destructor or cleanup code - it could be an instruction reordering issue that somehow does not appear in Linux

I'm trying to reproduce the failures in small standalone programs to see if the bugs are in TBB tests or in the doctest framework or in cygwin

@zmajeed
Copy link
Author

zmajeed commented Nov 8, 2021

The description above for the fifth test failure 127 - test_malloc_lib_unload (Failed) is not right - the crash may have been from some of my experimental changes

The actual failure is two-fold - fixed by using a right library name and version - and by using the Win32 version of a utility function

Initially the test fails because of a shared library/dll name and version mismatch. The top cmakefile has set(TBBMALLOC_BINARY_VERSION 2) - this builds cygtbbmalloc_debug-2.dll on cygwin. But test_malloc_lib_unload tries to dlopen libtbbmalloc_debug.so.2 or libtbbmalloc.so.2

$ ctest --test-dir build/cyg_gcc_debug -V -R malloc_lib_unload
test 127
    Start 127: test_malloc_lib_unload

127: Test command: cyg_gcc_debug/gnu_11.2_cxx11_64_debug/test_malloc_lib_unload.exe "--force-colors=1"
127: Can't load libtbbmalloc_debug.so.2 or libtbbmalloc.so.2
1/1 Test #127: test_malloc_lib_unload ...........***Failed    0.45 sec

0% tests passed, 1 tests failed out of 1
Total Test time (real) =   1.28 sec
The following tests FAILED:
        127 - test_malloc_lib_unload (Failed)

This is fixed by correctly setting various macros for cygwin in test/common/utils_dynamic_libs.h

diff --git a/test/common/utils_dynamic_libs.h b/test/common/utils_dynamic_libs.h
index 22927fe1..a503e724 100644
--- a/test/common/utils_dynamic_libs.h
+++ b/test/common/utils_dynamic_libs.h
@@ -32,7 +32,13 @@ namespace utils {

 #if TBB_USE_DEBUG
 #define SUFFIX1 "_debug"
-#define SUFFIX2
+
+#  if __CYGWIN__
+#    define SUFFIX2 "_debug-2"
+#  else
+#    define SUFFIX2
+#  endif
+
 #else
 #define SUFFIX1
 #define SUFFIX2 "_debug"
@@ -45,19 +51,24 @@ namespace utils {
     #define PREFIX
 #endif
 #define EXT ".dll"
+#elif __CYGWIN__
+#  define PREFIX "cyg"
 #else
 #define PREFIX "lib"
+#endif
+
 #if __APPLE__
 #define EXT ".dylib"
 // Android SDK build system does not support .so file name versioning
 #elif __FreeBSD__ || __NetBSD__ || __sun || _AIX || __ANDROID__
 #define EXT ".so"
+#elif __CYGWIN__
+#  define EXT ".dll"
 #elif __unix__  // Order of these elif's matters!
 #define EXT __TBB_STRING(.so.2)
 #else
 #error Unknown OS
 #endif
-#endif

 // Form the names of the TBB memory allocator binaries.
 #define MALLOCLIB_NAME1 PREFIX "tbbmalloc" SUFFIX1 EXT
@@ -71,6 +82,8 @@ using LIBRARY_HANDLE = void*;

 #if _WIN32 || _WIN64
 #define TEST_LIBRARY_NAME(base) PREFIX base SUFFIX1 ".dll"
+#elif __CYGWIN__
+#  define TEST_LIBRARY_NAME(base) PREFIX base SUFFIX1 ".dll"
 #elif __APPLE__
 #define TEST_LIBRARY_NAME(base) PREFIX base SUFFIX1 ".dylib"
 #else

With the correct library loading, a test assertion fails

$ ctest --test-dir build/cyg_gcc_debug -V -R malloc_lib_unload
test 127
    Start 127: test_malloc_lib_unload

127: Test command: cyg_gcc_debug/gnu_11.2_cxx11_64_debug/test_malloc_lib_unload.exe "--force-colors=1"
127: ===============================================================================
127: cygwin_build/test/tbbmalloc/test_malloc_lib_unload.cpp:188:
127: TEST CASE:  test unload lib
127:
127: cygwin_build/test/tbbmalloc/test_malloc_lib_unload.cpp:204: FATAL ERROR: REQUIRE( memory_in_use == memory_check ) is NOT correct!
127:   values: REQUIRE( 11730944 == 11796480 )
127:   logged: Memory consumption should not increase after 1st GetMemoryUsage() call
127:
127: ===============================================================================
127: [doctest] test cases: 1 | 0 passed | 1 failed | 0 skipped
127: [doctest] assertions: 7 | 6 passed | 1 failed |
127: [doctest] Status: FAILURE!
1/1 Test #127: test_malloc_lib_unload ...........***Failed    0.57 sec

0% tests passed, 1 tests failed out of 1
Total Test time (real) =   0.70 sec
The following tests FAILED:
        127 - test_malloc_lib_unload (Failed)

memory_in_use and memory_check are determined with GetMemoryUsage - for Linux (and cygwin by default) this returns the VmSize stat from /proc/self/status

As it happens this same function GetMemoryUsage was also why a previous test, test_malloc_compliance, failed - the reason was cygwin has yet to add support for the VmPeak stat in /proc/self/status - I made test_malloc_compliance pass by making cygwin use the Win32 version of GetMemoryUsage

Back to test_malloc_lib_unload - I did not debug why VmSize returns different values on cygwin - instead I employed the same approach and made cygwin use the Windows version of GetMemoryUsage - this returns the PagefileUsage stat using GetProcessMemoryInfo from psapi.h

Lo and behold the test passes

$ ctest --test-dir build/cyg_gcc_debug -V -R malloc_lib_unload
test 127
    Start 127: test_malloc_lib_unload

127: Test command: cyg_gcc_debug/gnu_11.2_cxx11_64_debug/test_malloc_lib_unload.exe "--force-colors=1"
127: ===============================================================================
127: [doctest] test cases:  1 |  1 passed | 0 failed | 0 skipped
127: [doctest] assertions: 57 | 57 passed | 0 failed |
127: [doctest] Status: SUCCESS!
1/1 Test #127: test_malloc_lib_unload ...........   Passed    0.48 sec

The following tests passed:
        test_malloc_lib_unload
100% tests passed, 0 tests failed out of 1
Total Test time (real) =   0.89 sec

@zmajeed
Copy link
Author

zmajeed commented Nov 8, 2021

Only one test failure remains - 126 - test_malloc_used_by_lib (Subprocess aborted) - this indeed is a SIGSEGV - I still don't have a full explanation - but my money is on destructors of static C++ objects that are accessing a dynamically loaded function from a dll after it has been unloaded with dlclose in one thread.

I'm thinking a destructor or some kind of cleanup code runs at program exit and not when the library is unloaded. This would be a bug in the test.

@zmajeed
Copy link
Author

zmajeed commented Nov 8, 2021

Question for TBB maintainers - would TBBBind be required for a Cygwin port PR?

@zmajeed
Copy link
Author

zmajeed commented Nov 9, 2021

Sync to upstream master added interprocedural optimization options by default - introduced in PR #432

These gcc options -flto -fno-fat-lto-objects break almost all tests on cygwin

1: Cygwin runtime failure: cyg_gcc_debug/gnu_11.2_cxx11_64_debug/test_tick_count.exe: Invalid relocation.  Offset 0x41aa20cc8 at address 0x100418c6e doesn't fit into 32 bits

Disabling IPO for cygwin in top cmakefile

diff --git a/CMakeLists.txt b/CMakeLists.txt
index 4dbdadb9..63fa3afe 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -185,7 +185,7 @@ endforeach()
 # - Enabling LTO on Android causes the NDK bug.
 #   NDK throws the warning: "argument unused during compilation: '-Wa,--noexecstack'"
 # - For some reason GCC does not instrument code with Thread Sanitizer when lto is enabled and C linker is used.
-if (TBB_ENABLE_IPO AND BUILD_SHARED_LIBS AND NOT ANDROID_PLATFORM AND NOT TBB_SANITIZE MATCHES "thread")
+if (TBB_ENABLE_IPO AND BUILD_SHARED_LIBS AND NOT ANDROID_PLATFORM AND NOT CYGWIN AND NOT TBB_SANITIZE MATCHES "thread")
     if (NOT CMAKE_VERSION VERSION_LESS 3.9)
         cmake_policy(SET CMP0069 NEW)
         include(CheckIPOSupported)

It could be -fno-fat-lto-objects automatically set by cmake is the real problem.

@alexey-katranov
Copy link
Contributor

Question for TBB maintainers - would TBBBind be required for a Cygwin port PR?

We do not use Cygwin :) so it is up to you. TBBBind is responsible for NUMA support in task_arena that will work without TBBBind considering any multi-socket machine as the machine with one big socket.

@zmajeed
Copy link
Author

zmajeed commented Nov 9, 2021

That's great - now I just need to fix this last test failure and come up with a nicer way for cygwin to use Win32 code in the tests

@alexey-katranov
Copy link
Contributor

Only one test failure remains - 126 - test_malloc_used_by_lib (Subprocess aborted) - this indeed is a SIGSEGV - I still don't have a full explanation - but my money is on destructors of static C++ objects that are accessing a dynamically loaded function from a dll after it has been unloaded with dlclose in one thread.

Can it be related: https://github.com/oneapi-src/oneTBB/blob/master/test/common/doctest.h#L470-L504?

@zmajeed
Copy link
Author

zmajeed commented Nov 12, 2021

Thank you for the tip - enabling the VC++ version of doctest_thread_local_wrapper for cygwin does not fix the problem - but it did lead me to cygwin wrappers for pthread that seem to use thread-local storage - I'll dig deeper when I get a chance

@zmajeed
Copy link
Author

zmajeed commented Nov 13, 2021

Well - pthread_key_create is "key" to TBB scalable_malloc - so no surprise test_malloc_used_by_lib crashes and burns!

scalable_malloc runs from the dynamically loaded dll in this test - when the library is unloaded scalable_malloc and all associated data is invalid

pthread_key_create associates thread-local data with a key - the data is opaque so you can give a destructor callback to pthread_key_create that runs to free any data when a thread exits

scalable_malloc uses pthread_key_create in its MemoryPool with mallocThreadShutdownNotification as the destructor callback

MemoryPool says in src/tbbmalloc/frontend.cpp

// list of all active pools is used to release
// all TLS data on thread termination or library unload

this is not happening in test_malloc_used_by_lib

dlclose of the scalable_malloc library is done by the last thread to reach a barrier in the test - this means the library is unloaded before pthread_key destructors run - this in itself is not bad if there is no actual data associated with a key - but in this case there is non-null data for a key and the invalid function pointer to mallocThreadShutdownNotification destructor is called

Why does this test work on other platforms? The MemoryPool must be getting cleared at or before library unload. I don't yet know how that is supposed to happen in general - is there some library unload callback like DllMain that does this or is the client supposed to explicitly destroy the pool and the test has been passing by luck on other platforms?

Any help from TBB devs would be appreciated

@zmajeed
Copy link
Author

zmajeed commented Nov 14, 2021

Found DllMain in src/tbbmalloc/tbbmalloc.cpp - got test_malloc_used_by_lib to pass but some questions remain

else if (callReason==DLL_PROCESS_DETACH)
{
    __TBB_mallocProcessShutdownNotification(lpvReserved != NULL);
}

This is exactly what's needed for cygwin - __TBB_mallocProcessShutdownNotification frees the memory pool so no pthread key data remains

Enabling it for cygwin opens a small can of worms because this DllMain is exclusive from another function RegisterProcessShutdownNotification used for pthread platforms like cygwin - I enabled both for cygwin

Also memory pool cleanup in __TBB_mallocProcessShutdownNotification is only enabled for __TBB_SOURCE_DIRECTLY_INCLUDED when the core tbbmalloc library is built

#if __TBB_SOURCE_DIRECTLY_INCLUDED
/* Pthread keys must be deleted as soon as possible to not call key dtor
   on thread termination when then the tbbmalloc code can be already unloaded.
*/
    defaultMemPool->destroy();
    destroyBackRefMain(&defaultMemPool->extMemPool.backend);
    ThreadId::destroy();      // Delete key for thread id
    hugePages.reset();
    // new total malloc initialization is possible after this point
    mallocInitialized.store(0, std::memory_order_release);
#endif // __TBB_SOURCE_DIRECTLY_INCLUDED

The purpose of __TBB_SOURCE_DIRECTLY_INCLUDED is explained in include/oneapi/tbb/detail/_config.h

/** Internal TBB features & modes **/

/** __TBB_SOURCE_DIRECTLY_INCLUDED is a mode used in whitebox testing when
    it's necessary to test internal functions not exported from TBB DLLs
**/

Now test_malloc_used_by_lib only tests public functions like scalable_malloc - so I don't get why memory pool cleanup is guarded by __TBB_SOURCE_DIRECTLY_INCLUDED in __TBB_mallocProcessShutdownNotification - the cleanup is always needed on cygwin and the same should be true on Windows

So it's not as simple as adding #define __TBB_SOURCE_DIRECTLY_INCLUDED 1 to test_malloc_used_by_lib.cpp because the macro must also be defined when building the tbbmalloc library

Anyway all tests now pass for cygwin - all that remains for a PR are cleaner ways for cygwin to sometimes use Win32 code

$ ctest --test-dir build/cyg_gcc_debug_clean2 -V -R malloc_used_by_lib
Internal ctest changing into directory: onetbb/build/cyg_gcc_debug
test 126
    Start 126: test_malloc_used_by_lib

126: Test command: build/cyg_gcc_debug/gnu_11.2_cxx11_64_debug/test_malloc_used_by_lib.exe "--force-colors=1"
126: ===============================================================================
126: [doctest] test cases: 1 | 1 passed | 0 failed | 0 skipped
126: [doctest] assertions: 6 | 6 passed | 0 failed |
126: [doctest] Status: SUCCESS!
1/1 Test #126: test_malloc_used_by_lib ..........   Passed    0.56 sec

The following tests passed:
        test_malloc_used_by_lib
100% tests passed, 0 tests failed out of 1
Total Test time (real) =   2.02 sec

@alexey-katranov
Copy link
Contributor

__TBB_SOURCE_DIRECTLY_INCLUDED is workaround for the static build. If tbbmalloc is dynamic there is a special handling of thread destructors: https://github.com/oneapi-src/oneTBB/blob/master/src/tbbmalloc/tbbmalloc.cpp#L75-L105. On Windows, we rely on DLL_THREAD_DETACH; while on POSIX we prevent tbbmalloc to be unloaded. Or do you really build the static version?

@zmajeed
Copy link
Author

zmajeed commented Nov 15, 2021

Thanks for clearing up the intended use of __TBB_SOURCE_DIRECTLY_INCLUDED - I'm not building static libraries

I found the true reason why pthread key destructors were crashing after a closer look at RegisterProcessShutdownNotification for pthread - it was a simple dll name mismatch

The name of the tbbmalloc library built by cmake on cygwin is cygtbbmalloc_debug-2.dll - I've had to account for this versioned name in couple places before - it bit me here again - because RegisterProcessShutdownNotification is

RegisterProcessShutdownNotification() {
  // prevents unloading, POSIX case

  // We need better support for the library pinning
  // when dlopen can't find TBBmalloc library.
  // for example: void *ret = dlopen(MALLOCLIB_NAME, RTLD_NOW);
  // MALLOC_ASSERT(ret, "Allocator can't load itself.");
  dlopen(MALLOCLIB_NAME, RTLD_NOW);
}

My initial changes for cygwin missed adding the version to MALLOCLIB_NAME - so dlopen was silently failing and this library "pinning" was not working

With MALLOCLIB_NAME fixed, test_malloc_used_by_lib passes without having to rely on DllMain or code in __TBB_SOURCE_DIRECTLY_INCLUDED blocks

@zmajeed
Copy link
Author

zmajeed commented Nov 15, 2021

There is a working cygwin port at https://github.com/zmajeedforks/oneTBB/tree/cygwin_build

@alexey-katranov
Copy link
Contributor

There is a working cygwin port at https://github.com/zmajeedforks/oneTBB/tree/cygwin_build

Great! Do you have to contribute the port?

@zmajeed
Copy link
Author

zmajeed commented Nov 17, 2021

I plan to - only one item needs cleaning up for a PR

In a couple tests I had to make a copy of the Windows implementation of GetMemoryUsage private to the test because the Linux version uses an unsupported feature like VmPeak or the test fails with the Linux version - see test_malloc_compliance.cpp and test_malloc_lib_unload.cpp in master...zmajeedforks:cygwin_build

I want to get rid of the copy in the PR - the original GetMemoryUsage is in test/common/memory_usage.h - currently all shared test code is implemented in these headers - that's fine except it includes platform-specific code that is better kept in a .cpp file - this is being discussed in a separate open issue

So I tried placing GetMemoryUsage in a new test/common/get_memory_usage.cpp file and building a new testcommon library for the tests - my first attempt resulted in multiple definition errors from doctest because code in test/common includes doctest.h and symbols conflicted between my testcommon library and the test executable

I'll try again - any suggestions are also welcome

@zmajeed
Copy link
Author

zmajeed commented Nov 20, 2021

Success! I refactored test/common/memory_usage.h for cygwin to use the Windows version of GetMemoryUsage

The refactoring turns memory_usage.h into a new static library memory_usage in a subdirectory of test/common. This way the platform-specific implementation of GetMemoryUsage moves to a new memory_usage.cpp file and removes the dependency of the library header on windows.h which pollutes the global namespace with Windows-specific macros - something particularly detrimental in a platform like cygwin that normally uses Unix and Linux APIs but can use Win32 APIs in an isolated way when necessary

My first attempt failed with link errors for multiple definitions of symbols from the doctest unit testing framework. Couple examples (from dozens) were

ld: libmemory_usage_debug.a(memory_usage.cpp.o): in function `doctest::detail::my_memcpy(void*, void const*, unsigned int)':
test/common/memory_usage/../../common/doctest.h:3041: multiple definition of `doctest::detail::my_memcpy(void*, void const*, unsigned int)'; CMakeFiles/test_malloc_compliance.dir/tbbmalloc/test_malloc_compliance.cpp.o:test/common/doctest.h:3041: first defined here

ld: libmemory_usage_debug.a(memory_usage.cpp.o): in function `utils::internal::Tracer::trace(char const*, ...)':
test/common/memoryusage/../../common/doctest.h:3043: multiple definition of `doctest::detail::rawMemoryToString(void const*, unsigned int)'; CMakeFiles/test_malloc_compliance.dir/tbbmalloc/test_malloc_compliance.cpp.o:test/common/doctest.h:3043: first defined here

The reason is doctest is a single-header library. All functions are defined in doctest.h. This is fine but many of these functions are at namespace level or are methods defined outside a class. This is a well-known problem in C++ that results in multiple definition errors. The obvious fix is to declare such functions inline - so obvious that I was surprised why doctest did not do this in the first place. However I did not want to spend time trying to understand doctest internals and just duplicated GetMemoryUsage for the two tests that failed on cygwin

Once the tests passed, I revisited the linker errors in order to get a refactored memory_usage library for the PR

It turns out doctest deliberately eschews inline functions in the header in favor of configuration macros for applications to use doctest in a library. These macros are documented at https://github.com/onqtam/doctest/blob/master/doc/markdown/configuration.md.

These macros include DOCTEST_CONFIG_IMPLEMENT, DOCTEST_CONFIG_IMPLEMENT_WITH_MAIN and DOCTEST_CONFIG_IMPLEMENTATION_IN_DLL - these control whether function definitions are exposed in the header file

In particular, DOCTEST_CONFIG_IMPLEMENT_WITH_MAIN is defined by default in the TBB common test header test/common/test.h - this caused the multiple definitions errors because my memory_usage library was built with test/common/test.h and the test executable naturally also includes test/common/test.h

#if !defined(DOCTEST_CONFIG_IMPLEMENT)
#define DOCTEST_CONFIG_IMPLEMENT_WITH_MAIN
#endif

This led me to the solution, which is to make the define of DOCTEST_CONFIG_IMPLEMENT_WITH_MAIN in test/common/test.h conditional on a new macro TBB_TEST_NO_DOCTEST_CONFIG being UNDEFINED

#if !defined(DOCTEST_CONFIG_IMPLEMENT) && !TBB_TEST_NO_DOCTEST_CONFIG
#define DOCTEST_CONFIG_IMPLEMENT_WITH_MAIN
#endif

This way existing test code remains unaffected - and the new memory_usage library simply defines TBB_TEST_NO_DOCTEST_CONFIG=1 to exclude duplicate definitions from doctest.h

This works out nicely. There are 9 tests that depend on memory_usage.h. These have been switched to use the memory_usage library in test/CMakeLists.txt

@alexey-katranov
Copy link
Contributor

Can you please summarize the problem scope with cygwin and Win32 API to document design decisions?

As you mentioned,

For cygwin to use Windows API, it has to #include <windows.h>. This is particularly difficult to manage in a surgical manner in a shared header file because cygwin by default needs conditional compiles for Linux and Unix

It does not seem that we need to solve the problem in a general way (at first glance a separate module and related issues seem overcomplicated). Is it possible to introduce the cygwin-related macro, e.g. __TBB_TEST_CYGWIN to rely on it in GetMemoryUsage?

@arunparkugan
Copy link

@alexey-katranov is this still relevant?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants