Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests don't pass with ThreadSanitizer #2123

Closed
syyyr opened this issue Sep 2, 2020 · 14 comments
Closed

Tests don't pass with ThreadSanitizer #2123

syyyr opened this issue Sep 2, 2020 · 14 comments

Comments

@syyyr
Copy link
Contributor

syyyr commented Sep 2, 2020

Hi,
I tried running the tests with ThreadSanitizer and some of them get some warnings. It seems that most (if not all) of them fail on this kind of error (there is maybe some variation, but all of them are "unlock of an unlocked mutex"):

WARNING: ThreadSanitizer: unlock of an unlocked mutex (or by a wrong thread) (pid=1196429)
    #0 pthread_mutex_unlock <null> (test_process+0x63228)
    #1 sr_rwunlock /home/vk/git/netconf-cli/submodules/dependencies/sysrepo/src/common.c:2456:5 (libsysrepo.so.5+0x1d4d7)

  Location is global '??' at 0x7f2729f53000 (srsub_ops.rpc.2fa8c985+0x000000000000)

  Mutex M268 (0x7f2729f53000) created at:
    #0 pthread_mutex_unlock <null> (test_process+0x63228)
    #1 sr_rwunlock /home/vk/git/netconf-cli/submodules/dependencies/sysrepo/src/common.c:2456:5 (libsysrepo.so.5+0x1d4d7)

The entire log is very long so I uploaded it here https://gist.github.com/syyyr/3f3640cbfbb4daa4617d18f5d5b2dfcd. Would it be possible to fix this? We'd like to run the tests as part of our CI.

Thanks

@michalvasko
Copy link
Collaborator

Hi,
sadly, this cannot be fixed. The sanitizers simply remember the address of a mutex that was locked and expect an unlock on the same address. However, sysrepo puts the locks in shared memory and in case it is remapped to another address, the checkers lose this information and see unlock on a wrong address. That is why you see so many errors.

@jktjkt
Copy link
Contributor

jktjkt commented Sep 7, 2020

I wonder if moving mutexes is portable, and I wasn't to find a definitive answer. Jonathan Wakely of the GCC/glibc fame said that pthread_mutex_t is not guaranteed to be movable at all, but it is not in the docs. Which doesn't mean that it's allowed :), either.

Moving locked mutexes is documented to be forbidden on IBM i, which claims to be POSIX-compliant. On Linux, I think that it is not possible to move robust mutexes (pthread_mutexattr_setrobust) because of the implementation of futexes between the kernel and glibc. I know that sysrepo currently does not use robust mutexes, but I think that they will be needed to support recovery from crashes within libsysrepo, you'll need these robust mutexes.

How high is that on the TODO list, and is it "doable" or would it require reworking half of the SHM code? TSAN has helped us in cla-sysrepo in the past many, many times, and disabling its features is a rather high cost for our product.

@michalvasko
Copy link
Collaborator

Firstly, the theoretical answer why pthread_mutex_t may not be movable in the stackoverflow answer is probably the exact reason why they are not on IBM. They implemented some smart (lazy) mutex initialization and also some global table of initialized mutexes (precisely that was mentioned) to enable additional checks. The cost of this feature is unmovable mutexes. Whether that is POSIX-compliant or not I cannot say.

Regarding robust mutexes, the plan was to make robust only the "recovery" locks, which could be put into a static SHM segment. However, that was just the initial idea that may turn out to be wrong for whatever reason.

TLDR; I would have to delve into the overall SHM usage in sysrepo once again to be able to come up with a plausible design that would 1) solve all the recovery problems and 2) optionally put all the mutexes into separate static SHM segments (that do not change their size/are not remapped for the lifetime of a process). I do not know when I will be able to do that.

@jktjkt
Copy link
Contributor

jktjkt commented Sep 8, 2020

Thanks, we'll be happy to test this once you have patches. In the meanwhile, @syyyr put together these sanitizer suppressions to keep TSAN working for our code.

@syyyr
Copy link
Contributor Author

syyyr commented Sep 9, 2020

Some more info:
TSan also gets confused about the number of locked mutexes. When I run a netconf-cli datastore test (which connects and disconnects from sysrepo multiple times in the same program), the number of locks (according to TSan) exceeds TSan's limit, which is 64. If I comment out some of the tests (starting from the first one), the test eventually passes. This error probably comes from TSan not being able to deal with the mutex remapping (or whatever the problem is). This can be solved by completely disabling deadlock detection which is unfortunate, because this disables it for everything, not just sysrepo.

TSAN_OPTIONS="suppressions=detect_deadlocks=0"

Log:

FATAL: ThreadSanitizer CHECK failed: ../lib/sanitizer_common/sanitizer_deadlock_detector.h:67 "((n_all_locks_)) < (((sizeof(all_locks_with_contexts_)/sizeof((all_locks_with_contexts_)[0]))))" (0x40, 0x40)
    #0 __tsan::TsanCheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) <null> (test_datastore_access_sysrepo+0x2f911a)
    #1 __sanitizer::CheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) <null> (test_datastore_access_sysrepo+0x26614a)
    #2 __sanitizer::DD::MutexAfterLock(__sanitizer::DDCallback*, __sanitizer::DDMutex*, bool, bool) <null> (test_datastore_access_sysrepo+0x25836d)
    #3 __tsan::MutexPostLock(__tsan::ThreadState*, unsigned long, unsigned long, unsigned int, int) <null> (test_datastore_access_sysrepo+0x2f7269)
    #4 pthread_mutex_timedlock <null> (test_datastore_access_sysrepo+0x28fd93)
    #5 sr_shmmod_lock /home/vk/git/netconf-cli/submodules/dependencies/sysrepo/src/shm_mod.c:59:11 (libsysrepo.so.5+0x8afc8)
    #6 sr_shmmod_modinfo_wrlock /home/vk/git/netconf-cli/submodules/dependencies/sysrepo/src/shm_mod.c:466:25 (libsysrepo.so.5+0x8b7ed)
    #7 sr_shmmod_oper_stored_del_conn /home/vk/git/netconf-cli/submodules/dependencies/sysrepo/src/shm_mod.c:1029:21 (libsysrepo.so.5+0x8eeda)
    #8 sr_disconnect /home/vk/git/netconf-cli/submodules/dependencies/sysrepo/src/sysrepo.c:419:15 (libsysrepo.so.5+0x899a)
    #9 sysrepo::Connection::~Connection() /home/vk/git/netconf-cli/submodules/dependencies/sysrepo/bindings/cpp/src/Connection.cpp:53:9 (libsysrepo-cpp.so.5+0x29088)
    #10 void __gnu_cxx::new_allocator<sysrepo::Connection>::destroy<sysrepo::Connection>(sysrepo::Connection*) /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../include/c++/10.2.0/ext/new_allocator.h:156:10 (test_datastore_access_sysrepo+0x3aa584)
    #11 void std::allocator_traits<std::allocator<sysrepo::Connection> >::destroy<sysrepo::Connection>(std::allocator<sysrepo::Connection>&, sysrepo::Connection*) /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../include/c++/10.2.0/bits/alloc_traits.h:531:8 (test_datastore_access_sysrepo+0x3aa4eb)
    #12 std::_Sp_counted_ptr_inplace<sysrepo::Connection, std::allocator<sysrepo::Connection>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../include/c++/10.2.0/bits/shared_ptr_base.h:560:2 (test_datastore_access_sysrepo+0x3aa232)
    #13 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../include/c++/10.2.0/bits/shared_ptr_base.h:158:6 (test_datastore_access_sysrepo+0x3b0220)
    #14 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count() /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../include/c++/10.2.0/bits/shared_ptr_base.h:733:11 (test_datastore_access_sysrepo+0x3b00e8)
    #15 std::__shared_ptr<sysrepo::Connection, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr() /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../include/c++/10.2.0/bits/shared_ptr_base.h:1183:31 (test_datastore_access_sysrepo+0x3b6162)
    #16 std::shared_ptr<sysrepo::Connection>::~shared_ptr() /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../include/c++/10.2.0/bits/shared_ptr.h:121:11 (test_datastore_access_sysrepo+0x3a7e0b)
    #17 SysrepoAccess::~SysrepoAccess() /home/vk/git/netconf-cli/src/sysrepo_access.cpp:159:31 (test_datastore_access_sysrepo+0x42c369)
    #18 _DOCTEST_ANON_FUNC_4() /home/vk/git/netconf-cli/tests/datastore_access.cpp:809:1 (test_datastore_access_sysrepo+0x324e66)
    #19 doctest::Context::run() /opt/cesnet-t/doctest/include/doctest/doctest.h:5907:21 (test_datastore_access_sysrepo+0x3f3397)
    #20 main /opt/cesnet-t/doctest/include/doctest/doctest.h:5991:71 (test_datastore_access_sysrepo+0x3f4f6a)
    #21 __libc_start_main <null> (libc.so.6+0x28151)
    #22 _start <null> (test_datastore_access_sysrepo+0x253bcd)

@michalvasko
Copy link
Collaborator

Hi,
this is most likely caused by TSan storing a new mutex on every lock (because it has unique address) and keeping it even after it was unlocked (reporting error instead because it was unlocked on a different address). You will reach 64 locks quite fast.

@syyyr
Copy link
Contributor Author

syyyr commented Oct 8, 2020

Hi,
running tests with TSAN has gotten a little bit worse, particularly test_process. The test started timeoutting again. I bisected and found out that it started happening in 401bcea (which makes sense, because it touches that test). The previous commit works fine. I tried increasing the timeout on sr_rpc_send_tree further as I did in one of my PRs, but it didn't really have any effect. I'm able to reproduce this on a VM in our CI (it happens in about 1 of 6 runs). I'm not sure exactly what seems to be the problem...

log from one of the failed runs:

[56521][INF]: Libyang internal module "yang" was installed.
[56521][INF]: File "ietf-datastores@2018-02-14.yang" was installed.
[56521][INF]: Sysrepo internal dependency module "ietf-datastores" was installed.
libyang[1]: Missing status in deprecated subtree (/ietf-yang-library:{grouping}[module-list]/{grouping}[schema-leaf]/schema), inheriting.
[56521][INF]: File "ietf-yang-library@2019-01-04.yang" was installed.
[56521][INF]: Sysrepo internal module "ietf-yang-library" was installed.
[56521][INF]: File "sysrepo-monitoring@2020-04-17.yang" was installed.
[56521][INF]: Sysrepo internal module "sysrepo-monitoring" was installed.
[56521][INF]: File "ietf-netconf@2011-06-01.yang" was installed.
[56521][INF]: Sysrepo internal dependency module "ietf-netconf" was installed.
[56521][INF]: File "ietf-netconf-with-defaults@2011-06-01.yang" was installed.
[56521][INF]: Sysrepo internal module "ietf-netconf-with-defaults" was installed.
[56521][INF]: File "ietf-netconf-notifications@2012-02-06.yang" was installed.
[56521][INF]: Sysrepo internal module "ietf-netconf-notifications" was installed.
[56521][INF]: File "ietf-origin@2018-02-14.yang" was installed.
[56521][INF]: Sysrepo internal module "ietf-origin" was installed.
[56521][INF]: Datastore copied from <startup> to <running>.
[56521][INF]: Module "ops-ref" scheduled for installation.
[56521][INF]: Module "ops" scheduled for installation.
[56521][INF]: File "ops-ref.yang" was installed.
[56521][INF]: Module "ietf-interfaces" scheduled for installation.
[56521][INF]: Module "iana-if-type" scheduled for installation.
[56521][INF]: File "ietf-interfaces@2014-05-08.yang" was installed.
[56532][INF]: Applying scheduled changes.
[56532][INF]: Module "ops-ref" will be installed as "ops" module dependency.
[56532][INF]: File "ops.yang" was installed.
[56532][INF]: Module "ops" was installed.
[56532][INF]: Dependency module "ops-ref" was installed.
[56532][INF]: Module "ietf-interfaces" will be installed as "iana-if-type" module dependency.
[56532][INF]: File "iana-if-type@2014-05-08.yang" was installed.
[56532][INF]: Module "iana-if-type" was installed.
[56532][INF]: Dependency module "ietf-interfaces" was installed.
[56532][INF]: Scheduled changes applied.
[56532][INF]: Session 1 (user "ci") created.
[56521][INF]: Scheduled changes not applied because of other existing connections.
[56521][INF]: Session 2 (user "ci") created.
[56521][INF]: Ext SHM was defragmented and 4424 B were saved.
[56521][INF]: Ext SHM was defragmented and 4120 B were saved.
[56521][INF]: Ext SHM was defragmented and 4136 B were saved.
[56521][INF]: Ext SHM was defragmented and 4296 B were saved.
[56521][INF]: Ext SHM was defragmented and 4192 B were saved.
[56532][INF]: Ext SHM was defragmented and 4664 B were saved.
[===========] Running 3 test(s).
[ RUN       ] test rpc sub
[56521][INF]: Applying scheduled changes.
[56521][INF]: No scheduled changes.
[56521][INF]: Module "iana-if-type" scheduled for deletion.
[56521][INF]: Module "ietf-interfaces" scheduled for deletion.
[56521][INF]: Module "ops" scheduled for deletion.
[56521][INF]: Module "ops-ref" scheduled for deletion.
[56521][INF]: Applying scheduled changes.
[56521][INF]: File "ops.yang" was removed.
[56521][INF]: Module "ops" was removed.
[56521][INF]: File "ops-ref.yang" was removed.
[56521][INF]: Module "ops-ref" was removed.
[56521][INF]: File "iana-if-type@2014-05-08.yang" was removed.
[56521][INF]: Module "iana-if-type" was removed.
[56521][INF]: File "ietf-interfaces@2014-05-08.yang" was removed.
[56521][INF]: Module "ietf-interfaces" was removed.
[56521][INF]: Scheduled changes applied.
[56521][INF]: Module "ops-ref" scheduled for installation.
[56521][INF]: Module "ops" scheduled for installation.
[56521][INF]: File "ops-ref.yang" was installed.
[56521][INF]: Module "ietf-interfaces" scheduled for installation.
[56521][INF]: Module "iana-if-type" scheduled for installation.
[56521][INF]: File "ietf-interfaces@2014-05-08.yang" was installed.
[56521][INF]: Applying scheduled changes.
[56521][INF]: Module "ops-ref" will be installed as "ops" module dependency.
[56521][INF]: File "ops.yang" was installed.
[56521][INF]: Module "ops" was installed.
[56521][INF]: Dependency module "ops-ref" was installed.
[56521][INF]: Module "ietf-interfaces" will be installed as "iana-if-type" module dependency.
[56521][INF]: File "iana-if-type@2014-05-08.yang" was installed.
[56521][INF]: Module "iana-if-type" was installed.
[56521][INF]: Dependency module "ietf-interfaces" was installed.
[56521][INF]: Scheduled changes applied.
[56521][INF]: Session 3 (user "ci") created.
[56609][INF]: Scheduled changes not applied because of other existing connections.
[56609][INF]: Session 4 (user "ci") created.
[56521][INF]: Published event "rpc" "/ops:rpc3" with ID 1 priority 0 for 1 subscribers.
LLVMSymbolizer: error reading file: The file was not recognized as a valid object file
[56609][INF]: Processing "/ops:rpc3" "rpc" event with ID 1 priority 0 (remaining 1 subscribers).
[===========] Running 3 test(s).
[ RUN       ] test rpc sub
[     OK OK ] test rpc sub
[ RUN       ] test rpc crash
[56521][ERR]: Callback event "rpc" with ID 1 processing timed out.
[56521][WRN]: Event "rpc" with ID 1 priority 0 failed (Timeout expired).
[56521][INF]: Scheduled changes not applied because of other existing connections.
[56521][WRN]: Cleaning up after a non-existent sysrepo client with PID 56609.
[56521][INF]: Applying scheduled changes.
[56521][INF]: No scheduled changes.
[56521][INF]: Module "iana-if-type" scheduled for deletion.
[56521][INF]: Module "ietf-interfaces" scheduled for deletion.
[56521][INF]: Module "ops" scheduled for deletion.
[56521][INF]: Module "ops-ref" scheduled for deletion.
[56521][INF]: Applying scheduled changes.
[56521][INF]: File "ops.yang" was removed.
[56521][INF]: Module "ops" was removed.
[56521][INF]: File "ops-ref.yang" was removed.
[56521][INF]: Module "ops-ref" was removed.
[56521][INF]: File "iana-if-type@2014-05-08.yang" was removed.
[56521][INF]: Module "iana-if-type" was removed.
[56521][INF]: File "ietf-interfaces@2014-05-08.yang" was removed.
[56521][INF]: Module "ietf-interfaces" was removed.
[56521][INF]: Scheduled changes applied.
[56521][INF]: Module "ops-ref" scheduled for installation.
[56521][INF]: Module "ops" scheduled for installation.
[56521][INF]: File "ops-ref.yang" was installed.
[56521][INF]: Module "ietf-interfaces" scheduled for installation.
[56521][INF]: Module "iana-if-type" scheduled for installation.
[56521][INF]: File "ietf-interfaces@2014-05-08.yang" was installed.
[56521][INF]: Applying scheduled changes.
[56521][INF]: Module "ops-ref" will be installed as "ops" module dependency.
[56521][INF]: File "ops.yang" was installed.
[56521][INF]: Module "ops" was installed.
[56521][INF]: Dependency module "ops-ref" was installed.
[56521][INF]: Module "ietf-interfaces" will be installed as "iana-if-type" module dependency.
[56521][INF]: File "iana-if-type@2014-05-08.yang" was installed.
[56521][INF]: Module "iana-if-type" was installed.
[56521][INF]: Dependency module "ietf-interfaces" was installed.
[56521][INF]: Scheduled changes applied.
[56521][INF]: Session 5 (user "ci") created.
[56521][INF]: There are no subscribers for changes of the module "ietf-interfaces" in running DS.
[56731][INF]: Scheduled changes not applied because of other existing connections.
[56731][INF]: Session 6 (user "ci") created.
[56521][INF]: Published event "notif" "ops" with ID 1 priority 0 for 1 subscribers.
[56731][INF]: Published event "change" "ietf-interfaces" with ID 1 priority 0 for 1 subscribers.
LLVMSymbolizer: error reading file: The file was not recognized as a valid object file
[56731][ERR]: Callback event "change" with ID 1 processing timed out.
[56731][WRN]: Event "change" with ID 1 priority 0 failed (Timeout expired).
LLVMSymbolizer: error reading file: The file was not recognized as a valid object file
[56521][INF]: Processing "notif" "ops" event with ID 1.
[56521][INF]: Successful processing of "notif" event with ID 1 priority 0 (remaining 0 subscribers).
[56521][INF]: Published event "notif" "ops" with ID 2 priority 0 for 1 subscribers.

@michalvasko
Copy link
Collaborator

Okay, I am really confused by the output you pasted even though it is hard to say whether it is relevant. So what exactly am I looking at? Running 2 test_process in parallel? Because the output does not make sense and I do see any other explanation for

[===========] Running 3 test(s).

being printed twice.

@syyyr
Copy link
Contributor Author

syyyr commented Oct 8, 2020

The command I use to reproduce the deadlock is this:

while { rm -rf /dev/shm/* test_repositories/; TSAN_OPTIONS="suppressions=tsan.supp" ctest  --output-on-failure --timeout 20 -R test_process; }; do
   :
done

which basically just runs test_process all over again until it fails. The "running 3 tests" message being printed two times is VERY weird, because I'm only running one test_process at a time, but that's what it does... I'll try to run the spin on my laptop and see if it appears if run for longer. Also the error message is sometimes different (it says something about not being able to acquire a lock), but I don't have it, because it's quite random what kind of log appears.

I'll try to find out on which exact line it hangs.

The tsan.supp file is this:

mutex:sr_rwunlock
mutex:sr_shmsub_notify_finish_wrunlock
race:shm_sub.c
race:shm_mod.c
race:shm_main.c

@syyyr
Copy link
Contributor Author

syyyr commented Oct 8, 2020

Some new notes:

  • The double "running three tests" message goes away when I print it to stderr. That doesn't say much unfortunately, but there probably is some buffering issue too.
  • The first test can be commented out and the bug still happens.
  • This is the log where the "Locking a mutex failed" error appears: https://gist.github.com/syyyr/565e2e21bedb9d3f11fdc8629504b089
  • Inside test_notif_instid2, there are two sr_apply_changes calls. If I change both them to have 2 second timeout and set the wait argument to 1, the issue appears always, even on my laptop. This is the log: https://gist.github.com/syyyr/ce01e1164db6109f9b8f801d3bc2b5f2 It might be that I'm changing the nature of the test too much, so feel free to say that this is not a valid "reproducer". It does give [ERR] messages, but I'm not sure if that's a bug or not.
  • When the test hangs, it does so right after the first notification is sent. If I put a printf inside the for loop that sends them, only one message appears.

@syyyr
Copy link
Contributor Author

syyyr commented Oct 9, 2020

Some more stuff:

  • If I comment out the first two tests, then the issue doesn't appear (the test didn't hang in 200 tries). So it's connected to some shared state, left by the previous state. Also the issue only appears when the rpc_crash test is run. Running rpc_sub and rpc_sub works fine. I would think this issue has something with the actual "crashing" of the forked process. Maybe there are some races/deadlocks after the cleanup procedures of the process that "crashes"?

@michalvasko
Copy link
Collaborator

Okay, I finally managed to take a look at this. I have tried changing test_notif_instid2 the way you described with no luck, it worked fine for me. Also, all the logs you are providing are quite useless because the order of the messages is completely haphazard. Because of that I cannot be sure that you are running the test correctly. The nature of this particular test is quite specific, especially the one testing "a crash".

@syyyr
Copy link
Contributor Author

syyyr commented Oct 13, 2020

These are the steps I took:
0) Start with a clean repo

  1. Apply this patch:
diff --git a/tests/test_process.c b/tests/test_process.c
index b4fb0cbf..810dbc39 100644
--- a/tests/test_process.c
+++ b/tests/test_process.c
@@ -444,14 +444,14 @@ test_notif_instid2(int rp, int wp)
         sr_assert_int_equal(ret, SR_ERR_OK);
         ret = sr_set_item_str(sess, "/ietf-interfaces:interfaces/interface[name='eth0']/description", "desc", NULL, 0);
         sr_assert_int_equal(ret, SR_ERR_OK);
-        ret = sr_apply_changes(sess, 0, 0);
+        ret = sr_apply_changes(sess, 5000, 1);
         sr_assert_int_equal(ret, SR_ERR_OK);
 
         ret = sr_set_item_str(sess, "/ietf-interfaces:interfaces/interface[name='eth0']/enabled", "true", NULL, 0);
         sr_assert_int_equal(ret, SR_ERR_OK);
         ret = sr_set_item_str(sess, "/ietf-interfaces:interfaces/interface[name='eth0']/description", "desc2", NULL, 0);
         sr_assert_int_equal(ret, SR_ERR_OK);
-        ret = sr_apply_changes(sess, 0, 0);
+        ret = sr_apply_changes(sess, 5000, 1);
         sr_assert_int_equal(ret, SR_ERR_OK);
     }
 
@@ -463,7 +463,7 @@ int
 main(void)
 {
     struct test tests[] = {
-        {"rpc sub", test_rpc_sub, test_rpc_sub, setup, teardown},
+        /* {"rpc sub", test_rpc_sub, test_rpc_sub, setup, teardown}, */
         {"rpc crash", test_rpc_crash1, test_rpc_crash2, setup, teardown},
         {"notif instid", test_notif_instid1, test_notif_instid2, setup, teardown},
     };
  1. Compile sysrepo with ThreadSanitizer enabled
  2. Go into the build directory
  3. Run the test with:
ctest -R test_process --output-on-failure -V
  1. The test hangs indefinitely. I've tried this a few times and SOMETIMES it doesn't hang on the first try, but most of the time it does and it is reproducible on my laptop. Hopefully it will hang for you too.

Thank you for looking at this. Sorry, that it's such a weird thing :D

@michalvasko
Copy link
Collaborator

Like I said, this will not be the way for me to fix it.

Applied your patch, configured with cmake -DCMAKE_C_FLAGS=-fsanitize=thread .., using your suppression file and running

while { make test_clean; TSAN_OPTIONS="suppressions=tsan.supp" ctest  --output-on-failure --timeout 20 -R test_process; }; do
   :
done

with output into the file out.txt.

jktjkt pushed a commit to CESNET/CzechLight-dependencies that referenced this issue Oct 24, 2020
Major changes in building:

- sysrepo no longer needs libredblack.

- sysrepo tests no longer operate on the global datastore. This makes
the build simpler (it is no longer needed to build sysrepo twice) and
also enables parallelization.

- new sysrepo needs some new TSan suppressions, because of how it's
implemented. More details here:
sysrepo/sysrepo#2123

- Netopeer2 now no longer has tests.

- libnetconf2 uses a custom patch so that it ignores what kind of libssh
version is installed. Upstream probably won't merge this patch (and
that's fine). When libssh 0.9.5 comes, we can switch back to upstream
again.

Change-Id: Ibb6932f11bed10eddb173bec0459c33e85072f02
jktjkt pushed a commit to CESNET/netconf-cli that referenced this issue Oct 24, 2020
Changes from old sysrepo:

- sysrepod and sysrepo-plugind are no longer required, so references to
those were removed.

- New Netopeer now uses NACM - the tests neeeded to be changed, so that
they disable NACM.

- Some TSan suppressions needed to be added, because of
sysrepo/sysrepo#2123

- sysrepo now provides easy access to a libyang context with all
modules, which means we no longer have manually fetch them to fill out
YangSchema

- sysrepo now uses a different datastore model
(https://tools.ietf.org/html/rfc8342). This changes the way sysrepo
behaves in the datastore tests, especially that running config is no
longer reset, when closing subscriptions. Because of that, the running
config is always reset at the start of a test. Also, getItems had to be
changed to use the "operational" datastore so that state data is still
fetched.

- sysrepo is now more parallelized and uses non-blocking mechanisms to
defer some actions. The most notable example is committing changes (the
new function is called `apply_changes()`). It is still possible to mimic
the old behavior, because all the "defer-able" functions have a `wait`
argument.

- As sysrepo now uses libyang internally, data can be fetched as libyang
nodes. I replaced the `sr_val_t` (and friends) to this mechanism. This
also unifies some stuff in datastore_access test (mainly the containers,
that had to be #ifdef'd).

- Inside RPC callbacks, libyang context is available. Use this context
to do libyang stuff, instead of injecting a YangSchema.

Depends-on: https://cesnet-gerrit-czechlight/c/CzechLight/dependencies/+/2884
Depends-on: https://cesnet-gerrit-public/c/CzechLight/dependencies/+/2884
Depends-on: https://gerrit.cesnet.cz/c/CzechLight/dependencies/+/2884
Change-Id: Iaf4281bc3bd6cda64ab7d8727c28b9b9d132050a
@michalvasko michalvasko closed this as not planned Won't fix, can't repro, duplicate, stale May 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants