-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tests don't pass with ThreadSanitizer #2123
Comments
Hi, |
I wonder if moving mutexes is portable, and I wasn't to find a definitive answer. Jonathan Wakely of the GCC/glibc fame said that Moving locked mutexes is documented to be forbidden on IBM i, which claims to be POSIX-compliant. On Linux, I think that it is not possible to move robust mutexes ( How high is that on the TODO list, and is it "doable" or would it require reworking half of the SHM code? TSAN has helped us in cla-sysrepo in the past many, many times, and disabling its features is a rather high cost for our product. |
Firstly, the theoretical answer why Regarding robust mutexes, the plan was to make robust only the "recovery" locks, which could be put into a static SHM segment. However, that was just the initial idea that may turn out to be wrong for whatever reason. TLDR; I would have to delve into the overall SHM usage in sysrepo once again to be able to come up with a plausible design that would 1) solve all the recovery problems and 2) optionally put all the mutexes into separate static SHM segments (that do not change their size/are not remapped for the lifetime of a process). I do not know when I will be able to do that. |
Thanks, we'll be happy to test this once you have patches. In the meanwhile, @syyyr put together these sanitizer suppressions to keep TSAN working for our code. |
Some more info:
Log:
|
Hi, |
Hi, log from one of the failed runs:
|
Okay, I am really confused by the output you pasted even though it is hard to say whether it is relevant. So what exactly am I looking at? Running 2
being printed twice. |
The command I use to reproduce the deadlock is this: while { rm -rf /dev/shm/* test_repositories/; TSAN_OPTIONS="suppressions=tsan.supp" ctest --output-on-failure --timeout 20 -R test_process; }; do
:
done which basically just runs I'll try to find out on which exact line it hangs. The tsan.supp file is this:
|
Some new notes:
|
Some more stuff:
|
Okay, I finally managed to take a look at this. I have tried changing |
These are the steps I took:
diff --git a/tests/test_process.c b/tests/test_process.c
index b4fb0cbf..810dbc39 100644
--- a/tests/test_process.c
+++ b/tests/test_process.c
@@ -444,14 +444,14 @@ test_notif_instid2(int rp, int wp)
sr_assert_int_equal(ret, SR_ERR_OK);
ret = sr_set_item_str(sess, "/ietf-interfaces:interfaces/interface[name='eth0']/description", "desc", NULL, 0);
sr_assert_int_equal(ret, SR_ERR_OK);
- ret = sr_apply_changes(sess, 0, 0);
+ ret = sr_apply_changes(sess, 5000, 1);
sr_assert_int_equal(ret, SR_ERR_OK);
ret = sr_set_item_str(sess, "/ietf-interfaces:interfaces/interface[name='eth0']/enabled", "true", NULL, 0);
sr_assert_int_equal(ret, SR_ERR_OK);
ret = sr_set_item_str(sess, "/ietf-interfaces:interfaces/interface[name='eth0']/description", "desc2", NULL, 0);
sr_assert_int_equal(ret, SR_ERR_OK);
- ret = sr_apply_changes(sess, 0, 0);
+ ret = sr_apply_changes(sess, 5000, 1);
sr_assert_int_equal(ret, SR_ERR_OK);
}
@@ -463,7 +463,7 @@ int
main(void)
{
struct test tests[] = {
- {"rpc sub", test_rpc_sub, test_rpc_sub, setup, teardown},
+ /* {"rpc sub", test_rpc_sub, test_rpc_sub, setup, teardown}, */
{"rpc crash", test_rpc_crash1, test_rpc_crash2, setup, teardown},
{"notif instid", test_notif_instid1, test_notif_instid2, setup, teardown},
};
Thank you for looking at this. Sorry, that it's such a weird thing :D |
Like I said, this will not be the way for me to fix it. Applied your patch, configured with while { make test_clean; TSAN_OPTIONS="suppressions=tsan.supp" ctest --output-on-failure --timeout 20 -R test_process; }; do
:
done with output into the file out.txt. |
Major changes in building: - sysrepo no longer needs libredblack. - sysrepo tests no longer operate on the global datastore. This makes the build simpler (it is no longer needed to build sysrepo twice) and also enables parallelization. - new sysrepo needs some new TSan suppressions, because of how it's implemented. More details here: sysrepo/sysrepo#2123 - Netopeer2 now no longer has tests. - libnetconf2 uses a custom patch so that it ignores what kind of libssh version is installed. Upstream probably won't merge this patch (and that's fine). When libssh 0.9.5 comes, we can switch back to upstream again. Change-Id: Ibb6932f11bed10eddb173bec0459c33e85072f02
Changes from old sysrepo: - sysrepod and sysrepo-plugind are no longer required, so references to those were removed. - New Netopeer now uses NACM - the tests neeeded to be changed, so that they disable NACM. - Some TSan suppressions needed to be added, because of sysrepo/sysrepo#2123 - sysrepo now provides easy access to a libyang context with all modules, which means we no longer have manually fetch them to fill out YangSchema - sysrepo now uses a different datastore model (https://tools.ietf.org/html/rfc8342). This changes the way sysrepo behaves in the datastore tests, especially that running config is no longer reset, when closing subscriptions. Because of that, the running config is always reset at the start of a test. Also, getItems had to be changed to use the "operational" datastore so that state data is still fetched. - sysrepo is now more parallelized and uses non-blocking mechanisms to defer some actions. The most notable example is committing changes (the new function is called `apply_changes()`). It is still possible to mimic the old behavior, because all the "defer-able" functions have a `wait` argument. - As sysrepo now uses libyang internally, data can be fetched as libyang nodes. I replaced the `sr_val_t` (and friends) to this mechanism. This also unifies some stuff in datastore_access test (mainly the containers, that had to be #ifdef'd). - Inside RPC callbacks, libyang context is available. Use this context to do libyang stuff, instead of injecting a YangSchema. Depends-on: https://cesnet-gerrit-czechlight/c/CzechLight/dependencies/+/2884 Depends-on: https://cesnet-gerrit-public/c/CzechLight/dependencies/+/2884 Depends-on: https://gerrit.cesnet.cz/c/CzechLight/dependencies/+/2884 Change-Id: Iaf4281bc3bd6cda64ab7d8727c28b9b9d132050a
Hi,
I tried running the tests with ThreadSanitizer and some of them get some warnings. It seems that most (if not all) of them fail on this kind of error (there is maybe some variation, but all of them are "unlock of an unlocked mutex"):
The entire log is very long so I uploaded it here https://gist.github.com/syyyr/3f3640cbfbb4daa4617d18f5d5b2dfcd. Would it be possible to fix this? We'd like to run the tests as part of our CI.
Thanks
The text was updated successfully, but these errors were encountered: