New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
changelog: A brick process is getting crash due to SIGSEGV in changelog #3522
Conversation
/run regression |
What about changelog_rpc_clnt_unref() which is calling list_del(&crpc->list) ? (I must say there are so many locks, and some are called under others, that it's hard to easily identify the order and the correctness of locks there!) |
changelog_rpc_clnt_unref call only while crpc ref value is 0 it means no one is using crpc object so we don't need to call |
IIUC a brick process is getting crashed during access of crpc->list so to make it thread safe i have used crpc->lock wherever |
Can't someone ref it in between? Looks racy to me, but I don't know the code well enough. |
It should not be otherwise it is a bug, last reference has been unref by only while a client want to destroy the rpc connection. |
/run regression |
A brick process is getting crashed while using glusterfind tool. The glusterfind tool uses changelog xlator and the xlator has race condition to handle crpc object list so at the time of calling ev_connector thread it is getting crashed. Solution: The xlator is not using correct lock to sync the list in crpc object so use crpc->lock to handle the crpc->list. Fixes: gluster#3521 Change-Id: I13ec8603dc06ecba4cd293cb48012a2ebef55749 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
/run regression |
@aravindavk Can you please review it. |
/run brick-mux regression |
7 test(s) failed 6 test(s) generated core 18 test(s) needed retry 4 flaky test(s) marked as success even though they failed |
The brick_mux crash is not specific to this patch so we can ignore it now. |
/run regression |
Are you sure the crashes are not related ? the latest regression has also had several crashes. |
There is no crash(https://build.gluster.org/job/gh_centos7-regression/2525/console) for centos-regression, yes i am sure |
I will run a brick_mux regression without apply a patch and share the link. |
Below is the link as we can see for brick_mux multiple test cases are getting crashed. |
Any idea why ? brick-mux should be stable. |
The brick_mux regression is getting crashed because of this change(2e825b3), We have to revert it. Though it is not a leak because vol_opt object is added to the xlator->volume_options while do validate auth_options that's why the list is not delete in gf_auth_fini function. The object is deleted while xlator->volume_options has been cleaned during xlator cleanup. |
A brick process is getting crashed while using glusterfind tool.
The glusterfind tool uses changelog xlator and the xlator has race
condition to handle crpc object list so at the time of calling
ev_connector thread it is getting crashed.
Solution: The xlator is not using correct lock to sync the list
in crpc object so use crpc->lock to handle the crpc->list.
Fixes: #3521
Change-Id: I13ec8603dc06ecba4cd293cb48012a2ebef55749
Signed-off-by: Mohit Agrawal moagrawa@redhat.com