-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for lock order inversion between uri_map_mutex and conn_mutex #185
Conversation
- removing connection mutex lock in sendToOne() to remove order dependency - protecting reference vectors using a new mutex instead of uri_map_mutex Signed-off-by: Gautam Venkataramanan <gautam.chennai@gmail.com>
Adding more details on the locks with the ordering: T14 (Policy processing/deserializing thread): B) Mutex M1775 acquired here while holding mutex M1599 in thread T14: A) Mutex M1599 previously acquired by the same thread here: <-- Called to give policy update to each agent connection T16 (Thread to update policies for every agent connection): B) Mutex M1599(conn_mutex) acquired here while holding mutex M1775 in thread T16: 185 void OpflexListener::sendToOne(OpflexServerConnection* conn, OpflexMessage* message) { A) Mutex M1775 (uri_map_mutex) previously acquired by the same thread here: 233 void OpflexServerConnection::on_policy_update_async(uv_async_t* handle) { Thread T14 (tid=7642, running) created by main thread at: Thread T16 (tid=7644, running) created by thread T14 at: |
@@ -184,7 +184,6 @@ void OpflexListener::sendToAll(OpflexMessage* message) { | |||
|
|||
void OpflexListener::sendToOne(OpflexServerConnection* conn, OpflexMessage* message) { | |||
std::unique_ptr<OpflexMessage> messagep(message); | |||
const std::lock_guard<std::recursive_mutex> lock(conn_mutex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a lock when we have multiple agents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checked code, it needs the lock even with single agent, since we could be closing the conn when sendToOne is in progress (similar to sendToAll) I kind of follow what TSAN is complaining about. Will see how to fix it and get back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussing this with Madhu. If we need to protect access to conn, then we need to put the lock outside SendToOne(), where conn continues to get accessed - in OpflexServerConnection::on_policy_update_async(). And I will influence the lock order of conn_mutex and ref_vec_mutex to address deadlock. However, the critical section will be longer here. We are going to discuss further and see how to address this clean. Currently we are passing connection pointers in the handle. May be we need to pass conn ID and check if its a valid connection every time we access and reduce critical section. Will raise an issue to track the cleanup/addressing large critical section.
@@ -234,7 +234,7 @@ void OpflexServerConnection::on_policy_update_async(uv_async_t* handle) { | |||
OpflexServerConnection* conn = (OpflexServerConnection *)handle->data; | |||
GbpOpflexServerImpl* server = dynamic_cast<GbpOpflexServerImpl*> | |||
(conn->listener->getHandlerFactory()); | |||
std::lock_guard<std::mutex> lock(conn->uri_map_mutex); | |||
std::lock_guard<std::mutex> lock(conn->ref_vec_mutex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine, but i am not sure why a single lock is not good enough. Is it due to your other commit ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This just avoids the inconsistency of using uri_map_mutex for the reference vector. This was changed from uri_update_mutex (original commit) to uri_map_mutex in 3351d64. The change is to just restore the original changes. Used ref_vec_mutex instead of uri_update_mutex for clarity. This wont address the lock ordering issue though.
- I am influencing the ordering now: conn_mutex obtained before ref_vec_mutex. This will take care of deadlock. - Critical section for conn_mutex is larger. We will discuss further and optimize the critical section and also may be move to connID-->connPtr map for more safety. Signed-off-by: Gautam Venkataramanan <gautam.chennai@gmail.com>
…ar to OpflexServerConnection::on_policy_update_async() Signed-off-by: Gautam Venkataramanan <gautam.chennai@gmail.com>
Opflex-cni-test passed as well. |
Signed-off-by: Gautam Venkataramanan gautam.chennai@gmail.com