-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
qat engine segfault issue in multi-thread. #110
Comments
Hi dhkim88 |
Hi paulturx |
Hi @dhkim88 I have been thinking about the issue you have raised. Interestingly if you look in the file qat_callback.c you'll see the callback that is used for the asymmetric crypto callbacks (qat_crypto_callbackFn()). This has been written in the following way:
It is purposely written this way so that opDone->job is checked first before the flag is set. This means in the synchronous use case a race condition does not happen and in the asynchronous use case a race condition cannot happen anyway, so it does not matter that job is used in the qat_wake_job call. Would this be acceptable to you? Kind Regards, Steve. |
Hi Steve Thanks for your update. Packet Handling thread
Polling thread
Please just follow the number sequence. Packet handling thread side. Polling thread side. There seem to be some similar synchronization issues in asynchronous mode. Thank you. |
Hi @dhkim88 I agree in the failure case you mention the asynchronous path effectively becomes like the synchronous path and is spinning waiting for the flag to get set. In that case the race condition could occur. The simplest fix for the issue would be to refactor the callback using a local variable to effectively cache the value of job. As such we will still allow the opdone structure to be cleaned up before the callback has finished if the thread is interrupted but it would have no bad effects in the callback as nothing would be using it. An example of the fix would be:
It would also need to be fixed in the same way in the qat_crypto_callbackFn(). Kind Regards, Steve. |
Hi steve. I agree with your suggestion. Thank you for your consideration. And I have another question. Can I get performance comparison reports for multi-threaded and multi-process? Can you tell me roughly? Thank you. |
HI
I was check the maximum performance about QAT-Engine using IXIA in multi-thread env. And I found a problem that caused segfault. So, debugging and found the source code of the problem.
gdb trace
#0 0x00007f3fe71fed20 in ASYNC_get_wait_ctx () from /secui/lib/QAT/libcrypto.so.1.1
#1 0x00007f3fd8aa042d in qat_wake_job () from /secui/lib/QAT/lib/engines-1.1/qat.so
#2 0x00007f3fd8aa489c in qat_chained_callbackFn () from /secui/lib/QAT/lib/engines-1.1/qat.so
#3 0x00007f3fd8800485 in LacSymCb_ProcessCallback () from /secui/lib/QAT/libqat_s.so
#4 0x00007f3fd8816532 in adf_user_notify_msgs_poll () from /secui/lib/QAT/libqat_s.so
#5 0x00007f3fd88126ac in adf_pollRing () from /secui/lib/QAT/libqat_s.so
#6 0x00007f3fd88129da in icp_adf_pollInstance () from /secui/lib/QAT/libqat_s.so
#7 0x00007f3fd880d436 in icp_sal_CyPollInstance () from /secui/lib/QAT/libqat_s.so
#8 0x00007f3fd8aa08e9 in timer_poll_func () from /secui/lib/QAT/lib/engines-1.1/qat.so
#9 0x00007f3fe22ee50b in start_thread () from /lib64/libpthread.so.0
#10 0x00007f3fe0b0038f in clone () from /lib64/libc.so.6
The below code is in the qat_chained_callbackFn function.
I don't use asynchronous mode, but it seems to be entering asynchronous paths.
This seems to be a synchronization problem between the polling thread and the packet handling thread in the QAT engine. The opdone variable used in the polling thread is a local variable declared in the packet-handling thread. When the callback function is called, If the opdone-> opDone.flag = 1 flag is set and interrupted then opdone address can be released in the packet handle thread. This problem may also occur in async mode.
I did some testing and added the following defensive code to confirm that there was no problem.
in qat_chained_callbackFn function
in qat_chained_ciphers_do_cipher function
}
please check this
Thank you.
The text was updated successfully, but these errors were encountered: