-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault during SGX-LKL shutdown #733
Comments
perhaps of interest. the draft PR for setting PROT_NONE when unmapping memory causes a large number of core and ltp tests to fail on shutdown. Might be related: #720 https://dev.azure.com/sgx-lkl/sgx-lkl/_build/results?buildId=1705&view=results |
Interesting. All of the segfaults for #720 happen in SW mode. Do you have a gdb stack trace of the segfaults in #720? It may be easy to fix. |
Here's one:
I'm checking to see if other tests look similar. |
@prp I checked the stack trace for 8 different failing LTP tests. They are all the same. |
My guess @prp from looking at a little code is that we have this in console_task:
and I suspect that after the shutdown check, we are starting the shutdown while we are trying to still use the queue. But I don't know this part of the code at all, so that is pure conjecture at this point. I believe you are more familiar with the virtio code and its interactions. Do you have any insights? |
I think you're right. By the way, this appears a duplicate of #158. |
Here's another that might be related: https://dev.azure.com/sgx-lkl/ff25f828-9f87-48e4-94f0-7449609f7e8f/_apis/build/builds/1723/logs/148 during 'report/TEST-basic-global_vars_test-(nonrelease)-(run-hw)-(8-ethreads)-junit.xml'
|
This may get fixed by PR #770. |
I observed the following failure during a CI run of
kernel-syscalls-mmap-mmap11-(nonrelease)-(run-hw)-(8-ethreads)
:This happens to one of the ethreads, after 6 of the 8 ethreads have already exited the enclave. Note that the signal handler cannot retrieve the tid anymore.
I suspect that the test here is irrelevant, and this is rather a rare race condition/memory corruption during shutdown. It would be good to observe this for a DEBUG build and get a stack trace. It seems different from the other issues that we have seen.
The text was updated successfully, but these errors were encountered: