-
Notifications
You must be signed in to change notification settings - Fork 7.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for listing tasks to gdb in gdbstub (IDFGH-729) #2828
Conversation
components/esp32/gdbstub.c
Outdated
if (!taskCount) { | ||
unsigned runTime = 0; | ||
taskCount = uxTaskGetNumberOfTasks(); | ||
taskCount = uxTaskGetSystemState(tasks, 32, &runTime); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If any of the task lists are corrupted, is it possible that this causes an exception? It would be desirable for GDB stub to not crash, even if FreeRTOS state is invalid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's almost impossible to ensure the state of FreeRTOS task's chained list (unless validating each pointer in FreeRTOS but this would be a huge cost for a very sparsely used feature).
I wonder if it would be better to let it crash instead and:
- Set a boolean to true the first time we are walking the FreeRTOS task list,
- If we re-enter the exception handler, check the boolean. If it's true, simply return the previous exception since it's saved from the first entry. It would then appear in gdb as a single (the running) task and the previous/initial exception frame.
What do you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm testing this idea right now, and I'll update the PR when I'm done with this. I've also added a conditional compilation of the FreeRTOS part of the gdbstub (based on the presence of uxTaskGetSystemState
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@igrr, @projectgus
uxTaskGetSystemState
tries to get xTaskQueueMutex
. If it is busy at the moment of crash (taken by the other CPU or corrupted in some special way) it can lead to freezing when that API is called. May be it is better to use uxTaskGetSnapshotAll
. It makes use of TaskSnapshot_t
which provides pointer to TCB, but it has type void *
, so gdbstub has to know TCB structure to get task name and core ID (X-Ryl669's DumpTCB
is the candidate for this). To get task handle TCB address can be converted to TaskHandle_t
.
eb7d3f1
to
ba010db
Compare
Here it is. This should keep it working as a best effort forensic analysis. |
A bit of description of the workflow, to help understanding the changes
So to sum up, if you have completely corrupted your memory, this code will act like the previous code, that is giving you the current exception frame and that's it. |
components/esp32/gdbstub.c
Outdated
@@ -285,12 +388,84 @@ static int gdbHandleCommand(unsigned char *cmd, int len) { | |||
gdbPacketEnd(); | |||
} else if (cmd[0]=='?') { //Reply with stop reason | |||
sendReason(); | |||
} else { | |||
#if configUSE_TRACE_FACILITY == 1 | |||
} else if (cmd[0]=='H' && reenteredHandler != -1) { //Continue with task |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reenteredHandler != -1
is used in several conditions checks. Re-writing this a bit could improve code readability.
For example:
} else if (cmd[0]=='H' && reenteredHandler != -1) { //Continue with task | |
} else if (reenteredHandler != -1) { | |
if (cmd[0]=='H') { //Continue with task | |
} else if (cmd[0]=='T') { //Task alive check | |
} else if (cmd[0]=='q') { //Extended query | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. That makes sense.
components/esp32/gdbstub.c
Outdated
@@ -211,13 +213,22 @@ STRUCT_FIELD (long, 4, XT_STK_OVLY, ovly) | |||
STRUCT_END(XtExcFrame) | |||
*/ | |||
|
|||
static inline bool isValidStack(long sp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use esp_stack_ptr_is_sane instead of this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. The isValidStack is extracted from coredump code esp_task_stack_start_is_sane
and it's not checking that the stack pointer is aligned on 16 bytes boundary. Is this guaranteed (stack pointer & 0xF == 0) under all normal conditions ?
components/esp32/gdbstub.c
Outdated
@@ -228,7 +239,7 @@ static void dumpHwToRegfile(XtExcFrame *frame) { | |||
gdbRegFile.windowstart=0x1; //1 | |||
gdbRegFile.configid0=0xdeadbeef; //ToDo | |||
gdbRegFile.configid1=0xdeadbeef; //ToDo | |||
gdbRegFile.ps=frame->ps-PS_EXCM_MASK; | |||
gdbRegFile.ps=(frame->ps & (1U<<5)) ? (frame->ps & ~(1U<<4)) : frame->ps; //Replicate correction from espcoredump.py:546 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you replace (1U<<5)
and (1U<<4)
with PS_UM
and PS_EXCM
respectively?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
components/esp32/gdbstub.c
Outdated
for (i=0; i<16; i++) gdbRegFile.a[i]=frameAregs[i]; | ||
for (i=16; i<64; i++) gdbRegFile.a[i]=0xDEADBEEF; | ||
if (gdbRegFile.a[0] & 0x8000000U) gdbRegFile.a[0] = (gdbRegFile.a[0] & 0x3fffffffU) | 0x40000000U; //Replicate correction from espcoredump.py:560 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to remove comments like //Replicate correction from espcoredump.py:560
in final version, since they seem to be useful for review only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
components/esp32/gdbstub.c
Outdated
//Fetch the task status | ||
static unsigned getAllTasksHandle(unsigned index, unsigned * handle, const char ** name, unsigned * coreId) { | ||
static unsigned taskCount = 0; | ||
static TaskStatus_t tasks[32]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest to use macro for max tasks number supported by GDB stub. The same in the code below which calls uxTaskGetSystemState
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I use CONFIG_ESP32_CORE_DUMP_MAX_TASKS_NUM
or simply make my own ?
components/esp32/gdbstub.c
Outdated
if (!taskCount) { | ||
unsigned runTime = 0; | ||
taskCount = uxTaskGetNumberOfTasks(); | ||
taskCount = uxTaskGetSystemState(tasks, 32, &runTime); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@igrr, @projectgus
uxTaskGetSystemState
tries to get xTaskQueueMutex
. If it is busy at the moment of crash (taken by the other CPU or corrupted in some special way) it can lead to freezing when that API is called. May be it is better to use uxTaskGetSnapshotAll
. It makes use of TaskSnapshot_t
which provides pointer to TCB, but it has type void *
, so gdbstub has to know TCB structure to get task name and core ID (X-Ryl669's DumpTCB
is the candidate for this). To get task handle TCB address can be converted to TaskHandle_t
.
I think the task name is really useful to locate task. Loosing this is really a big drawback.
Another option would be to modify FreeRTOS declaration of What do you prefer ? EDIT: Bah... Forget about that, I can already use |
…ve task and fetching each task's stack
…eck at multiple input point
56bec3e
to
fe4f20c
Compare
Summary of the changes:
|
@igrr @projectgus @X-Ryl669 Changes seem to be OK. One more thing... It should not be implemented in this PR. It is planned as one of ongoing core dump improvements/fixes. Registers for the task currently running on other CPU are not saved onto the stack, so GDB will show invalid values. Special mechanism needs to be implemented to retrieve them from other CPU upon entering panic handler. |
The only way to do that would be to trigger an interrupt on the other CPU. The handler role would be to save the exception frame to the memory and set a flag so the other cpu can busy loop on it. I'm still thinking about how to allow live debugging with this and using software interrupt might be a solution, except that it's not clearly explained how it's implemented underneath. |
Yes. ESP32 supports interrupts between xtensa cores. The idea is to use |
Would it sum up to adding a new reason in In that case, please consider allowing more action in this (new) handler, like:
That would permit one CPU to debug the other CPU in live within GDB directly from UART or WIFI! |
This will likely be implemented by extending the existing level 4 interrupt handler to allow doing other "stuff", such as saving the interrupt frame somewhere. This has to be at least level 4 interrupt, so you can interrupt the other CPU even if it is inside a FreeRTOS critical section.
Not to disencourage you from trying this, but there are other issues to solve. For example, with this approach the "debuggee" CPU can not be allowed to halt inside a critical section. If it does, then the "debugger" CPU can immediately deadlock if it tries to enter a critical section with the same spinlock. Also in the current FreeRTOS CPU0 is responsible for advancing tick count, so debuggee CPU can only be CPU1. |
I guess such debugging stub shouldn't be using FreeRTOS. It should be only interrupt based. Then GDB can ask to set up a debug break/watchpoint (we should be able to deal with this as interrupt too when they happen) Then GDB can ask to continue/resume. In that case, we should get out of the ISR and let the program resume. Provided the user has set up a debug break/watchpoint, an interrupt will happen when it's hit, and we're back to the GDB stub code. We could also have a FreeRTOS task that's listening to GDB command on the serial link to catch "Ctrl+C" (not sure how it's done) and in that case, it would trigger the interrupt to re-enter GDB stub. I don't see in this scheme when FreeRTOS is in the way or where it could deadlock. Also, at first, the GDB stub handler should talk using serial port, but in the future it could be using TCP/IP too since GDB supports this. |
…and changing the active task and fetching each task's stack Merges #2828
Is the current change in master finished and OK ? |
@X-Ryl669 yes, changes were cherry-picked in the commit above. Thanks for the contribution, and please let us know if you have another suggestions or contributions for debugging improvements. |
This is supposed to fix #2818
This adds all the required commands in the gdbstub (via UART) to:
To tease the current support, you can now do this:
This is a large change with many redundant code since I could not access the other components static's / internal stuff.
Mainly, I've duplicated the
dumpHwRegfile
function to support Xtensa'sXtSolFrame
, made a backup of the current panic handler (since as soon as gdb detects there are threads, it switches them, so thegdbregfile
is lost), and dealt with FreeRTOS's TCB structure.I didn't want to copy all the TCB structure declaration (I only needed a the top of the stack which is guaranteed to be the first element in the structure). So I declared a dummy structure with just a single element, and I'm using this. I'm not using
uxTaskSnapshotAll
here since this does not give the task name and CPU id. Instead, I'm usinguxTaskGetSystemState
that gives everything (name, TCB/handle, cpu id).I've added all protections for address validity checking (copied from espcoredump.py and coredump.c)
There is still a bug within gdb when it's receiving a dump with a code that's built with -Os (with inlining) but it has already been reported in your forum, and it's unrelated to this work (it also happens when doing coredump).
Please let me know if you need more changes.