Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to debug a multithread application [ Debugger / gdb ] #1069

Closed
avafinger opened this issue Mar 3, 2021 · 24 comments · Fixed by #1170
Closed

How to debug a multithread application [ Debugger / gdb ] #1069

avafinger opened this issue Mar 3, 2021 · 24 comments · Fixed by #1170

Comments

@avafinger
Copy link

I am not sure where I should open this issue, here or in geany-plugins.

I need to debug a multithreaded app with Geany with debugger plugin (gdb) but Geany hangs inside the thread.
I have been using Geany for a long time but I can't find a way to debug a thread.

To reproduce the problem, build and debug the sample thread.c, setting a break-point at the line 22 and 34 and then run

  • Build

     gcc -g -O0 -o thread thread.c -lpthread  
    
  • thread.c

#include<stdio.h>
#include<string.h>
#include<pthread.h>
#include<stdlib.h>
#include<unistd.h>

/*
 * Build with:
 *
 * gcc -g -O0 -o thread thread.c -lpthread
 *
*/

pthread_t tid[2];

void* doSomeThing(void *arg)
{
    unsigned long x = 0;
    int j = 0;
    int i = (int)arg;

    pthread_t id = pthread_self();

    printf("Thread %d processing...\n", i);
    if(pthread_equal(id,tid[0])) {
        printf("Inside First thread\n");
    } else  {
        printf("Inside Second thread\n");
    }

    for(x=0; x<(0xFFFFFFFF);x++) {
        j++;
    }
    printf("Thread %d: 0x%lx [ x = 0x%X ]\n",i, id, j);

    return NULL;
}

int main(void)
{
    int i = 0;
    int err;

    while (i < 2) {
        err = pthread_create(&(tid[i]), NULL, &doSomeThing, (void*)i);
        if (err != 0)
            printf("can't create thread :[%s]", strerror(err));
        else
            printf("Thread %d created successfully\n", i);
        i++;
    }

    sleep(10);
    return 0;
}

Geany correctly stops at the first break-point, but If i click "Step over", Geany switches to the second thread, hitting "Step over" again hangs Geany. And eventually, a crash occurs if you try to close Geany.

The correct behavior should be to stay in the same thread and walk through the code while "Step over" is hit, line by line.

@elextr
Copy link
Member

elextr commented Mar 3, 2021

This isn't anything to do with Geany, it definitely to do with the GDB plugin and things hanging looks like a bug in either the plugin or GDB.

Will move to geany-plugins.

@elextr elextr transferred this issue from geany/geany Mar 3, 2021
@avafinger avafinger changed the title How to debug a multithread application How to debug a multithread application [ Debugger / gdb ] Mar 3, 2021
@avafinger
Copy link
Author

avafinger commented Mar 3, 2021

Additional info:

GDB version(s):


gdb --version
GNU gdb (Ubuntu 8.1.1-0ubuntu1) 8.1.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".

The desired behavior:

When the break-point is hit the first time (Thread 0), it should ignore all other hits from another Thread (in this sample, Thread 1) until this Thread is terminated (exited), and not switch to any other threads.

An interesting question, How do you debug Geany?

@avafinger
Copy link
Author

Please disregard the hang problem, the main thread is more likely to end before Thread 0 or Thread 1 end during debugging, causing the issue.

I debugged this in another PC with a newer kernel, can you suggest how to get rid of this:

Could not open file /build/glibc-S9d2JN/glibc-2.27/nptl/pthread_create.c (Error when getting information for file “/build/glibc-S9d2JN/glibc-2.27/nptl/pthread_create.c”: No such file or directory)
Could not open file /build/glibc-S9d2JN/glibc-2.27/sysdeps/unix/sysv/linux/x86_64/clone.S (Error when getting information for file “/build/glibc-S9d2JN/glibc-2.27/sysdeps/unix/sysv/linux/x86_64/clone.S”: No such file or directory)
Could not open file /build/glibc-S9d2JN/glibc-2.27/sysdeps/posix/sleep.c (Error when getting information for file “/build/glibc-S9d2JN/glibc-2.27/sysdeps/posix/sleep.c”: No such file or directory)

@elextr
Copy link
Member

elextr commented Mar 3, 2021

An interesting question, How do you debug Geany?

Geany is not a multi-threaded application, its mostly GUI activated code, and GTK is single threaded, so Geany is single threaded. There may be some threads used in the libraries Geany uses, but they are hidden there, and not having their source means its unlikely that breakpoints will be set in those.

@avafinger
Copy link
Author

avafinger commented Mar 4, 2021

Right, i fixed the errors by installing glibc source code, but the debugger still wants to debug the thread_start in clone.S. Don't know how to disable this "feature". I am on Geany 1.38 built on 2021-02-08 (git code).

Anyway, I came up with a better example and tested it with gdb to check if it is a bug in gdb. It worked as expected. Trying to debug with Geany hangs, there seems to be a race condition or it is waiting for some gdb info.

Try to do reproduce this inside Geany and it hangs:

gdb ./thread 
GNU gdb (Ubuntu 8.1.1-0ubuntu1) 8.1.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./thread...done.
(gdb) break 26
Breakpoint 1 at 0x916: file thread.c, line 26.
(gdb) break 38
Breakpoint 2 at 0x9b8: file thread.c, line 38.
(gdb) break 62
Breakpoint 3 at 0xac8: file thread.c, line 62.
(gdb) break 65
Breakpoint 4 at 0xb22: file thread.c, line 65.
(gdb) run
Starting program: /home/alex/Download/apps/appsgw/thread/thread 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff77c2700 (LWP 20962)]
Thread 0 created successfully
Thread 0 starting...
[New Thread 0x7ffff6fc1700 (LWP 20963)]
Thread 1 starting...
Thread 1 created successfully
[Switching to Thread 0x7ffff77c2700 (LWP 20962)]

Thread 2 "thread" hit Breakpoint 1, doSomeThing (arg=0x0) at thread.c:26
26	    if(pthread_equal(id,tid[0])) {
(gdb) n
[Switching to Thread 0x7ffff6fc1700 (LWP 20963)]

Thread 3 "thread" hit Breakpoint 1, doSomeThing (arg=0x1) at thread.c:26
26	    if(pthread_equal(id,tid[0])) {
(gdb) n
Inside First thread
Thread 0 processing...
29	        printf("Inside Second thread\n");
(gdb) n
Inside Second thread
32	    printf("Thread %d processing...\n", i);
(gdb) n
Thread 1 processing...
33	    for(x=0; x<(0xFFFFFFFF);x++) {
(gdb) c
Continuing.
Thread 0: 0x7ffff77c2700 [ x = 0xFFFFFFFF ]
[Switching to Thread 0x7ffff77c2700 (LWP 20962)]

Thread 2 "thread" hit Breakpoint 2, doSomeThing (arg=0x0) at thread.c:38
38	    printf("Thread %d: 0x%lx exit\n",i, id);
(gdb) c
Continuing.
Thread 0: 0x7ffff77c2700 exit
[Thread 0x7ffff77c2700 (LWP 20962) exited]
[Switching to Thread 0x7ffff7fd6740 (LWP 20958)]

Thread 1 "thread" hit Breakpoint 3, main () at thread.c:62
62	    printf("Thread %d returned: %d\n", i, *ptr[i]);
(gdb) n
Thread 0 returned: 1
63	    i++;
(gdb) n
64	    pthread_join(tid[i], (void**)&(ptr[i]));
(gdb) n
Thread 1: 0x7ffff6fc1700 [ x = 0xFFFFFFFF ]
[Switching to Thread 0x7ffff6fc1700 (LWP 20963)]

Thread 3 "thread" hit Breakpoint 2, doSomeThing (arg=0x1) at thread.c:38
38	    printf("Thread %d: 0x%lx exit\n",i, id);
(gdb) c
Continuing.
Thread 1: 0x7ffff6fc1700 exit
[Thread 0x7ffff6fc1700 (LWP 20963) exited]
[Switching to Thread 0x7ffff7fd6740 (LWP 20958)]

Thread 1 "thread" hit Breakpoint 4, main () at thread.c:65
65	    printf("Thread %d returned: %d\n", i, *ptr[i]);
(gdb) n
Thread 1 returned: 2
66	    i++; // just to show on gdb next command
(gdb) n
67	    return 0;
(gdb) c
Continuing.
[Inferior 1 (process 20958) exited normally]
(gdb) 


breakpoints at lines: 26,38,62 and 65

Sample program:

#include<stdio.h>
#include<string.h>
#include<pthread.h>
#include<stdlib.h>
#include<unistd.h>

/*
 * Build with:
 *
 * gcc -g -O0 -o thread thread.c -lpthread
 *
*/
pthread_t tid[2];
int ret[2];

void* doSomeThing(void *arg)
{
    unsigned long x = 0;
    int j = 0;
    int i = (int)arg;

    printf("Thread %d starting...\n", i);
    pthread_t id = pthread_self();


    if(pthread_equal(id,tid[0])) {
        printf("Inside First thread\n");
    } else  {
        printf("Inside Second thread\n");
    }

    printf("Thread %d processing...\n", i);
    for(x=0; x<(0xFFFFFFFF);x++) {
        j++;
    }
    ret[i] = i + 1;
    printf("Thread %d: 0x%lx [ x = 0x%X ]\n",i, id, j);
    printf("Thread %d: 0x%lx exit\n",i, id);

    pthread_exit(&ret[i]);

    return NULL;
}

int main(void)
{
    int i = 0;
    int err;
    int *ptr[2];

    while (i < 2) {
        err = pthread_create(&(tid[i]), NULL, &doSomeThing, (void*)i);
        if (err != 0)
            printf("can't create thread :[%s]", strerror(err));
        else
            printf("Thread %d created successfully\n", i);
        i++;
    }

    i = 0;
    pthread_join(tid[i], (void**)&(ptr[i]));
    printf("Thread %d returned: %d\n", i, *ptr[i]);
    i++;
    pthread_join(tid[i], (void**)&(ptr[i]));
    printf("Thread %d returned: %d\n", i, *ptr[i]);
    i++; // just to show on gdb next command
    return 0;
}

PS: Geany will hang on line 62 after you click [Continue]

@elextr
Copy link
Member

elextr commented Mar 4, 2021

I don't use or know anything about debugger, but I note in your example that there are thread creation messages for threads 0 and 1 but thread 2 hits the breakpoint first. Perhaps thats just a human UI issue, you need to show the output from the GDB/MI interface that debugger uses. But if it is the same I would expect that having a thread it doesn't know about hit the breakpoint may confuse the debugger plugin.

Unfortunately the MAINTAINERS file does not have the debugger plugin maintainer's github username so can't ping them.

@avafinger
Copy link
Author

@elextr

Thank you for your reply.
Thread order execution is not predictable.

I hope they're still around and can give some input here... I need to learn how to debug the Geany plugin and understand how it works.

@nomadbyte
Copy link
Contributor

nomadbyte commented Apr 11, 2021

Not a solution, but you may try to use Scope Debugger -- that's another GDB plugin for Geany (to see the 'scope.html' simply click the plugin's Help in Plugin Manager). It also supports threads and seems to cope somewhat better with following the thread context.

In your example, Scope will even show two execution-line pointers for each of the worker thread states (in All-stop GDB mode), as these share the same work-function.

Minor Scope quirks:

  • to "reveal" the breakpoint markers I sometimes need to do 'Debug>More>Reset Markers';
  • also, when seeing Autos tab, sometimes it needs a Refresh from the context-menu.

BTW, in your example code, j variable rolls-over to negative (-1) as it's a signed int, while you're looping on unsigned long beyond the positive int range. Just FYI.

@elextr
Copy link
Member

elextr commented Apr 11, 2021

Also note that both plugins are just user interfaces for GDB which does the actual debugging, so things like "wants to debug thread_start" are actually referring to GDB and you could investigate that running GDB from the command line.

@nomadbyte
Copy link
Contributor

nomadbyte commented Apr 11, 2021

...so things like "wants to debug thread_start" are actually referring to GDB

Well, Scope does not seem to have this issue (i.e. trying to break in start_thread and clone). These functions are properly reported in the Stack tab. This must be an issue which is specific to thread-handling code in Debugger plugin.

@elextr
Copy link
Member

elextr commented Apr 11, 2021

This must be an issue, which is specific to thread-handling code in Debugger plugin.

Quite possibly, IIRC debugger plugin is pretty old and predates current extensive thread usage in every application.

Or as the OP observed, thread execution is non-deterministic, scope may just change the timing on your computer enough that whatever the problem is doesn't occur, ahh the joys of thread debugging 😈.

@nomadbyte
Copy link
Contributor

Looking closer, it's clearly a problem with the way Debugger processes the thread's call-stack. Somehow it removes the current function frame from the stack on stepping (doSomeThing < start_thread < clone). This triggers the UI to process the next item in the stack (start_thread), so it tries to load the source for that frame.Looks like his plugin needs a deep dive to fix this, or at least to figure out what's going on. At least there's a test case, which is a good thing.

@elextr
Copy link
Member

elextr commented Apr 11, 2021

Maybe its something that changed with newer versions of GDB and debugger hasn't been updated, the last substantive change to debugger (excluding GTK3 port and minor things) seems to have been in 2016 AFAICT.

@nomadbyte
Copy link
Contributor

nomadbyte commented Apr 3, 2022

@avafinger: I revisited this issue, and wanted to point out that the original expectation that the stepping from the breakpoint while in the thread-2 context should not be switching to thread-3 context is rather unwarranted in this case.

In your example, both threads share the same task-function, so the breakpoint is also common, it's hit by thread-2, but the step-over continues the execution of the whole process. Thus the same breakpoint is then swiftly hit by the thread-3, as it should.

If you'd like to continue stepping in thread-2 context, you'd need to explicitly switch to thread-2 in the Call Stack pane. The subsequent stepping should indeed preserve the selected thread context (unless another thread hits its own breakpoint someplace), This also aligns with how its handled in the interactive GDB session.

Anyway, I did some digging and I believe I've got the intended thread-switching/stepping behavior working (at least with your test code). I'll be pushing the changes soon.

@avafinger
Copy link
Author

Hi @nomadbyte , thank you, and nice you are working on this.

both threads share the same task-function

In a real-world scenery, the same function is called by many threads, hundreds, or thousands. Can you imagine a web server serving a connection with thousands of users connecting at the same time? Would be impossible to debug a single thread if the user should switch back the context manually in every step.

I am basing my assumptions comparing to what Visual studio does, once the breaking point is hit in a thread (first time), all others hit from other threads on the same function are ignored or halted and you can debug that specific thread (the first hit), step by step without worrying about the context switching to other thread. If I am not mistaken, Eclipse does the same but I am not an eclipse user, just mentioning it.

Once you push your code I can test with a more complex sample. But I think the sample code should be as simple as possible.

Thank you and Cheers

@nomadbyte
Copy link
Contributor

Here it is (#1170). Mind that this does not change the original design, simply enforces the intended behavior when debugging multi-threaded programs. This aligns with the GDB's All-Stop Mode (default) flow, also as shown in your GDB interactive session above.

This should also take care of the "annoying" (I would guess, unintended) error-dialogs popping when trying to step from the thread's breakpoint, which complained about missing system sources for the pthread-frames etc.

@avafinger
Copy link
Author

avafinger commented Apr 4, 2022

@nomadbyte

I tested your fix with the sample thread2.c (attached). Here are my findings:

  1. No more hangs (fixed!)
  2. no more popups
  3. It now debugs the thread in the same context, well, that depends on the concept, let me try to explain below:

Threads are not predictable, but in the sample, Thread[0] will always finish first, because it is fired first, in theory.

First run:

./thread2
Thread[0] created successfully
Thread[0] processing...
Inside First thread: Thread[0]
Thread[1] created successfully
Thread[1] processing...
Inside Second thread: Thread[1]
Thread[0]: 0x7f102b58b700 [ x = 0xFFFFFFFF ]
Thread[0] finished successfully with status: 0
Thread[1]: 0x7f102ad8a700 [ x = 0xFFFFFFFF ]
Thread[1] finished successfully with status: 1

Second run:

./thread2
Thread[0] processing...
Inside First thread: Thread[0]
Thread[0] created successfully
Thread[1] created successfully
Thread[1] processing...
Inside Second thread: Thread[1]
Thread[0]: 0x7f6cbc155700 [ x = 0xFFFFFFFF ]
Thread[0] finished successfully with status: 0
Thread[1]: 0x7f6cbb954700 [ x = 0xFFFFFFFF ]
Thread[1] finished successfully with status: 1

So we can assume that. Maybe printf is not the best thing to use in the example.

Now let's build and put a breakpoint on lines 25 and 34.
Add variable i to Watch and debug.
In my assumption, if I am right, when the first thread (Thread[0] or Thread[1] does not matter which one) hit the breakpoint (line 25) and you now can watch the value of i, if you do a [Step Over] and since we should be in the same context, variable i should not change until we exit the thread. Why? because we want to debug this thread (the first hit).

But it switches to the new context when the next thread hits the breakpoint (line 25) and if you keep pushing [Step Over] until the end of the thread, you can notice variable i does not change, but we are now on next thread context.

I think the next breakpoint hit on line 25 should be ignored and we should stay on the previous context until we exit from the thread.

I will port this sample to Visual C and compare the results and mark this Closed or make any new comments.

If anyone would like to comment on the assumption, please, be free to do so.

Anyway, @nomadbyte thank you for your work.
thread2.c.zip

@nomadbyte
Copy link
Contributor

nomadbyte commented Apr 4, 2022

@avafinger: Thanks for the quick turn around with the testing. I'm glad that your results seem to show that the mentioned issues are gone.

If I understand it correctly, your assumption about disregarding the breakpoints in peer threads is not consistent with the GDB All Stop mode.

Just to reiterate this, in GDB All Stop mode:

  • when hitting a breakpoint, GDB will halt all running threads and switch to the context of the thread that has hit the breakpoint
  • stepping/continuing from the Stopped state will resume all concurrent threads
  • in the absence of subsequent breakpoints in the running threads, GDB will maintain the currently selected thread context
  • if a breakpoint is hit by a concurrent thread, GDB will halt all running threads and switch to that thread's context

Not sure how this works for your practical cases, but this GDB behavior does make sense, especially when threads do not share the task-function. When such threads hit breakpoints in their task-function code, it's reasonable -- and convenient too -- to expect thread context to switch to that breaking thread.

As for how to achieve your desired thread context switching (or rather non-switching) behavior -- one simple and common way is by making the breakpoints conditional on the intended thread-id (or some surrogate). In your example, the watched variable i could be used in the shared breakpoint condition (e.g. i==0, for making the breakpoint effective only for thread 0). The condition can be added in the Breakpoints pane.

P.S. pushed some updates (#1170) to prevent the "annoying popups" also on a premature Stop/End of the debugging run.

@avafinger
Copy link
Author

the shared breakpoint condition (e.g. i==0, for making the breakpoint effective only for thread 0)

The example ( variable i ) was used just to show the context had changed when you clicked [Step over].

I agree with the "stepping/continuing from the Stopped state will resume all concurrent threads" and "if a breakpoint is hit by a concurrent thread, GDB will halt all running threads and switch to that thread's context", maybe the design should be changed to support my thinking (or the user). The user wants to debug thread-id 0x123456 and not 0x999999 or 0x777777 which share the same task function and same breakpoints.

I haven't tested this on Visual C yet.
I will test your changes and report back asap.
It's nice to have a stable debugger in Geany!

@elextr
Copy link
Member

elextr commented Apr 5, 2022

@avafinger also just to comment on a couple of things you said above which may indicate a misunderstanding of threads.

Threads are not predictable, but in the sample, Thread[0] will always finish first, because it is fired first, in theory.

No, theory says exactly the opposite, nothing guarantees that, it can be scheduled to run or not at the OS convenience, that depends on what the heaps of other threads running in applications like the desktop and other apps want to do at the same time, no current CPU has enough cores to cope, even hyperthreaded. So it totally depends on scheduling if a newly created thread gets to run right away, or not.

Maybe printf is not the best thing to use in the example.

IIUC in practice1 printf is fine for threads, each printf will be atomic, so individual characters won't mix, and therefore the order of output shows the order of execution, useful for debugging.

Therefore your two runs output are both perfectly reasonable, remember the main is also a thread and subject to scheduling, so looking at the first few lines, in run 1 main returned from pthread_create and ran its "successful" printf before thread[0] got to its first printf, but in the second case thread[0] got to run two printfs before the main thread got to run its "successful" printf. See comment above about scheduling.

[Edit: further interpretation of execution order is left as an exercise for the reader 😄]

Footnotes

  1. POSIX requires printf to take a lock on the file, but some argue it doesn't explicitly require the lock to be held while all characters are output, but in practice most implementations do.

@avafinger
Copy link
Author

avafinger commented Apr 5, 2022

Let's add fuel to the fire. :)

https://it-qa.com/why-is-thread-behavior-considered-unpredictable/

PS: Just for reading... i agree with you all. :)

@avafinger
Copy link
Author

Today i ported the code to WIndows and tested it on Visual C++ Studio and i found the same behavior with the proposed fix.
So, my assumption that the next step should stay in the same context is wrong, when the next Thread hit the same breakpoint Visual C++ debugger switches to the new context. I could sometimes do 6 Steps without changing the context, but that means the following thread's breakpoint wasn't hit yet. It was all about the scheduler.

I tested also your latest update, so far so good.

Thank you @nomadbyte and @elextr

If you have any suggestions on how to switch to previous context within the Geany debugger would be nice.

@nomadbyte
Copy link
Contributor

nomadbyte commented Apr 6, 2022

If you have any suggestions on how to switch to previous context within the Geany debugger would be nice.

@avafinger: I assume, by that you mean switching to a previous frame's context, so that you could also inspect the frame's local vars in addition to pointing at the frame's entry source. In such case -- click the frame's arrow (in Call Stack pane), it will turn yellow instead of gray, and the Debugger will load the frame's local vars in the Autos pane.

@avafinger
Copy link
Author

Good. When opening the Thread ID in the Call stack panel, the debugger switches to this context, which is exactly what I wanted.
Thank you @nomadbyte .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants