Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack dump failure with some clang-compiled 32-bit binaries #29

Closed
vit9696 opened this issue Sep 4, 2016 · 9 comments
Closed

Stack dump failure with some clang-compiled 32-bit binaries #29

vit9696 opened this issue Sep 4, 2016 · 9 comments

Comments

@vit9696
Copy link

vit9696 commented Sep 4, 2016

Hello,

I ran into an issue when debugging clang created binaries. Under certain conditions I get empty stack, even though gdb does manage to produce something:

Dr. Mingw report

-------------------

Error occured on Sunday, September 4, 2016 at 17:07:31.

sample.exe caused an Illegal Instruction at location 00401674 in module sample.exe.

Registers:
eax=00004823 ebx=00000438 ecx=00000780 edx=00000000 esi=005ec034 edi=00000029
eip=00401674 esp=0088fe40 ebp=00000000 iopl=0         nv up ei pl nz na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010202

AddrPC   Params

sample.exe
ntdll.dll       6.2.10586.306
KERNEL32.DLL    6.2.10586.0
KERNELBASE.dll  6.2.10586.494
GDI32.dll       6.2.10586.420
USER32.dll      6.2.10586.306
IMM32.DLL       6.2.10586.0
msvcrt.dll      7.0.10586.0
ole32.dll       6.2.10586.545
combase.dll     6.2.10586.103
RPCRT4.dll      6.2.10586.306
SspiCli.dll     6.2.10586.0
CRYPTBASE.dll   6.2.10586.0
bcryptPrimitives.dll    6.2.10586.420
sechost.dll     6.2.10586.0
OLEAUT32.dll    6.2.10586.0
SHELL32.dll     6.2.10586.545
cfgmgr32.dll    6.2.10586.0
OPENGL32.DLL    6.2.10586.0
windows.storage.dll 6.2.10586.494
advapi32.dll    6.2.10586.63
shlwapi.dll     6.2.10586.0
DDRAW.dll       6.2.10586.0
GLU32.dll       6.2.10586.0
kernel.appcore.dll  6.2.10586.0
shcore.dll      6.2.10586.494
powrprof.dll    6.2.10586.0
DCIMAN32.dll    6.2.10586.3
profapi.dll     6.2.10586.0
VERSION.dll     6.2.10586.0
WINMM.DLL       6.2.10586.0
WINMMBASE.dll   6.2.10586.0
exchndl.dll     0.8.0.0
PSAPI.DLL       6.2.10586.0
mgwhelp.dll     0.8.0.0
dbghelp.dll     6.3.9600.17029
WS2_32.dll      6.2.10586.420
WININET.dll     11.0.10586.545
iertutil.dll    11.0.10586.545
ondemandconnroutehelper.dll 6.2.10586.212
IPHLPAPI.DLL    6.2.10586.0
winhttp.dll     6.2.10586.420
mswsock.dll     6.2.10586.420
NSI.dll         6.2.10586.0
WINNSI.DLL      6.2.10586.0
DNSAPI.dll      6.2.10586.212
dhcpcsvc6.DLL   6.2.10586.420
urlmon.dll      11.0.10586.545
dhcpcsvc.DLL    6.2.10586.420
rasadhlp.dll    6.2.10586.71
clbcatq.dll     2001.12.10941.16384
fwpuclnt.dll    6.2.10586.212
bcrypt.dll      6.2.10586.0

Windows 6.2.9200
DrMingw 0.8.0

gdb output

(gdb) c
Continuing.

Thread 1 received signal SIGILL, Illegal instruction.
0x00401674 in B::myMethod(void*, unsigned int, unsigned int) ()
(gdb) bt
#0  0x00401674 in B::myMethod(void*, unsigned int, unsigned int) ()
#1  0x00401646 in SDL_main ()
#2  0x00501683 in WinMain@16 ()
#3  0x00000001 in ?? ()
#4  0x00000001 in ?? ()
#5  0x00912fc8 in ?? ()
#6  0x5c766564 in ?? ()
#7  0x63736e6f in ?? ()
#8  0x74706972 in ?? ()
#9  0x6c2d7265 in ?? ()
#10 0x5c6d766c in ?? ()
#11 0x706d6173 in ?? ()
#12 0x652e656c in ?? ()
#13 0x20006578 in ?? ()
#14 0xbaadf000 in ?? ()
#15 0xbaadf00d in ?? ()
#16 0xbaadf00d in ?? ()
#17 0xbaadf00d in ?? ()
#18 0xbaadf00d in ?? ()
#19 0xbaadf00d in ?? ()
#20 0xbaadf00d in ?? ()
#21 0xbaadf00d in ?? ()
#22 0xbaadf00d in ?? ()
#23 0xababfeee in ?? ()
#24 0xabababab in ?? ()
#25 0xfeeeabab in ?? ()
#26 0xfeeefeee in ?? ()
#27 0x0000
I suspect the issue to be caused by SDL because I failed to make a sample without it. However, I guess it is not much relevant. I [uploaded](https://github.com/jrfonseca/drmingw/files/453959/sample.zip) a binary, source code, compilation arguments and related stuff. Could anything be done with it?

Regards,
Vit

@jrfonseca
Copy link
Owner

I'll look into this. For my reference, which version of LLVM/Clang are you using?

If you want symbols, you should try passing -g to clang.

I think that the fact that SDL is required to repro this is not coincidence. Prbably LLVM/Clang linker is having troubles in merging the DWARF debugging info from SDL static libraries.

$ i686-w64-mingw32-objdump -d sample.exe | grep -i ud2
  401674:   0f 0b                   ud2    
$ i686-w64-mingw32-addr2line -e sample.exe 401674
i686-w64-mingw32-addr2line: Dwarf Error: Could not find abbrev number 1324.
i686-w64-mingw32-addr2line: Dwarf Error: Could not find abbrev number 26.
i686-w64-mingw32-addr2line: Dwarf Error: Could not find abbrev number 34.
i686-w64-mingw32-addr2line: Dwarf Error: Could not find abbrev number 26.
i686-w64-mingw32-addr2line: Dwarf Error: Could not find abbrev number 205.
i686-w64-mingw32-addr2line: Dwarf Error: Could not find abbrev number 377.
i686-w64-mingw32-addr2line: Dwarf Error: Could not find abbrev number 106.
i686-w64-mingw32-addr2line: Dwarf Error: Could not find abbrev number 74.
i686-w64-mingw32-addr2line: Dwarf Error: Could not find abbrev number 27.
i686-w64-mingw32-addr2line: Dwarf Error: Could not find abbrev number 26.
cygming-crtbegin.c:?
$ i686-w64-mingw32-nm  sample.exe  | sort | grep '^004016..' | i686-w64-mingw32-c++filt  | md-quote 
00401600 T ___gcc_deregister_frame
00401630 T _SDL_main
00401630 t .text
00401649 t setupCrashReporter()
00401656 t .text
00401656 T B::myMethod(void*, unsigned int, unsigned int)
004016a0 T _GPU_GetLinkedVersion
004016a0 t .text
004016c0 t _GPU_GetCompiledVersion

So my initial reading is that the LLVM/Clang is producing invalid DWARF. But the symbols are still there in CODEVIEW format. And GDB might be ignoring all DWARF, but DrminGW assumes DWARF is ok, so it never looks at CODEVIEW symbols.

@vit9696
Copy link
Author

vit9696 commented Sep 5, 2016

I did use clang 3.9 rc3 downloaded from llvm.org if I remember correctly. I do remember reproducing the issue with 3.8 as well, however.
There is something strange with the issue. If I change myMethod anyhow even preserving the function calls the issue will no longer reproduce, and the stack will safely be reconstructed. That's the cause I expect the issue to be much broader than just dwarf generation.

@jrfonseca
Copy link
Owner

jrfonseca commented Sep 7, 2016

The problem here is that you used -fomit-frame-pointer whereas https://github.com/jrfonseca/drmingw/blob/master/README.md#which-options-should-i-pass-to-gcc-when-compiling states you need to use -g and -fno-omit-frame-pointer.

This is because DrMingw/MgwHelp are not currently capable of unwinding the stack unless there is PDB information, or frame pointer is used. Furthermore by conicidence ebp=00000000 which completely confuses StackWalk64.

Use DWARF debug info to unwind the stack (like gdb does) would be nice, and is mentioned on https://github.com/jrfonseca/drmingw/blob/master/TODO.md but there's no ETA.

The only thing I could do is make the code a bit more forgiving towards ebp=00000000, which should allow to print at least the top of the stack.

@vit9696
Copy link
Author

vit9696 commented Sep 7, 2016

You are correct, I completely forgot about -fomit-frame-pointer because it did not affect recent gcc builds anyhow (I guess gcc failed to omit the register in most cases). -g have always been optional as far as I understood this, because by default gcc preserved the symbol table, which was enough to reconstruct the stack unless the exact function was optimised, but frame pointer omission did it. Thanks for a hint.

However, despite -fno-omit-frame-pointer fixing things, it will slow certain things down due to extra register usage, which is rather undesired. Will it be much trouble to perform a raw stack dump for later analysis? (E. g. in a way IDA does, from esp and onwards with 4-byte alignment) I feel that this is going to be better than special casing things.

Also, I tried using pdb file generation with -g and cv2pdb, but it seemed to have failed to work unless -fno-omit-frame-pointer is present as well. Perhaps it is still expecting the frame pointer to be present.

@jrfonseca
Copy link
Owner

I agree that for the ExcHndl case, requiring -fno-omit-frame-pointer is not ideal.

We could indeed either dump a few bytes or do some sort of small analysis like http://www.hexblog.com/?p=104 . The big difficulty is to detect when StackWalk fails due to lack of frame pointer, or merely because it reached the bottom of the stack.

If it gets too complicated, it might be better to spend that time in implement stack unwind via .eh_frame/.debug_frame information, which can be left in release binaries without affecting performance or requiring full debug info.

@vit9696
Copy link
Author

vit9696 commented Sep 7, 2016

It might be my own ignorance but are not .debug_frame/.eh_frame only generated when -g argument is passed? In this case the overall binary may be slower due to -g making a broader stack use for argument passing.

CallStackWalk is an interesting idea, and I feel that it might work rather reliably. I am not fully positive but perhaps generating the call stack could be done according to user preference? Or this method could be used in parallel with the general stack reconstruction.

@jrfonseca
Copy link
Owner

My understanding is that -g does not change the executable code (just the presence/absence of debugging info), so it should have no significant runtime impact.

Also, it seems nowadays .debug_* is generated even without -g. Unless one goes out of its way to strip symbols via -s or binutil's strip it should be there.

@vit9696
Copy link
Author

vit9696 commented Sep 7, 2016

Hmmm, it looks like you are right regarding gcc at least. Similarly to gcc LLVM does not change the optimisation given that a -g flag is present, and furthermore it promises to produce accurate debug info. Perhaps my information was dated or a common misbelief.

As for stripping I think that's what most people do due to size, so I would expect symbol names to be simply missing in general case, which makes .debug_frame parsing a little useless.

I spent a few minutes writing IDA's algo in C++, and it seems to produce relatively decent addresses for me. If you find time to integrate it into Dr. Mingw I will appreciate it.

Sample

bool isPrevInstrCall(void *addr) {
    struct CallPattern {
        ptrdiff_t delta;
        uint8_t op;
    };

    const CallPattern callPatterns[] {
        {-2, 0xFF},
        {-3, 0xFF},
        {-5, 0xE8},
        {-6, 0xFF}
    };

    uint8_t *ptr = static_cast<uint8_t *>(addr);
    for (auto &call : callPatterns) {
        if (!IsBadReadPtr(ptr+call.delta, sizeof(uint8_t)) && 
            call.op == *(ptr+call.delta))
            return true;
    }
    return false;
}

void walkStack() {
    uintptr_t *stackVar {nullptr};
    uintptr_t **sp = &stackVar;

    SYSTEM_INFO info {};
    GetSystemInfo(&info);

    auto pageSize = info.dwPageSize;
    fprintf(stderr, "stack start\n");
    //TODO: get stack segment end address
    while (!IsBadReadPtr(sp, sizeof(uintptr_t))) {
        if (!IsBadReadPtr(*sp, sizeof(uintptr_t))) {
            auto pagePointer = reinterpret_cast<uintptr_t *>(reinterpret_cast<uintptr_t>(*sp) & (~(pageSize-1)));
            MEMORY_BASIC_INFORMATION info {};

            if (VirtualQuery(pagePointer, &info, sizeof(MEMORY_BASIC_INFORMATION)) == sizeof(MEMORY_BASIC_INFORMATION) &&
                (info.Protect & (PAGE_EXECUTE|PAGE_EXECUTE_READ|PAGE_EXECUTE_READWRITE|PAGE_EXECUTE_WRITECOPY)) &&
                isPrevInstrCall(*sp)) {
                fprintf(stderr, "[0x%p]\n", *sp);
            }   
        }
        sp++;
    }
    fprintf(stderr, "stack end\n");
}

@jrfonseca
Copy link
Owner

Thanks @vit9696. I didn't have time this week, and I'm not sure when I'll have, but to avoid forgetting this, I've file this as a separate feature request issue #31.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants