Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance heap with for static-linked binaries & remove typeinfo bloat #1173

Merged
merged 1 commit into from
Sep 28, 2022

Conversation

disconnect3d
Copy link
Member

This commit enhances the heap commands UX for statically linked binaries and removes typeinfo module bloat.

The typeinfo module had this typeinfo.load function that was looking up a given type. If it didn't find the type, it fallbacked to compiling many many system headers in order to add a symbol for a given type into the program. This was supposed to be used for missing glibc malloc symbols like malloc_chunk.

However, the exact reason it was used: the struct malloc_chunk was never defined in a header file and was always defined in a malloc.c or another .c file in glibc sources.

Another place the typeinfo.load logic of compiling headers was/is used is the dt command, which is a windbg alias for getting struct layout type information, e.g.:

pwndbg> dt 'struct malloc_chunk'
struct malloc_chunk
    +0x0000 mchunk_prev_size     : size_t
    +0x0008 mchunk_size          : size_t
    +0x0010 fd                   : struct malloc_chunk *
    +0x0018 bk                   : struct malloc_chunk *
    +0x0020 fd_nextsize          : struct malloc_chunk *
    +0x0028 bk_nextsize          : struct malloc_chunk *
pwndbg>

However, the whole big issue with typeinfo.load compilation of headers was that most of the time it didn't work because e.g. some headers defined in other paths were missing or that two different headers used the same struct/function name and the compilation failed.

Since this logic almost never gave good results, I am removing it.

Regarding UX for statically linked binaries: we use info dll command to see if a binary is statically linked. While this method is not robust, as it may give us wrong results if the statically linked binary used dlopen(...) it is probably good enough.

Now, if a heap related command is executed on statically linked binaries, it will inform the user and set the resolving of libc heap symbols via heuristics. Then, it also says to the user they have to set the glibc version and re-run the command.

This commit enhances the heap commands UX for statically linked binaries
and removes typeinfo module bloat.

The typeinfo module had this typeinfo.load function that was looking up a given type.
If it didn't find the type, it fallbacked to compiling many many system
headers in order to add a symbol for a given type into the program. This was
supposed to be used for missing glibc malloc symbols like malloc_chunk.

However, the exact reason it was used: the struct malloc_chunk was never
defined in a header file and was always defined in a malloc.c or another
.c file in glibc sources.

Another place the typeinfo.load logic of compiling headers was/is used
is the `dt` command, which is a windbg alias for getting struct layout
type information, e.g.:
```
pwndbg> dt 'struct malloc_chunk'
struct malloc_chunk
    +0x0000 mchunk_prev_size     : size_t
    +0x0008 mchunk_size          : size_t
    +0x0010 fd                   : struct malloc_chunk *
    +0x0018 bk                   : struct malloc_chunk *
    +0x0020 fd_nextsize          : struct malloc_chunk *
    +0x0028 bk_nextsize          : struct malloc_chunk *
pwndbg>
```

However, the whole big issue with typeinfo.load compilation of headers
was that most of the time it didn't work because e.g. some headers
defined in other paths were missing or that two different headers used
the same struct/function name and the compilation failed.

Since this logic almost never gave good results, I am removing it.

Regarding UX for statically linked binaries: we use `info dll` command
to see if a binary is statically linked. While this method is not
robust, as it may give us wrong results if the statically linked binary
used `dlopen(...)` it is probably good enough.

Now, if a heap related command is executed on statically linked binaries, it
will inform the user and set the resolving of libc heap symbols via
heuristics. Then, it also says to the user they have to set the glibc
version and re-run the command.
@disconnect3d
Copy link
Member Author

@CptGibbon @lebr0nli pls review that if u have time

@codecov-commenter
Copy link

codecov-commenter commented Sep 27, 2022

Codecov Report

Merging #1173 (0838426) into dev (4ad2d76) will increase coverage by 0.10%.
The diff coverage is 12.50%.

@@            Coverage Diff             @@
##              dev    #1173      +/-   ##
==========================================
+ Coverage   52.66%   52.76%   +0.10%     
==========================================
  Files         178      178              
  Lines       20037    20004      -33     
  Branches     1835     1830       -5     
==========================================
+ Hits        10552    10555       +3     
+ Misses       9082     9041      -41     
- Partials      403      408       +5     
Impacted Files Coverage Δ
pwndbg/gdblib/typeinfo.py 89.09% <0.00%> (+35.24%) ⬆️
pwndbg/commands/__init__.py 67.09% <7.14%> (-2.59%) ⬇️
pwndbg/heap/ptmalloc.py 38.44% <100.00%> (ø)
pwndbg/net.py 32.95% <0.00%> (ø)
tests/test_attachp.py 88.13% <0.00%> (ø)
pwndbg/commands/elf.py 26.31% <0.00%> (ø)
pwndbg/commands/rop.py 39.39% <0.00%> (ø)
pwndbg/commands/argv.py 51.25% <0.00%> (ø)
pwndbg/commands/vmmap.py 43.47% <0.00%> (ø)
pwndbg/symbol.py 41.62% <0.00%> (+0.28%) ⬆️
... and 3 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@CptGibbon
Copy link
Contributor

Yeah this is so much better ❤️

Only snag is that heap still raises an error after printing the help message.

heap_error

@lebr0nli
Copy link
Contributor

I have the same opinion as @CptGibbon, this enhancement looks great!

Btw the snag @CptGibbon mentioned is my fault, I'm sorry.

I did that before because I think raising an exception here will make it easier to debug, but seems that it's not a good idea to show that exception to the user.
(#1075 (comment))

Maybe we should still raise it but somehow we change the message of that exception to a more user-friendly message?
Or maybe there is a better way to stop all the functions without using exception?
(I'm not sure, I didn't have a good idea about handling this exception yet.)

@disconnect3d disconnect3d merged commit 1ca4d2d into dev Sep 28, 2022
@disconnect3d
Copy link
Member Author

@lebr0nli I believe that exception is thrown due to a glibc version check.

We should refactor the code so that the glibc version is checked earlier on, and, if it is not set, a proper message is displayed to the user to set it.

We could also......... fallback and set glibc version to the glibc installed in the user system and warn them that we did exactly this but the question is: can we retrieve the version the user's system glibc is compiled with? Probably through reading symbolic links?

@disconnect3d disconnect3d deleted the enhance-heap-with-no-dbgsyms branch September 28, 2022 12:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants