Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic jemalloc command for printing arenas info with bin #2176

Draft
wants to merge 11 commits into
base: dev
Choose a base branch
from

Conversation

jetchirag
Copy link
Contributor

For #2174

First command for jemalloc to print basic arenas details, it is not modularized and currently only for basic testing purpose.

pwndbg> arenas_info
Number of arenas:  1
Arena Address:  0x7ffff78010c0
Index   Address Slabcur
0 0x7ffff7814528 0x7ffff7816980
1 0x7ffff7814608 0x0
2 0x7ffff78146e8 0x0
...

Signed-off-by: Chirag Aggarwal <thechiragaggarwal@gmail.com>
Signed-off-by: Chirag Aggarwal <thechiragaggarwal@gmail.com>
Signed-off-by: Chirag Aggarwal <thechiragaggarwal@gmail.com>
@jetchirag jetchirag marked this pull request as draft May 20, 2024 21:34
@CptGibbon
Copy link
Collaborator

Awesome!
I'll get jemalloc installed and take a look at this 👍

@CptGibbon CptGibbon self-requested a review May 24, 2024 16:15
@CptGibbon
Copy link
Collaborator

CptGibbon commented May 24, 2024

@jetchirag
Huh I'm seeing the below error when I try the new command whilst debugging the example program from #2174
What am I doing wrong?

Screenshot 2024-05-24 at 09 47 34

My jemalloc version is 5.3.0-182-gf9c0b5f7f8a917661db39289e38ec94d9d198f11

@jetchirag
Copy link
Contributor Author

@CptGibbon Are you on dev branch? It was renamed, I'm using 5.3.0. Btw to update on progress, I'm currently working on parsing rtree which stores extent metadata, will add a command for that soon.

@CptGibbon
Copy link
Collaborator

CptGibbon commented May 24, 2024

Are you on dev branch?

I'm on the jemalloc_1 branch that this PR is based on.

EDIT: Oh I guess you meant dev branch of jemalloc?
In which case: yes.
Which branch should I use? I see master, stable-3 & stable-4, but they're all considered "stale" with their last update at least a couple of years ago.

@jetchirag
Copy link
Contributor Author

jetchirag commented May 24, 2024

Are you on dev branch?

I'm on the jemalloc_1 branch that this PR is based on.

EDIT: Oh I guess you meant dev branch of jemalloc? In which case: yes. Which branch should I use? I see master, stable-3 & stable-4, but they're all considered "stale" with their last update at least a couple of years ago.

The latest release, version 5.3.0.

Edit: you can download it from https://github.com/jemalloc/jemalloc/releases/tag/5.3.0

@CptGibbon
Copy link
Collaborator

Ah okay thanks 👍
Screenshot 2024-05-24 at 14 25 52

@CptGibbon
Copy link
Collaborator

Great stuff, just a couple of notes for this PR:

  1. I think renaming the command to something with "jemalloc" in the title makes it clear which allocator it's for (this will be cumbersome for now but if we implement automatic allocator detection later it could change to something more terse)
  2. Specify somewhere which version of jemalloc this is for (perhaps just print it in the command output for now, we can do version detection later) that way people won't be confused like I was if it doesn't work
  3. Amend setup-dev.sh so that reviewers don't have to manually install jemalloc and build your test program

Signed-off-by: Chirag Aggarwal <thechiragaggarwal@gmail.com>
Signed-off-by: Chirag Aggarwal <thechiragaggarwal@gmail.com>
Signed-off-by: Chirag Aggarwal <thechiragaggarwal@gmail.com>
@jetchirag
Copy link
Contributor Author

@CptGibbon Added prefixes to command, and also installation script. Also pushed some WIP code for rtree parsing.

@CptGibbon
Copy link
Collaborator

@CptGibbon Added prefixes to command, and also installation script. Also pushed some WIP code for rtree parsing.

Okay thanks, great work 👍
I would recommend creating tests for these new jemalloc commands before adding any more to this PR.
Let me know if you want help writing them.

@CptGibbon
Copy link
Collaborator

I'm seeing a few lint commits, I find working in an IDE that supports devcontainers can alleviate this.
pwndbg has a devcontainer setup that will automatically install the appropriate formatting & linting extensions and run them each time you save a file.

…nd extent data

Signed-off-by: Chirag Aggarwal <thechiragaggarwal@gmail.com>
@jetchirag
Copy link
Contributor Author

So I have removed the old commands and added proper class for generating memory mapping by jemalloc. There are lots of todos, I'll be adding one or two classes for rtree (including one to finding extent information from memory pointer address given by malloc()) and other structures then move onto actual command followed by tests as I'd also like to get started with test early since there's that discussed problem of testing memory addresses.

Things were slow past 2 weeks, I did read a lot of documentations on the same and will try to pick up pace now.

Also, thank you for dev containers suggestion though seems like it didn't fix linting on setup-dev.sh or maybe I forgot to run linting on it. It's just whitespace, I'll fix that in next commit, hopefully

Signed-off-by: Chirag Aggarwal <thechiragaggarwal@gmail.com>
Signed-off-by: Chirag Aggarwal <thechiragaggarwal@gmail.com>
Signed-off-by: Chirag Aggarwal <thechiragaggarwal@gmail.com>
Signed-off-by: Chirag Aggarwal <thechiragaggarwal@gmail.com>
@jetchirag
Copy link
Contributor Author

@CptGibbon I think now we have something which resembles malloc_chunk and heap command which is the primary goal. The structure's still not there yet and tests aren't done but it would be great if I can get feedback on this so we can make changes early.

@CptGibbon
Copy link
Collaborator

@CptGibbon ... it would be great if I can get feedback on this so we can make changes early.

No problem, I'll review today.
Thanks for the update 🙏

@CptGibbon
Copy link
Collaborator

@jetchirag
I loaded your test program into GDB and ran your new commands.
Screens of my experience below:

jemalloc_heap

Screenshot 2024-06-20 at 15 56 46

jemalloc_heap appears to work, though some of those addresses look spurious, can you confirm this is correct behavior?

find_extent

Screenshot 2024-06-20 at 16 02 27

find_extent appears to work, again could you confirm this is correct behavior?

extent_info

Used by the jemalloc_heap and find_extent commands.

jemalloc_base_info

Screenshot 2024-06-20 at 16 06 04

No dice.

get_extent

Screenshot 2024-06-20 at 16 08 48

Not sure if I'm using the correct address as an argument here but nothing I try works...

jemalloc_test

Screenshot 2024-06-20 at 16 09 45

One last note; if you want your test program to get linked with jemalloc you'll need to add a corresponding line to the makefile that builds tests.

@jetchirag
Copy link
Contributor Author

Thank you for testing @CptGibbon
Currently only jemalloc_heap,jemalloc_find_extentand jemalloc_extent_info are to be used, rest are just for testing purpose. Since I've changed the structure a lot, some old test ones would have broke, it's fine.

Your output for heap and find extent seems to be expected. Yes there are some false addresses, some of it seems to be due to cache_oblivious, I'll investigate that and add note in output if needed or fix it. Any comment on the code itself?

@CptGibbon
Copy link
Collaborator

CptGibbon commented Jun 21, 2024

Any comment on the code itself?

I'm usually of the mind that if it works it works, but I can take a closer look 👍

So far my only concern aside from the address issue is that jemalloc_heap was rather slow to complete, at around 3 seconds. This was running on a free codespaces instance of the smallest variety, but that's still rather slow compared to most commands.

Great work 👌

@jetchirag
Copy link
Contributor Author

So far my only concern aside from the address issue is that jemalloc_heap was rather slow to complete, at around 3 seconds. This was running on a free codespaces instance of the smallest variety, but that's still rather slow compared to most commands.

Yes that has been my concern as well. I've mentioned this in comment, it's happening due to the size of rtree. I was trying to find it there's an alternate way, perhaps it maintains caches or thread specific information, will check.

@CptGibbon
Copy link
Collaborator

Just 1 note and 1 question:

A current development goal is to uncouple as much code from the gdb library as possible, hopefully this makes testing easier and makes room for different debugger backends.

One step towards this is gdblib.memory.fetch_struct_as_dictionary()

def fetch_struct_as_dictionary(
struct_name: str,
struct_address: int,
include_only_fields: Set[str] | None = None,
exclude_fields: Set[str] | None = None,
) -> GdbDict:
struct_type = gdb.lookup_type("struct " + struct_name)
fetched_struct = get_typed_pointer_value(struct_type, struct_address)
return pack_struct_into_dictionary(fetched_struct, include_only_fields, exclude_fields)

Which will eventually become something like debugger.fetch_struct_as_dictionary().

An example use can be found in gdblib.heap.ptmalloc.fetch_chunk_metadata().

The transition to debugger-agnostic code is still underway, so if using the above function would slow you down too much it's not a show-stopper, but it's something to consider.

I see you credit jegdb in the source, which code specifically is taken from jegdb?

@jetchirag
Copy link
Contributor Author

@CptGibbon I'll check that function. I don't think there's much code taken directly from that script. The code for bit shifting bitfield was directly used

https://github.com/pwndbg/pwndbg/pull/2176/files#diff-c96fafbf3d980a07d7f741d99185b64e1a09dd7371ac617dc6f7bffb25b7f68bR380-R389

and rtree_leaf_elm_bits_extent_get

Primarily I used the script as base to correlate with jemalloc source code to understand structure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants