Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added support for remotely specifiying debugee and re-running targets #50

Merged
merged 3 commits into from Aug 31, 2016

Conversation

jr64
Copy link
Contributor

@jr64 jr64 commented Aug 22, 2016

Hi, I've added a new option to gef-remote so you can tell the gdbserver which executable you want to run. Additionally this patch allows you to restart processes and specify arguments (when you are connected in extended-remote mode).

Example (server started with gdbserver --multi 0.0.0.0:1337):

gef➤  gef-remote -f /bin/ls 10.0.0.83:1337
[+] Connected to '10.0.0.83:1337'
Reading /bin/ls from remote target...
warning: File transfers from remote targets can be slow. Use "set sysroot" to access files locally instead.
Reading /bin/ls from remote target...
[+] Target is not running, waiting for start
gef➤  run "/proc"
`target:/bin/ls' has disappeared; keeping its symbols.
Starting program: target:/bin/ls "/proc"
Reading /lib/ld-linux.so.2 from remote target...
Reading /lib/ld-linux.so.2 from remote target...
Reading /lib/ld-2.11.2.so from remote target...
Reading /lib/.debug/ld-2.11.2.so from remote target...
[+] Targeting PID=3742
[+] Downloading remote information
[+] Remote information loaded, remember to clean '/tmp/gef/3742' when your session is over
Reading /lib/libselinux.so.1 from remote target...
Reading /lib/librt.so.1 from remote target...
Reading /lib/libacl.so.1 from remote target...
Reading /lib/libc.so.6 from remote target...
Reading /lib/libdl.so.2 from remote target...
Reading /lib/libpthread.so.0 from remote target...
Reading /lib/libattr.so.1 from remote target...
Reading /lib/librt-2.11.2.so from remote target...
Reading /lib/.debug/librt-2.11.2.so from remote target...
Reading /lib/libc-2.11.2.so from remote target...
Reading /lib/.debug/libc-2.11.2.so from remote target...
Reading /lib/libdl-2.11.2.so from remote target...
Reading /lib/.debug/libdl-2.11.2.so from remote target...
Reading /lib/libpthread-2.11.2.so from remote target...
Reading /lib/.debug/libpthread-2.11.2.so from remote target...
[Inferior 1 (process 3742) exited normally]
[*] No debugging session active

And (gdbserver 0.0.0.0:1337 /bin/ls):

gef➤  gef-remote -E 10.0.0.83:1337
warning: Could not load vsyscall page because no executable was specified
try using the "file" command first.
0xb7fe3850 in ?? ()
[+] Connected to '10.0.0.83:1337'
[+] Targeting PID=3747
[+] Downloading remote information
[+] Remote information loaded, remember to clean '/tmp/gef/3747' when your session is over
gef➤  r
Starting program:  
warning: Could not load vsyscall page because no executable was specified
try using the "file" command first.
[+] Targeting PID=3750
[+] Downloading remote information
[+] Remote information loaded, remember to clean '/tmp/gef/3750' when your session is over
[Inferior 1 (process 3750) exited normally]
[*] No debugging session active

This works by adding a hook to gdb that downloads the remote info each time "continue" is executed and the PID has changed since the last time.

One thing I had to change to make this work is that in extended-remote mode, setup_remote_environment no longer uses the file command on the local, downloaded file because then the next run would start the local file instead of the remote one. I hope this doesn't break anything in the other commands.

@@ -3246,22 +3246,25 @@ class RemoteCommand(GenericCommand):
def __init__(self):
super(RemoteCommand, self).__init__(prefix=False)
self.add_setting("proc_directory", "/proc")
gdb.events.cont.connect(self.continue_handler)
Copy link
Owner

@hugsy hugsy Aug 23, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any particular reason you've chosen to add a GDB continue event handler instead of gdb.events.new_objfile?
Not that I care so much about performance, but your handler will be executed every time GDB continues an execution. By the look of your patch, I believe new_objfile would be more appropriate, but maybe there is a reason you did not use it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are completely right, new_objfile is the better solution, I simply overlooked it when reading the docs. I'll fix it today if I have time or else tomorrow.

@hugsy
Copy link
Owner

hugsy commented Aug 23, 2016

No worries at all.

I will review your code when you re-submit.

And also, since we are adding a new feature, I would ask you to add the proper documentation about it in the file docs/features.md (search for gef-remote).

Cheers,

@jr64
Copy link
Contributor Author

jr64 commented Aug 24, 2016

I played around with gef-remote a bit more today and I think there are way too many combinations and features (for instance with remote/extended-remote, file and set remote exec-file) to support them all with gef-remote, so I chose another approach.

This new patch makes it so that all remote features should work even if you manually connect with the target command. Of course I left gef-remote in as a convenience function and for backwards compatibility.

What I implemented is dynamic detection of remote debugging and lazy loading + local caching of remote files. For instance if gef tries to access /proc/1234/exe and remote debugging is active, the file will be downloaded and cached in /tmp/gef from where it is reused on the next access. In my opinion this is a better approach because it doesn't force the use of gef-remote and is a lot more flexible. The downside is that the cache checking costs a bit more performance than the old method.

Additionally, I expanded the "download all libs" feature (-A): It previously only downloaded the libraries currently loaded at the time you attach, now it listens on new_objfile events and also grabs libraries loaded later.

However there is still one major problem which existed before and I didn't fix: /proc/$pid/maps is only fetched once and never updated afterwards. With my patch you could change the "True" in line 1519 to "False" to disable the cache, but I found that get_process_maps is called a lot so this might not be the best approach. I thought about invalidating the maps cache each time the target is continued, but it still might be modified in the background by another thread. Please let me know what you think the most appropriate solution would be, thanks!

@hugsy
Copy link
Owner

hugsy commented Aug 25, 2016

I really like your ideas, but your diff is actually hard to read because it appears that you've manually merged some newly pushed commits, instead of git-rebasing.

Can you rebase against the HEAD of the master branch and resubmit a "clean" patch ?

Regarding your note on /proc/$pid/maps, I rely on the function get_process_maps() which is cached for improving performance (with the @memoize decorator). But this cache can also be flushed by calling the reset_all_caches() function. This way your next call to get_process_maps() will force gef to fetch the updated values. Hope this helps, and looking forward to read your final patch.

@jr64
Copy link
Contributor Author

jr64 commented Aug 25, 2016

Looks like I got the version of the script I based the changes on mixed up the second time - sorry for that!

Regarding the @memoize decorator, unless I'm missing something, I don't think it is currently working as it is supposed to be.

def __call__(self, *args):
    if args not in self.cache:
        value = self.func(*args)
        self.cache[args] = value
            return value
    return self.func(*args)

The last line is probably supposed to be return self.cache[args] or func will be called regardless of the cache status. Maybe I'm missing something but if you add a print to get_process_maps you see it called multiple times when a breakpoint is hit.

Is the user supposed to reset the cache manually with reset-cache? I didn't find any other calls to reset().

hugsy pushed a commit that referenced this pull request Aug 25, 2016
@hugsy
Copy link
Owner

hugsy commented Aug 25, 2016

I've updated the memoize mechanism thanks to functools. It works as expected, better and more pythonic code 😄

For the moment yes, the user manually resets the cache, but in real life I never had to call it. I believe a better approach would be also to add a gdb.events on new_objfiles. It would probably cleaner and less confusing... I'll investigate that on the weekend.

Thanks for your feedbacks to make gef better!

Cheers,

@Grazfather
Copy link
Collaborator

Why not use the built in memoizer, lru_cache?

It is possible to use `gef` in a remote debugging environment.
It is possible to use `gef` in a remote debugging environment. Required files
will be automatically downloaded and cached in a temporary directory (`/tmp/gef` on most
Unix systems). Remember to manually delete the cache if you change the target file or
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't it use the info file information or something to at least get the binary name, and not sure an old binary ?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it is better to rely on gdb.events instead. That is the only way to trigger some actions when the target binary has changed, and only when it has changed.

@hugsy
Copy link
Owner

hugsy commented Aug 25, 2016

lru_cache was introduced in Python 3.2 and I want to preserve the Python2/Python3 compatibility since I still happen to find quite a lot of GDB compiled with Python2.

@Grazfather
Copy link
Collaborator

Ah that makes sense.

Hopefully the memoization fix will speed gef up noticeably.

@hugsy
Copy link
Owner

hugsy commented Aug 25, 2016

@Grazfather : Yeah me too 😄 I don't know when the first version of the memoize stopped working as expected, but yeah, it is better to cache some info since they are expensive to get, and they will not change as long as the debug session is running.

@jr64 : I'll definitely have a look in the next few days at how to use a gdb.events.new_objfile so that user never have to reset the cache manually, but it does not seem that hard to add.

@jr64
Copy link
Contributor Author

jr64 commented Aug 26, 2016

@hugsy : Great to hear, I'm looking forward to it.

hugsy pushed a commit that referenced this pull request Aug 29, 2016
… new_objfile to detect when cache should be resetted
@hugsy
Copy link
Owner

hugsy commented Aug 29, 2016

Hi @jr64

From all the tests I did, now gef is fully capable of detecting when a new file is being loaded, and will reset the memoized versions of any function that must be (see commit 63d5b49).
Feel free to try it on your end, and if it works, rebase to this version to create your patch. Looking forward to know how turns out 😉

Cheers,

@jr64
Copy link
Contributor Author

jr64 commented Aug 29, 2016

Hey @hugsy
Your patch works great, but I can still think of some situations where the cache would be outdated.

For instance, if you have a breakpoint before and after mmaping a file, those new changes will not be reflected in vmmap because that doesn't trigger the new_objfiles event and it still uses the cached version.

In my opinion, the best solution would be to explicitly invalidate the /proc/$pid/maps cache when the vmmap command is executed. Additionally, I'd add the option to automatically invalidate the maps cache each time a breakpoint is hit. In most scenarios you are probably debugging a reasonably fast machine on a reasonably fast (most likely local) connection and the overhead of fetching /proc/$pid/maps on each breakpoint is negligible. If not, it could still be turned off. The information might still be outdated because some background thread maps something but I guess that shouldn't happen too often.

Imho accurately displaying the memory-layout at all times is really important, especially for exploit development and well worth the little bit of overhead.

@hugsy
Copy link
Owner

hugsy commented Aug 29, 2016

Hey @jr64

I totally agree, having an accurate view of the process memory layout is primordial.
But your "vmmap" solution is dodgy, although I agree with your point. vmmap is more or less the GDB command to call the gef function get_process_maps(). This is the function that is cached.

Waiting for the user to re-type vmmap to invalidate the cache means that all other functions and commands that depends on the memory mapping might still be using outdated values. So I won't do it, especially since there is also the more explicit command reset-cache. In addition, performance is a bit of an issue especially for QEMU VMs, which are very slow already (that is why I added caches in a first place).

Refreshing the cache at breakpoints might do the trick. Need more tests...

@jr64
Copy link
Contributor Author

jr64 commented Aug 29, 2016

@hugsy I meant resetting the cache when executing vmmaponly as an additional measure just so that the info displayed is always up-to-date when the user explicitly requests the information (for instance in the case where a background thread modifies something while execution is paused or if refreshing the cache each breakpoint isn't possible for performance reasons). I totally agree with you that only invalidating on execution of the vmmap command isn't enough, maybe I should have been more clear.

If you don't want to do that, I think documentation should be more clear - most users probably wouldn't expect that vmmapmight display outdated information if you don't run reset-cache first.

@hugsy
Copy link
Owner

hugsy commented Aug 30, 2016

It's not so much that I'm against implementing it, I just don't think it is the best approach. Need more time to think about it though. I would prefer an automated approach, preferably based on events or something similar.

In any case, yes I'll add a note on the vmmap documentation.

@jr64
Copy link
Contributor Author

jr64 commented Aug 30, 2016

I think you are right, hopefully you can find a more "complete" solution to the problem.

Anyways, I fixed the merge conflict, looking forward to your review.

@hugsy
Copy link
Owner

hugsy commented Aug 31, 2016

iz all guud 😄

thanks for your contrib !

@hugsy hugsy merged commit 9dce8bf into hugsy:master Aug 31, 2016
@hugsy
Copy link
Owner

hugsy commented Aug 31, 2016

And the note regarding the cache was added in commit c842e4c.

@jr64
Copy link
Contributor Author

jr64 commented Aug 31, 2016

Great to hear, thank you very much!

SakiiR pushed a commit to SakiiR/gef that referenced this pull request Jul 1, 2019
SakiiR pushed a commit to SakiiR/gef that referenced this pull request Jul 1, 2019
…d and new_objfile to detect when cache should be resetted
SakiiR pushed a commit to SakiiR/gef that referenced this pull request Jul 1, 2019
added support for remotely specifiying debugee and re-running targets
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants