Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BridgedIterables are inherently slow #24

Closed
fmagin opened this issue Jun 18, 2019 · 6 comments
Closed

BridgedIterables are inherently slow #24

fmagin opened this issue Jun 18, 2019 · 6 comments

Comments

@fmagin
Copy link
Contributor

fmagin commented Jun 18, 2019

Due to all the round trips BridgedIterables are fairly slow to the point of being unusable for interactive use, at least with large binaries (~6-7k functions).
%time [ f for f in fm.getFunctions(True)] takes nearly 2 minutes for me in this case.

My idea for this would be to extend BridgedIterables with a special method (list() maybe) that converts the iterable to a list on the server side of the bridge (which should be fairly fast), on then transfers it as one message over the bridge. I am not sure if this will work well, given that the result will just be a list of references to objects on the other side of the bridge and accessing them when iterating over the list will be slow again. Maybe a more general special function that allows the evaluation of arbitrary expressions on the server side where only the result is transferred would be an option too. Would lead to uglier code, but should be a speedup that would justify this IMO. E.g. getting the list of all function names ([ f.name for f in fm.getFunctions(True)]) should be roughly a few hundred ms this way vs over two minutes.

Another approach might be to try improving the speed of the bridge itself (e.g. an option for using OS specific IPC instead of TCP). A quick benchmark on my machine ( using https://stackoverflow.com/questions/14973942/tcp-loopback-connection-vs-unix-domain-socket-performance found https://stackoverflow.com/questions/14973942/tcp-loopback-connection-vs-unix-domain-socket-performance) shows ~5 times higher throughput and ~half the latency when using unix sockets vs tcp. This does not account for any python overhead though.

@fmagin
Copy link
Contributor Author

fmagin commented Jun 18, 2019

I have code for this eval command btw, I just haven't managed to access the special variables like currentProgram on the remote side yet. Can push this later.

@justfoxing
Copy link
Owner

The list() idea is the simplest, but as you note, may run into problems down the line. I'd be interested in seeing your eval suggestion.

Unix sockets aren't something that I'm keen on - I do a lot of my work on Windows as well, so ghidra_bridge needs to be cross-platform.

@fmagin
Copy link
Contributor Author

fmagin commented Jun 19, 2019

Currently trying to create a PoC with UnixStreamSockets, which would be trivial, but fails because the SocketServer module in ghidra Jython doesn't provide UnixStreamSockets...
This would obviously be some flag or OS detection, I agree with you that cross platform support should be the goal. But the potential order of magnitude speedup would still justify some extra logic for this IMO. Especially for headless analysis tasks which probably run on some linux server anyway.

The eval code I have is fairly simple, I just added a remote_eval function somewhere that sends a string that gets evaluated inside eval() and the result gets sent back. But I don't understand how to access the namespace and context yet. Maybe I am greatly underestimating the complexity needed for this.

That feature could then be wrapped in some IPython magic so:

%gidra_eval
[ f.name for f in currentProgram.functionManger.getFunctions(True)]

Would send the entire cell to the server to be evaluated and only return the result.

@fmagin
Copy link
Contributor Author

fmagin commented Jun 20, 2019

The basic idea around eval can be seen in fmagin@3584047

I am not sure how to support accessing the relevant variables and if it is possible to access those on the client side too somehow

@fmagin
Copy link
Contributor Author

fmagin commented Jun 20, 2019

I managed a trick to at least benchmark this, so I can illustrate what kind of speed up this would yield:

In [28]: %time len([f.name for f in currentProgram.functionManager.getFunctions(True)])                                                                                                             
CPU times: user 20.6 s, sys: 3.78 s, total: 24.4 s
Wall time: 3min 12s
Out[28]: 9118
In [29]: %time b.bridge.remote_eval("len([f.name for f in self.handle_dict['%s'].local_obj.functionManager.getFunctions(True)])" % currentProgram._bridge_handle)                                   
CPU times: user 1.31 ms, sys: 206 µs, total: 1.51 ms
Wall time: 83.3 ms
Out[29]: 9118

so roughly a difference of the factor 2000

This might also be the hint I need to make this work with, will open a PR for this when I can at least access the currentProgram variable (which would probably be the most important variable anyway)

@justfoxing
Copy link
Owner

Cheers for that, pulled most of it in with some changes, releasing as v0.0.7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants