Running r2pipe Python in batch #123

zavalyshyn · 2020-11-30T12:26:20Z

Describe the issue

I'm using r2pipe to extract callgraph info from all the binaries in a given folder. For each binary I first open it, then run an "aaa" command and then extract the callgraph in r2 commands format with "agC*" command. Now, there is no specific issue per se, r2pipe works as intended but it takes quite a lot of time to run through all the binaries.

I've checked the examples folder on how to use r2pipe in batch, but the code there is somehow simplified.
I wonder what would be your suggestions on how to improve the code runtime.
For instance, do I really need to quit r2 after each file?

How to reproduce?

Here is my code:

binaries_list = os.listdir(binaries_dir)
batchsize = 1000 # execute files in batches of 1000
total_count = len(binaries_list)

def parseglobalcallgraph(filename):
    filepath = os.path.join(binaries_dir, filename)
    r2 = r2pipe.open(filepath,["-e io.cache=true"])
    r2.cmd('aaa')
    gcg = r2.cmd("agC*") # extract global call graph in r2 commands format
    r2.quit()
    hash_value = hashlib.md5(gcg.encode()).hexdigest()
    return {'hash':hash_value, 'filename':filename}

for i in range(0, len(binaries_list), batchsize):
    batch = binaries_list[i:i+batchsize]
    with Pool(processes=10) as pool:
        results = pool.imap(parseglobalcallgraph, batch)
        pool.close()
        for res in results:
            if (res['hash'] not in hash_db):
                hash_db.add(res['hash'])
                print(res['hash'])
            else:
                continue

Expected behavior

I'd expect it to be much faster but seems like I'm missing something.

Possible fix

Screenshots

Additional context

trufae · 2021-08-24T17:33:05Z

r2pipe is slow, in part because of Python, in part because the way it reads the data from the pipe. you can use the native r2pipe by prefixing the filepath with ccall:// so it will use dlopen(r_core) and do direct C api calls. that will make the script at least 10 times faster.

You can help improving the r2pipe module and profiling that issue. other langs dont have this issue

zavalyshyn · 2021-08-24T20:45:30Z

Many thanks! I didn't know you could do that with prefixes

zavalyshyn closed this as completed Aug 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running r2pipe Python in batch #123

Running r2pipe Python in batch #123

zavalyshyn commented Nov 30, 2020 •

edited

trufae commented Aug 24, 2021

zavalyshyn commented Aug 24, 2021

Running r2pipe Python in batch #123

Running r2pipe Python in batch #123

Comments

zavalyshyn commented Nov 30, 2020 • edited

trufae commented Aug 24, 2021

zavalyshyn commented Aug 24, 2021

zavalyshyn commented Nov 30, 2020 •

edited