In [1]:
from darter.file import parse_elf_snapshot, parse_appjit_snapshot

# Loading and parsing the snapshot file

Here we open the file to inspect. It actually contains *two* snapshots, one is the common base and the other contains the actual user code.  
`parse_elf_snapshot` extracts the 2 blobs for each of the two snapshots, and parses them.
It only returns the second snapshot, which is the interesting one.

By default we are inspecting `sample-app.so`, an included sample file which results from building the default Flutter app.  
To inspect another file, **place the filename here:**

In [3]:
fname = 'sample-app.so'
s = parse_elf_snapshot(fname)

------- PARSING VM SNAPSHOT --------

[Header]
  length = 4733
  kind = 2 ('kFullAOT', 'Full + AOT code')

[Snapshot header]
  version = 'c8562f0ee0ebc38ba217c7955956d1cb'
  features = {'product': True, 'use_bare_instructions': True, 'asserts"': False, 'causal_async_stacks': True, 'bytecode': False, 'arm-eabi': True, 'softfp': True}

  base objects: 95
  objects: 935
  clusters: 5
  code order length = 69

[002c1094]: INFO: Reading allocation clusters...
[002c13a9]: INFO: Reading fill clusters...
[002c2215]: INFO: Reading roots...
[002c2281]: INFO: Snasphot parsed.

------- PARSING ISOLATE SNAPSHOT --------

[Header]
  length = 836159
  kind = 2 ('kFullAOT', 'Full + AOT code')

[Snapshot header]
  version = 'c8562f0ee0ebc38ba217c7955956d1cb'
  features = {'product': True, 'use_bare_instructions': True, 'asserts"': False, 'causal_async_stacks': True, 'bytecode': False, 'arm-eabi': True, 'softfp': True}

  base objects: 935
  objects: 74247
  clusters: 222
  code order length = 7228

[00

If your snapshot is AppJIT instead of AppAOT, you can use `parse_appjit_snapshot`:

In [3]:
fname = 'appjit-sample.dart.snapshot'
s = parse_appjit_snapshot(fname)

FileNotFoundError: [Errno 2] No such file or directory: 'appjit-sample.dart.snapshot'

If the parsing was successful, then you are good to go.  
The parsed data is in `s`; we will now analyze it further.

# Analyzing parsed data

We will start by defining some basic functions, tables and stats to help us analyze the data:

In [4]:
from darter.constants import *
from darter.other import *
from collections import defaultdict

def is_relevant(src):
    if src[0] == s.refs['root'].x['global_object_pool'] or src[0] == s.refs['root'].x['symbol_table']:
        return False
    return True

def show_rev_tree(ref, depth=4, max_srcs=5, i_step=4, hide_irrelevant=True):
    ''' Shows a tree of back-references to an object; that is, things pointing to it. '''
    def show_src(src, depth, roots=set(), indent=0):
        if src[0] in roots: return
        if hide_irrelevant and not is_relevant(src): return
        print(" "*indent + '' + ", ".join(str(x) for x in src))
        if depth > 0:
            roots, indent = roots | {src[0]}, indent + i_step
            srcs = src[0].src
            if hasattr(src[0], 'nsrc'): srcs = srcs + src[0].nsrc
            for csrc in srcs[:max_srcs]: show_src(csrc, depth-1, roots, indent)
            if len(srcs) > max_srcs: print(" "*(indent) + '... {} more'.format(len(srcs)-max_srcs))
    show_src((ref,), depth)

# TODO: show some basic stats

# Play!

You are now free to inspect the parsed data as you wish. Some examples:

In [5]:
# Print the first 5 functions of the app. They are 'reference objects':
for ref in s.getrefs('Function')[:5]:
    print(ref)

Function name='megamorphic_miss'->2861
Function name='<anonymous closure>'->2862
Function name='<anonymous signature>'->2863
Function name='addWithPaintTransform'->2864
Function name='hitTestChildren'->2865


In [6]:
# You can use 'ref.x' to access the object data dictionary
ref = s.getrefs('Function')[4]
print(ref.x)

{'name': 'hitTestChildren'->64784, 'owner': Class name='RenderFractionalTranslation'->1132, 'result_type': Type->48984, 'parameter_types': Array[3]->54905, 'parameter_names': Array[3]->54904, 'type_parameters': <base Null>null, 'data': <base Null>null, 'code': Code->11870, 'packed_fields': 1310743, 'kind_tag': 142082048}


In [7]:
# Print the usage tree for a reference
show_rev_tree(ref)

Function name='hitTestChildren'->2865
    ClosureData->10072, parent_function
        Function name='<anonymous closure>'->2862, data
            Code->11868, owner
            Array[511]->63682, value, 387
                GrowableObjectArray->54862, data
    Code->11870, owner
    Array[6]->55688, value, 3
        Class name='RenderFractionalTranslation'->1132, functions
            Function name='<anonymous closure>'->2862, owner
                Code->11868, owner
                Array[511]->63682, value, 387
            Function name='applyPaintTransform'->3377, owner
                Code->12331, owner
            Function name='paint'->3378, owner
                Code->12332, owner
            Function name='hitTest'->3379, owner
                Code->12333, owner
            ... 5 more


In [8]:
# Using 'refs', we can access the reference object for an ID (for instance, the Array above)
s.refs[1132]

Class name='RenderFractionalTranslation'->1132

In [8]:
# Using 'strings', we can look up the reference object for a certain string
s.strings['Flutter Demo']

'Flutter Demo'->69305

# Finding references to VM objects from native code

If you are on AOT, there's no data that tells you the objects referenced from a function.

This experimental code attempts to disassemble the native code, looking for references to `r5`.
It parses those references into a `native_refs` dictionary.  
**Note:** This does *not* track calls made to other functions, only objects loaded into a register, and it currenty works for AOT ARM.

It's going to take a while (some minutes), and it's not bulletproof, I've seen it miss some references...

In [10]:
from time import time
import r2pipe, re
r2 = r2pipe.open(fname)

MG = re.compile(r'[^a-z0-9A-Z_]r5([^a-z0-9A-Z_]|$)')
M1 = re.compile(r'add (\w+), (\w+), (\w+)')
M2 = re.compile(r'ldr (\w+), \[(\w+), (\w+)\]')

def extract_references(code):
    if 'instructions' not in code.x: return
    instr = code.x['instructions']
    r2.cmd('s ' + str(instr['data_addr']))
    ops = r2.cmdj('pdj ' + str(len(instr['data']) // 4))

    def read_op():
        m = ops.pop(0)
        return m['offset'], m['opcode']

    result = []
    def process(pc, reg, offset):
        offset += 1
        if offset % 4 != 0:
            raise Exception('Offset not aligned: {}'.format(offset))
        result.append((offset // 4, pc, reg))
    
    def parse_pline(op, exp_source='r5'):
        m = re.fullmatch(M1, op)
        if m:
            target = m.group(1)
            source = m.group(2)
            if exp_source != source or (target == 'r5'):
                raise Exception('Source / target not matching!')
            offset = int(m.group(3), 0)
            pc, op = read_op()
            target2, offset2 = parse_pline(op, target)
            return target2, offset + offset2
        m = re.fullmatch(M2, op)
        if m:
            target = m.group(1)
            source = m.group(2)
            if exp_source != source or (target == 'r5'):
                raise Exception('Source / target not matching!')
            offset = int(m.group(3), 0)
            return target, offset
        raise Exception('Unknown op line: ' + op)

    while len(ops):
        pc, op = read_op()
        if not re.search(MG, op): continue
        try:
            target, offset = parse_pline(op)
            process(pc, target, offset)
        except Exception as e:
            print(pc, e)
    return result

native_refs = {}
start = time()
for r in s.getrefs('Code'):
    native_refs[r.ref] = extract_references(r)
print('Elapsed: {}s'.format(time() - start))

20944 Unknown op line: push {r0, r1, r2, r3, r5, fp, lr}
20996 Unknown op line: pop {r0, r1, r2, r3, r5, fp, lr}
21092 Unknown op line: push {r0, r1, r2, r3, r5, fp, lr}
21144 Unknown op line: pop {r0, r1, r2, r3, r5, fp, lr}
21244 Unknown op line: push {r0, r1, r2, r3, r5, fp, lr}
21292 Unknown op line: pop {r0, r1, r2, r3, r5, fp, lr}
Elapsed: 37.2733108997345s


Now populate the parsed references into `nrefs` on the Code reference and `nsrc` on the reference they point to:

In [11]:
for i in range(1, s.refs['next']):
    s.refs[i].nsrc = []
global_entries = s.refs['root'].x['global_object_pool'].x['entries']

for r, nrefs in native_refs.items():
    if nrefs is None: continue
    instr = s.refs[r].x['instructions']
    instr['nrefs'] = []
    for entry, pc, reg in nrefs:
        # FIXME: also look at patchable. track refs *at entry* and not ref
        if not (0 <= entry < len(global_entries)):
            print('Ref outside entries:', s.refs[r], entry, pc, reg)
            continue
        entry = global_entries[entry]
        if 'raw_obj' not in entry:
            #print('Not an object:', refs[r], entry, pc, reg)
            continue
        entry['raw_obj'].nsrc.append((s.refs[r], pc))
        instr['nrefs'].append((entry['raw_obj'], pc))

Ref outside entries: Code->11868 17895 22644 r3


Now go to the previous section and try to use `show_rev_tree` at the `Flutter Demo` string.