Let's start by opening the file and analyzing native references:

In [1]:
from darter.file import parse_elf_snapshot, parse_appjit_snapshot
from darter.asm.base import populate_native_references

s = parse_elf_snapshot('samples/arm-app.so')
populate_native_references(s)

------- PARSING VM SNAPSHOT --------

[Header]
  length = 4733
  kind = 2 ('kFullAOT', 'Full + AOT code')

[Snapshot header]
  version = 'c8562f0ee0ebc38ba217c7955956d1cb'
  features = {'product': True, 'use_bare_instructions': True, 'asserts"': False, 'causal_async_stacks': True, 'bytecode': False, 'arm-eabi': True, 'softfp': True}

  base objects: 95
  objects: 935
  clusters: 5
  code order length = 69

[002c1094]: INFO: Reading allocation clusters...
[002c13a9]: INFO: Reading fill clusters...
[002c2215]: INFO: Reading roots...
[002c2281]: INFO: Snasphot parsed.

------- PARSING ISOLATE SNAPSHOT --------

[Header]
  length = 836159
  kind = 2 ('kFullAOT', 'Full + AOT code')

[Snapshot header]
  version = 'c8562f0ee0ebc38ba217c7955956d1cb'
  features = {'product': True, 'use_bare_instructions': True, 'asserts"': False, 'causal_async_stacks': True, 'bytecode': False, 'arm-eabi': True, 'softfp': True}

  base objects: 935
  objects: 74247
  clusters: 222
  code order length = 7228

[00

We will import the constants and define some basic functions to help us analyze the data:

In [5]:
import re
from darter.constants import *
from collections import defaultdict

def is_irrelevant(src):
    if src[0] == src[0].s.refs['root'].x['global_object_pool']: return True
    if src[0] == src[0].s.refs['root'].x['symbol_table']: return True

def show_rev_tree(ref, depth=4, max_srcs=5, i_step=4, hide_irrelevant=True, hide_location=True):
    ''' Shows a tree of back-references to an object; that is, things pointing to it. '''
    def show_src(src, depth, roots=set(), indent=0):
        x, *rest = src
        location = x.locate()
        location_str = ' {{ {} }}'.format(' '.join(map(str, location))) if location else ''
        print(' '*indent + '{} ({})'.format(x, ' '.join(map(str, rest))) + location_str)
        if depth > 0:
            roots, indent = roots | {x}, indent + i_step
            if hide_location and location: roots |= set(location)
            srcs = x.src + getattr(x, 'nsrc', [])
            filter_out = lambda src: (src[0] in roots) or (hide_irrelevant and is_irrelevant(src))
            srcs = [ src for src in srcs if not filter_out(src) ]
            for csrc in srcs[:max_srcs]: show_src(csrc, depth-1, roots, indent)
            if len(srcs) > max_srcs: print(" "*(indent) + '... {} more'.format(len(srcs)-max_srcs))
    show_src((ref,), depth)

# Export R2 metadata

This exports part of the parsed data into metadata for loading into [Radare2](https://www.radare.org/r/), which can be useful for manual close analysis of the assembled code, or for Darter development.

Flags are created for every code section (describing the code object), so you can seek / disassemble them more easily.  
Comments are placed at every native reference, describing the loaded object.

In [16]:
from collections import defaultdict
from base64 import b64encode
do_b64 = lambda x: 'base64:' + b64encode(x.encode('utf-8')).decode('ascii')

def produce_metadata(snapshot):
    out = []
    comments = defaultdict(lambda: [])
    out.append('fs functions')
    for code in snapshot.getrefs('Code'):
        instr = code.x['instructions']
        name = 'c_{}'.format(code.ref)
        comment = ' '.join(map(str, code.locate()))
        out.append('f {name} {len} {addr} {c}'.format( name=name, len=len(instr['data']), addr=instr['data_addr'], c=do_b64(comment) ))
        for target, pc, kind, *args in code.x.get('nrefs', []):
            if kind == 'load':
                comments[pc].append( 'load: {reg} = {tg}'.format(tg=target.describe(), reg=args[0]) )
    for addr, lines in comments.items():
        out.append('CCu {} @ {}'.format( do_b64("\n".join(lines)), addr ))
    return ''.join(x + '\n' for x in out)

with open('metadata.r2', 'w') as f: f.write(produce_metadata(s))

To load the metadata, open the file in Radare2 and do: `. metadata.r2`  
Then, seek to a certain code object (`s c_18052`) and dissassemble it (`pD $(fl)`).  
The color of the annotations can be changed through something like: `ec comment rgb:f0f000`

# Print string table

In order to do an initial reconnaissance, we can dump all strings in the application.  
The following code does that, excluding strings that look like obfuscated identifiers:

In [23]:
ID_PATTERN = r'(([sg]et|init|dyn)\:)?_?[a-zA-Z]{1,3}(\@\d+)?.?'   # pattern to ignore
REF_ORDER = True    # False sorts strings alphabetically

with open('app-strings.txt', 'w') as f:
    key = (lambda x: x.ref) if REF_ORDER else (lambda x: x.x['value'])
    for x in sorted(s.strings_refs, key=key):
        if re.fullmatch(ID_PATTERN, x.x['value']): continue
        print('{} {}'.format(x.ref, repr(x.x['value'])), file=f)

# Find relevant libraries

Most of the application code, objects, strings, etc. will often be open-source dependencies rather than useful code.  
Dart code is separated in *libraries*, so making a list of libraries that contain useful code is a big help.

In [26]:
for lib in sorted(s.getrefs('Library'), key=lambda x: x.x['url'].x['value']):
    print(lib)

Library('dart:_http')->11838
Library('dart:_internal')->11853
Library('dart:_vmservice')->11845
Library('dart:async')->11859
Library('dart:collection')->11857
Library('dart:convert')->11856
Library('dart:core')->11858
Library('dart:developer')->11855
Library('dart:ffi')->11854
Library('dart:io')->11708
Library('dart:isolate')->11852
Library('dart:math')->11851
Library('dart:mirrors')->11850
Library('dart:nativewrappers')->11849
Library('dart:profiler')->11848
Library('dart:typed_data')->11846
Library('dart:ui')->11626
Library('dart:vmservice_io')->11843
Library('package:collection/src/priority_queue.dart')->11830
Library('package:flutter/src/animation/animation.dart')->11638
Library('package:flutter/src/animation/animation_controller.dart')->11744
Library('package:flutter/src/animation/animations.dart')->11642
Library('package:flutter/src/animation/curves.dart')->11625
Library('package:flutter/src/animation/listener_helpers.dart')->11643
Library('package:flutter/src/animation/tween.dar

However if the app is obfuscated, the names will be gibberish. In this case, we need...

# Deobfuscation

The Dart snapshotter has support for obfuscation. It's easy to enable, and it renames identifiers (i.e. library URLs, class names, function names, field names) to random three-letter strings. This obfuscation is per string, so two functions named the same will also have the same obfuscated identifier. Also, some identifiers that are special to the VM (such as the top-level class name, `::`) are left untouched. This obfuscation is applied to everything, including dependencies and internal (i.e. Dart or Flutter) code.

This brings up a need for deobfuscation. The process can be summarized as:

 1. Find open-source dependencies used in the app.
 2. Build a *reference snapshot* that contains these open-source dependencies.
 3. Attempt to match functions in both snapshots, so we can unmask their real names.

This deobfuscates most of the identifiers, helps us separate useful code from dependencies, and helps us see what the useful code is doing.  
In my experience, the fastest way to do this is:

 1. Build an empty app. This will be the reference snapshot.
 2. Print a string table, as we did before, but excluding strings that are present in the reference snapshot.
 3. Look at the strings, google them, find the package they're defined in.
 4. When you have found a package, import it in the app and rebuild it.
 5. Go to step (2); this time there'll be less strings.  
 6. When only useful strings are left, the reference snapshot is ready. Use it to deobfuscate the names.

However, Dart has a powerful **tree shaker** which will remove any dependencies we are not using. So, importing these dependencies in our empty app is not enough, we would have to call every function in their public API in order for them to be included in the reference snapshot. Because this is long and tedious, it's better to patch Dart to disable this tree shaker.

TODO: continue explanation; publish nref-based deobfuscation code; add obfuscated sample

# Print library structure

Given a certain library, we can print all classes, functions, fields and constants in it.

In [27]:
lib = s.strings['package:myapp/main.dart'].src[1][0]
print(lib)

Library('package:myapp/main.dart')->11847
