## Lesson 2: A closer look into OFRAK unpacking and the resource tree

**Objectives**: unpack a resource; learn about OFRAK components; learn about auto-analysis, tags, and the resource tree; filter the resource tree

Let's dive a bit deeper into the binary.

In [1]:
from ofrak import OFRAK
from ofrak_tutorial.helper_functions import create_hello_world_binary

create_hello_world_binary()

ofrak = OFRAK()
basic_context = await ofrak.create_ofrak_context()
root_resource = await basic_context.create_root_resource_from_file("hello_world")

Using OFRAK Community License.


To get more information about this binary, we want to identify what type of file it is, analyze it for attributes, and unpack it into child resources. We can do this all at once with one function call in OFRAK.

In [2]:
unpack_result = await root_resource.unpack_recursively()

The result of calling `unpack_recursively()` is a `ComponentRunResult` object. The details of this object are not important, but it contains two significant pieces of information:

- A number of resources were created and modified:

In [3]:
print(f"{len(unpack_result.resources_created)} resources created")
print(f"{len(unpack_result.resources_modified)} resources modified")

170 resources created
171 resources modified


- Several **components** were run on our resource, including `MagicMimeIdentifier` and `ElfUnpacker`:

In [4]:
print(f"components run: {sorted(unpack_result.components_run)}")

components run: [b'ApkIdentifier', b'DeviceTreeBlobIdentifier', b'ElfDynamicSectionUnpacker', b'ElfPointerArraySectionUnpacker', b'ElfRelaUnpacker', b'ElfSymbolUnpacker', b'ElfUnpacker', b'MagicDescriptionIdentifier', b'MagicMimeIdentifier', b'OpenWrtIdentifier', b'UbiIdentifier', b'UbifsIdentifier', b'Uf2FileIdentifier']


Components in OFRAK are the objects that perform actions on resources. Components can be: **unpackers**, **packers**, **modifiers**, **identifiers**, or **analyzers**.

Here's a typical OFRAK workflow in terms of these components:

- create an OFRAK resource from something, typically a file on disk
  + **unpack** the resource (this step uses **identifiers**)
    - **modify** the resource (possibly using **analyzers**)
  + re-**pack** the resource
- export the modified and repacked resource, typically to a file on disk

(Note: [Lesson 1](1_simple_string_modification.ipynb) presented a simpler workflow: as we only needed to access and modify the binary data of the file, unpacking and repacking weren't necessary, so we only created the resource from a file, modified its binary data, and flushed the result to disk.)

Back to what happened when unpacking our hello world binary. Here, the `MagicMimeIdentifier` used `libmagic` on the binary to try and determine what type of file it is – in this case, it's an ELF executable. Based on this, OFRAK knows it needs to run the `ElfUnpacker`, which unpacked the binary into sections based on the known ELF file structure.

Components can be manually selected and run on resources, or they can be run automatically, as we did here.

Since we only had one resource before, but several were modified, it seems like the ELF unpacker has unpacked this file into some children! What does the resource tree look like now?

In [5]:
info = await root_resource.summarize_tree()
print(info)

┌3d0dba226977427db9da67ef90b8a6d6: [caption=(File: hello_world, Elf), attributes=(AttributesType[FilesystemEntry], Magic), global_offset=(0x0-0x4020), parent_offset=(0x0-0x0), data_hash=f5cd0893]
├────983096a34da149daa8c65d378e109ae3: [caption=(ElfBasicHeader), attributes=(Data, AttributesType[ElfBasicHeader]), global_offset=(0x0-0x10), parent_offset=(0x0-0x10), data_hex=7f454c46020101000000000000000000]
├────bff4ac13993b44b9ab74c93b7a6735ce: [caption=(ElfHeader), attributes=(Data, AttributesType[ElfHeader]), global_offset=(0x10-0x40), parent_offset=(0x10-0x40), data_hex=02003e000100000040104000000000004000000000000000e03800000000000000000000400038000b0040001d001c00]
├────eec4b8662f584c758dc95684f0c5b542: [caption=(ElfSectionHeader), attributes=(Data, AttributesType[ElfSectionStructure], AttributesType[ElfSectionHeader]), global_offset=(0x38e0-0x3920), parent_offset=(0x38e0-0x3920), data_hex=00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

This tree has a lot more information than before. Which kinds of children were created? Let's have a look at their *tags*, which essentially describe the possible types that OFRAK knows for a resource.

In [6]:
async def get_descendants_tags(resource):
    """Return an alphabetically sorted list of all the tags of the descendants of `resource`."""
    all_tags = set()
    for child_resource in await resource.get_descendants():
        all_tags |= set(child_resource.get_tags())
    return sorted(all_tags, key=str)


for tag in await get_descendants_tags(root_resource):
    print(tag)

Addressable
CodeRegion
ElfBasicHeader
ElfDynSymbolSection
ElfDynamicEntry
ElfDynamicSection
ElfFiniArraySection
ElfHeader
ElfInitArraySection
ElfPointerArraySection
ElfProgramHeader
ElfRelaEntry
ElfRelaSection
ElfSection
ElfSectionHeader
ElfSectionNameStringSection
ElfSectionStructure
ElfSegmentStructure
ElfStringSection
ElfSymbol
ElfSymbolSection
ElfSymbolStructure
ElfVirtualAddress
MemoryRegion
NamedProgramSection
ProgramSection


OFRAK has analyzed the binary down to basically its ELF sections, headers and symbols. No instructions? That's because we haven't told OFRAK which analysis backend to use. Available backends are Ghidra and BinaryNinja; we'll introduce them later, when we need them.

How would we get all the resources in the tree with a `CodeRegion` tag? We can use one of OFRAK's filtering capabilities, `ResourceFilter.with_tags`:

In [7]:
from ofrak.core import CodeRegion
from ofrak import ResourceFilter

list(await root_resource.get_descendants(r_filter=ResourceFilter.with_tags(CodeRegion)))

[Resource(resource_id=06806edc25954d3fa764e019556cd2de, tag=[CodeRegion,NamedProgramSection,ProgramSection,ElfSection,MemoryRegion,Addressable,ElfSectionStructure], data=06806edc25954d3fa764e019556cd2de),
 Resource(resource_id=b18cd7cd86c14b2286bc6c2dde7cefd6, tag=[CodeRegion,NamedProgramSection,ProgramSection,ElfSection,MemoryRegion,Addressable,ElfSectionStructure], data=b18cd7cd86c14b2286bc6c2dde7cefd6),
 Resource(resource_id=864b2bbf22524fe7b686ae41e234518d, tag=[CodeRegion,NamedProgramSection,ProgramSection,ElfSection,MemoryRegion,Addressable,ElfSectionStructure], data=864b2bbf22524fe7b686ae41e234518d),
 Resource(resource_id=30bb22f4cc284590a2ff5f47ff31a5ba, tag=[CodeRegion,NamedProgramSection,ProgramSection,ElfSection,MemoryRegion,Addressable,ElfSectionStructure], data=30bb22f4cc284590a2ff5f47ff31a5ba)]

[Next page](3_binary_format_modification.ipynb)