Skip to content

yaml2obj fails to roundtrip Mach-O dSYM files #166993

@rjmansfield

Description

@rjmansfield

yaml2obj fails when converting Mach-O dSYM YAML (generated by obj2yaml) back to a binary with section offset alignment errors.

This seems to happen with basically any dSYM file. For example:

#include <stdio.h>

int add(int a, int b) { return a + b; }
int multiply(int x, int y) { return x * y; }

struct Point { int x, y; const char *label; };

struct Point make_point(int x, int y, const char *label) {
    struct Point p = {x, y, label};
    return p;
}

int main(void) {
    int a = add(5, 3);
    struct Point p = make_point(10, 20, "test");
    printf("%d, %d\n", a, p.x);
    return 0;
}
$ clang repro.c -o repro -g
$ ~/llvm/llvm-project/build/bin/obj2yaml repro.dSYM/Contents/Resources/DWARF/repro -o repro.yaml
$ ~/llvm/llvm-project/build/bin/yaml2obj repro.yaml -o repro_roundtrip.out
yaml2obj: error: wrote too much data somewhere, section offsets in section __debug_aranges for segment __DWARF don't line up: [cursor=0x210c], [fileStart=0x0], [sectionOffset=0x20b7]
$ ~/llvm/llvm-project/build/bin/obj2yaml --version
LLVM (http://llvm.org/):
  LLVM version 22.0.0git
  Optimized build.

using HEAD as of this report which is 917d815.

It looks like the problem is that obj2yaml copies the DWARF compilation unit Length field e.g 347 bytes from the binary
and then when yaml2obj re-emits the DIEs, ULEB128 encoding and other things it can cause the size to differ e.g. 363 bytes and then this size difference causes the next section to be misaligned. Maybe it would make sense to not to emit the Length field from obj2yaml and let yaml2obj calculate the Length based on what it actually emits.

Metadata

Metadata

Assignees

No one assigned

    Labels

    llvm-toolsAll llvm tools that do not have corresponding tag

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions