Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Metal-produced universal binaries #38

Merged
merged 13 commits into from Apr 7, 2023

Conversation

maleadt
Copy link
Collaborator

@maleadt maleadt commented Feb 17, 2023

This PR adds the necessary pieces of functionality and bug fixes I need to parse universal MachO binaries as produced by Apple's Metal compiler/driver, so that I can extract the native GPU code and disassemble it (for the purpose of implementing a @code_native in Metal.jl).

In summary:

  • rebase Add rudimentary support for fat MachO files #7, updating to ObjectFile.jl and Julia changes, additionally adding support for 64-bit universal binaries
  • add magic numbers for universal Metal binaries, even though they can just be treated as ordinary universal binaries
  • add rudimentary support for parsing Metallib headers such that iterating a Metal universal binary doesn't error
  • bunch of small bug fixes (some breaking, like the findfirst change[1], so need to check that this doesn't break dependents)

[1]: in fact, if we're breaking the interface anyway, maybe we should make it so that findfirst returns an index, and not an object. That's how findfirst generally works in Base.

With all this in place, I process Metal universal binaries like such:

using ObjectFile

fat_handle = readmeta(open("metal_fat_macho.bin"))
fat_handle isa FatMachOHandle || error("Expected a universal binary")

# the universal binary contains several architectures; extract the GPU one
@enum GPUMachineType::UInt32 begin
    AppleGPU = 0x1000013
    AMDGPU   = 0x1000014
    IntelGPU = 0x1000015
    AIR64    = 0x1000017
end
arch = findfirst(fat_handle) do arch
    arch.header isa MachO.MachOHeader64 && GPUMachineType(arch.header.cputype) == AppleGPU
end
arch == nothing && error("Could not find GPU architecture in universal binary")

# the GPU binary contains several sections (metallib, descriptor, reflection, compute?,
# fragment?, vertex?); extract the compute section, which is another Mach-O binary
compute_section = findfirst(Sections(fat_handle[arch]), "__TEXT,__compute")
compute_section === nothing && error("Could not find __compute section in GPU binary")
compute_binary = read(compute_section)
native_handle = readmeta(IOBuffer(compute_binary))

# within the native GPU binary, isolate the section containing code
section = findfirst(Sections(native_handle), "__TEXT,__text")
isnothing(section) && error("Could not find __TEXT,__text section")

function extract_function(handle, section, code, fn)
    # find the symbol
    symbol = findfirst(Symbols(handle), fn)
    symbol ===  nothing && return nothing

    # read the section
    code = read(section)

    # extract the function
    size = if symbol_number(symbol) < length(Symbols(handle))
        # up until the next symbol
        symbol_value(Symbols(handle)[symbol_number(symbol) + 1])
    else
        # up until the end of the section
        section_size(section)
    end - symbol_value(symbol)
    return code[symbol_value(symbol) + 1 : symbol_value(symbol) + size]
end

# extract relevant functions
code = read(section)
main_code = extract_function(native_handle, section, code, "_agc.main")
main_code === nothing && error("Could not find main function")
write("/tmp/new.bin", main_code)
prolog_code = extract_function(native_handle, section, code, "_agc.main.constant_program")
if prolog_code !== nothing
    # XXX: what to do with the kernel prologue?
    write("/tmp/prolog.bin", prolog_code)
end

@staticfloat Please review, and let me know what this needs to get merged. I'd like to rely on this from Metal.jl as soon as possible :-)

Closes #37, closes #7

@staticfloat
Copy link
Owner

Overall, this looks quite good to me.

Regarding making this consistent across all platforms, we could change the API to always return a collection of handles, and on ELF/COFF we just return a single-element vector, but on MachO we sometimes return multiple element vectors of handles. That way we can continue to build workflows that are truly agnostic to the underlying object file type.

With regards to breaking API, that's totally fine with me. I say do whatever you need to to make the package better, let's bump the major version number be done with it.

@giordano
Copy link
Contributor

Maybe the test libraries could be lazy artifacts downloaded during the tests (I warmly recommend using ArtifactUtils.jl), instead of pushing binary blobs to the repo?

@maleadt
Copy link
Collaborator Author

maleadt commented Feb 17, 2023

What would we gain by that? The binaries are tiny; doesn't seem worth the overhead of managing artifacts.

@maleadt
Copy link
Collaborator Author

maleadt commented Apr 5, 2023

Regarding making this consistent across all platforms, we could change the API to always return a collection of handles, and on ELF/COFF we just return a single-element vector, but on MachO we sometimes return multiple element vectors of handles. That way we can continue to build workflows that are truly agnostic to the underlying object file type.

Fine for me! I've implemented that change; every readmeta call now returns a single-element vector, except for fat Mach-O binaries which return an iterable object to access the contained Maco-O binaries.

@codecov
Copy link

codecov bot commented Apr 5, 2023

Codecov Report

Patch coverage: 60.71% and project coverage change: -1.04 ⚠️

Comparison is base (3180efb) 71.25% compared to head (a639d84) 70.22%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master      #38      +/-   ##
==========================================
- Coverage   71.25%   70.22%   -1.04%     
==========================================
  Files          45       47       +2     
  Lines        1249     1313      +64     
==========================================
+ Hits          890      922      +32     
- Misses        359      391      +32     
Impacted Files Coverage Δ
src/Abstract/Symbol.jl 62.06% <0.00%> (-32.67%) ⬇️
src/MachO/MachO.jl 100.00% <ø> (ø)
src/MachO/MetalLibrary.jl 0.00% <0.00%> (ø)
src/MachO/MachOSection.jl 80.48% <57.14%> (-4.81%) ⬇️
src/MachO/MachOFat.jl 73.33% <73.33%> (ø)
src/Abstract/ObjectHandle.jl 80.39% <100.00%> (ø)
src/Abstract/Section.jl 69.04% <100.00%> (+1.60%) ⬆️
src/COFF/COFFHandle.jl 91.30% <100.00%> (-6.53%) ⬇️
src/ELF/ELFHandle.jl 97.95% <100.00%> (ø)
src/MachO/MachOHandle.jl 93.75% <100.00%> (+0.64%) ⬆️
... and 2 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@maleadt
Copy link
Collaborator Author

maleadt commented Apr 7, 2023

julia> Sys.ARCH
:x86_64

julia> sizeof(ObjectFile.MachO.MachOSection64)
80
julia> Sys.ARCH
:i686

julia> sizeof(ObjectFile.MachO.MachOSection64)
76

That seems bad.

@staticfloat
Copy link
Owner

Agreed

@maleadt
Copy link
Collaborator Author

maleadt commented Apr 7, 2023

Field offsets are identical, so I guess it's just the padding at the end:

julia> structinfo(T) = [(fieldoffset(T,i), fieldname(T,i), fieldtype(T,i)) for i = 1:fieldcount(T)];

julia> structinfo(ObjectFile.MachO.MachOSection64{MachOHandle{IOStream}})
11-element Vector{Tuple{UInt64, Symbol, DataType}}:
 (0x0000000000000000, :sectname, fixed_string{UInt128})
 (0x0000000000000010, :segname, fixed_string{UInt128})
 (0x0000000000000020, :addr, UInt64)
 (0x0000000000000028, :size, UInt64)
 (0x0000000000000030, :offset, UInt32)
 (0x0000000000000034, :align, UInt32)
 (0x0000000000000038, :reloff, UInt32)
 (0x000000000000003c, :nreloc, UInt32)
 (0x0000000000000040, :flags, UInt32)
 (0x0000000000000044, :reserved1, UInt32)
 (0x0000000000000048, :reserved2, UInt32)

# vs

julia> structinfo(ObjectFile.MachO.MachOSection64{MachOHandle{IOStream}})
11-element Vector{Tuple{UInt32, Symbol, DataType}}:
 (0x00000000, :sectname, fixed_string{UInt128})
 (0x00000010, :segname, fixed_string{UInt128})
 (0x00000020, :addr, UInt64)
 (0x00000028, :size, UInt64)
 (0x00000030, :offset, UInt32)
 (0x00000034, :align, UInt32)
 (0x00000038, :reloff, UInt32)
 (0x0000003c, :nreloc, UInt32)
 (0x00000040, :flags, UInt32)
 (0x00000044, :reserved1, UInt32)
 (0x00000048, :reserved2, UInt32)

EDIT: oh, this should probably be using StructIO's unpack instead of plain read.

EDIT2: actually, that gets it wrong too:

julia> section_header_size(ohs[1])
80

julia> StructIO.packed_sizeof(section_header_type(ohs[1]))
80

EDIT3: was missing a reserved field:

struct section_64 {
  char        sectname[16];   /* name of this section */
  char        segname[16];    /* segment this section goes in */
  uint64_t    addr;           /* memory address of this section */
  uint64_t    size;           /* size in bytes of this section */
  uint32_t    offset;         /* file offset of this section */
  uint32_t    align;          /* section alignment (power of 2) */
  uint32_t    reloff;         /* file offset of relocation entries */
  uint32_t    nreloc;         /* number of relocation entries */
  uint32_t    flags;          /* flags (section type and attributes) */
  uint32_t    reserved1;      /* reserved (for offset or index) */
  uint32_t    reserved2;      /* reserved (for count or sizeof) */
  uint32_t    reserved3;      /* reserved */
};

@staticfloat staticfloat merged commit b00ab22 into staticfloat:master Apr 7, 2023
16 of 18 checks passed
@staticfloat
Copy link
Owner

Thanks so much, Tim!

@maleadt maleadt deleted the metal branch April 7, 2023 16:24
@maleadt
Copy link
Collaborator Author

maleadt commented Apr 11, 2023

Is there anything else you want to put in this release, now that it's breaking? If not, could you tag a version so that I can depend on this?

@staticfloat
Copy link
Owner

JuliaRegistries/General#81419

@giordano
Copy link
Contributor

giordano commented May 2, 2023

What would we gain by that?

Sounds like we'd gain something (at least for users not running the tests): #40 🙃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for universal binaries containing Metal code
3 participants