Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for split dyld shared cache. #398

Merged
merged 1 commit into from
Nov 27, 2021

Conversation

mstange
Copy link
Contributor

@mstange mstange commented Nov 7, 2021

Fixes #358.

This adds support for the dyld cache format that is used on macOS 12 and
iOS 15. The cache is split over multiple files, with a "root" cache
and one or more subcaches, for example:

/System/Library/dyld/dyld_shared_cache_x86_64
/System/Library/dyld/dyld_shared_cache_x86_64.1
/System/Library/dyld/dyld_shared_cache_x86_64.2
/System/Library/dyld/dyld_shared_cache_x86_64.3

Additionally, on iOS, there is a separate .symbols subcache, which
contains local symbols.

Each file has a set of mappings. For each image in the cache, the
segments of that image can be distributed over multiple files: For
example, on macOS 12.0.1, the image for libsystem_malloc.dylib for the
arm64e architecture has its __TEXT segment in the root cache and the
__LINKEDIT segment in the .1 subcache - there's a single __LINKEDIT
segment which is shared between all images across both files. The
remaining libsystem_malloc.dylib segments are in the same file as the
__TEXT segment.

The DyldCache API now requires the data for all subcaches to be supplied
to the constructor.

The parse_at methods have been removed and been replaced with a
parse_dyld_cache_image method.

With this patch, the following command outputs correct symbols for
libsystem_malloc.dylib:

cargo run --release --bin objdump -- /System/Library/dyld/dyld_shared_cache_arm64e /usr/lib/system/libsystem_malloc.dylib

Support for local symbols is not implemented. But, as a first step,
DyldCache::parse requires the .symbols subcache to be supplied (if the
root cache expects one to be present) and checks that its UUID is correct.
MachOFile doesn't do anything with ilocalsym and nlocalsym yet, and we
don't yet have the struct definitions for dyld_cache_local_symbols_info
and dyld_cache_local_symbols_entry.

src/read/macho/file.rs Outdated Show resolved Hide resolved
@mstange mstange force-pushed the monterey-dyld branch 2 times, most recently from 100473f to 638ff07 Compare November 25, 2021 21:26
@mstange
Copy link
Contributor Author

mstange commented Nov 25, 2021

@philipc This is now almost ready but I have a few questions on how to proceed:

  • Should we keep the parse_at methods or are you fine with removing them? I don't know of any other use case for them.
  • I've changed MachOFile to store a Vec of MachOSegments, so that it can keep track of each segment's data. To do this, I had to remove the file reference in MachOSegment and store the endian field directly. This affects MachOSegment's lifetime parameters - it no longer has a 'file lifetime parameter. Is this the right way to go?
  • I need some solution to do something similar for the sections. My current patch gets each section's data by looking up the section's corresponding segment, and getting the data for that. But this assumes we have a segment with that name. During some of the roundtrip tests, this assumption does not hold. It looks like object's "write" support for macho does not write out any segments, only sections. This is what's causing the test failures. How should I proceed? I could do the same for sections as I did for segments: For each section, instead of storing a MachOSectionInternal, store a MachOSection and give that a data field; and remove the file reference field (and the 'file lifetime) from MachOSection and store the endian field directly. Does that sound ok?
  • There is some code duplication with the load command stuff in MachOFile::parse and MachOFile::parse_dyld_cache_image. Should I try harder to share this code or is what I have now ok?

@philipc
Copy link
Contributor

philipc commented Nov 26, 2021

  • Should we keep the parse_at methods or are you fine with removing them? I don't know of any other use case for them.

I'm fine with removing them.

  • I've changed MachOFile to store a Vec of MachOSegments, so that it can keep track of each segment's data. To do this, I had to remove the file reference in MachOSegment and store the endian field directly. This affects MachOSegment's lifetime parameters - it no longer has a 'file lifetime parameter. Is this the right way to go?

Can we store a MachOSegmentInternal in MachOFile instead, and then segment iteration will create a MachOSegment from that which still has a file reference. It's a bit more complicated, but I prefer to avoid the public API change for this.

  • I need some solution to do something similar for the sections. My current patch gets each section's data by looking up the section's corresponding segment, and getting the data for that. But this assumes we have a segment with that name. During some of the roundtrip tests, this assumption does not hold. It looks like object's "write" support for macho does not write out any segments, only sections. This is what's causing the test failures. How should I proceed? I could do the same for sections as I did for segments: For each section, instead of storing a MachOSectionInternal, store a MachOSection and give that a data field; and remove the file reference field (and the 'file lifetime) from MachOSection and store the endian field directly. Does that sound ok?

Object files always have a single unnamed segment to contain the sections. I think that MachOSectionInternal should store a segment index, and then we simply look up the segment by index rather than by name.

  • There is some code duplication with the load command stuff in MachOFile::parse and MachOFile::parse_dyld_cache_image. Should I try harder to share this code or is what I have now ok?

Looks ok to me.

@mstange
Copy link
Contributor Author

mstange commented Nov 26, 2021

Thanks!

  • I've changed MachOFile to store a Vec of MachOSegments, so that it can keep track of each segment's data. To do this, I had to remove the file reference in MachOSegment and store the endian field directly. This affects MachOSegment's lifetime parameters - it no longer has a 'file lifetime parameter. Is this the right way to go?

Can we store a MachOSegmentInternal in MachOFile instead, and then segment iteration will create a MachOSegment from that which still has a file reference. It's a bit more complicated, but I prefer to avoid the public API change for this.

So we'd store a data: ReadRef field in the MachOSegmentInternal, together with the Mach::Segment? Sounds good to me.

Which exact public API change are you referring to? The loss of the 'file lifetime parameter on MachOSegment?

  • I need some solution to do something similar for the sections. My current patch gets each section's data by looking up the section's corresponding segment, and getting the data for that. But this assumes we have a segment with that name. During some of the roundtrip tests, this assumption does not hold. It looks like object's "write" support for macho does not write out any segments, only sections. This is what's causing the test failures. How should I proceed? I could do the same for sections as I did for segments: For each section, instead of storing a MachOSectionInternal, store a MachOSection and give that a data field; and remove the file reference field (and the 'file lifetime) from MachOSection and store the endian field directly. Does that sound ok?

Object files always have a single unnamed segment to contain the sections. I think that MachOSectionInternal should store a segment index, and then we simply look up the segment by index rather than by name.

Ok, that sounds good, I'll try that.

@philipc
Copy link
Contributor

philipc commented Nov 26, 2021

So we'd store a data: ReadRef field in the MachOSegmentInternal, together with the Mach::Segment? Sounds good to me.

Which exact public API change are you referring to? The loss of the 'file lifetime parameter on MachOSegment?

Yes to both.

@mstange mstange force-pushed the monterey-dyld branch 4 times, most recently from 7d81395 to af1780b Compare November 26, 2021 04:19
@mstange mstange marked this pull request as ready for review November 26, 2021 04:25
@mstange
Copy link
Contributor Author

mstange commented Nov 26, 2021

This is now ready for review. My last struggle was with the Debug implementation on the DyldCacheHeader struct. I had to split up the 288 byte padding array into 9 x 32 byte arrays so that the MSRV could derive Debug. I briefly tried supplying a manually-written debug implementation, but it was only compiled when the macho feature was selected, but the struct is defined even without that feature, so it broke the "require debug" lint for the no feature configuration.

Copy link
Contributor

@philipc philipc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just some trivial comments.

crates/examples/src/bin/objdump.rs Outdated Show resolved Hide resolved
src/read/macho/dyld_cache.rs Outdated Show resolved Hide resolved
src/read/macho/dyld_cache.rs Outdated Show resolved Hide resolved
@philipc
Copy link
Contributor

philipc commented Nov 27, 2021

My last struggle was with the Debug implementation on the DyldCacheHeader struct. I had to split up the 288 byte padding array into 9 x 32 byte arrays so that the MSRV could derive Debug. I briefly tried supplying a manually-written debug implementation, but it was only compiled when the macho feature was selected, but the struct is defined even without that feature, so it broke the "require debug" lint for the no feature configuration.

I would have been fine with disabling the lint for this struct.

Fixes gimli-rs#358.

This adds support for the dyld cache format that is used on macOS 12 and
iOS 15. The cache is split over multiple files, with a "root" cache
and one or more subcaches, for example:

```
/System/Library/dyld/dyld_shared_cache_x86_64
/System/Library/dyld/dyld_shared_cache_x86_64.1
/System/Library/dyld/dyld_shared_cache_x86_64.2
/System/Library/dyld/dyld_shared_cache_x86_64.3
```

Additionally, on iOS, there is a separate .symbols subcache, which
contains local symbols.

Each file has a set of mappings. For each image in the cache, the
segments of that image can be distributed over multiple files: For
example, on macOS 12.0.1, the image for libsystem_malloc.dylib for the
arm64e architecture has its __TEXT segment in the root cache and the
__LINKEDIT segment in the .1 subcache - there's a single __LINKEDIT
segment which is shared between all images across both files. The
remaining libsystem_malloc.dylib segments are in the same file as the
__TEXT segment.

The DyldCache API now requires the data for all subcaches to be supplied
to the constructor.

The parse_at methods have been removed and been replaced with a
parse_dyld_cache_image method.

With this patch, the following command outputs correct symbols for
libsystem_malloc.dylib:

```
cargo run --release --bin objdump -- /System/Library/dyld/dyld_shared_cache_arm64e /usr/lib/system/libsystem_malloc.dylib
```

Support for local symbols is not implemented. But, as a first step,
DyldCache::parse requires the .symbols subcache to be supplied (if the
root cache expects one to be present) and checks that its UUID is correct.
MachOFile doesn't do anything with ilocalsym and nlocalsym yet, and we
don't yet have the struct definitions for dyld_cache_local_symbols_info
and dyld_cache_local_symbols_entry.
@mstange
Copy link
Contributor Author

mstange commented Dec 3, 2021

I'd be interested in a release with this fix. What do your current release plans look like?

@philipc
Copy link
Contributor

philipc commented Dec 4, 2021

I'll do a release in the near future. I don't have any further work planned before a release.

@jrmuizel
Copy link
Contributor

@philipc: gentle ping about the release. Having a release will let us fix system symbols when profiling Firefox

@philipc
Copy link
Contributor

philipc commented Dec 12, 2021

Released 0.28.0, thanks for the ping.

However, I've noticed there are some warnings being generated. Some of these already existed and I'll fix them, but could you look at these two:

warning: field is never read: `symbols_subcache`
  --> src/read/macho/dyld_cache.rs:17:5
   |
17 |     symbols_subcache: Option<DyldSubCache<'data, E, R>>,
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
   = note: `#[warn(dead_code)]` on by default

warning: field is never read: `header`
  --> src/read/macho/dyld_cache.rs:18:5
   |
18 |     header: &'data macho::DyldCacheHeader<E>,
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

It seems strange that symbols_subcache is never used.

@mstange
Copy link
Contributor Author

mstange commented Dec 12, 2021

Thanks, Philip! I think the warnings are new because the compiler got smarter recently. I'll make a PR which removes both unused fields. The symbols subcache is currently unused because I didn't add support for it, other than requiring it in the API and verifying its UUID. Only iOS uses a symbols subcache, macOS doesn't use it so I don't need it.

mcbegamerxx954 pushed a commit to mcbegamerxx954/object that referenced this pull request Jun 15, 2024
Fixes gimli-rs#358.

This adds support for the dyld cache format that is used on macOS 12 and
iOS 15. The cache is split over multiple files, with a "root" cache
and one or more subcaches, for example:

```
/System/Library/dyld/dyld_shared_cache_x86_64
/System/Library/dyld/dyld_shared_cache_x86_64.1
/System/Library/dyld/dyld_shared_cache_x86_64.2
/System/Library/dyld/dyld_shared_cache_x86_64.3
```

Additionally, on iOS, there is a separate .symbols subcache, which
contains local symbols.

Each file has a set of mappings. For each image in the cache, the
segments of that image can be distributed over multiple files: For
example, on macOS 12.0.1, the image for libsystem_malloc.dylib for the
arm64e architecture has its __TEXT segment in the root cache and the
__LINKEDIT segment in the .1 subcache - there's a single __LINKEDIT
segment which is shared between all images across both files. The
remaining libsystem_malloc.dylib segments are in the same file as the
__TEXT segment.

The DyldCache API now requires the data for all subcaches to be supplied
to the constructor.

The parse_at methods have been removed and been replaced with a
parse_dyld_cache_image method.

With this patch, the following command outputs correct symbols for
libsystem_malloc.dylib:

```
cargo run --release --bin objdump -- /System/Library/dyld/dyld_shared_cache_arm64e /usr/lib/system/libsystem_malloc.dylib
```

Support for local symbols is not implemented. But, as a first step,
DyldCache::parse requires the .symbols subcache to be supplied (if the
root cache expects one to be present) and checks that its UUID is correct.
MachOFile doesn't do anything with ilocalsym and nlocalsym yet, and we
don't yet have the struct definitions for dyld_cache_local_symbols_info
and dyld_cache_local_symbols_entry.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support macOS 12 dyld shared cache format
3 participants