-
Notifications
You must be signed in to change notification settings - Fork 10.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KeyPath performance improvement: Omit projection across trivially-typed memory. #60758
Conversation
@swift-ci please benchmark |
Performance (x86_64): -O
Code size: -OPerformance (x86_64): -Osize
Code size: -OsizePerformance (x86_64): -Onone
Code size: -swiftlibsHow to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
@swift-ci please test |
Thanks for looking into this! I agree that precomputing a fixed offset for projecting keypaths made of purely struct/tuple stored property components is a good idea. However, instead of doing the computation on an already-instantiated key path object, I think it'd be cleaner to compute the fixed offset, if any, during key path instantiation, since as we're traversing the key path pattern, we know what components are "struct" components, and then key path objects don't need to carry extra state or do awkward reflection of the already-instantiated types and components to compute it lazily. That approach should also automatically cover |
stdlib/public/core/KeyPath.swift
Outdated
@@ -35,6 +35,8 @@ internal func _abstract( | |||
/// type. | |||
@_objcRuntimeName(_TtCs11_AnyKeyPath) | |||
public class AnyKeyPath: Hashable, _AppendKeyPath { | |||
internal var _isPureStructKeyPath: Bool? | |||
internal var _pureStructValueOffset: Int = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although the AnyKeyPath
type layout is strictly speaking private, I am concerned that there are clients in practice who interpret the current object layout for reflection purposes, so changing the object layout by adding fields might be a binary compatibility issue. Memory usage is also a concern for key paths—there might be a lot of key path objects in an app, and adding a field to AnyKeyPath increases the memory usage of every one in the system.
If we compute the struct value offset during key path pattern instantiation, then the tribool shouldn't be necessary. It should also be true that no key path with a struct value offset ever has a KVC compatibility string (since only ObjC objects have those). So maybe we can overload the _kvcCompatibilityString
pointer field to store the struct offset for pure struct key paths?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean _kvcKeyPathStringPtr
? I've grep'ed the entire compiler tree and I haven't found _kvcKeyPathStringPtr
outside of KeyPath.swift.
Is the proposal to store the value of the offset as if it were a pointer, like this?
_kvcKeyPathStringPtr = UnsafePointer<CChar>(bitPattern: 0x08)
. This would represent an 8-byte offset. Any other proposal I can think of at the moment uses additional memory, including storing the pointer to a string (or Int) representing the actual offset, or using an enum/case where one case is an UnsafePointer<CChar>
and the other is an Int offset. The latter happens to have a MemoryLayout size of 9 and offset of 16.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One way of computing the offset without using extra memory is:
let offset = UnsafePointer<CChar>(bitPattern: 0x32) // _kvcKeyPathStringPtr
let offsetBase = UnsafePointer<CChar>(bitPattern: 0x01) //constant offset, can't be 0.
print(offsetBase!.distance(to: offset!) + 1) //50 (bytes)
Edit: Should be + 1, not - 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I meant _kvcKeyPathStringPtr
, sorry. My thinking was that, on 64-bit platforms, we can take advantage of the fact that valid pointers are always positive, and overload the word as follows:
- if it's a KVC string pointer, store the pointer as is, or
- if it's an inline offset, store the negative offset + 1, or
INT64_MIN | offset
, or something else with the sign bit set
and then we can tell which is which by testing the high bit of the word.
On 32 bit platforms, we don't quite have that luxury. We could however still take advantage of the fact that valid
pointers are always greater than or equal to 4096 (because of the null pointer page), and still store a small offset as is, and consider it to be a KVC string pointer if the value is larger than 4096. That would mean we wouldn't be able to use the optimization if the inline offset is greater than 4096 on those platforms, but maybe that's OK?
Hi, thanks for taking a look! |
My thinking is to do it in the runtime, inside of |
I see that; yes, we could compute the offset there, and if we do, then |
stdlib/public/core/KeyPath.swift
Outdated
@@ -173,7 +173,8 @@ public class AnyKeyPath: Hashable, _AppendKeyPath { | |||
} | |||
|
|||
internal func isClass(_ item: Any.Type) -> Bool { | |||
// Displays "warning: 'is' test is always true" at compile time, but that's not actually the case. | |||
// Displays "warning: 'is' test is always true" at compile time, | |||
// but that's not actually the case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’d thought that this has to do with Obj-C bridging shenanigans, whereby anything can be wrapped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the warning is true, modulo runtime bugs. But I think we can remove all of these accessors (isPureStructKeyPath, isClass, isTuple) when the inline offset is computed during pattern instantiation.
stdlib/public/core/KeyPath.swift
Outdated
@@ -2551,7 +2497,7 @@ internal func _appendingKeyPaths< | |||
return unsafeDowncast(result, to: Result.self) | |||
} | |||
} | |||
_processAppendingKeyPathType(root: &returnValue, leaf: leaf) | |||
_processOffsetForAppendedKeyPath(appendedKeyPath: &returnValue, root: root, leaf: leaf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Formatting nit:
_processOffsetForAppendedKeyPath(appendedKeyPath: &returnValue, root: root, leaf: leaf) | |
_processOffsetForAppendedKeyPath(appendedKeyPath: &returnValue, root: root, leaf: leaf) |
@@ -3450,6 +3425,12 @@ internal struct InstantiateKeyPathBuffer: KeyPathPatternVisitor { | |||
mutable: Bool, | |||
offset: KeyPathPatternStoredOffset) { | |||
let previous = updatePreviousComponentAddr() | |||
switch kind { | |||
case .struct: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think autoformatting nudged these when they were copy-pasted (here and below)
case .struct: | |
case .struct: |
@swift-ci please test |
The next set of changes are almost ready! I have two questions before the next commit goes in:
When I compute/store the offset in
Thanks! |
Unless things have changed recently, stdlib style is max width of 80. |
I dug into formatting discrepancies a bit more and I found that the max line width is specified as 80 here: I'll use 80 for my changes. The other discrepancy is my use of I've found and fixed the issue in |
@swift-ci please test |
@swift-ci please benchmark |
*/ | ||
|
||
func assignOffsetToStorage(offset: Int) { | ||
let maximumOffsetOn32BitArchitecture = 4094 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be 4096?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way I'm storing this is:
0 -> Actual nil pointer
1 -> Valid offset of 1
...
4094 -> Valid offset of 4094
4095 -> Valid offset of 0
4096 would be a pointer to the first value of the next page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the comment below that I could store it as "offset + 1", which would mean:
0 -> Actual nil pointer
1 -> Valid offset of 0
...
4094 -> Valid offset of 4093
4095 -> Valid offset of 4094
4096 -> pointer to the first value of the next page.
stdlib/public/core/KeyPath.swift
Outdated
_kvcKeyPathStringPtr = | ||
UnsafePointer<CChar>(bitPattern: maximumOffsetOn32BitArchitecture + 1) | ||
} else if offset < maximumOffsetOn32BitArchitecture { | ||
_kvcKeyPathStringPtr = UnsafePointer<CChar>(bitPattern: offset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could save a conditional here by always storing offset + 1
, so we don't need to treat the zero case specially.
stdlib/public/core/KeyPath.swift
Outdated
|
||
// TODO: Maybe we can get a pointer's raw bits instead of doing | ||
// a distance calculation. Note: offsetBase can't be unwrapped | ||
// forcefully if its bitPattern is 0x00. Hence the 0x01. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it would be better to get the bitPattern
of the pointer and do integer arithmetic here, since arithmetic between unrelated pointers is UB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. I'm looking for a way to get the bit pattern out of a pointer like this (without using distance(to: )
) but I'm not finding any methods or computed properties to do that. Is this something that might be added in an extension to UnsafePointers or was this omitted intentionally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You get the bit pattern via [U]Int(bitPattern: pointer)
.
stdlib/public/core/KeyPath.swift
Outdated
@@ -1980,7 +2096,9 @@ func _setAtWritableKeyPath<Root, Value>( | |||
value: value) | |||
} | |||
// TODO: we should be able to do this more efficiently than projecting. | |||
let (addr, owner) = keyPath._projectMutableAddress(from: &root) | |||
let (addr, owner) = _withUnprotectedUnsafePointer(to: &root) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is _withUnprotectedUnsafePointer
for here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an update I'm pulling in from #60933.
(Rather than dealing with a formatting nightmare I downloaded KeyPath.swift from scratch and added in the offset calculations into the Walker).
Maybe I should've waited to rebase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. It might be good to rebase before we commit to make the history clean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope that rebase went through okay, I think it was complaining about me adding a TODO in one of the commits.
@swift-ci please benchmark |
1 similar comment
@swift-ci please benchmark |
@swift-ci please test |
1 similar comment
@swift-ci please test |
There is no need to distinguish the kind of type for |
Interesting. So I'm guessing we can remove the
And then |
It should be sufficient to do:
since if either |
Ok, I see the problem now. |
This means we no longer need to check for empty KeyPath Walker results.
@swift-ci please test |
@swift-ci please test macOS platform |
@swift-ci please benchmark |
stdlib/public/core/KeyPath.swift
Outdated
appendedKeyPath: &returnValue, | ||
root: root, | ||
leaf: leaf | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to do this part inside of open
, so that we don't lose the type information and have to re-cast the result in the code below. Alternatively, it should be impossible for returnValue as? Result
to fail, so we can unsafeDowncast
it instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I move this into open3
, it can look like this:
var result:AnyKeyPath = _appendingKeyPaths(root: typedRoot, leaf: typedLeaf)
_processOffsetForAppendedKeyPath(
appendedKeyPath: &result,
root: root,
leaf: leaf
)
return unsafeDowncast(result, to: Result.self)
case .struct: | ||
structOffset += value | ||
default: | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is a .class
component kind, then the struct offset should be invalidated too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have, around line 3464:
switch kind {
case .struct:
isPureStruct.append(true)
default:
isPureStruct.append(false)
}
So that looks like it's being taken care of there just so I don't have to do isPureStruct.append(false)
for every single case. That said, I could remove isPureStruct.append(false)
on lines 3493 and 3518 since it's being taken care of by the switch statement right at the top of visitStoredComponent().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good. As a follow-up, it would be interesting to try to do the optimization for the outOfLine
and unresolvedIndirectOffset
cases as well once the final offset is resolved, but we don't need to do that right away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! I do have a couple ideas for additional optimizations that'll go in subsequent MRs:
- The optimization listed in the
Alternative Considered
section at the top. - Omit projection in the case where the final N elements are pure structs. For example, if the root is a reference type, and all you have are nested structs down below, we just project from the root to the first struct and then jump to the requisite offset.
If we include the unresolvedIndirectOffset
/ outOfLine
optimization in the list of ideas, where do you see the best bang-for-the-buck being at this point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Being able to also optimize a tail of struct-offset components sounds promising; you could combine that with a class stored property component either as the final component or as the component before the struct-offset suffix. But if I were going to look into optimizing key path applications myself, I would look at the generic traversal loops and see how we could improve their performance. I suspect that the way they're written with heavy reliance on nested functions and _openExistential
is not very friendly to the SIL optimizer, and there's a lot of unneeded overhead that could be avoided by an optimal implementation that eliminated unnecessary copies and retain/release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! I have the benchmarks for the struct-offset-tail complete, and the actual implementation is in the debugging phase. Getting these up is something I'll be working on next.
What other methods of getting concrete type information inside a scope, other than _openExistential
exist, if any? I did at one point float the idea of grabbing the concrete type information via _openExistential
and storing it for subsequent read and write operations, but now it looks like that would have a bigger impact on ABI stability than we might like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should not be any ABI concerns with changing the internal implementation of swift_getKeyPath
or its variants, since their implementations are private. A blunt approach would be to rewrite those functions in runtime C++, where we have finer control over access to type metadata and value copying. _openExistential
shouldn't really incur any overhead directly, but I suspect that the functions we open into aren't always getting inlined, and that may be introducing overhead. I would look at the quality of the generated code and see what combination of compiler improvements and/or code changes we can make to reduce overhead there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a plan, thanks!
….append() that weren't needed.
@swift-ci please test |
1 similar comment
@swift-ci please test |
@swift-ci please test macOS platform |
@fibrechannelscsi @BradLarson any further changes you want to make before merging this? |
This is good to go, thanks!
|
…ed by non-trivially-typed memory.
Whoops, that should've been pushed into the |
@swift-ci please test |
@swift-ci please test macOS platform |
This would have prevented explicitly specified KeyPaths through pure structs, e.g., \A.b.c, from taking the optimized path.
I found this unnecessary check while working on the second KeyPath optimization today. |
@swift-ci please test |
@swift-ci please test Linux platform |
This is looking good to go in now! |
Thanks @fibrechannelscsi ! |
The main assumption associated with this performance improvement is that any contiguous region of (
KeyPathComponentKind
)struct
s is trivially-typed, and as such, can be traversed via simple pointer arithmetic.In the general case, in order to read or write to a value referred to by a KeyPath, a projection is required to be performed from the root to the requested value. During a projection, all intermediate nodes between the root and value are visited. An example of this involves structs within structs containing trivial types only. The premise behind this optimization is to precompute, and use, the offset required to reach the value from the root if only trivially-typed memory is traversed. In the case of a KeyPath
append()
operation, the offset is recomputed, and used only if the appended KeyPath ends up only traversing trivially-typed memory.Tuples are explicitly excluded from this optimization, for the time being.
Upon construction of a new KeyPath instance, the byte offset from the root to the value is precomputed. Further, a Boolean value is precomputed to determine if only trivially-typed memory is traversed. (The function checks to see if all of the
KeyPathComponentKinds
are structs from the root to the value.)During a projection involving a read or write operation, we check the aforementioned Boolean value. If it's true, then we use simple pointer arithmetic to skip to the final value and perform the cast to the value's type before returning it.
Alternative Considered
This alternative can potentially be included in a future PR to improve performance. The currently-proposed performance improvement was found to provide the best speedup with the fewest changes.
Avoid overwriting
Any
inside_projectReadOnly()
.In order to avoid an implicit
realloc()
of the object associated with thecurBase
of typeAny
, we break the storage up into two pieces. The first piece uses anAnyObject
to store a reference type, if the current item happens to be one. The second piece uses a buffer that grows to the maximum size of any struct encountered during any projection step. This prevents an explicitrealloc()
during any subsequent projections, at the expense of memory that is only relinquished when the KeyPath goes out of scope. Due to the extra memory requirement of these KeyPaths, one could argue that these deserve their own type, perhaps namedBufferedKeyPath
. Additional layers of projection (via_openExistential()
) may be needed to carry the requisite type information from the root to the final value with this approach.