Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyPath performance improvement: Omit projection across trivially-typed memory. #60758

Merged
merged 14 commits into from
Nov 1, 2022

Conversation

fibrechannelscsi
Copy link
Contributor

The main assumption associated with this performance improvement is that any contiguous region of (KeyPathComponentKind) structs is trivially-typed, and as such, can be traversed via simple pointer arithmetic.

In the general case, in order to read or write to a value referred to by a KeyPath, a projection is required to be performed from the root to the requested value. During a projection, all intermediate nodes between the root and value are visited. An example of this involves structs within structs containing trivial types only. The premise behind this optimization is to precompute, and use, the offset required to reach the value from the root if only trivially-typed memory is traversed. In the case of a KeyPath append() operation, the offset is recomputed, and used only if the appended KeyPath ends up only traversing trivially-typed memory.

Tuples are explicitly excluded from this optimization, for the time being.

Upon construction of a new KeyPath instance, the byte offset from the root to the value is precomputed. Further, a Boolean value is precomputed to determine if only trivially-typed memory is traversed. (The function checks to see if all of the KeyPathComponentKinds are structs from the root to the value.)

During a projection involving a read or write operation, we check the aforementioned Boolean value. If it's true, then we use simple pointer arithmetic to skip to the final value and perform the cast to the value's type before returning it.

Alternative Considered

This alternative can potentially be included in a future PR to improve performance. The currently-proposed performance improvement was found to provide the best speedup with the fewest changes.

Avoid overwriting Any inside _projectReadOnly().

In order to avoid an implicit realloc() of the object associated with the curBase of type Any, we break the storage up into two pieces. The first piece uses an AnyObject to store a reference type, if the current item happens to be one. The second piece uses a buffer that grows to the maximum size of any struct encountered during any projection step. This prevents an explicit realloc() during any subsequent projections, at the expense of memory that is only relinquished when the KeyPath goes out of scope. Due to the extra memory requirement of these KeyPaths, one could argue that these deserve their own type, perhaps named BufferedKeyPath. Additional layers of projection (via _openExistential()) may be needed to carry the requisite type information from the root to the final value with this approach.

@BradLarson
Copy link
Contributor

@swift-ci please benchmark

@BradLarson
Copy link
Contributor

Performance (x86_64): -O

Regression OLD NEW DELTA RATIO
UTF8Decode_InitFromCustom_contiguous 122 161 +32.0% 0.76x (?)
UTF8Decode_InitDecoding 122 160 +31.1% 0.76x (?)
UTF8Decode_InitFromCustom_noncontiguous 241 280 +16.2% 0.86x
CStringLongNonAscii 144 160 +11.1% 0.90x (?)
Breadcrumbs.MutatedUTF16ToIdx.Mixed 176 195 +10.8% 0.90x (?)
Breadcrumbs.MutatedIdxToUTF16.Mixed 179 198 +10.6% 0.90x (?)
 
Improvement OLD NEW DELTA RATIO
KeyPathsSmallStruct 194 3 -98.5% 64.65x
KeyPathReadPerformance 211 11 -94.8% 19.18x
KeyPathNestedStructs 203 12 -94.1% 16.92x
KeyPathWritePerformance 762 55 -92.8% 13.85x
FlattenListFlatMap 4553 3200 -29.7% 1.42x (?)
Data.hash.Medium 29 24 -17.2% 1.21x
ArrayLiteral2 87 76 -12.6% 1.14x (?)
Set.isStrictSubset.Int.Empty 44 39 -11.4% 1.13x (?)
StringToDataLargeUnicode 2450 2250 -8.2% 1.09x (?)
Set.isDisjoint.Seq.Box.Empty 90 83 -7.8% 1.08x (?)
DataToStringEmpty 700 650 -7.1% 1.08x (?)
ObjectiveCBridgeToNSDictionary 10450 9750 -6.7% 1.07x (?)

Code size: -O

Performance (x86_64): -Osize

Regression OLD NEW DELTA RATIO
StringFromLongWholeSubstring 2 3 +50.0% 0.67x (?)
FlattenListFlatMap 2510 3402 +35.5% 0.74x (?)
UTF8Decode_InitDecoding 121 164 +35.5% 0.74x (?)
UTF8Decode_InitFromCustom_contiguous 122 162 +32.8% 0.75x (?)
UTF8Decode_InitFromCustom_noncontiguous 245 287 +17.1% 0.85x (?)
Breadcrumbs.MutatedUTF16ToIdx.Mixed 176 195 +10.8% 0.90x (?)
Breadcrumbs.MutatedIdxToUTF16.Mixed 179 198 +10.6% 0.90x (?)
SequenceAlgosRange 2170 2390 +10.1% 0.91x (?)
SequenceAlgosArray 2180 2400 +10.1% 0.91x (?)
UTF8Decode_InitFromCustom_contiguous_ascii 169 185 +9.5% 0.91x (?)
NormalizedIterator_ascii 97 106 +9.3% 0.92x (?)
 
Improvement OLD NEW DELTA RATIO
KeyPathsSmallStruct 195 3 -98.5% 64.98x
KeyPathReadPerformance 212 11 -94.8% 19.27x
KeyPathNestedStructs 208 12 -94.2% 17.33x
KeyPathWritePerformance 719 55 -92.4% 13.07x
Data.hash.Medium 28 24 -14.3% 1.17x (?)
Set.isDisjoint.Seq.Int.Empty 53 49 -7.5% 1.08x (?)

Code size: -Osize

Performance (x86_64): -Onone

Regression OLD NEW DELTA RATIO
UTF8Decode_InitDecoding 127 168 +32.3% 0.76x
UTF8Decode_InitFromCustom_contiguous 130 170 +30.8% 0.76x (?)
SIMDReduce.Int8 6748 7624 +13.0% 0.89x (?)
RandomDoubleLCG 30237 33393 +10.4% 0.91x (?)
RandomDoubleOpaqueLCG 30524 33450 +9.6% 0.91x (?)
 
Improvement OLD NEW DELTA RATIO
KeyPathsSmallStruct 202 9 -95.5% 22.44x
KeyPathWritePerformance 765 71 -90.7% 10.77x
KeyPathNestedStructs 218 23 -89.4% 9.48x
KeyPathReadPerformance 229 26 -88.6% 8.81x
Data.hash.Medium 33 28 -15.2% 1.18x
Breadcrumbs.MutatedUTF16ToIdx.ASCII 15 13 -13.3% 1.15x (?)
ArrayAppendLatin1Substring 26208 22968 -12.4% 1.14x (?)
RC4 13042 11442 -12.3% 1.14x (?)
ArrayAppendAsciiSubstring 25920 22752 -12.2% 1.14x (?)
ArrayAppendUTF16Substring 25848 22716 -12.1% 1.14x
CharacterLiteralsLarge 446 398 -10.8% 1.12x (?)

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: 6-Core Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 32 GB

@BradLarson
Copy link
Contributor

@swift-ci please test

2 similar comments
@BradLarson
Copy link
Contributor

@swift-ci please test

@BradLarson
Copy link
Contributor

@swift-ci please test

@BradLarson
Copy link
Contributor

cc @jckarter, @Azoy - We're not sure who might be the right person to review this, and also if there is an alternative approach to achieve similar performance gains. For our applications, this has been a significant win, but we're definitely open to other designs.

@jckarter
Copy link
Contributor

Thanks for looking into this! I agree that precomputing a fixed offset for projecting keypaths made of purely struct/tuple stored property components is a good idea. However, instead of doing the computation on an already-instantiated key path object, I think it'd be cleaner to compute the fixed offset, if any, during key path instantiation, since as we're traversing the key path pattern, we know what components are "struct" components, and then key path objects don't need to carry extra state or do awkward reflection of the already-instantiated types and components to compute it lazily. That approach should also automatically cover self and tuple components, since those also look like "struct" components in the pattern.

@@ -35,6 +35,8 @@ internal func _abstract(
/// type.
@_objcRuntimeName(_TtCs11_AnyKeyPath)
public class AnyKeyPath: Hashable, _AppendKeyPath {
internal var _isPureStructKeyPath: Bool?
internal var _pureStructValueOffset: Int = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the AnyKeyPath type layout is strictly speaking private, I am concerned that there are clients in practice who interpret the current object layout for reflection purposes, so changing the object layout by adding fields might be a binary compatibility issue. Memory usage is also a concern for key paths—there might be a lot of key path objects in an app, and adding a field to AnyKeyPath increases the memory usage of every one in the system.

If we compute the struct value offset during key path pattern instantiation, then the tribool shouldn't be necessary. It should also be true that no key path with a struct value offset ever has a KVC compatibility string (since only ObjC objects have those). So maybe we can overload the _kvcCompatibilityString pointer field to store the struct offset for pure struct key paths?

Copy link
Contributor Author

@fibrechannelscsi fibrechannelscsi Sep 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean _kvcKeyPathStringPtr? I've grep'ed the entire compiler tree and I haven't found _kvcKeyPathStringPtr outside of KeyPath.swift.
Is the proposal to store the value of the offset as if it were a pointer, like this?
_kvcKeyPathStringPtr = UnsafePointer<CChar>(bitPattern: 0x08). This would represent an 8-byte offset. Any other proposal I can think of at the moment uses additional memory, including storing the pointer to a string (or Int) representing the actual offset, or using an enum/case where one case is an UnsafePointer<CChar> and the other is an Int offset. The latter happens to have a MemoryLayout size of 9 and offset of 16.

Copy link
Contributor Author

@fibrechannelscsi fibrechannelscsi Sep 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way of computing the offset without using extra memory is:

let offset = UnsafePointer<CChar>(bitPattern: 0x32) // _kvcKeyPathStringPtr
let offsetBase = UnsafePointer<CChar>(bitPattern: 0x01) //constant offset, can't be 0.
print(offsetBase!.distance(to: offset!) + 1) //50 (bytes)

Edit: Should be + 1, not - 1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I meant _kvcKeyPathStringPtr, sorry. My thinking was that, on 64-bit platforms, we can take advantage of the fact that valid pointers are always positive, and overload the word as follows:

  • if it's a KVC string pointer, store the pointer as is, or
  • if it's an inline offset, store the negative offset + 1, or INT64_MIN | offset, or something else with the sign bit set

and then we can tell which is which by testing the high bit of the word.

On 32 bit platforms, we don't quite have that luxury. We could however still take advantage of the fact that valid
pointers are always greater than or equal to 4096 (because of the null pointer page), and still store a small offset as is, and consider it to be a KVC string pointer if the value is larger than 4096. That would mean we wouldn't be able to use the optimization if the inline offset is greater than 4096 on those platforms, but maybe that's OK?

@fibrechannelscsi
Copy link
Contributor Author

Hi, thanks for taking a look!
Yes, I see where the KeyPath is being fully constructed before we call _computeOffsetForPureStructKeypath() just to compute the offset. Is the idea to compute the offset at the SIL level, potentially in projectTailElems()?

@jckarter
Copy link
Contributor

My thinking is to do it in the runtime, inside of InstantiateKeyPathBuffer. As the key path is instantiated, we can sum up the resolved offsets of all the StoredComponents, giving up if we encounter any non-struct or non-stored components along the way.

@fibrechannelscsi
Copy link
Contributor Author

I see that; yes, we could compute the offset there, and if we do, then _computeOffsetForPureStructKeypath() could go away (and potentially _recalculateOffsetForPureStructKeyPath() as well).

@@ -173,7 +173,8 @@ public class AnyKeyPath: Hashable, _AppendKeyPath {
}

internal func isClass(_ item: Any.Type) -> Bool {
// Displays "warning: 'is' test is always true" at compile time, but that's not actually the case.
// Displays "warning: 'is' test is always true" at compile time,
// but that's not actually the case.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d thought that this has to do with Obj-C bridging shenanigans, whereby anything can be wrapped?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the warning is true, modulo runtime bugs. But I think we can remove all of these accessors (isPureStructKeyPath, isClass, isTuple) when the inline offset is computed during pattern instantiation.

@@ -2551,7 +2497,7 @@ internal func _appendingKeyPaths<
return unsafeDowncast(result, to: Result.self)
}
}
_processAppendingKeyPathType(root: &returnValue, leaf: leaf)
_processOffsetForAppendedKeyPath(appendedKeyPath: &returnValue, root: root, leaf: leaf)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting nit:

Suggested change
_processOffsetForAppendedKeyPath(appendedKeyPath: &returnValue, root: root, leaf: leaf)
_processOffsetForAppendedKeyPath(appendedKeyPath: &returnValue, root: root, leaf: leaf)

@@ -3450,6 +3425,12 @@ internal struct InstantiateKeyPathBuffer: KeyPathPatternVisitor {
mutable: Bool,
offset: KeyPathPatternStoredOffset) {
let previous = updatePreviousComponentAddr()
switch kind {
case .struct:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think autoformatting nudged these when they were copy-pasted (here and below)

Suggested change
case .struct:
case .struct:

@BradLarson
Copy link
Contributor

@swift-ci please test

@fibrechannelscsi
Copy link
Contributor Author

The next set of changes are almost ready! I have two questions before the next commit goes in:

  1. The last failing test I ought to fix is in capture_propagate_keypath.swift. We have:
struct Str {
  var i: Int = 27
  var j: Int = 28
  @inline(never)
  var c: Int { i + 100 }
} 

When I compute/store the offset in visitStoredComponent(), in InstantiateKeyPathBuffer, the information passed into the Header constructor for c looks the same as it does for i and j. It gets treated as a valid "optimized" offset and the value I get back is 100 and not 127 as expected. I'm looking for additional information, if any, in the InstantiateKeyPathBuffer, that can help distinguish cases like this.

  1. I see there are formatting issues that crept in during the last copy/paste. I'm looking for the recommended SwiftFormat flags that'll keep everything in line. I have amaxwidth of 100 and an indent of 2. Am I missing anything?

Thanks!

@xwu
Copy link
Collaborator

xwu commented Sep 28, 2022

Unless things have changed recently, stdlib style is max width of 80.

@fibrechannelscsi
Copy link
Contributor Author

fibrechannelscsi commented Sep 28, 2022

I dug into formatting discrepancies a bit more and I found that the max line width is specified as 80 here:
https://www.swift.org/contributing/#contributing-code
but I got the number of 100 from here:
https://google.github.io/swift/

I'll use 80 for my changes.

The other discrepancy is my use of swiftformat which is what you get when you brew install it, rather than swift-format. Is there a JSON configuration for the latter that I ought to be using? If I download KeyPath.swift without my changes and try to format it without a configuration file with swift-format, it changes up many instances of spacing and indenting.

I've found and fixed the issue in capture_propagate_keypath.swift. The issue was not with testSimple; I had removed the computed property attribute and @inline(never) from c in the playground I was using. The issue was with testGeneric. In this case, adding isPureStruct.append(false) to each case in visitStoredComponent() that did not explicitly involve .struct and then .inline resolved the issue.

@BradLarson
Copy link
Contributor

@swift-ci please test

@BradLarson
Copy link
Contributor

@swift-ci please benchmark

*/

func assignOffsetToStorage(offset: Int) {
let maximumOffsetOn32BitArchitecture = 4094
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be 4096?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I'm storing this is:
0 -> Actual nil pointer
1 -> Valid offset of 1
...
4094 -> Valid offset of 4094
4095 -> Valid offset of 0
4096 would be a pointer to the first value of the next page.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the comment below that I could store it as "offset + 1", which would mean:
0 -> Actual nil pointer
1 -> Valid offset of 0
...
4094 -> Valid offset of 4093
4095 -> Valid offset of 4094
4096 -> pointer to the first value of the next page.

_kvcKeyPathStringPtr =
UnsafePointer<CChar>(bitPattern: maximumOffsetOn32BitArchitecture + 1)
} else if offset < maximumOffsetOn32BitArchitecture {
_kvcKeyPathStringPtr = UnsafePointer<CChar>(bitPattern: offset)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could save a conditional here by always storing offset + 1, so we don't need to treat the zero case specially.


// TODO: Maybe we can get a pointer's raw bits instead of doing
// a distance calculation. Note: offsetBase can't be unwrapped
// forcefully if its bitPattern is 0x00. Hence the 0x01.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it would be better to get the bitPattern of the pointer and do integer arithmetic here, since arithmetic between unrelated pointers is UB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. I'm looking for a way to get the bit pattern out of a pointer like this (without using distance(to: ) ) but I'm not finding any methods or computed properties to do that. Is this something that might be added in an extension to UnsafePointers or was this omitted intentionally?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You get the bit pattern via [U]Int(bitPattern: pointer).

@@ -1980,7 +2096,9 @@ func _setAtWritableKeyPath<Root, Value>(
value: value)
}
// TODO: we should be able to do this more efficiently than projecting.
let (addr, owner) = keyPath._projectMutableAddress(from: &root)
let (addr, owner) = _withUnprotectedUnsafePointer(to: &root) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is _withUnprotectedUnsafePointer for here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an update I'm pulling in from #60933.
(Rather than dealing with a formatting nightmare I downloaded KeyPath.swift from scratch and added in the offset calculations into the Walker).
Maybe I should've waited to rebase?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. It might be good to rebase before we commit to make the history clean.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope that rebase went through okay, I think it was complaining about me adding a TODO in one of the commits.

@BradLarson
Copy link
Contributor

@swift-ci please benchmark

1 similar comment
@BradLarson
Copy link
Contributor

@swift-ci please benchmark

@BradLarson
Copy link
Contributor

@swift-ci please test

1 similar comment
@BradLarson
Copy link
Contributor

@swift-ci please test

@jckarter
Copy link
Contributor

jckarter commented Oct 24, 2022

There is no need to distinguish the kind of type for \.self. The key path doesn't dereference any object references and behaves like a "pure struct" key path no matter what. The interesting property that enables the optimization here is whether the key path only does in-place offsets into the root value, not what types are involved when doing so. In your example \C.a is not a value type key path, it'll be either a computed property key path or class ivar key path depending on the finality of a, and that will still break the "pure struct" property of the composed key path when you do (\C.self).appending(path: \C.a).appending(path: \A.a).

@fibrechannelscsi
Copy link
Contributor Author

fibrechannelscsi commented Oct 24, 2022

Interesting. So I'm guessing we can remove the if walker.isPureStruct.count == 0 { check and simply check, during an append operation, if the resulting KeyPath will yield one that is considered a pure struct? Inside _processOffsetForAppendedKeyPath it may just end up looking like:

  if let rootOffset = root.getOffsetFromStorage(),
    let leafOffset = leaf.getOffsetFromStorage(),
    root.isPureStructAppendable(to: leaf)
  {
    appendedKeyPath.assignOffsetToStorage(offset: rootOffset + leafOffset)
  }

And then isPureStructAppendable(to:) will do the heavy lifting in terms of determining if the resulting KeyPath is still a pure struct / identity one or not.

@jckarter
Copy link
Contributor

It should be sufficient to do:

 if let rootOffset = root.getOffsetFromStorage(),
    let leafOffset = leaf.getOffsetFromStorage() {
  appendedKeyPath.assignOffsetToStorage(offset: rootOffset + leafOffset)
}

since if either root or leaf is lacking an offset, you can't compute the combined offset, and they'll have offsets if and only if each key path traversal is equivalent to a pointer offset, in which case, the composed key path is equivalent to the sum of the offsets.

@fibrechannelscsi
Copy link
Contributor Author

Ok, I see the problem now.
The issue isn't with walker.isPureStruct.count per se, it's the fact that there are still a few cases where isPureStruct.append() still ought to be called. In essence, we have an empty Walker where we ought not to, and this is causing some types of KeyPaths to be treated as pure structs where they ought not to be.
For example, the dynamically-typed application section in the KeyPath.swift unit test is going into the case .pointer: section of visitComputedComponent. We ought to add isPureStruct.append(false) there. In the end, the Walker should be sufficiently populated so that we don't need the if walker.isPureStruct.count == 0 { anymore.
Let me find those last remaining cases and get that up soon!

This means we no longer need to check for empty KeyPath Walker results.
@BradLarson
Copy link
Contributor

@swift-ci please test

@BradLarson
Copy link
Contributor

@swift-ci please test macOS platform

@BradLarson
Copy link
Contributor

@swift-ci please benchmark

appendedKeyPath: &returnValue,
root: root,
leaf: leaf
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to do this part inside of open, so that we don't lose the type information and have to re-cast the result in the code below. Alternatively, it should be impossible for returnValue as? Result to fail, so we can unsafeDowncast it instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I move this into open3, it can look like this:

var result:AnyKeyPath = _appendingKeyPaths(root: typedRoot, leaf: typedLeaf)
_processOffsetForAppendedKeyPath(
  appendedKeyPath: &result,
  root: root,
  leaf: leaf
)
return unsafeDowncast(result, to: Result.self)

case .struct:
structOffset += value
default:
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a .class component kind, then the struct offset should be invalidated too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have, around line 3464:

    switch kind {
        case .struct:
      isPureStruct.append(true)
        default:
      isPureStruct.append(false)
    }

So that looks like it's being taken care of there just so I don't have to do isPureStruct.append(false) for every single case. That said, I could remove isPureStruct.append(false) on lines 3493 and 3518 since it's being taken care of by the switch statement right at the top of visitStoredComponent().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good. As a follow-up, it would be interesting to try to do the optimization for the outOfLine and unresolvedIndirectOffset cases as well once the final offset is resolved, but we don't need to do that right away.

Copy link
Contributor Author

@fibrechannelscsi fibrechannelscsi Oct 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! I do have a couple ideas for additional optimizations that'll go in subsequent MRs:

  1. The optimization listed in the Alternative Considered section at the top.
  2. Omit projection in the case where the final N elements are pure structs. For example, if the root is a reference type, and all you have are nested structs down below, we just project from the root to the first struct and then jump to the requisite offset.

If we include the unresolvedIndirectOffset / outOfLine optimization in the list of ideas, where do you see the best bang-for-the-buck being at this point?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being able to also optimize a tail of struct-offset components sounds promising; you could combine that with a class stored property component either as the final component or as the component before the struct-offset suffix. But if I were going to look into optimizing key path applications myself, I would look at the generic traversal loops and see how we could improve their performance. I suspect that the way they're written with heavy reliance on nested functions and _openExistential is not very friendly to the SIL optimizer, and there's a lot of unneeded overhead that could be avoided by an optimal implementation that eliminated unnecessary copies and retain/release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! I have the benchmarks for the struct-offset-tail complete, and the actual implementation is in the debugging phase. Getting these up is something I'll be working on next.

What other methods of getting concrete type information inside a scope, other than _openExistential exist, if any? I did at one point float the idea of grabbing the concrete type information via _openExistential and storing it for subsequent read and write operations, but now it looks like that would have a bigger impact on ABI stability than we might like.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should not be any ABI concerns with changing the internal implementation of swift_getKeyPath or its variants, since their implementations are private. A blunt approach would be to rewrite those functions in runtime C++, where we have finer control over access to type metadata and value copying. _openExistential shouldn't really incur any overhead directly, but I suspect that the functions we open into aren't always getting inlined, and that may be introducing overhead. I would look at the quality of the generated code and see what combination of compiler improvements and/or code changes we can make to reduce overhead there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a plan, thanks!

@BradLarson
Copy link
Contributor

@swift-ci please test

1 similar comment
@BradLarson
Copy link
Contributor

@swift-ci please test

@BradLarson
Copy link
Contributor

@swift-ci please test macOS platform

@jckarter
Copy link
Contributor

@fibrechannelscsi @BradLarson any further changes you want to make before merging this?

@fibrechannelscsi
Copy link
Contributor Author

This is good to go, thanks!
The next two steps are:

  1. Getting the benchmarks in for the struct-offset-tail optimization (this will be a separate MR, involving benchmark/single-source/KeyPathPerformanceTests.swift)
  2. Adding those optimizations into KeyPath.swift (another MR).

@fibrechannelscsi
Copy link
Contributor Author

Whoops, that should've been pushed into the more-keypath-benchmarks branch. Let me go revert that.

@BradLarson
Copy link
Contributor

@swift-ci please test

@BradLarson
Copy link
Contributor

@swift-ci please test macOS platform

This would have prevented explicitly specified KeyPaths through pure structs, e.g., \A.b.c, from taking the optimized path.
@fibrechannelscsi
Copy link
Contributor Author

fibrechannelscsi commented Oct 31, 2022

I found this unnecessary check while working on the second KeyPath optimization today.
The benchmarks didn't catch this because the one nested-struct KeyPath benchmark generates its KeyPaths via append() each step of the way. This would've prevented explicitly specified KeyPaths to nested structs from taking the optimized path. Looks like all pertinent KeyPath and stdlib tests pass on my end without this line.

@BradLarson
Copy link
Contributor

@swift-ci please test

@BradLarson
Copy link
Contributor

@swift-ci please test Linux platform

@fibrechannelscsi
Copy link
Contributor Author

This is looking good to go in now!

@jckarter jckarter merged commit 121adf6 into swiftlang:main Nov 1, 2022
@jckarter
Copy link
Contributor

jckarter commented Nov 1, 2022

Thanks @fibrechannelscsi !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants