KeyPath performance improvement: Omit projection across trivially-typed memory. #60758

fibrechannelscsi · 2022-08-24T23:09:42Z

The main assumption associated with this performance improvement is that any contiguous region of (KeyPathComponentKind) structs is trivially-typed, and as such, can be traversed via simple pointer arithmetic.

In the general case, in order to read or write to a value referred to by a KeyPath, a projection is required to be performed from the root to the requested value. During a projection, all intermediate nodes between the root and value are visited. An example of this involves structs within structs containing trivial types only. The premise behind this optimization is to precompute, and use, the offset required to reach the value from the root if only trivially-typed memory is traversed. In the case of a KeyPath append() operation, the offset is recomputed, and used only if the appended KeyPath ends up only traversing trivially-typed memory.

Tuples are explicitly excluded from this optimization, for the time being.

Upon construction of a new KeyPath instance, the byte offset from the root to the value is precomputed. Further, a Boolean value is precomputed to determine if only trivially-typed memory is traversed. (The function checks to see if all of the KeyPathComponentKinds are structs from the root to the value.)

During a projection involving a read or write operation, we check the aforementioned Boolean value. If it's true, then we use simple pointer arithmetic to skip to the final value and perform the cast to the value's type before returning it.

Alternative Considered

This alternative can potentially be included in a future PR to improve performance. The currently-proposed performance improvement was found to provide the best speedup with the fewest changes.

Avoid overwriting `Any` inside `_projectReadOnly()`.

In order to avoid an implicit realloc() of the object associated with the curBase of type Any, we break the storage up into two pieces. The first piece uses an AnyObject to store a reference type, if the current item happens to be one. The second piece uses a buffer that grows to the maximum size of any struct encountered during any projection step. This prevents an explicit realloc() during any subsequent projections, at the expense of memory that is only relinquished when the KeyPath goes out of scope. Due to the extra memory requirement of these KeyPaths, one could argue that these deserve their own type, perhaps named BufferedKeyPath. Additional layers of projection (via _openExistential()) may be needed to carry the requisite type information from the root to the final value with this approach.

BradLarson · 2022-08-24T23:55:27Z

@swift-ci please benchmark

BradLarson · 2022-08-25T01:20:57Z

Performance (x86_64): -O

Regression	OLD	NEW	DELTA	RATIO
UTF8Decode_InitFromCustom_contiguous	122	161	+32.0%	0.76x (?)
UTF8Decode_InitDecoding	122	160	+31.1%	0.76x (?)
UTF8Decode_InitFromCustom_noncontiguous	241	280	+16.2%	0.86x
CStringLongNonAscii	144	160	+11.1%	0.90x (?)
Breadcrumbs.MutatedUTF16ToIdx.Mixed	176	195	+10.8%	0.90x (?)
Breadcrumbs.MutatedIdxToUTF16.Mixed	179	198	+10.6%	0.90x (?)

Improvement	OLD	NEW	DELTA	RATIO
KeyPathsSmallStruct	194	3	-98.5%	64.65x
KeyPathReadPerformance	211	11	-94.8%	19.18x
KeyPathNestedStructs	203	12	-94.1%	16.92x
KeyPathWritePerformance	762	55	-92.8%	13.85x
FlattenListFlatMap	4553	3200	-29.7%	1.42x (?)
Data.hash.Medium	29	24	-17.2%	1.21x
ArrayLiteral2	87	76	-12.6%	1.14x (?)
Set.isStrictSubset.Int.Empty	44	39	-11.4%	1.13x (?)
StringToDataLargeUnicode	2450	2250	-8.2%	1.09x (?)
Set.isDisjoint.Seq.Box.Empty	90	83	-7.8%	1.08x (?)
DataToStringEmpty	700	650	-7.1%	1.08x (?)
ObjectiveCBridgeToNSDictionary	10450	9750	-6.7%	1.07x (?)

Code size: -O

Performance (x86_64): -Osize

Regression	OLD	NEW	DELTA	RATIO
StringFromLongWholeSubstring	2	3	+50.0%	0.67x (?)
FlattenListFlatMap	2510	3402	+35.5%	0.74x (?)
UTF8Decode_InitDecoding	121	164	+35.5%	0.74x (?)
UTF8Decode_InitFromCustom_contiguous	122	162	+32.8%	0.75x (?)
UTF8Decode_InitFromCustom_noncontiguous	245	287	+17.1%	0.85x (?)
Breadcrumbs.MutatedUTF16ToIdx.Mixed	176	195	+10.8%	0.90x (?)
Breadcrumbs.MutatedIdxToUTF16.Mixed	179	198	+10.6%	0.90x (?)
SequenceAlgosRange	2170	2390	+10.1%	0.91x (?)
SequenceAlgosArray	2180	2400	+10.1%	0.91x (?)
UTF8Decode_InitFromCustom_contiguous_ascii	169	185	+9.5%	0.91x (?)
NormalizedIterator_ascii	97	106	+9.3%	0.92x (?)

Improvement	OLD	NEW	DELTA	RATIO
KeyPathsSmallStruct	195	3	-98.5%	64.98x
KeyPathReadPerformance	212	11	-94.8%	19.27x
KeyPathNestedStructs	208	12	-94.2%	17.33x
KeyPathWritePerformance	719	55	-92.4%	13.07x
Data.hash.Medium	28	24	-14.3%	1.17x (?)
Set.isDisjoint.Seq.Int.Empty	53	49	-7.5%	1.08x (?)

Code size: -Osize

Performance (x86_64): -Onone

Regression	OLD	NEW	DELTA	RATIO
UTF8Decode_InitDecoding	127	168	+32.3%	0.76x
UTF8Decode_InitFromCustom_contiguous	130	170	+30.8%	0.76x (?)
SIMDReduce.Int8	6748	7624	+13.0%	0.89x (?)
RandomDoubleLCG	30237	33393	+10.4%	0.91x (?)
RandomDoubleOpaqueLCG	30524	33450	+9.6%	0.91x (?)

Improvement	OLD	NEW	DELTA	RATIO
KeyPathsSmallStruct	202	9	-95.5%	22.44x
KeyPathWritePerformance	765	71	-90.7%	10.77x
KeyPathNestedStructs	218	23	-89.4%	9.48x
KeyPathReadPerformance	229	26	-88.6%	8.81x
Data.hash.Medium	33	28	-15.2%	1.18x
Breadcrumbs.MutatedUTF16ToIdx.ASCII	15	13	-13.3%	1.15x (?)
ArrayAppendLatin1Substring	26208	22968	-12.4%	1.14x (?)
RC4	13042	11442	-12.3%	1.14x (?)
ArrayAppendAsciiSubstring	25920	22752	-12.2%	1.14x (?)
ArrayAppendUTF16Substring	25848	22716	-12.1%	1.14x
CharacterLiteralsLarge	446	398	-10.8%	1.12x (?)

Code size: -swiftlibs

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: 6-Core Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 32 GB

BradLarson · 2022-08-26T02:07:01Z

@swift-ci please test

BradLarson · 2022-08-26T17:27:51Z

@swift-ci please test

BradLarson · 2022-08-26T20:16:43Z

@swift-ci please test

BradLarson · 2022-08-29T15:28:43Z

cc @jckarter, @Azoy - We're not sure who might be the right person to review this, and also if there is an alternative approach to achieve similar performance gains. For our applications, this has been a significant win, but we're definitely open to other designs.

jckarter · 2022-09-16T16:49:02Z

Thanks for looking into this! I agree that precomputing a fixed offset for projecting keypaths made of purely struct/tuple stored property components is a good idea. However, instead of doing the computation on an already-instantiated key path object, I think it'd be cleaner to compute the fixed offset, if any, during key path instantiation, since as we're traversing the key path pattern, we know what components are "struct" components, and then key path objects don't need to carry extra state or do awkward reflection of the already-instantiated types and components to compute it lazily. That approach should also automatically cover self and tuple components, since those also look like "struct" components in the pattern.

jckarter · 2022-09-16T17:30:33Z

stdlib/public/core/KeyPath.swift

@@ -35,6 +35,8 @@ internal func _abstract(
 /// type.
 @_objcRuntimeName(_TtCs11_AnyKeyPath)
 public class AnyKeyPath: Hashable, _AppendKeyPath {
+  internal var _isPureStructKeyPath: Bool?
+  internal var _pureStructValueOffset: Int = 0


Although the AnyKeyPath type layout is strictly speaking private, I am concerned that there are clients in practice who interpret the current object layout for reflection purposes, so changing the object layout by adding fields might be a binary compatibility issue. Memory usage is also a concern for key paths—there might be a lot of key path objects in an app, and adding a field to AnyKeyPath increases the memory usage of every one in the system.

If we compute the struct value offset during key path pattern instantiation, then the tribool shouldn't be necessary. It should also be true that no key path with a struct value offset ever has a KVC compatibility string (since only ObjC objects have those). So maybe we can overload the _kvcCompatibilityString pointer field to store the struct offset for pure struct key paths?

Did you mean _kvcKeyPathStringPtr? I've grep'ed the entire compiler tree and I haven't found _kvcKeyPathStringPtr outside of KeyPath.swift.
Is the proposal to store the value of the offset as if it were a pointer, like this?
_kvcKeyPathStringPtr = UnsafePointer<CChar>(bitPattern: 0x08). This would represent an 8-byte offset. Any other proposal I can think of at the moment uses additional memory, including storing the pointer to a string (or Int) representing the actual offset, or using an enum/case where one case is an UnsafePointer<CChar> and the other is an Int offset. The latter happens to have a MemoryLayout size of 9 and offset of 16.

One way of computing the offset without using extra memory is:

let offset = UnsafePointer<CChar>(bitPattern: 0x32) // _kvcKeyPathStringPtr let offsetBase = UnsafePointer<CChar>(bitPattern: 0x01) //constant offset, can't be 0. print(offsetBase!.distance(to: offset!) + 1) //50 (bytes)

Edit: Should be + 1, not - 1.

Yeah, I meant _kvcKeyPathStringPtr, sorry. My thinking was that, on 64-bit platforms, we can take advantage of the fact that valid pointers are always positive, and overload the word as follows:

if it's a KVC string pointer, store the pointer as is, or

if it's an inline offset, store the negative offset + 1, or INT64_MIN | offset, or something else with the sign bit set

and then we can tell which is which by testing the high bit of the word.

On 32 bit platforms, we don't quite have that luxury. We could however still take advantage of the fact that valid
pointers are always greater than or equal to 4096 (because of the null pointer page), and still store a small offset as is, and consider it to be a KVC string pointer if the value is larger than 4096. That would mean we wouldn't be able to use the optimization if the inline offset is greater than 4096 on those platforms, but maybe that's OK?

fibrechannelscsi · 2022-09-19T20:36:00Z

Hi, thanks for taking a look!
Yes, I see where the KeyPath is being fully constructed before we call _computeOffsetForPureStructKeypath() just to compute the offset. Is the idea to compute the offset at the SIL level, potentially in projectTailElems()?

jckarter · 2022-09-19T21:52:26Z

My thinking is to do it in the runtime, inside of InstantiateKeyPathBuffer. As the key path is instantiated, we can sum up the resolved offsets of all the StoredComponents, giving up if we encounter any non-struct or non-stored components along the way.

fibrechannelscsi · 2022-09-19T22:30:29Z

I see that; yes, we could compute the offset there, and if we do, then _computeOffsetForPureStructKeypath() could go away (and potentially _recalculateOffsetForPureStructKeyPath() as well).

xwu · 2022-09-24T15:25:37Z

stdlib/public/core/KeyPath.swift

@@ -173,7 +173,8 @@ public class AnyKeyPath: Hashable, _AppendKeyPath {
  }

  internal func isClass(_ item: Any.Type) -> Bool {
-    // Displays "warning: 'is' test is always true" at compile time, but that's not actually the case.
+    // Displays "warning: 'is' test is always true" at compile time,
+    // but that's not actually the case.


I’d thought that this has to do with Obj-C bridging shenanigans, whereby anything can be wrapped?

Yeah, the warning is true, modulo runtime bugs. But I think we can remove all of these accessors (isPureStructKeyPath, isClass, isTuple) when the inline offset is computed during pattern instantiation.

xwu · 2022-09-24T15:26:16Z

stdlib/public/core/KeyPath.swift

@@ -2551,7 +2497,7 @@ internal func _appendingKeyPaths<
      return unsafeDowncast(result, to: Result.self)
    }
  }
-   _processAppendingKeyPathType(root: &returnValue, leaf: leaf)
+    _processOffsetForAppendedKeyPath(appendedKeyPath: &returnValue, root: root, leaf: leaf)


Formatting nit:

Suggested change

_processOffsetForAppendedKeyPath(appendedKeyPath: &returnValue, root: root, leaf: leaf)

_processOffsetForAppendedKeyPath(appendedKeyPath: &returnValue, root: root, leaf: leaf)

xwu · 2022-09-24T15:27:19Z

stdlib/public/core/KeyPath.swift

@@ -3450,6 +3425,12 @@ internal struct InstantiateKeyPathBuffer: KeyPathPatternVisitor {
                                     mutable: Bool,
                                     offset: KeyPathPatternStoredOffset) {
    let previous = updatePreviousComponentAddr()
+    switch kind {
+        case .struct:


I think autoformatting nudged these when they were copy-pasted (here and below)

Suggested change

case .struct:

case .struct:

BradLarson · 2022-09-24T16:19:53Z

@swift-ci please test

fibrechannelscsi · 2022-09-28T00:26:03Z

The next set of changes are almost ready! I have two questions before the next commit goes in:

The last failing test I ought to fix is in capture_propagate_keypath.swift. We have:

struct Str {
  var i: Int = 27
  var j: Int = 28
  @inline(never)
  var c: Int { i + 100 }
}

When I compute/store the offset in visitStoredComponent(), in InstantiateKeyPathBuffer, the information passed into the Header constructor for c looks the same as it does for i and j. It gets treated as a valid "optimized" offset and the value I get back is 100 and not 127 as expected. I'm looking for additional information, if any, in the InstantiateKeyPathBuffer, that can help distinguish cases like this.

I see there are formatting issues that crept in during the last copy/paste. I'm looking for the recommended SwiftFormat flags that'll keep everything in line. I have amaxwidth of 100 and an indent of 2. Am I missing anything?

Thanks!

xwu · 2022-09-28T02:20:54Z

Unless things have changed recently, stdlib style is max width of 80.

fibrechannelscsi · 2022-09-28T23:02:59Z

I dug into formatting discrepancies a bit more and I found that the max line width is specified as 80 here:
https://www.swift.org/contributing/#contributing-code
but I got the number of 100 from here:
https://google.github.io/swift/

I'll use 80 for my changes.

The other discrepancy is my use of swiftformat which is what you get when you brew install it, rather than swift-format. Is there a JSON configuration for the latter that I ought to be using? If I download KeyPath.swift without my changes and try to format it without a configuration file with swift-format, it changes up many instances of spacing and indenting.

I've found and fixed the issue in capture_propagate_keypath.swift. The issue was not with testSimple; I had removed the computed property attribute and @inline(never) from c in the playground I was using. The issue was with testGeneric. In this case, adding isPureStruct.append(false) to each case in visitStoredComponent() that did not explicitly involve .struct and then .inline resolved the issue.

BradLarson · 2022-09-29T15:14:47Z

@swift-ci please test

BradLarson · 2022-09-29T21:04:18Z

@swift-ci please benchmark

stdlib/public/core/KeyPath.swift

jckarter · 2022-09-29T22:58:02Z

stdlib/public/core/KeyPath.swift

+  */
+
+  func assignOffsetToStorage(offset: Int) {
+    let maximumOffsetOn32BitArchitecture = 4094


Should this be 4096?

The way I'm storing this is:
0 -> Actual nil pointer
1 -> Valid offset of 1
...
4094 -> Valid offset of 4094
4095 -> Valid offset of 0
4096 would be a pointer to the first value of the next page.

I agree with the comment below that I could store it as "offset + 1", which would mean:
0 -> Actual nil pointer
1 -> Valid offset of 0
...
4094 -> Valid offset of 4093
4095 -> Valid offset of 4094
4096 -> pointer to the first value of the next page.

jckarter · 2022-09-29T22:58:47Z

stdlib/public/core/KeyPath.swift

+        _kvcKeyPathStringPtr =
+          UnsafePointer<CChar>(bitPattern: maximumOffsetOn32BitArchitecture + 1)
+      } else if offset < maximumOffsetOn32BitArchitecture {
+        _kvcKeyPathStringPtr = UnsafePointer<CChar>(bitPattern: offset)


I think you could save a conditional here by always storing offset + 1, so we don't need to treat the zero case specially.

stdlib/public/core/KeyPath.swift

jckarter · 2022-09-29T23:01:55Z

stdlib/public/core/KeyPath.swift

+
+    // TODO: Maybe we can get a pointer's raw bits instead of doing
+    // a distance calculation. Note: offsetBase can't be unwrapped
+    // forcefully if its bitPattern is 0x00. Hence the 0x01.


Yeah, it would be better to get the bitPattern of the pointer and do integer arithmetic here, since arithmetic between unrelated pointers is UB.

Interesting. I'm looking for a way to get the bit pattern out of a pointer like this (without using distance(to: ) ) but I'm not finding any methods or computed properties to do that. Is this something that might be added in an extension to UnsafePointers or was this omitted intentionally?

You get the bit pattern via [U]Int(bitPattern: pointer).

stdlib/public/core/KeyPath.swift

jckarter · 2022-09-29T23:15:35Z

stdlib/public/core/KeyPath.swift

@@ -1980,7 +2096,9 @@ func _setAtWritableKeyPath<Root, Value>(
      value: value)
  }
  // TODO: we should be able to do this more efficiently than projecting.
-  let (addr, owner) = keyPath._projectMutableAddress(from: &root)
+  let (addr, owner) = _withUnprotectedUnsafePointer(to: &root) {


What is _withUnprotectedUnsafePointer for here?

This is an update I'm pulling in from #60933.
(Rather than dealing with a formatting nightmare I downloaded KeyPath.swift from scratch and added in the offset calculations into the Walker).
Maybe I should've waited to rebase?

Ah, I see. It might be good to rebase before we commit to make the history clean.

I hope that rebase went through okay, I think it was complaining about me adding a TODO in one of the commits.

stdlib/public/core/KeyPath.swift

BradLarson · 2022-09-30T14:35:05Z

@swift-ci please benchmark

BradLarson · 2022-09-30T18:43:18Z

@swift-ci please benchmark

BradLarson · 2022-10-06T00:06:38Z

@swift-ci please test

BradLarson · 2022-10-17T17:34:34Z

@swift-ci please test

jckarter · 2022-10-24T15:58:13Z

There is no need to distinguish the kind of type for \.self. The key path doesn't dereference any object references and behaves like a "pure struct" key path no matter what. The interesting property that enables the optimization here is whether the key path only does in-place offsets into the root value, not what types are involved when doing so. In your example \C.a is not a value type key path, it'll be either a computed property key path or class ivar key path depending on the finality of a, and that will still break the "pure struct" property of the composed key path when you do (\C.self).appending(path: \C.a).appending(path: \A.a).

fibrechannelscsi · 2022-10-24T17:03:55Z

Interesting. So I'm guessing we can remove the if walker.isPureStruct.count == 0 { check and simply check, during an append operation, if the resulting KeyPath will yield one that is considered a pure struct? Inside _processOffsetForAppendedKeyPath it may just end up looking like:

  if let rootOffset = root.getOffsetFromStorage(),
    let leafOffset = leaf.getOffsetFromStorage(),
    root.isPureStructAppendable(to: leaf)
  {
    appendedKeyPath.assignOffsetToStorage(offset: rootOffset + leafOffset)
  }

And then isPureStructAppendable(to:) will do the heavy lifting in terms of determining if the resulting KeyPath is still a pure struct / identity one or not.

jckarter · 2022-10-24T17:36:59Z

It should be sufficient to do:

 if let rootOffset = root.getOffsetFromStorage(),
    let leafOffset = leaf.getOffsetFromStorage() {
  appendedKeyPath.assignOffsetToStorage(offset: rootOffset + leafOffset)
}

since if either root or leaf is lacking an offset, you can't compute the combined offset, and they'll have offsets if and only if each key path traversal is equivalent to a pointer offset, in which case, the composed key path is equivalent to the sum of the offsets.

fibrechannelscsi · 2022-10-24T18:54:55Z

Ok, I see the problem now.
The issue isn't with walker.isPureStruct.count per se, it's the fact that there are still a few cases where isPureStruct.append() still ought to be called. In essence, we have an empty Walker where we ought not to, and this is causing some types of KeyPaths to be treated as pure structs where they ought not to be.
For example, the dynamically-typed application section in the KeyPath.swift unit test is going into the case .pointer: section of visitComputedComponent. We ought to add isPureStruct.append(false) there. In the end, the Walker should be sufficiently populated so that we don't need the if walker.isPureStruct.count == 0 { anymore.
Let me find those last remaining cases and get that up soon!

This means we no longer need to check for empty KeyPath Walker results.

BradLarson · 2022-10-25T00:25:51Z

@swift-ci please test

BradLarson · 2022-10-25T16:09:19Z

@swift-ci please test macOS platform

BradLarson · 2022-10-25T16:09:44Z

@swift-ci please benchmark

jckarter · 2022-10-25T17:29:48Z

stdlib/public/core/KeyPath.swift

+    appendedKeyPath: &returnValue,
+    root: root,
+    leaf: leaf
+  )


I think it would be better to do this part inside of open, so that we don't lose the type information and have to re-cast the result in the code below. Alternatively, it should be impossible for returnValue as? Result to fail, so we can unsafeDowncast it instead.

If I move this into open3, it can look like this:

var result:AnyKeyPath = _appendingKeyPaths(root: typedRoot, leaf: typedLeaf) _processOffsetForAppendedKeyPath( appendedKeyPath: &result, root: root, leaf: leaf ) return unsafeDowncast(result, to: Result.self)

jckarter · 2022-10-25T17:31:38Z

stdlib/public/core/KeyPath.swift

+          case .struct:
+            structOffset += value
+          default:
+             break


If this is a .class component kind, then the struct offset should be invalidated too.

I have, around line 3464:

switch kind { case .struct: isPureStruct.append(true) default: isPureStruct.append(false) }

So that looks like it's being taken care of there just so I don't have to do isPureStruct.append(false) for every single case. That said, I could remove isPureStruct.append(false) on lines 3493 and 3518 since it's being taken care of by the switch statement right at the top of visitStoredComponent().

That sounds good. As a follow-up, it would be interesting to try to do the optimization for the outOfLine and unresolvedIndirectOffset cases as well once the final offset is resolved, but we don't need to do that right away.

Great! I do have a couple ideas for additional optimizations that'll go in subsequent MRs:

The optimization listed in the Alternative Considered section at the top.

Omit projection in the case where the final N elements are pure structs. For example, if the root is a reference type, and all you have are nested structs down below, we just project from the root to the first struct and then jump to the requisite offset.

If we include the unresolvedIndirectOffset / outOfLine optimization in the list of ideas, where do you see the best bang-for-the-buck being at this point?

Being able to also optimize a tail of struct-offset components sounds promising; you could combine that with a class stored property component either as the final component or as the component before the struct-offset suffix. But if I were going to look into optimizing key path applications myself, I would look at the generic traversal loops and see how we could improve their performance. I suspect that the way they're written with heavy reliance on nested functions and _openExistential is not very friendly to the SIL optimizer, and there's a lot of unneeded overhead that could be avoided by an optimal implementation that eliminated unnecessary copies and retain/release.

Sounds good! I have the benchmarks for the struct-offset-tail complete, and the actual implementation is in the debugging phase. Getting these up is something I'll be working on next.

What other methods of getting concrete type information inside a scope, other than _openExistential exist, if any? I did at one point float the idea of grabbing the concrete type information via _openExistential and storing it for subsequent read and write operations, but now it looks like that would have a bigger impact on ABI stability than we might like.

There should not be any ABI concerns with changing the internal implementation of swift_getKeyPath or its variants, since their implementations are private. A blunt approach would be to rewrite those functions in runtime C++, where we have finer control over access to type metadata and value copying. _openExistential shouldn't really incur any overhead directly, but I suspect that the functions we open into aren't always getting inlined, and that may be introducing overhead. I would look at the quality of the generated code and see what combination of compiler improvements and/or code changes we can make to reduce overhead there.

Sounds like a plan, thanks!

….append() that weren't needed.

BradLarson · 2022-10-26T17:25:29Z

@swift-ci please test

BradLarson · 2022-10-26T19:47:18Z

@swift-ci please test

BradLarson · 2022-10-26T22:32:26Z

@swift-ci please test macOS platform

jckarter · 2022-10-27T20:50:15Z

@fibrechannelscsi @BradLarson any further changes you want to make before merging this?

fibrechannelscsi · 2022-10-27T23:00:34Z

This is good to go, thanks!
The next two steps are:

Getting the benchmarks in for the struct-offset-tail optimization (this will be a separate MR, involving benchmark/single-source/KeyPathPerformanceTests.swift)
Adding those optimizations into KeyPath.swift (another MR).

…ed by non-trivially-typed memory.

fibrechannelscsi · 2022-10-28T17:29:26Z

Whoops, that should've been pushed into the more-keypath-benchmarks branch. Let me go revert that.

BradLarson · 2022-10-28T17:37:39Z

@swift-ci please test

BradLarson · 2022-10-31T14:24:01Z

@swift-ci please test macOS platform

This would have prevented explicitly specified KeyPaths through pure structs, e.g., \A.b.c, from taking the optimized path.

fibrechannelscsi · 2022-10-31T20:46:25Z

I found this unnecessary check while working on the second KeyPath optimization today.
The benchmarks didn't catch this because the one nested-struct KeyPath benchmark generates its KeyPaths via append() each step of the way. This would've prevented explicitly specified KeyPaths to nested structs from taking the optimized path. Looks like all pertinent KeyPath and stdlib tests pass on my end without this line.

BradLarson · 2022-10-31T20:57:39Z

@swift-ci please test

BradLarson · 2022-11-01T14:34:22Z

@swift-ci please test Linux platform

fibrechannelscsi · 2022-11-01T16:34:31Z

This is looking good to go in now!

jckarter · 2022-11-01T19:00:08Z

Thanks @fibrechannelscsi !

jckarter reviewed Sep 16, 2022

View reviewed changes

xwu reviewed Sep 24, 2022

View reviewed changes

jckarter reviewed Sep 29, 2022

View reviewed changes

stdlib/public/core/KeyPath.swift Show resolved Hide resolved

jckarter reviewed Sep 29, 2022

View reviewed changes

stdlib/public/core/KeyPath.swift Outdated Show resolved Hide resolved

jckarter reviewed Sep 29, 2022

View reviewed changes

stdlib/public/core/KeyPath.swift Outdated Show resolved Hide resolved

jckarter reviewed Sep 29, 2022

View reviewed changes

stdlib/public/core/KeyPath.swift Outdated Show resolved Hide resolved

Added missing instances of pureStruct information propagation.

21f175a

This means we no longer need to check for empty KeyPath Walker results.

jckarter reviewed Oct 25, 2022

View reviewed changes

jckarter approved these changes Oct 25, 2022

View reviewed changes

Revised _tryToAppendKeyPaths(). Removed two instances of isPureStruct…

92d7cf5

….append() that weren't needed.

Added a benchmark for KeyPaths where trivially-typed memory is preced…

dd8fb4a

…ed by non-trivially-typed memory.

Reverted a change that should have gone into another branch.

9171f79

fibrechannelscsi mentioned this pull request Oct 28, 2022

An additional benchmark for KeyPath read performance. #61795

Merged

Removed an unnecessary disabling of isPureStruct.

bc4b38d

This would have prevented explicitly specified KeyPaths through pure structs, e.g., \A.b.c, from taking the optimized path.

jckarter merged commit 121adf6 into swiftlang:main Nov 1, 2022

	_processOffsetForAppendedKeyPath(appendedKeyPath: &returnValue, root: root, leaf: leaf)
	_processOffsetForAppendedKeyPath(appendedKeyPath: &returnValue, root: root, leaf: leaf)

KeyPath performance improvement: Omit projection across trivially-typed memory. #60758

KeyPath performance improvement: Omit projection across trivially-typed memory. #60758

Conversation

fibrechannelscsi commented Aug 24, 2022

Alternative Considered

Avoid overwriting Any inside _projectReadOnly().

BradLarson commented Aug 24, 2022

BradLarson commented Aug 25, 2022

Performance (x86_64): -O

Code size: -O

Performance (x86_64): -Osize

Code size: -Osize

Performance (x86_64): -Onone

Code size: -swiftlibs

BradLarson commented Aug 26, 2022

BradLarson commented Aug 26, 2022

BradLarson commented Aug 26, 2022

BradLarson commented Aug 29, 2022

jckarter commented Sep 16, 2022

Choose a reason for hiding this comment

fibrechannelscsi Sep 19, 2022 • edited Loading

Choose a reason for hiding this comment

fibrechannelscsi Sep 19, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fibrechannelscsi commented Sep 19, 2022

jckarter commented Sep 19, 2022

fibrechannelscsi commented Sep 19, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BradLarson commented Sep 24, 2022

fibrechannelscsi commented Sep 28, 2022

xwu commented Sep 28, 2022

fibrechannelscsi commented Sep 28, 2022 • edited Loading

BradLarson commented Sep 29, 2022

BradLarson commented Sep 29, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BradLarson commented Sep 30, 2022

BradLarson commented Sep 30, 2022

BradLarson commented Oct 6, 2022

BradLarson commented Oct 17, 2022

jckarter commented Oct 24, 2022 • edited Loading

fibrechannelscsi commented Oct 24, 2022 • edited Loading

jckarter commented Oct 24, 2022

fibrechannelscsi commented Oct 24, 2022

BradLarson commented Oct 25, 2022

BradLarson commented Oct 25, 2022

BradLarson commented Oct 25, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fibrechannelscsi Oct 26, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BradLarson commented Oct 26, 2022

BradLarson commented Oct 26, 2022

BradLarson commented Oct 26, 2022

jckarter commented Oct 27, 2022

fibrechannelscsi commented Oct 27, 2022

fibrechannelscsi commented Oct 28, 2022

BradLarson commented Oct 28, 2022

BradLarson commented Oct 31, 2022

fibrechannelscsi commented Oct 31, 2022 • edited Loading

Avoid overwriting `Any` inside `_projectReadOnly()`.

fibrechannelscsi Sep 19, 2022 •

edited

Loading

fibrechannelscsi Sep 19, 2022 •

edited

Loading

fibrechannelscsi commented Sep 28, 2022 •

edited

Loading

jckarter commented Oct 24, 2022 •

edited

Loading

fibrechannelscsi commented Oct 24, 2022 •

edited

Loading

fibrechannelscsi Oct 26, 2022 •

edited

Loading

fibrechannelscsi commented Oct 31, 2022 •

edited

Loading