Skip to content

Conversation

rjmccall
Copy link
Contributor

This is essentially a long-belated follow-up to Arnold's #12606. The key observation here is that the enum-tag-single-payload witnesses are strictly more powerful than the XI witnesses: you can simulate
the XI witnesses by using an extra case count that's <= the XI count. Of course the result is less efficient than the XI witnesses when actually running generic code, but that's less important than overall code size, and we can work on fast-paths in the future.

The extra inhabitant count is stored in a 32-bit field (always present) following the ValueWitnessFlags, which now occupy a fixed 32 bits. This inflates non-XI VWTs on 32-bit targets by a word, but the net effect on XI VWTs is to shrink them by two words, which is likely to be the more important change. Also, being able to access the XI count directly should be a nice win.

@rjmccall
Copy link
Contributor Author

@swift-ci Please benchmark.

@rjmccall
Copy link
Contributor Author

Note that a number of IRGen tests are still failing.

@rjmccall rjmccall requested a review from jckarter December 11, 2018 09:21
@swift-ci
Copy link
Contributor

Build comment file:

Performance: -O

TEST OLD NEW DELTA RATIO
Regression
Breadcrumbs.CopyUTF16CodeUnits.Mixed 55 66 +20.0% 0.83x
Improvement
SortAdjacentIntPyramids 1259 1007 -20.0% 1.25x
LessSubstringSubstring 44 41 -6.8% 1.07x

Code size: -O

TEST OLD NEW DELTA RATIO
Improvement
ProtocolDispatch.o 781 765 -2.0% 1.02x
Hanoi.o 3601 3537 -1.8% 1.02x
ErrorHandling.o 3125 3077 -1.5% 1.02x
OpenClose.o 3282 3234 -1.5% 1.01x
ObserverUnappliedMethod.o 5266 5202 -1.2% 1.01x
Codable.o 36479 36047 -1.2% 1.01x

Performance: -Osize

TEST OLD NEW DELTA RATIO
Improvement
LessSubstringSubstringGenericComparable 44 41 -6.8% 1.07x

Code size: -Osize

TEST OLD NEW DELTA RATIO
Improvement
ProtocolDispatch.o 862 846 -1.9% 1.02x
Hanoi.o 3810 3746 -1.7% 1.02x
ErrorHandling.o 3093 3045 -1.6% 1.02x
OpenClose.o 3728 3680 -1.3% 1.01x
Codable.o 34855 34407 -1.3% 1.01x

Performance: -Onone

TEST OLD NEW DELTA RATIO
Regression
ArrayOfPOD 777 858 +10.4% 0.91x (?)
Improvement
EqualSubstringSubstring 50 46 -8.0% 1.09x
LessSubstringSubstring 50 46 -8.0% 1.09x
EqualSubstringSubstringGenericEquatable 50 46 -8.0% 1.09x

Code size: -swiftlibs

TEST OLD NEW DELTA RATIO
Improvement
libswiftSwiftPrivate.dylib 45056 40960 -9.1% 1.10x
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB
--------------

Copy link
Contributor

@jckarter jckarter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! For a follow up, can we kill the SinglePayloadGeneric runtime functions entirely?

@rjmccall
Copy link
Contributor Author

It's possible, but we'd have to do a lot of work inline in the cases that are currently relying on them (dynamic-layout single-payload enums and dynamic-layout structs).

@rjmccall
Copy link
Contributor Author

@swift-ci test compiler performance

@rjmccall rjmccall force-pushed the remove-xi-witnesses branch from e6b39e0 to 53e0362 Compare December 11, 2018 22:18
@rjmccall
Copy link
Contributor Author

@swift-ci Please test.

@swift-ci
Copy link
Contributor

Build failed
Swift Test Linux Platform
Git Sha - e6b39e0d9db089c21ddca3049d6264f6c7fe7141

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - e6b39e0d9db089c21ddca3049d6264f6c7fe7141

@rjmccall
Copy link
Contributor Author

I might change the dynamic multi-payload implementation to use the fixed-style pattern, since the extra-inhabitants logic is totally static and the only dynamic thing is the offset of the tag.

@jckarter
Copy link
Contributor

I wanted to make it so that single-payload enums with an added tag gave extra inhabitants from the tag like multi-payload enums do too, but ran out of time.

@rjmccall
Copy link
Contributor Author

Yeah, I saw the FIXMEs there. I didn't want to touch that, either.

@rjmccall
Copy link
Contributor Author

...the multi-payload tagging system, even in the purely dynamic, no-spare-bits case, is way weirder than I gave it credit for. I do not understand all this stuff with rotation, but I'm basically going to have to leave the current practice intact.

This is essentially a long-belated follow-up to Arnold's swiftlang#12606.
The key observation here is that the enum-tag-single-payload witnesses
are strictly more powerful than the XI witnesses: you can simulate
the XI witnesses by using an extra case count that's <= the XI count.
Of course the result is less efficient than the XI witnesses, but
that's less important than overall code size, and we can work on
fast-paths for that.

The extra inhabitant count is stored in a 32-bit field (always present)
following the ValueWitnessFlags, which now occupy a fixed 32 bits.
This inflates non-XI VWTs on 32-bit targets by a word, but the net effect
on XI VWTs is to shrink them by two words, which is likely to be the
more important change.  Also, being able to access the XI count directly
should be a nice win.
This allows callers to avoid needing to reload these tags in common cases.
@jckarter
Copy link
Contributor

jckarter commented Dec 12, 2018

The rotation stuff should only impact spare-bits-using multi-payload enums, not any place where we're falling back to the dynamic implementation. For a dynamically laid out multipayload enum, it really should just be a tag at a byte offset.

@rjmccall rjmccall force-pushed the remove-xi-witnesses branch from 53e0362 to 724c192 Compare December 12, 2018 03:19
@rjmccall
Copy link
Contributor Author

@swift-ci Please test.

@swift-ci
Copy link
Contributor

Build failed
Swift Test Linux Platform
Git Sha - 53e0362d51a603c3bc2ff8876e93718884180ac6

@rjmccall rjmccall changed the title [DNM] Remove the extra-inhabitant value witness functions. Remove the extra-inhabitant value witness functions Dec 12, 2018
@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - 53e0362d51a603c3bc2ff8876e93718884180ac6

@swift-ci
Copy link
Contributor

Build comment file:

Summary for master full

Unexpected test results, excluded stats for NonEmpty, Tagged, Wordy, GRDB

Regressions found (see below)

Debug-batch

debug-batch brief

Regressed (0)
name old new delta delta_pct
Improved (0)
name old new delta delta_pct
Unchanged (delta < 1.0% or delta < 100.0ms) (3)
name old new delta delta_pct
Frontend.NumInstructionsExecuted 27,082,149,775,134 27,225,040,835,774 142,891,060,640 0.53%
LLVM.NumLLVMBytesOutput 855,560,228 852,805,022 -2,755,206 -0.32%
time.swift-driver.wall 2713.1s 2719.2s 6.1s 0.22%

debug-batch detailed

Regressed (6)
name old new delta delta_pct
AST.NumASTBytesAllocated 40,508,085,082 41,333,847,765 825,762,683 2.04% ⛔
Sema.NumConformancesDeserialized 3,593,357 3,636,469 43,112 1.2% ⛔
Sema.NumDeclsDeserialized 28,994,229 29,348,411 354,182 1.22% ⛔
Sema.OverriddenDeclsRequest 3,517,623 3,636,066 118,443 3.37% ⛔
Sema.SelfBoundsFromWhereClauseRequest 46,578,493 47,136,807 558,314 1.2% ⛔
Sema.USRGenerationRequest 4,994,237 5,239,151 244,914 4.9% ⛔
Improved (2)
name old new delta delta_pct
Driver.NumDriverPipePolls 305,865 296,687 -9,178 -3.0% ✅
Driver.NumDriverPipeReads 342,245 329,750 -12,495 -3.65% ✅
Unchanged (delta < 1.0% or delta < 100.0ms) (87)
name old new delta delta_pct
AST.NumDecls 61,350 61,350 0 0.0%
AST.NumDependencies 147,620 147,613 -7 -0.0%
AST.NumImportedExternalDefinitions 917,167 917,167 0 0.0%
AST.NumInfixOperators 22,247 22,247 0 0.0%
AST.NumLinkLibraries 0 0 0 0.0%
AST.NumLoadedModules 174,735 174,735 0 0.0%
AST.NumLocalTypeDecls 112 112 0 0.0%
AST.NumObjCMethods 12,563 12,563 0 0.0%
AST.NumPostfixOperators 13 13 0 0.0%
AST.NumPrecedenceGroups 12,067 12,067 0 0.0%
AST.NumPrefixOperators 70 70 0 0.0%
AST.NumReferencedDynamicNames 101 101 0 0.0%
AST.NumReferencedMemberNames 2,790,339 2,790,339 0 0.0%
AST.NumReferencedTopLevelNames 196,565 196,565 0 0.0%
AST.NumSourceBuffers 273,817 273,817 0 0.0%
AST.NumSourceLines 2,035,368 2,035,368 0 0.0%
AST.NumSourceLinesPerSecond 727,995 726,844 -1,151 -0.16%
AST.NumTotalClangImportedEntities 3,355,442 3,366,607 11,165 0.33%
AST.NumUsedConformances 168,313 168,313 0 0.0%
Driver.ChildrenMaxRSS 66,625,419,264 66,751,127,552 125,708,288 0.19%
Driver.DriverDepCascadingDynamic 0 0 0 0.0%
Driver.DriverDepCascadingExternal 0 0 0 0.0%
Driver.DriverDepCascadingMember 0 0 0 0.0%
Driver.DriverDepCascadingNominal 0 0 0 0.0%
Driver.DriverDepCascadingTopLevel 0 0 0 0.0%
Driver.DriverDepDynamic 0 0 0 0.0%
Driver.DriverDepExternal 0 0 0 0.0%
Driver.DriverDepMember 0 0 0 0.0%
Driver.DriverDepNominal 0 0 0 0.0%
Driver.DriverDepTopLevel 0 0 0 0.0%
Driver.NumDriverJobsRun 12,834 12,834 0 0.0%
Driver.NumDriverJobsSkipped 0 0 0 0.0%
Driver.NumProcessFailures 0 0 0 0.0%
Frontend.MaxMallocUsage 342,419,261,680 345,113,767,896 2,694,506,216 0.79%
Frontend.NumInstructionsExecuted 27,082,149,775,134 27,225,040,835,774 142,891,060,640 0.53%
Frontend.NumProcessFailures 0 0 0 0.0%
IRModule.NumIRAliases 90,975 90,975 0 0.0%
IRModule.NumIRBasicBlocks 3,168,818 3,155,147 -13,671 -0.43%
IRModule.NumIRComdatSymbols 0 0 0 0.0%
IRModule.NumIRFunctions 1,529,429 1,520,037 -9,392 -0.61%
IRModule.NumIRGlobals 1,752,117 1,752,117 0 0.0%
IRModule.NumIRIFuncs 0 0 0 0.0%
IRModule.NumIRInsts 40,047,064 39,962,569 -84,495 -0.21%
IRModule.NumIRNamedMetaData 62,817 62,817 0 0.0%
IRModule.NumIRValueSymbols 2,934,288 2,924,896 -9,392 -0.32%
LLVM.NumLLVMBytesOutput 855,560,228 852,805,022 -2,755,206 -0.32%
Parse.NumFunctionsParsed 2,107,329 2,107,329 0 0.0%
Parse.NumIterableDeclContextParsed 837,746 837,746 0 0.0%
SILModule.NumSILGenDefaultWitnessTables 0 0 0 0.0%
SILModule.NumSILGenFunctions 1,232,778 1,232,778 0 0.0%
SILModule.NumSILGenGlobalVariables 23,603 23,603 0 0.0%
SILModule.NumSILGenVtables 10,134 10,134 0 0.0%
SILModule.NumSILGenWitnessTables 33,710 33,710 0 0.0%
SILModule.NumSILOptDefaultWitnessTables 0 0 0 0.0%
SILModule.NumSILOptFunctions 1,101,798 1,101,798 0 0.0%
SILModule.NumSILOptGlobalVariables 24,285 24,285 0 0.0%
SILModule.NumSILOptVtables 16,285 16,285 0 0.0%
SILModule.NumSILOptWitnessTables 66,048 66,048 0 0.0%
Sema.AccessLevelRequest 1,802,769 1,806,400 3,631 0.2%
Sema.DefaultAndMaxAccessLevelRequest 43,420 43,420 0 0.0%
Sema.EnumRawTypeRequest 12,383 12,383 0 0.0%
Sema.ExtendedNominalRequest 2,601,810 2,613,583 11,773 0.45%
Sema.InheritedDeclsReferencedRequest 80,330,125 80,698,326 368,201 0.46%
Sema.InheritedTypeRequest 436,697 436,666 -31 -0.01%
Sema.IsDynamicRequest 1,442,770 1,442,770 0 0.0%
Sema.IsObjCRequest 1,244,108 1,245,450 1,342 0.11%
Sema.NamedLazyMemberLoadFailureCount 17,439 17,488 49 0.28%
Sema.NamedLazyMemberLoadSuccessCount 11,937,571 11,937,197 -374 -0.0%
Sema.NominalTypeLookupDirectCount 23,278,016 23,320,761 42,745 0.18%
Sema.NumConstraintScopes 10,900,519 10,905,697 5,178 0.05%
Sema.NumConstraintsConsideredForEdgeContraction 20,173,850 20,174,679 829 0.0%
Sema.NumDeclsValidated 1,540,705 1,540,705 0 0.0%
Sema.NumFunctionsTypechecked 790,555 790,555 0 0.0%
Sema.NumGenericSignatureBuilders 833,318 837,368 4,050 0.49%
Sema.NumLazyGenericEnvironments 5,941,402 5,996,432 55,030 0.93%
Sema.NumLazyGenericEnvironmentsLoaded 162,944 162,916 -28 -0.02%
Sema.NumLazyIterableDeclContexts 4,733,796 4,749,623 15,827 0.33%
Sema.NumLeafScopes 7,684,676 7,689,219 4,543 0.06%
Sema.NumTypesDeserialized 10,721,085 10,782,483 61,398 0.57%
Sema.NumTypesValidated 1,030,130 1,030,131 1 0.0%
Sema.NumUnloadedLazyIterableDeclContexts 3,339,084 3,333,427 -5,657 -0.17%
Sema.RequirementRequest 54,264 54,264 0 0.0%
Sema.SetterAccessLevelRequest 98,823 98,823 0 0.0%
Sema.SuperclassDeclRequest 63,403,043 63,486,643 83,600 0.13%
Sema.SuperclassTypeRequest 30,156 30,156 0 0.0%
Sema.TypeDeclsFromWhereClauseRequest 25,755 25,755 0 0.0%
Sema.UnderlyingTypeDeclsReferencedRequest 2,333,195 2,333,851 656 0.03%

Release

release brief

Regressed (0)
name old new delta delta_pct
Improved (0)
name old new delta delta_pct
Unchanged (delta < 1.0% or delta < 100.0ms) (3)
name old new delta delta_pct
Frontend.NumInstructionsExecuted 21,530,281,116,542 21,509,179,728,071 -21,101,388,471 -0.1%
LLVM.NumLLVMBytesOutput 787,347,934 785,876,920 -1,471,014 -0.19%
time.swift-driver.wall 4043.8s 4035.2s -8.6s -0.21%

release detailed

Regressed (0)
name old new delta delta_pct
Improved (0)
name old new delta delta_pct
Unchanged (delta < 1.0% or delta < 100.0ms) (23)
name old new delta delta_pct
AST.NumImportedExternalDefinitions 170,174 170,174 0 0.0%
AST.NumLoadedModules 10,893 10,893 0 0.0%
AST.NumTotalClangImportedEntities 580,402 580,402 0 0.0%
AST.NumUsedConformances 169,030 169,030 0 0.0%
IRModule.NumIRBasicBlocks 2,750,837 2,742,496 -8,341 -0.3%
IRModule.NumIRFunctions 1,268,487 1,259,849 -8,638 -0.68%
IRModule.NumIRGlobals 1,399,736 1,399,736 0 0.0%
IRModule.NumIRInsts 26,686,582 26,587,273 -99,309 -0.37%
IRModule.NumIRValueSymbols 2,497,464 2,488,826 -8,638 -0.35%
LLVM.NumLLVMBytesOutput 787,347,934 785,876,920 -1,471,014 -0.19%
SILModule.NumSILGenFunctions 536,776 536,776 0 0.0%
SILModule.NumSILOptFunctions 660,551 660,551 0 0.0%
Sema.NumConformancesDeserialized 1,496,721 1,496,721 0 0.0%
Sema.NumConstraintScopes 9,602,768 9,602,768 0 0.0%
Sema.NumDeclsDeserialized 3,900,982 3,900,982 0 0.0%
Sema.NumDeclsValidated 811,910 811,910 0 0.0%
Sema.NumFunctionsTypechecked 344,933 344,933 0 0.0%
Sema.NumGenericSignatureBuilders 141,348 141,348 0 0.0%
Sema.NumLazyGenericEnvironments 804,925 804,925 0 0.0%
Sema.NumLazyGenericEnvironmentsLoaded 15,034 15,034 0 0.0%
Sema.NumLazyIterableDeclContexts 515,603 515,603 0 0.0%
Sema.NumTypesDeserialized 2,109,443 2,109,443 0 0.0%
Sema.NumTypesValidated 389,671 389,671 0 0.0%

@rjmccall rjmccall merged commit 570840f into swiftlang:master Dec 12, 2018
@rjmccall rjmccall deleted the remove-xi-witnesses branch December 12, 2018 06:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants