Skip to content

Fix KnownSafety optimization bugs in ARCSequenceOpts #33722

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Sep 4, 2020

Conversation

meg-gupta
Copy link
Contributor

@meg-gupta meg-gupta commented Aug 31, 2020

ARCSequenceOpts does KnownSafety optimization where 'KnownSafe' retain/release pairs are deleted.
A retain is 'KnownSafe' if during top down dataflow, it is preceded by another retain or non-arc use on the same RCIdentity.
A release is 'KnownSafe' if during bottom up dataflow, it is preceded by another release or non-arc use on the same RCIdentity.
This PR fixes the following two bugs in KnownSafety optimization.

a) KnownSafety bug in ARCSequenceOpts without loop support

1: retain_value S
2: retain_value S

3: strong_release S.subobject
4: strong_retain S.subobject
5: release_value S
6: release_value S

2 and 5 are marked as 'KnownSafe'. And currently they get paired and optimized away. 2 and 5 pairing is incorrect, this can be seen clearly expanding retain_value and release_value.

1e: strong_retain S.subobject
2e: strong_retain S.subobject
3: strong_release S.subobject
4: strong_retain S.subobject
5e: strong_release S.subobject
6e: strong_release S.subobject

The correct pairings are 2e and 3, 4 and 5e. Since RCIdentity distinguishes between S and S.subobject we can end
up with incorrect pairings and optimization with KnownSafety. This is true for any aliasing RCIdentities.

b) KnownSafety bug in ARCSequenceOpts with loop support

bb0:
   1: strong_retain S
   2: strong_retain S
loop:
    3: strong_release S
    4: strong_retain S
    5: cond_br undef, loop, bb2
bb2:
   6: strong_release S
   7: strong_release S

2 and 6 are marked as 'KnownSafe'. And currently get paired and optimized away. This is incorrect and the correct pairings are 2 and 3 and 4 and 6. ARCLoopOpts computes loop summaries and uses it to only compute refcount states for CodeMotionSafe optimization in ARCSequenceOpts. The instructions in the loop have no effect on the already computed 'KnownSafe' states outside the loop.

With this PR, while doing bottom up dataflow, if we encounter an unmatched retain instruction, that can pair with a 'KnownSafe' already visited release instruction, we turn off KnownSafety if the two
RCIdentities mayAlias.
Similarly, during top down dataflow, if we encounter an unmatched release instruction, that can pair with a 'KnownSafe' already visited retain instruction, we turn off KnownSafety if the two RCIdentities mayAlias. This fixes the bugs in both ARCSequenceOpts without loop support and with loop support.

Fixes rdar://67720198 and rdar://27775427

@meg-gupta meg-gupta requested review from gottesmm and atrick August 31, 2020 18:25
Copy link
Contributor

@gottesmm gottesmm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a bunch of refactoring going on here. Can you split that out in a separate PR and land that separately?

SetFactory);

auto II = this->summarizedinterestinginsts_rbegin();
auto IE = this->summarizedinterestinginsts_rend();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the usage of this.

@gottesmm
Copy link
Contributor

Just reading the description, this is incorrect. We only require bottom up that there is a post-dominating use that requires liveness that will not be removed.

@meg-gupta
Copy link
Contributor Author

@gottesmm I edited the description of 'KnownSafety' to also include non-arc use.

@meg-gupta meg-gupta force-pushed the asochanges branch 3 times, most recently from 7cd553e to 52bb0ef Compare September 1, 2020 21:58
@meg-gupta
Copy link
Contributor Author

@swift-ci test

@meg-gupta
Copy link
Contributor Author

@swift-ci benchmark

@swift-ci
Copy link
Contributor

swift-ci commented Sep 1, 2020

Build failed
Swift Test Linux Platform
Git Sha - 1540e28ff8a4584406b30d44195490ff11bea9ef

@swift-ci
Copy link
Contributor

swift-ci commented Sep 1, 2020

Build failed
Swift Test OS X Platform
Git Sha - 1540e28ff8a4584406b30d44195490ff11bea9ef

@swift-ci
Copy link
Contributor

swift-ci commented Sep 1, 2020

Performance: -O

Regression OLD NEW DELTA RATIO
FlattenListFlatMap 2556 3859 +51.0% 0.66x (?)
UTF8Decode_InitFromCustom_contiguous 143 169 +18.2% 0.85x
UTF8Decode_InitDecoding 143 168 +17.5% 0.85x (?)
ArrayLiteral2 67 78 +16.4% 0.86x
Set.isStrictSubset.Int.Empty 45 52 +15.6% 0.87x (?)
Set.isSubset.Int.Empty 43 49 +14.0% 0.88x (?)
Set.isStrictSubset.Seq.Int.Empty 108 120 +11.1% 0.90x (?)
BucketSort 137 148 +8.0% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
ObjectiveCBridgeStubDateAccess 152 130 -14.5% 1.17x (?)
SuffixAnySequence 1691 1572 -7.0% 1.08x (?)

Code size: -O

Performance: -Osize

Regression OLD NEW DELTA RATIO
Set.subtracting.Empty.Box 7 9 +28.6% 0.78x
UTF8Decode_InitFromCustom_contiguous 143 169 +18.2% 0.85x (?)
UTF8Decode_InitDecoding 145 168 +15.9% 0.86x
Set.isStrictSubset.Int.Empty 47 53 +12.8% 0.89x (?)
DataCreateEmptyArray 800 900 +12.5% 0.89x (?)
StringUTF16Builder 210 230 +9.5% 0.91x (?)
UTF8Decode_InitFromCustom_noncontiguous 263 288 +9.5% 0.91x (?)
Set.isStrictSubset.Seq.Int.Empty 112 122 +8.9% 0.92x (?)
ArrayLiteral2 98 106 +8.2% 0.92x (?)
 
Improvement OLD NEW DELTA RATIO
FlattenListFlatMap 2903 2466 -15.1% 1.18x (?)
RemoveWhereFilterInts 26 24 -7.7% 1.08x (?)
DistinctClassFieldAccesses 200 186 -7.0% 1.08x (?)
Chars2 3650 3400 -6.8% 1.07x (?)
StringBuilderLong 1060 990 -6.6% 1.07x (?)

Code size: -Osize

Performance: -Onone

Regression OLD NEW DELTA RATIO
UTF8Decode_InitDecoding 154 174 +13.0% 0.89x (?)
UTF8Decode_InitFromCustom_contiguous 157 177 +12.7% 0.89x
DictionaryKeysContainsCocoa 27 30 +11.1% 0.90x (?)
 
Improvement OLD NEW DELTA RATIO
SubstringFromLongString 10 9 -10.0% 1.11x (?)

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: 6-Core Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

Copy link
Contributor

@atrick atrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still trying to prove to myself it's correct and efficient.

I was initially worried that this would fail to optimize a more "normal" case like this:

1: retain_value S
2: retain_value S
3: strong_retain S.subobject
4: strong_release S.subobject
5: release_value S
6: release_value S

Maybe that case will actually sill be optimized with your change? Anyway, I won't argue strongly that this pass needs to handle all cases since SemanticARCOpts can handle it eventually.

Will we correctly handle a case like this?

1: retain_value S
2: retain_value S
3: strong_release S.subobject
4: release_value S
5: strong_retain S.subobject
6: release_value S

@meg-gupta
Copy link
Contributor Author

@atrick We will still optimize the "normal" case. Test $matched_rr_subobject in arcsequenceopts_knownsafetybugs.sil has this pattern.

@meg-gupta
Copy link
Contributor Author

meg-gupta commented Sep 2, 2020

In:

1: retain_value S
2: retain_value S
3: strong_release S.subobject
4: release_value S
5: strong_retain S.subobject
6: release_value S

2 and 4 get paired and optimized away due to CodeMotionSafe. This fix turns off KnownSafety on 2 when it sees 3.

@atrick
Copy link
Contributor

atrick commented Sep 2, 2020

I'm trying to understand why this known safety fix is looking for retains. Let's say code-motion safe doesn't apply. So what about this?:

1: retain_value S
2: retain_value S
3: strong_release S.subobject
4: use(S)
5: release_value S
6: strong_retain S.subobject
7: release_value S

@meg-gupta
Copy link
Contributor Author

@atrick Similar to previous example, 3 turns off KnownSafety on 2. And nothing gets optimized if we consider only 'KnownSafety'.
In bottom up traversal, the fix only looks at unmatched retains that could potentially match with already visited KnownSafe RCIdentity.

@atrick
Copy link
Contributor

atrick commented Sep 3, 2020

For efficiency, RefCountInstToMatchedMap can just be a set of unmatched instructions.

@atrick
Copy link
Contributor

atrick commented Sep 3, 2020

When we summarize the loop in the RefCountInstToMatchedMap. It assumes that the IncToDecStateMap finds any unmatched pair on any paths through the loop.

LoopARCSequenceDataflowEvaluator::mergeSuccessors clears the data flow state at any non-local successor, so it should be conservative.

This only time this doesn't happen is with "allowsleaks" exist, which reach an unreachable before any side-effect instructions.

To be safe, add tests cases for when the release is on the loop tail; add a test:

loop:
  retain
  cond_br exit
  release
  br loop

loop:
  retain
  cond_br exit-to-unreachable
  release
  br loop

@atrick
Copy link
Contributor

atrick commented Sep 3, 2020

When computing the loop summary, the code assumes that dataflow can be applied to each "interesting" instruction in order to produce a summary. In general this is not sound because loops iterate and the order is meaningless. This means that we're relying on updateForDifferentLoopInst to apply data flow so conservatively that the order doesn't matter. It's not clear that's actually happening. We should clearly comment that requirement and verify the assumption under some verification flag:

In ARCRegionState::processLoopBottomUp

  // For each state that we are currently tracking, apply our summarized
  // instructions to it.
  for (auto &OtherState : getBottomupStates()) {
    if (!OtherState.hasValue())
      continue;

    for (auto *I : State->getSummarizedInterestingInsts()) {
      OtherState->second.updateForDifferentLoopInst(I, AA);
      OtherState->second.checkAndResetKnownSafety(
          I, OtherState->first, checkIfRefCountInstIsMatched, RCIA, AA);
    }
#ifndef NDEBUG
    if (verifyLoopARCSummary) {
      auto newOtherState = OtherState
      for (auto *I : State->getSummarizedInterestingInsts()) {
        OtherState->second.updateForDifferentLoopInst(I, AA);
        OtherState->second.checkAndResetKnownSafety(
            I, OtherState->first, checkIfRefCountInstIsMatched, RCIA, AA);
      }
      assert(newOtherState == OtherState);
    }
#endif
  }

Copy link
Contributor

@atrick atrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked for some test cases and comments. Otherwise this looks good to me. Thanks!

Regarding the loop data-flow verification, that's a separate issue from this PR so can be done in a follow-up.

While doing bottom up dataflow, if we encounter an
unmatched retain instruction, that can pair with a 'KnownSafe'
already visited release instruction, we turn off KnownSafety if the two
RCIdentities mayAlias.
This is done in BottomUpRefCountState::checkAndResetKnownSafety.
In order to determine if a retain is umatched, we look at
IncToDecStateMap. If a retain was matched during bottom up dataflow, it
is always found in IncToDecStateMap with value of the matched release's
BottomUpRefCountState.

Similarly, during top down dataflow, if we encounter an unmatched
release instruction, that can pair with a 'KnownSafe' already
visited retain instruction, we turn off KnownSafety if the two RCIdentities
mayAlias.
This is done in TopDownRefCountState::checkAndResetKnownSafety.
In order to determine if a release is umatched, we look at
DecToIncStateMap. If a release was matched during top down dataflow, it
is always found in DecToIncStateMap with value of the matched retain's
TopDownRefCountState.

For ARCLoopOpts, during bottom up and top down traversal of a region with
a nested loop, we find if the retain/release in the loop summary was
matched or not by looking at the persistent RefCountInstToMatched map.
This map is populated when processing the nested loop region from the
IncToDecStateMap/DecToStateMap which gets thrown away after the loop
region is processed.

This fixes the bugs in both ARCSequenceOpts without loop
support and with loop support.
@meg-gupta
Copy link
Contributor Author

@swift-ci test

@meg-gupta
Copy link
Contributor Author

@swift-ci Please Test Source Compatibility

@swift-ci
Copy link
Contributor

swift-ci commented Sep 3, 2020

Build failed
Swift Test Linux Platform
Git Sha - 52bb0ef94cf8633a0c6c1120a751dfdfbcd90e61

@swift-ci
Copy link
Contributor

swift-ci commented Sep 3, 2020

Build failed
Swift Test OS X Platform
Git Sha - 52bb0ef94cf8633a0c6c1120a751dfdfbcd90e61

@meg-gupta
Copy link
Contributor Author

@swift-ci Please Test Source Compatibility

@meg-gupta
Copy link
Contributor Author

Addressed review comments.

@meg-gupta
Copy link
Contributor Author

sck debug failures are pre-existing

@meg-gupta meg-gupta merged commit 1372414 into swiftlang:master Sep 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants