Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generation of tier 2 traces is overly optimistic about the accuracy of profiling for branches. #115727

Open
markshannon opened this issue Feb 20, 2024 · 3 comments
Labels
3.13 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@markshannon
Copy link
Member

markshannon commented Feb 20, 2024

Bug report

We have a test for our confidence that the trace is going the right way:

def test_confidence_score(self):
    def testfunc(n):
        bits = 0
        for i in range(n):
            if i & 0x01:
                bits += 1
            if i & 0x02:
                bits += 1
            if i&0x04:
                bits += 1
            if i&0x08:
                bits += 1
            ...

But consider this:

def test_confidence_score(self):
    def testfunc(n):
        bits = 0
        for i in range(n):
            if i & 0x10:
                bits += 1
            if i & 0x20:
                bits += 1
            if i&0x40:
                bits += 1
            if i&0x80:
                bits += 1
            ...

The direction of the branch changes every 16 iterations, so looks perfectly predictable to profiling but it as pathalogical as the first version.

How to fix this

We should lower our confidence a bit, even if profiling claims 100% predictability.

I don't think there is a perfect way to do this, but it shouldn't too hard to come up with something good enough.

Here is a possible scheme

Use lower confidence and threshold when generating the initial trace:

  • 0.8 * profile_fraction + 0.1 for bool checks (maybe higher for is None checks)
  • 0.95 - 0.05 * misses recorded for type checks
  • Stop trace at a lower confidence than before, say < 0.2

Reassess confidence in optimizer:

  • Type information may raise confidence (it we eliminate a guard, we have 100% confidence)
  • Stop trace at confidence < 0.33 (maybe allow a lower confidence if it completes a loop)

Linked PRs

@markshannon markshannon added type-bug An unexpected behavior, bug, or error 3.13 bugs and security fixes labels Feb 20, 2024
@Eclips4 Eclips4 added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Feb 20, 2024
@markshannon
Copy link
Member Author

@gvanrossum this might be of interest to you

markshannon added a commit that referenced this issue Feb 20, 2024
@gvanrossum
Copy link
Member

I tried to use lltrace debugging to understand better what's happening here (you probably already know this). A few issues contribute to the seemingly random behavior if the test is repeatedly called after resetting optimizers and executors. I simulated this effect as follows:

    def test_repeat_confidence_score(self):
        for _ in range(10):
            self.test_confidence_score()
            sys._clear_internal_caches()

One of the things I learned is that it's not just the bitmasks in the POP_JUMP_IF_* opcodes that aren't being reset; the counter in JUMP_BACKWARD survives the reset too (of course).

This has the interesting effect that the first run is different than all later runs: the first time, the counter starts at 0 and we optimize when it has hit the threshold (16), so we execute 16 iterations in Tier 1. But all later runs (well, until our poor signed 12 bits counter overflows) execute a single Tier 1 iteration, find the counter already > 16, and immediately optimize.

Since the initial iteration is always executed in Tier 1, and the jump bitmask is only updated in Tier 1, and the first iteration always has i == 0, this jump is always taken and counted on the first iteration, biasing the bitmask towards 1. On the second iteration, i == 1, so the jump isn't taken and the executor takes the side exit, but this manipulates the Tier 1 instruction pointer directly, and doesn't update the bitmask. On the third iteration, i == 2 and the executor makes it past the successful jump. Etcetera.

All in all we now find that for the second and following calls to testfunc(), the bitmask for the first POP_JUMP_IF_FALSE only gets updated once per call, and it always gets marked as "taken", gradually shifting in more 1 bits, and hence increasing the "confidence" as well. I believe this is what caused this test to break the refleaks CI build (the extra confidence caused a second POP_JUMP_IF_FALSE to be kept in the trace).

Is there a bug here? We might argue that we don't care -- this is obviously an unusual situation, regular apps won't be resetting the optimizer or clearing the executors, so the rolling over of the counter in JUMP_BACKWARD and the resulting strange bias in the POP_JUMP_IF_* bitmask won't affect regular users. But it sure surprised me!

@gvanrossum
Copy link
Member

Use lower confidence and threshold when generating the initial trace:

* 0.8 * profile_fraction + 0.1 for bool checks (maybe higher for `is None` checks)

The first half of this is simple (it changes the multiplier to something in the range 0.5-0.9 instead of 0.5-1.0), and I'll send a PR for this.

I'm not sure what to do for None checks -- are you suggesting that if we see e.g. 16 None checks fail in a row we should be more confident that this trend continues than if we see 16 failing False checks in a row? That feels a bit specific to the examples we typically use in tests -- while the if i & 0x01 example is nice to generate a worst-case example quickly, it doesn't feel very realistic, so I don't know how to pick a different threshold for None checks,

* 0.95 - 0.05 * misses recorded for type checks

Assuming you're referring to CALL_ISINSTANCE, the machinery needed to actually record such failed type checks is a bit more involved -- we have spare cache entries in that opcode where we can record the outcome similar to the POP_JUMP_IF* opcodes, and then we could special-case that when constructing traces to adjust the confidence value based on that. Is that what you had in mind? Or are you talking about guards like _GUARD_BOTH_INT? (Hopefully those have much higher likelihood of being correct.)

Assuming CALL_ISINSTANCE, would you want the confidence adjustment to only happen when followed by a POP_JUMP_IF_*? How would we arrange that?

* Stop trace at a lower confidence than before, say < 0.2

I'm not ready for that yet. With the current threshold (1/3), it would take over 10 maximally-confident jumps to end the trace early -- that's gotta be very branchy code indeed. :-) For unpredictable jumps, there is no difference.

gvanrossum added a commit that referenced this issue Feb 22, 2024
The theory is that even if we saw a jump go in the same direction the
last 16 times we got there, we shouldn't be overly confident that it's
still going to go the same way in the future. This PR makes it so that
in the extreme cases, the confidence is multiplied by 0.9 instead of
remaining unchanged. For unpredictable jumps, there is no difference
(still 0.5). For somewhat predictable jumps, we interpolate.
woodruffw pushed a commit to woodruffw-forks/cpython that referenced this issue Mar 4, 2024
woodruffw pushed a commit to woodruffw-forks/cpython that referenced this issue Mar 4, 2024
…on#115748)

The theory is that even if we saw a jump go in the same direction the
last 16 times we got there, we shouldn't be overly confident that it's
still going to go the same way in the future. This PR makes it so that
in the extreme cases, the confidence is multiplied by 0.9 instead of
remaining unchanged. For unpredictable jumps, there is no difference
(still 0.5). For somewhat predictable jumps, we interpolate.
diegorusso pushed a commit to diegorusso/cpython that referenced this issue Apr 17, 2024
diegorusso pushed a commit to diegorusso/cpython that referenced this issue Apr 17, 2024
…on#115748)

The theory is that even if we saw a jump go in the same direction the
last 16 times we got there, we shouldn't be overly confident that it's
still going to go the same way in the future. This PR makes it so that
in the extreme cases, the confidence is multiplied by 0.9 instead of
remaining unchanged. For unpredictable jumps, there is no difference
(still 0.5). For somewhat predictable jumps, we interpolate.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

3 participants