More opcode predictions #71442

serhiy-storchaka · 2016-06-07T18:41:44Z

BPO	27255
Nosy	@rhettinger, @serhiy-storchaka
Files	more_opcode_predicts.patch more_opcode_predicts_2.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/serhiy-storchaka'
closed_at = <Date 2016-06-28.07:09:50.943>
created_at = <Date 2016-06-07.18:41:44.446>
labels = ['interpreter-core', 'type-feature']
title = 'More opcode predictions'
updated_at = <Date 2016-06-28.07:09:50.941>
user = 'https://github.com/serhiy-storchaka'

bugs.python.org fields:

activity = <Date 2016-06-28.07:09:50.941>
actor = 'serhiy.storchaka'
assignee = 'serhiy.storchaka'
closed = True
closed_date = <Date 2016-06-28.07:09:50.943>
closer = 'serhiy.storchaka'
components = ['Interpreter Core']
creation = <Date 2016-06-07.18:41:44.446>
creator = 'serhiy.storchaka'
dependencies = []
files = ['43283', '43547']
hgrepos = []
issue_num = 27255
keywords = ['patch']
message_count = 8.0
messages = ['267719', '267754', '269314', '269350', '269384', '269391', '269409', '269419']
nosy_count = 3.0
nosy_names = ['rhettinger', 'python-dev', 'serhiy.storchaka']
pr_nums = []
priority = 'low'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue27255'
versions = ['Python 3.6']

serhiy-storchaka · 2016-06-07T18:41:43Z

Currently the PREDICT() macros is used for predicticting following pair of opcodes in the ceval loop:

LIST_APPEND JUMP_ABSOLUTE
SET_ADD JUMP_ABSOLUTE
MAP_ADD JUMP_ABSOLUTE
COMPARE_OP POP_JUMP_IF_FALSE
COMPARE_OP POP_JUMP_IF_TRUE
GET_ITER FOR_ITER
FOR_ITER STORE_FAST (if continue iterating)
FOR_ITER UNPACK_SEQUENCE (if continue iterating)
WITH_CLEANUP_START WITH_CLEANUP_FINISH
WITH_CLEANUP_FINISH END_FINALLY

I collected static and dynamic opcode statistics. Static statistics is a statistics about what opcodes are used in code objects after running the compileall script on the Lib directory. Dynamic statistics is is a statistics about what opcodes are executed when run all Python tests. Resulting numbers are close, but there are some reasonable differences (opcodes used in loops have larger numbers in dynamic statistics, and opcodes used for creating classes and functions have larger numbers in static statistics). In following lines the fist number is from dynamic statistics and the second number is from static statistics.

LIST_APPEND, SET_ADD and MAP_ADD are always followed by JUMP_ABSOLUTE.
COMPARE_OP is followed by POP_JUMP_IF_FALSE in 83% (78%) and by POP_JUMP_IF_TRUE in 13% (5%).
GET_ITER is followed by FOR_ITER in 72% (80%) and CALL_FUNCTION in 28% (28%).
FOR_ITER is followed by STORE_FAST in 50% (80%), UNPACK_SEQUENCE in 21% (18%) and POP_BLOCK in 20% (0%).
WITH_CLEANUP_START is always followed by WITH_CLEANUP_FINISH.
WITH_CLEANUP_FINISH is always followed by END_FINALLY

According to this statistic existing predictions are correct.

But the statistics suggests other predictions.

JUMP_ABSOLUTE is followed by FOR_ITER in 77% (0%).
UNPACK_SEQUENCE is followed by STORE_FAST in 97% (94%).
SETUP_LOOP is followed by LOAD_FAST in 81% (52%) and LOAD_GLOBAL in 18% (31%).
BUILD_SLICE is followed by BINARY_SUBSCR in 99% (87%).
ROT_TWO is followed by STORE_FAST in 85% (40%).
LOAD_CLOSURE is followed by BUILD_TUPLE in 94% (68%).
SETUP_WITH is followed by POP_TOP in 94% (54%) and STORE_FAST in 6% (44%).
GET_YIELD_FROM_ITER, GET_AITER, GET_ANEXT and GET_AWAITABLE are always followed by LOAD_CONST.
DUP_TOP_TWO is always followed by BINARY_SUBSCR.
BEFORE_ASYNC_WITH is always followed by GET_AWAITABLE.
All INPLACE_ instructions almost always are followed by STORE_FAST.

Proposed patch adds predictions of following pairs:

DUP_TOP_TWO BINARY_SUBSCR
INPLACE_POWER STORE_FAST
INPLACE_MULTIPLY STORE_FAST
INPLACE_MATRIX_MULTIPLY STORE_FAST
INPLACE_TRUE_DIVIDE STORE_FAST
INPLACE_FLOOR_DIVIDE STORE_FAST
INPLACE_MODULO STORE_FAST
INPLACE_ADD STORE_FAST
INPLACE_SUBTRACT STORE_FAST
INPLACE_LSHIFT STORE_FAST
INPLACE_RSHIFT STORE_FAST
INPLACE_AND STORE_FAST
INPLACE_XOR STORE_FAST
INPLACE_OR STORE_FAST
GET_AITER LOAD_CONST
GET_ANEXT LOAD_CONST
GET_AWAITABLE LOAD_CONST
UNPACK_SEQUENCE STORE_FAST
LOAD_CLOSURE BUILD_TUPLE
GET_ITER CALL_FUNCTION
GET_YIELD_FROM_ITER LOAD_CONST
FOR_ITER POP_BLOCK
BEFORE_ASYNC_WITH GET_AWAITABLE
SETUP_WITH POP_TOP
SETUP_WITH STORE_FAST
BUILD_SLICE BINARY_SUBSCR

Note that the effect of this change is not very significant since the PREDICT() macros is no-op if computed GOTOs are used.

rhettinger · 2016-06-08T00:01:18Z

Serhiy, please slow down and stop rewriting every single thing you see. Your rate of patches is prolific and hard to digest. Please give some consideration that the people who came before you (lots of them) put a lot of thought into what was done and were deliberately conservative.

Dynamic statistics can vary widely depending on code being profiled (much like PGO optimizations for C compilers).

For a prediction to have value, it needs to be a commonly used opcode; it must occur in a highly predictive pair with an intrinsic relationship rather than a coincidental relationship.

FWIW, these tests aren't free. Global prediction tables have a limited size and are subject to aliasing. Mispredictions are expensive. Also, the ceval loop doesn't need more clutter.

The new opcodes GET_YIELD_FROM_ITER, GET_AITER, GET_ANEXT, and GET_AWAITABLE haven't been considered before. The code in Python/compile.c shows that LOAD_CONST is the only possible next opcode, so these are reasonable candidates (i.e. the pair effectively acts as a single opcode). Of these, I think only GET_ANEXT and GET_AWAITABLE are likely to occur in a loop. So, these two may be worthwhile. All the rest should probably be skipped.

serhiy-storchaka · 2016-06-26T21:38:15Z

Since existing predictions were added many new opcodes and opcode combinations were added. Predictions need to be updated to reflect current status.

The patch contained only predictions for highly predictive (>=94%) pairs of commonly used opcodes or 100% predictive pairs of uncommonly used opcodes (but they can be more common in specific applications).

The second version of the patch contains only 100% predictions. It doesn't depend on any statistics, the compiler always generate these opcodes in pairs.

rhettinger · 2016-06-27T05:20:07Z

The second version of the patch mostly looks fine.

The prediction from DUP_TOP_TWO to BINARY_SUBSCR is questionable. Because compile.c only uses it one context, the prediction rate is 100%; however, there is no intrinsic relationship between the two opcodes, so the prediction is happenstance and fragile. Reconsider whether you really want this prediction pairing.

python-dev · 2016-06-27T15:59:23Z

New changeset f19c2b28710e by Serhiy Storchaka in branch 'default':
Issue bpo-27255: Added more predictions in ceval.c.
https://hg.python.org/cpython/rev/f19c2b28710e

serhiy-storchaka · 2016-06-27T18:14:51Z

Thank you Raymond. Committed without the prediction from DUP_TOP_TWO to BINARY_SUBSCR.

What are you think about following highly predictive pairs?

From UNPACK_SEQUENCE to STORE_FAST with 96.5% probability. This is the 15th of most common pairs. It is more common than any other predicted pairs except the COMPARE_OP/POP_JUMP_IF_FALSE pair. I suppose it is mostly used in for loops over dict.items(), enumerate(), etc. I suppose the remaining 3.5% are unpacking to object attributes (like "self.x, self.y = ...").
From BUILD_SLICE to BINARY_SUBSCR with 99.3% probability. This is the 37th of most common pairs. It is more common than any other predicted pairs except the three most common pairs. The remaining 0.7% are slice assignment (0.42%), slice deleting (0.29%), slice inplace operations and extended slices.

FYI here is a list of most common pairs (predicted pairs are starred).

5.84% LOAD_FAST LOAD_FAST 22.6%
5.16% LOAD_FAST LOAD_ATTR 20.0%
4.18% COMPARE_OP POP_JUMP_IF_FALSE 82.9% *
3.97% POP_JUMP_IF_FALSE LOAD_FAST 66.3%
3.90% STORE_FAST LOAD_FAST 47.2%
3.70% LOAD_FAST CALL_FUNCTION 14.3%
3.36% LOAD_FAST LOAD_CONST 13.0%
2.64% LOAD_ATTR LOAD_FAST 35.2%
2.28% LOAD_CONST COMPARE_OP 26.7%
2.12% STORE_FAST STORE_FAST 25.6%
2.09% LOAD_GLOBAL LOAD_FAST 37.5%
1.49% CALL_FUNCTION STORE_FAST 20.5%
1.44% <0> LOAD_FAST 39.1%
1.37% JUMP_ABSOLUTE FOR_ITER 77.6%
1.29% UNPACK_SEQUENCE STORE_FAST 96.5%
1.28% CALL_FUNCTION POP_TOP 17.7%
1.28% LOAD_FAST LOAD_GLOBAL 4.9%
1.26% FOR_ITER STORE_FAST 50.3% *
1.25% LOAD_CONST RETURN_VALUE 14.6%
1.19% LOAD_ATTR LOAD_CONST 15.9%
...
0.65% COMPARE_OP POP_JUMP_IF_TRUE 13.0% *
0.65% BUILD_SLICE BINARY_SUBSCR 99.3%
...
0.55% SETUP_LOOP LOAD_FAST 80.7%
0.55% GET_ITER FOR_ITER 71.9% *
0.53% FOR_ITER UNPACK_SEQUENCE 21.2% *
...
0.50% FOR_ITER POP_BLOCK 20.0% *
...
0.33% ROT_TWO STORE_FAST 85.8%
...
0.31% INPLACE_ADD STORE_FAST 92.1%
...
0.30% LIST_APPEND JUMP_ABSOLUTE 100.0% *
...
0.22% BUILD_MAP STORE_FAST 85.3%
...
0.21% GET_ITER CALL_FUNCTION 28.1% *

rhettinger · 2016-06-27T22:17:29Z

The BUILD_SLICE pairing with BINARY_SUBSCR has a tight conceptual linkage with the lookup case dominating use patterns over slice deletion and slice assignment. IIRC, I passed on that pairing long ago because even though the predictive power was good, it didn't come up much (a very, very low percentage of executed op-codes in any real programs). That would mean the user wouldn't be likely to see any benefit and I worried about the cost (global prediction tables have a finite size and are subject to aliased false positives, so when in doubt, it is better to not do it at all.)

The UNPACK_SEQUENCE pairing with STORE_FAST had the opposite issue. The two arise in real code often enough to be interesting, but I wasn't as confident that it wasn't competing with alternative successors like STORE_NAME or attribute storage. Here, the statistics are heavily influenced by whatever is being profiled. Given that I wasn't confident in the pairing, I passed on it.

That said, there's nothing terribly wrong with either, so it is hard to veto them. Now as well as back when the opcode predictions were first studied, I remain -0 on both pairings.

In general, we should have a bias towards there being relatively few predictions (each one adds size to the generated code, adds load to the branch prediction tables, and adds to the cognitive load of anyone modifying the compiler, the peephole optimizer, or adding new opcodes) and there should be a preference to leave out cases where there is doubt.

I'll leave it to you to decide whether either one of these should go in (I just wanted you to know why they weren't included in the first place).

serhiy-storchaka · 2016-06-28T07:09:51Z

OK, I'm only +0 on adding these predictions and fine with status quo. The ratio 28:1 looks compelling to me, but all this is not enough important.

serhiy-storchaka added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Jun 7, 2016

rhettinger self-assigned this Jun 8, 2016

rhettinger assigned serhiy-storchaka and unassigned rhettinger Jun 27, 2016

serhiy-storchaka closed this as completed Jun 28, 2016

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More opcode predictions #71442

More opcode predictions #71442

serhiy-storchaka commented Jun 7, 2016

serhiy-storchaka commented Jun 7, 2016

rhettinger commented Jun 8, 2016

serhiy-storchaka commented Jun 26, 2016

rhettinger commented Jun 27, 2016

python-dev mannequin commented Jun 27, 2016

serhiy-storchaka commented Jun 27, 2016

rhettinger commented Jun 27, 2016

serhiy-storchaka commented Jun 28, 2016

More opcode predictions #71442

More opcode predictions #71442

Comments

serhiy-storchaka commented Jun 7, 2016

serhiy-storchaka commented Jun 7, 2016

rhettinger commented Jun 8, 2016

serhiy-storchaka commented Jun 26, 2016

rhettinger commented Jun 27, 2016

python-dev mannequin commented Jun 27, 2016

serhiy-storchaka commented Jun 27, 2016

rhettinger commented Jun 27, 2016

serhiy-storchaka commented Jun 28, 2016