Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-98831: Remove all remaining DISPATCH() calls from bytecodes.c #99271

Merged
merged 8 commits into from Nov 10, 2022

Conversation

gvanrossum
Copy link
Member

@gvanrossum gvanrossum commented Nov 9, 2022

Also mark those opcodes that have no stack effect as such (inst(FOO) -> inst(FOO, (--))).

Also mark those opcodes that have no stack effect as such.
@gvanrossum
Copy link
Member Author

@brandtbucher When you have the benchmark results can you just post them here? So Mark won't have to ask for them.

@brandtbucher
Copy link
Member

Here you go:

All benchmarks:
===============

Slower (21):
- nbody: 93.3 ms +- 1.6 ms -> 96.8 ms +- 1.9 ms: 1.04x slower
- unpickle_list: 4.91 us +- 0.06 us -> 5.04 us +- 0.20 us: 1.03x slower
- scimark_fft: 309 ms +- 4 ms -> 315 ms +- 8 ms: 1.02x slower
- sqlglot_normalize: 106 ms +- 1 ms -> 108 ms +- 1 ms: 1.02x slower
- pickle: 10.1 us +- 0.1 us -> 10.2 us +- 0.1 us: 1.02x slower
- chameleon: 6.50 ms +- 0.09 ms -> 6.60 ms +- 0.08 ms: 1.02x slower
- telco: 6.31 ms +- 0.12 ms -> 6.39 ms +- 0.16 ms: 1.01x slower
- fannkuch: 366 ms +- 3 ms -> 371 ms +- 3 ms: 1.01x slower
- crypto_pyaes: 74.5 ms +- 1.6 ms -> 75.4 ms +- 0.5 ms: 1.01x slower
- sqlglot_optimize: 51.5 ms +- 0.4 ms -> 52.0 ms +- 0.2 ms: 1.01x slower
- pycparser: 1.08 sec +- 0.02 sec -> 1.09 sec +- 0.02 sec: 1.01x slower
- json_dumps: 9.33 ms +- 0.13 ms -> 9.39 ms +- 0.14 ms: 1.01x slower
- regex_v8: 21.2 ms +- 0.2 ms -> 21.3 ms +- 0.0 ms: 1.01x slower
- scimark_sor: 104 ms +- 1 ms -> 104 ms +- 1 ms: 1.01x slower
- nqueens: 83.2 ms +- 1.0 ms -> 83.7 ms +- 1.3 ms: 1.01x slower
- sqlglot_transpile: 1.65 ms +- 0.02 ms -> 1.66 ms +- 0.02 ms: 1.01x slower
- python_startup: 8.63 ms +- 0.01 ms -> 8.67 ms +- 0.01 ms: 1.00x slower
- python_startup_no_site: 6.31 ms +- 0.01 ms -> 6.33 ms +- 0.01 ms: 1.00x slower
- coroutines: 25.4 ms +- 0.1 ms -> 25.5 ms +- 0.3 ms: 1.00x slower
- mako: 9.68 ms +- 0.05 ms -> 9.71 ms +- 0.05 ms: 1.00x slower
- sympy_integrate: 20.5 ms +- 0.1 ms -> 20.6 ms +- 0.1 ms: 1.00x slower

Faster (34):
- regex_dna: 208 ms +- 1 ms -> 199 ms +- 1 ms: 1.05x faster
- logging_simple: 6.03 us +- 0.08 us -> 5.80 us +- 0.08 us: 1.04x faster
- pickle_list: 4.19 us +- 0.03 us -> 4.04 us +- 0.04 us: 1.04x faster
- pidigits: 199 ms +- 0 ms -> 193 ms +- 0 ms: 1.03x faster
- django_template: 33.5 ms +- 1.0 ms -> 32.6 ms +- 0.4 ms: 1.03x faster
- logging_format: 6.58 us +- 0.09 us -> 6.40 us +- 0.09 us: 1.03x faster
- sqlite_synth: 2.65 us +- 0.04 us -> 2.59 us +- 0.04 us: 1.02x faster
- generators: 73.6 ms +- 0.4 ms -> 72.1 ms +- 0.3 ms: 1.02x faster
- regex_compile: 130 ms +- 2 ms -> 128 ms +- 1 ms: 1.02x faster
- richards: 43.5 ms +- 1.1 ms -> 42.8 ms +- 0.6 ms: 1.02x faster
- mdp: 2.56 sec +- 0.01 sec -> 2.52 sec +- 0.04 sec: 1.02x faster
- chaos: 68.0 ms +- 0.6 ms -> 66.9 ms +- 0.7 ms: 1.02x faster
- json: 4.64 ms +- 0.15 ms -> 4.57 ms +- 0.11 ms: 1.02x faster
- pickle_pure_python: 287 us +- 4 us -> 283 us +- 5 us: 1.01x faster
- deepcopy_reduce: 3.01 us +- 0.04 us -> 2.97 us +- 0.04 us: 1.01x faster
- pyflate: 405 ms +- 8 ms -> 400 ms +- 4 ms: 1.01x faster
- deepcopy: 334 us +- 5 us -> 329 us +- 3 us: 1.01x faster
- xml_etree_generate: 77.0 ms +- 0.8 ms -> 76.1 ms +- 0.8 ms: 1.01x faster
- pprint_pformat: 1.42 sec +- 0.02 sec -> 1.41 sec +- 0.02 sec: 1.01x faster
- xml_etree_process: 53.0 ms +- 0.6 ms -> 52.4 ms +- 0.6 ms: 1.01x faster
- pprint_safe_repr: 692 ms +- 7 ms -> 684 ms +- 8 ms: 1.01x faster
- sympy_expand: 460 ms +- 7 ms -> 456 ms +- 6 ms: 1.01x faster
- coverage: 96.5 ms +- 1.2 ms -> 95.6 ms +- 1.8 ms: 1.01x faster
- genshi_xml: 47.4 ms +- 0.8 ms -> 47.1 ms +- 0.5 ms: 1.01x faster
- deepcopy_memo: 34.4 us +- 0.6 us -> 34.2 us +- 0.3 us: 1.01x faster
- raytrace: 287 ms +- 3 ms -> 284 ms +- 7 ms: 1.01x faster
- deltablue: 3.34 ms +- 0.05 ms -> 3.32 ms +- 0.05 ms: 1.01x faster
- genshi_text: 20.7 ms +- 0.3 ms -> 20.6 ms +- 0.3 ms: 1.01x faster
- sympy_str: 285 ms +- 4 ms -> 283 ms +- 3 ms: 1.01x faster
- go: 138 ms +- 1 ms -> 137 ms +- 1 ms: 1.01x faster
- unpickle_pure_python: 205 us +- 2 us -> 204 us +- 2 us: 1.01x faster
- xml_etree_parse: 148 ms +- 2 ms -> 147 ms +- 1 ms: 1.01x faster
- json_loads: 24.0 us +- 0.1 us -> 23.9 us +- 0.2 us: 1.00x faster
- pickle_dict: 30.9 us +- 0.1 us -> 30.8 us +- 0.1 us: 1.00x faster

Benchmark hidden because not significant (27): 2to3, aiohttp, async_tree_none, async_tree_cpu_io_mixed, async_tree_io, async_tree_memoization, dulwich_log, float, gunicorn, hexiom, html5lib, logging_silent, meteor_contest, mypy, pathlib, regex_effbot, scimark_lu, scimark_monte_carlo, scimark_sparse_mat_mult, spectral_norm, sqlglot_parse, sympy_sum, thrift, tornado_http, unpack_sequence, unpickle, xml_etree_iterparse

Geometric mean: 1.00x faster

/* Use BASIC_PUSH as NULL is not a valid object pointer */
BASIC_PUSH(NULL);
inst(PUSH_NULL, (-- res)) {
res = NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that this ought to fail, as we should have NULL checks in debug mode.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but we never had those. In debug mode we check for stack overflow/underflow but not for NULL-ness. (Of course the NULL will wreak havoc if it gets popped by the wrong opcode, bu that's also nothing new.) The code generator still checks for stack over/underflow by calling STACK_SHRINK()/GROW(), so nothing is really changed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we have typed stack effects, we can auto-generate those sorts of assertions. That would be really nice.

Python/bytecodes.c Outdated Show resolved Hide resolved
Python/bytecodes.c Outdated Show resolved Hide resolved
Python/bytecodes.c Outdated Show resolved Hide resolved
Python/bytecodes.c Outdated Show resolved Hide resolved
@@ -2642,14 +2631,15 @@ dummy_func(
PyObject *next = PyList_GET_ITEM(seq, it->it_index++);
PUSH(Py_NewRef(next));
JUMPBY(INLINE_CACHE_ENTRIES_FOR_ITER);
DISPATCH();
goto end_for_iter_list; // End of this instruction
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:(

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm really counting on the compiler to optimize this for us.

@gvanrossum gvanrossum merged commit 694cdb2 into python:main Nov 10, 2022
@gvanrossum gvanrossum deleted the rm-dispatch branch November 10, 2022 20:43
gvanrossum added a commit to gvanrossum/cpython that referenced this pull request Nov 10, 2022
python#99271)

Also mark those opcodes that have no stack effect as such.

Co-authored-by: Brandt Bucher <brandtbucher@gmail.com>
ethanfurman pushed a commit to ethanfurman/cpython that referenced this pull request Nov 12, 2022
python#99271)

Also mark those opcodes that have no stack effect as such.

Co-authored-by: Brandt Bucher <brandtbucher@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants