Skip to content

Conversation

sergey-miryanov
Copy link
Contributor

@sergey-miryanov sergey-miryanov commented Sep 28, 2025

When we use PyTuple_Pack all objects already well constructed. If we know that they immutable we can skip tracking it in GC, because GC will untrack them eventually.

I have a PR ready and benchmark results:

Geometric mean: 1.01x faster (Win11 x64, 11th Gen Intel(R) Core(TM) i5-11600K @ 3.90GHz, 48d0d0d)

All benchmarks:

+--------------------------+----------+------------------------+
| Benchmark                | main     | tuples                 |
+==========================+==========+========================+
| async_generators         | 435 ms   | 430 ms: 1.01x faster   |
+--------------------------+----------+------------------------+
| asyncio_tcp              | 750 ms   | 756 ms: 1.01x slower   |
+--------------------------+----------+------------------------+
| asyncio_tcp_ssl          | 1.91 sec | 1.92 sec: 1.01x slower |
+--------------------------+----------+------------------------+
| comprehensions           | 22.1 us  | 21.8 us: 1.01x faster  |
+--------------------------+----------+------------------------+
| bench_mp_pool            | 104 ms   | 103 ms: 1.01x faster   |
+--------------------------+----------+------------------------+
| bench_thread_pool        | 1.29 ms  | 1.27 ms: 1.01x faster  |
+--------------------------+----------+------------------------+
| coroutines               | 28.2 ms  | 27.7 ms: 1.02x faster  |
+--------------------------+----------+------------------------+
| coverage                 | 88.5 ms  | 86.3 ms: 1.02x faster  |
+--------------------------+----------+------------------------+
| crypto_pyaes             | 90.1 ms  | 86.7 ms: 1.04x faster  |
+--------------------------+----------+------------------------+
| deepcopy                 | 310 us   | 307 us: 1.01x faster   |
+--------------------------+----------+------------------------+
| deepcopy_memo            | 36.4 us  | 36.1 us: 1.01x faster  |
+--------------------------+----------+------------------------+
| deltablue                | 5.19 ms  | 4.85 ms: 1.07x faster  |
+--------------------------+----------+------------------------+
| django_template          | 45.5 ms  | 45.8 ms: 1.01x slower  |
+--------------------------+----------+------------------------+
| docutils                 | 2.47 sec | 2.45 sec: 1.01x faster |
+--------------------------+----------+------------------------+
| dulwich_log              | 86.2 ms  | 86.9 ms: 1.01x slower  |
+--------------------------+----------+------------------------+
| fannkuch                 | 449 ms   | 441 ms: 1.02x faster   |
+--------------------------+----------+------------------------+
| float                    | 85.3 ms  | 82.5 ms: 1.03x faster  |
+--------------------------+----------+------------------------+
| create_gc_cycles         | 1.17 ms  | 1.17 ms: 1.01x faster  |
+--------------------------+----------+------------------------+
| gc_traversal             | 2.97 ms  | 2.88 ms: 1.03x faster  |
+--------------------------+----------+------------------------+
| generators               | 43.0 ms  | 41.6 ms: 1.03x faster  |
+--------------------------+----------+------------------------+
| genshi_text              | 28.9 ms  | 28.7 ms: 1.01x faster  |
+--------------------------+----------+------------------------+
| go                       | 160 ms   | 153 ms: 1.04x faster   |
+--------------------------+----------+------------------------+
| hexiom                   | 8.39 ms  | 8.13 ms: 1.03x faster  |
+--------------------------+----------+------------------------+
| json_dumps               | 8.62 ms  | 8.69 ms: 1.01x slower  |
+--------------------------+----------+------------------------+
| logging_format           | 12.5 us  | 12.2 us: 1.02x faster  |
+--------------------------+----------+------------------------+
| logging_silent           | 139 ns   | 140 ns: 1.01x slower   |
+--------------------------+----------+------------------------+
| logging_simple           | 11.3 us  | 11.1 us: 1.01x faster  |
+--------------------------+----------+------------------------+
| mako                     | 14.2 ms  | 14.4 ms: 1.01x slower  |
+--------------------------+----------+------------------------+
| mdp                      | 1.47 sec | 1.50 sec: 1.02x slower |
+--------------------------+----------+------------------------+
| meteor_contest           | 104 ms   | 102 ms: 1.02x faster   |
+--------------------------+----------+------------------------+
| nbody                    | 114 ms   | 113 ms: 1.01x faster   |
+--------------------------+----------+------------------------+
| pickle_pure_python       | 439 us   | 436 us: 1.01x faster   |
+--------------------------+----------+------------------------+
| pprint_safe_repr         | 953 ms   | 916 ms: 1.04x faster   |
+--------------------------+----------+------------------------+
| pprint_pformat           | 1.95 sec | 1.88 sec: 1.04x faster |
+--------------------------+----------+------------------------+
| pyflate                  | 506 ms   | 492 ms: 1.03x faster   |
+--------------------------+----------+------------------------+
| python_startup           | 28.5 ms  | 27.4 ms: 1.04x faster  |
+--------------------------+----------+------------------------+
| python_startup_no_site   | 23.2 ms  | 22.2 ms: 1.05x faster  |
+--------------------------+----------+------------------------+
| raytrace                 | 361 ms   | 345 ms: 1.05x faster   |
+--------------------------+----------+------------------------+
| regex_compile            | 146 ms   | 146 ms: 1.01x faster   |
+--------------------------+----------+------------------------+
| regex_effbot             | 2.03 ms  | 2.02 ms: 1.01x faster  |
+--------------------------+----------+------------------------+
| regex_v8                 | 23.9 ms  | 22.7 ms: 1.06x faster  |
+--------------------------+----------+------------------------+
| richards                 | 66.1 ms  | 59.9 ms: 1.10x faster  |
+--------------------------+----------+------------------------+
| richards_super           | 71.6 ms  | 68.7 ms: 1.04x faster  |
+--------------------------+----------+------------------------+
| scimark_fft              | 300 ms   | 294 ms: 1.02x faster   |
+--------------------------+----------+------------------------+
| scimark_lu               | 135 ms   | 131 ms: 1.03x faster   |
+--------------------------+----------+------------------------+
| scimark_monte_carlo      | 83.3 ms  | 82.4 ms: 1.01x faster  |
+--------------------------+----------+------------------------+
| scimark_sor              | 157 ms   | 150 ms: 1.05x faster   |
+--------------------------+----------+------------------------+
| scimark_sparse_mat_mult  | 4.27 ms  | 4.35 ms: 1.02x slower  |
+--------------------------+----------+------------------------+
| spectral_norm            | 122 ms   | 118 ms: 1.03x faster   |
+--------------------------+----------+------------------------+
| sqlglot_optimize         | 60.7 ms  | 60.9 ms: 1.00x slower  |
+--------------------------+----------+------------------------+
| sympy_expand             | 501 ms   | 503 ms: 1.00x slower   |
+--------------------------+----------+------------------------+
| sympy_sum                | 143 ms   | 144 ms: 1.01x slower   |
+--------------------------+----------+------------------------+
| sympy_str                | 287 ms   | 292 ms: 1.02x slower   |
+--------------------------+----------+------------------------+
| telco                    | 7.26 ms  | 7.33 ms: 1.01x slower  |
+--------------------------+----------+------------------------+
| tomli_loads              | 2.23 sec | 2.25 sec: 1.01x slower |
+--------------------------+----------+------------------------+
| typing_runtime_protocols | 189 us   | 185 us: 1.02x faster   |
+--------------------------+----------+------------------------+
| unpack_sequence          | 65.4 ns  | 68.7 ns: 1.05x slower  |
+--------------------------+----------+------------------------+
| unpickle                 | 13.9 us  | 14.1 us: 1.01x slower  |
+--------------------------+----------+------------------------+
| unpickle_pure_python     | 303 us   | 300 us: 1.01x faster   |
+--------------------------+----------+------------------------+
| xml_etree_parse          | 130 ms   | 130 ms: 1.01x slower   |
+--------------------------+----------+------------------------+
| xml_etree_iterparse      | 107 ms   | 108 ms: 1.01x slower   |
+--------------------------+----------+------------------------+
| xml_etree_process        | 79.2 ms  | 78.6 ms: 1.01x faster  |
+--------------------------+----------+------------------------+
| Geometric mean           | (ref)    | 1.01x faster           |
+--------------------------+----------+------------------------+

Benchmark hidden because not significant (20): 2to3, chaos, deepcopy_reduce, genshi_xml, html5lib, json_loads, nqueens, pathlib, pickle, pickle_dict, pickle_list, pidigits, regex_dna, sqlglot_normalize, sqlglot_parse, sqlglot_transpile, sqlite_synth, sympy_integrate, unpickle_list, xml_etree_generate

It doesn't hurt performance, but can decrease number of objects in GC to check and untrack.

@sergey-miryanov
Copy link
Contributor Author

I'm not sure that this needs a NEWS entry because it is an implementation detail. But I'm here on triage/core decision.

@sergey-miryanov
Copy link
Contributor Author

Sorry, misclick.

@ZeroIntensity
Copy link
Member

PyTuple_Pack is user-facing, so let's add a blurb.

@eendebakpt
Copy link
Contributor

Which cases would benefit from this change to PyTuple_Pack? We could apply a similar optimization to some other methods to construct a tuple (e.g. _PyTuple_FromArray, or inside dictiter_iternextitem), and it is not clear to me directly which ones would benefit (no gc tracking), and which ones would not (expensive to check all the arguments)

PyTuple_Pack itself is slow (due to varargs), so there should not be a lot of cases where this is used often.

Note: I used this code to check which cases are impacted

import gc
import random

a = (1, 2, 3, 4)
b = tuple([1,2,3])
c = tuple(list([1,2,3])) 
d = (2,3) * (random.randint(2, 4)+2)
          
print(f'{gc.is_tracked(a)=}') # still tracked
print(f'{gc.is_tracked(b)=}')
print(f'{gc.is_tracked(c)=}')
print(f'{gc.is_tracked(d)=}')

d={10: 1, 11: 2}

one_dict_item = next(iter(d.items()))
print(f'{gc.is_tracked(one_dict_item)=}') # still tracked!

l=[]
l.extend(d.items())
print(f'{gc.is_tracked(l[0])=}') # here it works! l[0] is not tracked

for tp in d.items():
    print(f'{gc.is_tracked(tp)=}')  # tracked, even though this might be a common use case

@sergey-miryanov
Copy link
Contributor Author

@eendebakpt Yeah, I did the same for:

  • _PyTuple_FromArray
  • _PyTuple_FromStackRefStealOnSuccess
  • _PyTuple_FromArraySteal
  • tuple_concat
  • tuple_repeat
  • tuple_subscript

Collecting stats and microbenchmarking now.

@sergey-miryanov
Copy link
Contributor Author

Tests fail because instrumentation a bit straightforward and work on my machine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants