Specialize STORE_SUBSCR #105

sweeneyde · 2021-10-23T18:04:04Z

I gathered a bunch of data from the test suite and the STORE_SUBSCR-heavy pyperformance benchmarks:

https://gist.github.com/sweeneyde/f7c99f3620312f6f90cd0ff63a0d050e

From this, it looks like we should specialize list[small_int] and dict[object], with some potential opportunities for bytearray[small_int] or list[negative_int] or list[slice] or array[small_int] as well.

The text was updated successfully, but these errors were encountered:

sweeneyde · 2021-10-23T18:24:50Z

Summary of data:

list[small_int] is a clear win across the board.
list[slice] helps fannkuch, nqueens and a little bit of regex_compile.
list[negative_int] helps nqueens and regex_compile
dict[object] seems underrepresented in these pyperformance, but
- dict[str] would help bm_django_template, and I think there's a small constant dict[str] overhead in the pyperformance machinery.
bytearray[small_int] helps regex_compile, regex_dna, regex_v8, a little bit of django_template, and a small constant overhead in pyperformance.
array.array[small_int] helps just scimark.

Unless someone has a different suggestion, I'll open a PR that does list[small_int], dict[object], and bytearray[int].

sweeneyde · 2021-10-23T19:13:53Z

In django_template, 3875/3894 = 99.5% of dict[str]=... calls are an already-hashed string being inserted into a DICT_KEYS_UNICODE dictionary, so that could be good to specialize on.

gvanrossum · 2021-10-23T19:33:17Z

Good start!

My intuition was that dict[str] and list[small_int] would cover the overwhelming uses of this opcode. Your data seems to confirm that. IIRC Mark has typically produced a little table showing what fraction each potential specialization represents, and we'd focus on whatever gets us to around 90% specialized.

It's also somewhat important to verify that per location the specialization stays valid -- e.g. a location that alternates between dict[str] and list[small_int] would be disastrous. Fortunately that feels unlikely (it would have to be rather weird code). However, I could imagine dict[str] and dict[some_other_object] occurring for the same location, in code that's generic over the key type, and that would be a likely worst-case scenario. Hopefully that's rare enough, it's unlikely to be represented in the benchmarks. (And the cost wouldn't be too terrible -- just frequent deoptimizations. IIRC there's also a counter that tracks this and eventually just stops trying to specialize that particular opcode. But I'm not sure.)

sweeneyde · 2021-10-23T20:31:15Z

	Weighted Average	Average	stdlib tests	crypto_pyaes	django_template	fannkuch	meteor_contest	nbody	nqueens	regex_compile	regex_dna	regex_v8	scimark
list[small_int]	39%	43%	53%	99%	11%	34%	100%	100%	67%	7%	0%	1%	0%
bytearray[int]	8%	26%	1%	0%	1%	0%	0%	0%	0%	91%	99%	98%	0%
array[small_int]	27%	9%	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%	100%
list[slice]	20%	8%	1%	0%	0%	66%	0%	0%	21%	0%	0%	0%	0%
dict[str]	1%	6%	3%	1%	65%	0%	0%	0%	0%	0%	0%	1%	0%
dict[int]	3%	2%	26%	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%
list[negative_int]	1%	1%	0%	0%	0%	0%	0%	0%	12%	1%	0%	0%	0%
dict[other]	1%	1%	10%	0%	1%	0%	0%	0%	0%	1%	0%	0%	0%
OTHER	0%	1%	0%	0%	11%	0%	0%	0%	0%	0%	0%	0%	0%
dict.__setitem__(subclass, ...)	0%	1%	0%	0%	11%	0%	0%	0%	0%	0%	0%	0%	0%
Counter[object]	0%	0%	5%	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%
SubPattern[object]	0%	0%	0%	0%	0%	0%	0%	0%	0%	1%	0%	0%	0%
array[slice]	0%	0%	1%	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%
other_buffer[object]	0%	0%	1%	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%
deque[object]	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%
other_dict_subclass[object]	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%
bytearray[slice]	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%
list[other]	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%
array[other]	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%
bytearray[other]	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%

Edit: included data for some stdlib tests

gvanrossum · 2021-10-23T20:47:26Z

I few array.array is less common in real code. Real scientific code would use numpy.

gvanrossum · 2021-10-23T20:49:07Z

I’m also skeptical about bytearray. Tha comes from having three benchmarks for regex compilation.

sweeneyde · 2021-10-28T00:13:33Z

My first attempt here found that bytearray specialization didn't even help all that much.

I opened a new PR that has STORE_SUBSCR_LIST_INT, STORE_SUBSCR_DICT_GENERAL, and STORE_SUBSCR_DICT_UNICODE, with some pretty good microbenchmarks, and bm_nbody 1.2x faster.

sweeneyde mentioned this issue Oct 26, 2021

bpo-45609: Specialize STORE_SUBSCR python/cpython#29221

Closed

sweeneyde mentioned this issue Oct 29, 2021

bpo-45609: Specialize STORE_SUBSCR (alternate) python/cpython#29242

Merged

sweeneyde closed this as completed Nov 20, 2021

sweeneyde mentioned this issue Dec 18, 2021

Specializations #74

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specialize STORE_SUBSCR #105

Specialize STORE_SUBSCR #105

sweeneyde commented Oct 23, 2021

sweeneyde commented Oct 23, 2021

sweeneyde commented Oct 23, 2021

gvanrossum commented Oct 23, 2021

sweeneyde commented Oct 23, 2021 •

edited

gvanrossum commented Oct 23, 2021

gvanrossum commented Oct 23, 2021

sweeneyde commented Oct 28, 2021

Specialize STORE_SUBSCR #105

Specialize STORE_SUBSCR #105

Comments

sweeneyde commented Oct 23, 2021

sweeneyde commented Oct 23, 2021

sweeneyde commented Oct 23, 2021

gvanrossum commented Oct 23, 2021

sweeneyde commented Oct 23, 2021 • edited

gvanrossum commented Oct 23, 2021

gvanrossum commented Oct 23, 2021

sweeneyde commented Oct 28, 2021

sweeneyde commented Oct 23, 2021 •

edited