Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specialize STORE_SUBSCR #105

Closed
sweeneyde opened this issue Oct 23, 2021 · 7 comments
Closed

Specialize STORE_SUBSCR #105

sweeneyde opened this issue Oct 23, 2021 · 7 comments

Comments

@sweeneyde
Copy link

I gathered a bunch of data from the test suite and the STORE_SUBSCR-heavy pyperformance benchmarks:

https://gist.github.com/sweeneyde/f7c99f3620312f6f90cd0ff63a0d050e

From this, it looks like we should specialize list[small_int] and dict[object], with some potential opportunities for bytearray[small_int] or list[negative_int] or list[slice] or array[small_int] as well.

@sweeneyde
Copy link
Author

Summary of data:

  • list[small_int] is a clear win across the board.
  • list[slice] helps fannkuch, nqueens and a little bit of regex_compile.
  • list[negative_int] helps nqueens and regex_compile
  • dict[object] seems underrepresented in these pyperformance, but
    • dict[str] would help bm_django_template, and I think there's a small constant dict[str] overhead in the pyperformance machinery.
  • bytearray[small_int] helps regex_compile, regex_dna, regex_v8, a little bit of django_template, and a small constant overhead in pyperformance.
  • array.array[small_int] helps just scimark.

Unless someone has a different suggestion, I'll open a PR that does list[small_int], dict[object], and bytearray[int].

@sweeneyde
Copy link
Author

In django_template, 3875/3894 = 99.5% of dict[str]=... calls are an already-hashed string being inserted into a DICT_KEYS_UNICODE dictionary, so that could be good to specialize on.

@gvanrossum
Copy link
Collaborator

Good start!

My intuition was that dict[str] and list[small_int] would cover the overwhelming uses of this opcode. Your data seems to confirm that. IIRC Mark has typically produced a little table showing what fraction each potential specialization represents, and we'd focus on whatever gets us to around 90% specialized.

It's also somewhat important to verify that per location the specialization stays valid -- e.g. a location that alternates between dict[str] and list[small_int] would be disastrous. Fortunately that feels unlikely (it would have to be rather weird code). However, I could imagine dict[str] and dict[some_other_object] occurring for the same location, in code that's generic over the key type, and that would be a likely worst-case scenario. Hopefully that's rare enough, it's unlikely to be represented in the benchmarks. (And the cost wouldn't be too terrible -- just frequent deoptimizations. IIRC there's also a counter that tracks this and eventually just stops trying to specialize that particular opcode. But I'm not sure.)

@sweeneyde
Copy link
Author

sweeneyde commented Oct 23, 2021

Weighted Average Average stdlib tests crypto_pyaes django_template fannkuch meteor_contest nbody nqueens regex_compile regex_dna regex_v8 scimark
list[small_int] 39% 43% 53% 99% 11% 34% 100% 100% 67% 7% 0% 1% 0%
bytearray[int] 8% 26% 1% 0% 1% 0% 0% 0% 0% 91% 99% 98% 0%
array[small_int] 27% 9% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 100%
list[slice] 20% 8% 1% 0% 0% 66% 0% 0% 21% 0% 0% 0% 0%
dict[str] 1% 6% 3% 1% 65% 0% 0% 0% 0% 0% 0% 1% 0%
dict[int] 3% 2% 26% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0%
list[negative_int] 1% 1% 0% 0% 0% 0% 0% 0% 12% 1% 0% 0% 0%
dict[other] 1% 1% 10% 0% 1% 0% 0% 0% 0% 1% 0% 0% 0%
OTHER 0% 1% 0% 0% 11% 0% 0% 0% 0% 0% 0% 0% 0%
dict.__setitem__(subclass, ...) 0% 1% 0% 0% 11% 0% 0% 0% 0% 0% 0% 0% 0%
Counter[object] 0% 0% 5% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0%
SubPattern[object] 0% 0% 0% 0% 0% 0% 0% 0% 0% 1% 0% 0% 0%
array[slice] 0% 0% 1% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0%
other_buffer[object] 0% 0% 1% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0%
deque[object] 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0%
other_dict_subclass[object] 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0%
bytearray[slice] 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0%
list[other] 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0%
array[other] 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0%
bytearray[other] 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0%

Edit: included data for some stdlib tests

@gvanrossum
Copy link
Collaborator

I few array.array is less common in real code. Real scientific code would use numpy.

@gvanrossum
Copy link
Collaborator

I’m also skeptical about bytearray. Tha comes from having three benchmarks for regex compilation.

@sweeneyde
Copy link
Author

My first attempt here found that bytearray specialization didn't even help all that much.

I opened a new PR that has STORE_SUBSCR_LIST_INT, STORE_SUBSCR_DICT_GENERAL, and STORE_SUBSCR_DICT_UNICODE, with some pretty good microbenchmarks, and bm_nbody 1.2x faster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants