Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-26301: Add a fast-path for list[index] in BINARY_SUBSCR opcode #9853

Closed
wants to merge 1 commit into from

Conversation

pablogsal
Copy link
Member

@pablogsal pablogsal commented Oct 13, 2018

Based on a path by Zach Byrne and @vstinner.

./python -m perf timeit "l = [1,2,3,4,5,6]" "l[3]" -o old_index.json
./python -m perf timeit "l = [1,2,3,4,5,6]" "l[3]" -o new_index.json
./python -m perf compare_to old_index.json ../cpython/new_index.json
Mean +- std dev: [old_index] 209 ns +- 8 ns -> [new_index] 194 ns +- 4 ns: 1.08x faster (-7%)

Cache information

BASELINE

 Performance counter stats for './python -c
l = [1,2,3,4,5,6]
for _ in range(1000):
    l[3]' (200 runs):

           665,682      cache-references:u                                            ( +-  0.08% )
             4,030      cache-misses:u            #    0.605 % of all cache refs      ( +-  7.73% )
        60,529,180      cycles:u                                                      ( +-  0.12% )
        87,335,803      instructions:u            #    1.44  insn per cycle           ( +-  0.01% )
        22,946,153      branches:u                                                    ( +-  0.01% )
             1,018      faults:u                                                      ( +-  0.01% )
                 0      migrations:u

          0.024963 +- 0.000185 seconds time elapsed  ( +-  0.74% )

PATCH

 Performance counter stats for './python -c l = [1,2,3,4,5,6]
for _ in range(1000):
    l[3]' (200 runs):

           670,231      cache-references:u                                            ( +-  0.09% )
             3,261      cache-misses:u            #    0.487 % of all cache refs      ( +-  7.23% )
        60,717,884      cycles:u                                                      ( +-  0.13% )
        86,998,772      instructions:u            #    1.43  insn per cycle           ( +-  0.01% )
        22,873,947      branches:u                                                    ( +-  0.01% )
             1,017      faults:u                                                      ( +-  0.01% )
                 0      migrations:u

          0.025440 +- 0.000182 seconds time elapsed  ( +-  0.72% )

https://bugs.python.org/issue26301

@vstinner
Copy link
Member

1.08x faster (-7%)

Since it's a micro benchmark, such result is not impressive and I'm not sure that it's worth it. I doubt that we see anything if macro benchmark level: http://pyperformance.readthedocs.io/

cc @serhiy-storchaka @pitrou @methane

@pitrou
Copy link
Member

pitrou commented Oct 15, 2018

Indeed, this is not attractive at all. You also forgot to check for errors after PyLong_AsSsize_t.

@pablogsal
Copy link
Member Author

@vstinner @pitrou Thanks for the review, I will, therefore, close this PR. Will still check if I see anything different in https://pyperformance.readthedocs.io/ just out of curiosity.

@pablogsal pablogsal closed this Oct 15, 2018
@pablogsal pablogsal deleted the bpo26301 branch October 15, 2018 21:19
@serhiy-storchaka
Copy link
Member

In addition to said by @vstinner and @pitrou, you would need to benchmark also indexing non-lists: strings, tuples, dicts. I afraid that optimizing one particular case will hit performance of all other cases. Also, adding 20 new lines in the ceval loop can affect performance of other unrelated opcodes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants