Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-91432: Add more FOR_ITER specializations #94096

Closed
wants to merge 5 commits into from

Conversation

sweeneyde
Copy link
Member

#91432

This does:

  • FOR_ITER(tuple)
  • FOR_ITER(dict_items) + UNPACK_SEQUENCE(2)
    • Eliminates handling the tuple, one dispatch
  • FOR_ITER(enumerate) + UNPACK_SEQUENCE(2) + STORE_FAST
    • Eliminates handling the tuple, two dispatches, allocating the PyLongObject

I'm not sure whether all of these are worth it, but I want to see how this moves stats and micro- and macro- benchmarks.

@sweeneyde
Copy link
Member Author

In microbenchmarks, it seems adding these extra opcodes bumped some things around and made FOR_ITER_RANGE and FOR_ITER_LIST a bit slower. Maybe the compiler decided not to inline _PyLong_AssignValue after this change, since this code uses it in two places.

But dict items, enumerate, and tuple did speed up, as expected.

microbenchmark script
from pyperf import Runner, perf_counter
from itertools import repeat

def for_range(loops, length):
    repetitions = repeat(None, loops)
    R = range(length)

    t0 = perf_counter()
    for _ in repetitions:
        for x in R:
            pass
    t1 = perf_counter()

    return t1 - t0

def for_list(loops, length):
    repetitions = repeat(None, loops)
    L = list(map(float, range(length)))

    t0 = perf_counter()
    for _ in repetitions:
        for x in L:
            pass
    t1 = perf_counter()

    return t1 - t0

def for_tuple(loops, length):
    repetitions = repeat(None, loops)
    T = tuple(map(float, range(length)))

    t0 = perf_counter()
    for _ in repetitions:
        for x in T:
            pass
    t1 = perf_counter()

    return t1 - t0

def for_dict(loops, length):
    repetitions = repeat(None, loops)
    D = dict.fromkeys(map(float, range(length)))

    t0 = perf_counter()
    for _ in repetitions:
        for x, y in D.items():
            pass
    t1 = perf_counter()

    return t1 - t0

def for_enumerate(loops, length):
    repetitions = repeat(None, loops)
    L = [None] * length

    t0 = perf_counter()
    for _ in repetitions:
        for i, x in enumerate(L):
            pass
    t1 = perf_counter()

    return t1 - t0

def for_map(loops, length):
    repetitions = repeat(None, loops)
    L = [()] * length

    t0 = perf_counter()
    for _ in repetitions:
        for x in map(len, L):
            pass
    t1 = perf_counter()

    return t1 - t0


def for_string(loops, length):
    repetitions = repeat(None, loops)
    S = "a" * length

    t0 = perf_counter()
    for _ in repetitions:
        for x in S:
            pass
    t1 = perf_counter()

    return t1 - t0

def for_set(loops, length):
    repetitions = repeat(None, loops)
    S = {f"a{i}" for i in range(length)}

    t0 = perf_counter()
    for _ in repetitions:
        for x in S:
            pass
    t1 = perf_counter()

    return t1 - t0


bench = Runner().bench_time_func
for n in [20, 200, 2_000, 20_000]:
    bench(f"for_range {n:_}", for_range, n, inner_loops=n)
    bench(f"for_list {n:_}", for_list, n, inner_loops=n)
    bench(f"for_tuple {n:_}", for_tuple, n, inner_loops=n)
    bench(f"for_dict {n:_}", for_dict, n, inner_loops=n)
    bench(f"for_enumerate {n:_}", for_enumerate, n, inner_loops=n)
    bench(f"for_map {n:_}", for_map, n, inner_loops=n)
    bench(f"for_string {n:_}", for_string, n, inner_loops=n)
    bench(f"for_set {n:_}", for_set, n, inner_loops=n)
Slower (7):
- for_range 2_000: 4.01 ns +- 0.02 ns -> 5.58 ns +- 0.14 ns: 1.39x slower
- for_range 200: 4.47 ns +- 0.06 ns -> 6.04 ns +- 0.43 ns: 1.35x slower
- for_range 20_000: 4.19 ns +- 0.03 ns -> 5.57 ns +- 0.35 ns: 1.33x slower
- for_range 20: 6.03 ns +- 0.32 ns -> 7.58 ns +- 0.27 ns: 1.26x slower
- for_list 20: 5.95 ns +- 0.44 ns -> 6.19 ns +- 0.85 ns: 1.04x slower
- for_list 20_000: 4.56 ns +- 0.17 ns -> 4.73 ns +- 0.38 ns: 1.04x slower
- for_list 200: 4.69 ns +- 0.26 ns -> 4.84 ns +- 0.36 ns: 1.03x slower

Faster (24):
- for_enumerate 20_000: 22.2 ns +- 0.4 ns -> 9.52 ns +- 0.39 ns: 2.34x faster
- for_enumerate 2_000: 21.2 ns +- 0.4 ns -> 9.58 ns +- 0.37 ns: 2.21x faster
- for_dict 20_000: 13.5 ns +- 0.1 ns -> 7.52 ns +- 0.05 ns: 1.80x faster
- for_dict 2_000: 13.4 ns +- 0.1 ns -> 7.52 ns +- 0.13 ns: 1.79x faster
- for_dict 200: 13.8 ns +- 0.2 ns -> 7.80 ns +- 0.13 ns: 1.77x faster
- for_dict 20: 17.5 ns +- 0.2 ns -> 11.6 ns +- 0.1 ns: 1.51x faster
- for_enumerate 200: 13.5 ns +- 0.3 ns -> 9.71 ns +- 0.41 ns: 1.39x faster
- for_tuple 20_000: 6.49 ns +- 0.05 ns -> 4.97 ns +- 0.04 ns: 1.31x faster
- for_enumerate 20: 17.5 ns +- 0.5 ns -> 13.4 ns +- 0.4 ns: 1.30x faster
- for_tuple 20: 8.20 ns +- 0.06 ns -> 6.31 ns +- 0.23 ns: 1.30x faster
- for_tuple 2_000: 6.49 ns +- 0.05 ns -> 5.01 ns +- 0.22 ns: 1.29x faster
- for_tuple 200: 6.65 ns +- 0.07 ns -> 5.16 ns +- 0.14 ns: 1.29x faster
- for_string 2_000: 6.75 ns +- 0.06 ns -> 5.85 ns +- 0.21 ns: 1.16x faster
- for_string 20_000: 6.74 ns +- 0.04 ns -> 5.87 ns +- 0.30 ns: 1.15x faster
- for_string 200: 6.90 ns +- 0.09 ns -> 6.16 ns +- 0.39 ns: 1.12x faster
- for_string 20: 8.59 ns +- 0.08 ns -> 7.70 ns +- 0.26 ns: 1.12x faster
- for_set 200: 8.50 ns +- 0.10 ns -> 7.71 ns +- 0.10 ns: 1.10x faster
- for_set 20: 12.4 ns +- 0.1 ns -> 11.4 ns +- 0.2 ns: 1.09x faster
- for_map 20_000: 17.8 ns +- 0.3 ns -> 16.4 ns +- 0.3 ns: 1.08x faster
- for_map 2_000: 17.9 ns +- 0.4 ns -> 16.5 ns +- 0.5 ns: 1.08x faster
- for_map 200: 18.5 ns +- 0.3 ns -> 17.2 ns +- 0.2 ns: 1.07x faster
- for_map 20: 22.0 ns +- 0.3 ns -> 20.9 ns +- 0.4 ns: 1.05x faster
- for_set 20_000: 22.4 ns +- 0.2 ns -> 21.4 ns +- 0.5 ns: 1.05x faster
- for_set 2_000: 12.8 ns +- 0.1 ns -> 12.3 ns +- 0.3 ns: 1.04x faster

Benchmark hidden because not significant (1): for_list 2_000

Geometric mean: 1.18x faster

@markshannon
Copy link
Member

markshannon commented Jun 22, 2022

Before adding any more specializations for builtin iterators, I'd like to try implementing faster-cpython/ideas#392 and add specialization for generators.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants