Skip to content

[mypyc] Speed up native-to-native calls using await #19398

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jul 8, 2025

Conversation

JukkaL
Copy link
Collaborator

@JukkaL JukkaL commented Jul 8, 2025

When calling a native async function using await, e.g. await foo(), avoid raising StopIteration to pass the return value, since this is expensive. Instead, pass an extra PyObject ** argument to the generator helper method and use that to return the return value. This is mostly helpful when there are many calls using await that don't block (e.g. there is a fast path that is usually taken that doesn't block). When awaiting from non-compiled code, the slow path is still taken.

This builds on top of #19376.

This PR makes this microbenchmark about 3x faster, which is about the ideal scenario for this optimization:

import asyncio
from time import time

async def inc(x: int) -> int:
    return x + 1


async def bench(n: int) -> int:
    x = 0
    for i in range(n):
        x = await inc(x)
    return x

asyncio.run(bench(1000))

t0 = time()
asyncio.run(bench(1000 * 1000 * 200))
print(time() - t0)

@JukkaL JukkaL merged commit 4a427e9 into master Jul 8, 2025
13 checks passed
@JukkaL JukkaL deleted the mypyc-await-optimize-2 branch July 8, 2025 16:46
JukkaL added a commit that referenced this pull request Jul 9, 2025
Call the generator helper method directly instead of calling
`PyIter_Next` when calling a native generator from a native function.
This way we can avoid raising StopIteration when the generator is
exhausted. The approach is similar to what I used to speed up calls
using await in #19398. Refer to that PR for a more detailed explanation.

This helps mostly when a generator produces a small number of values,
which is quite common.

This PR improves the performance of this microbenchmark, which is a
close to the ideal use case, by about 2.6x (now 5.7x faster than
interpreted):
```
from typing import Iterator

def foo(x: int) -> Iterator[int]:
    for a in range(x):
        yield a

def bench(n: int) -> None:
    for i in range(n):
        for a in foo(1):
            pass

from time import time
bench(1000 * 1000)
t0 = time()
bench(50 * 1000 * 1000)
print(time() - t0)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants