Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Detection of Scripts that Require Top-Level Await #715

Closed
wants to merge 12 commits into from

Conversation

JeffersGlass
Copy link
Member

@JeffersGlass JeffersGlass commented Aug 23, 2022

Issue

As noted in #714, testing whether a script needs to run with top-level-await enabled by looking for 'asyncio' in the source code misses some cases where top-level-await is necessary.

Solution

This PR changes the detection method. Instead of looking for the string 'asyncio' in the source, we use ast.Parse() to parse the source file, then walk the syntax tree to determine if any await, await for, or await with statements are used outside of async def functions. (This is precisely the situation that requires code to be compiled with the PyCF_ALLOW_TOP_LEVEL_AWAIT flag, which is what runPythonAsync() achieves.)

Testing

I've created a separate repository for the tests for the included code (TopLevelAsyncFinder and the is_async function). These tests demonstrate that is_async correctly differentiates between the three top-level-await statements inside and outside of async def statements. I'd be happy to include that code in this codebase somewhere, but I wasn't sure how it would fit into the testing regimen.

Timing

Of course, parsing and walking the AST is slowing than just looking for a string in the source, so I did some benchmarking. The gist is: it's slower, yes, but compared to the full pyodide/pyscript loading time, it's negligible. Running on the examples in this repo, the longest time to parse and walk the AST is roughly 6.5 milliseconds for the WebGL example, compared to 2.5 seconds from page-load to first-PyScript-execution in the same circumstances. (Testing for 'asyncio' in the source text takes <10 microseconds.)

I've compiled the test data for all the examples, for the curious.

Efficiency

The AST walking isn't as efficient as it could be - we always walk the entire tree instead of returning as soon as a top-level-await statement is found. This is a possible improvement for down the road.

@JeffersGlass
Copy link
Member Author

Given that this intersects with both #659 and #698, I won't be at all offended if this takes a long time to merge, or its work gets subsumed into some other initiative.

@fpliger
Copy link
Contributor

fpliger commented Aug 24, 2022

Good point @JeffersGlass :) @madhur-tandon , what do you think of merging this soon and you solve conflicts in your PR?

Before we merge though, I wanted to check with @hoodmane @rth since I recall a discussion we had a few weeks ago where I think one of you guys hinted at "you could just always use runPythonAsync (that has been on my mental backlog for some time now :) ).

Independently of that, thank you @JeffersGlass this is great work

@hoodmane
Copy link
Contributor

I think one of you guys hinted at "you could just always use runPythonAsync"

Yes. If you are ever checking something and picking either runPython and runPythonAsync it's better to always use runPythonAsync. The only disadvantage of runPythonAsync is that since it's async it can't be used in a synchronous way.

@madhur-tandon
Copy link
Member

Hi @JeffersGlass Thank you for the PR. Based on the standardising effort for different runtimes (and based on @hoodmane's comment above), we have decided to use runPythonAsync for the Pyodide runtime. As such, the need for determination of asyncio is removed since runPythonAsync is used everywhere underneath now.

But the determination of asyncio in this PR is an optimisation indeed, albeit not being used right now (maybe in future?) and I think we can still merge it, after adding some tests of course (and removing conflicts due to Runtime Refactor being merged yesterday).

@JeffersGlass
Copy link
Member Author

Thanks @madhur-tandon and all - makes sense to me, seems always using runPythonAsync provides what this PR was trying to achieve in terms of always correctly executing scripts with top level await, which is swell. I've been trying to put together a blog post on concurrent programming in PyScript (as it currently exists), and, selfishly, not having to explain why # asyncio makes your code work will be wonderful. 👍

I'll resolve the conflicts with the new runtime refactoring and add tests to this PR.

@hoodmane
Copy link
Contributor

Even just checking for await<space> would be a lot better than the current situation...

@JeffersGlass
Copy link
Member Author

JeffersGlass commented Aug 25, 2022

@hoodmane Definitely, though it would miss ugly cases like await(foo()).

@hoodmane
Copy link
Contributor

await(foo())

I guess that's valid syntax though formatters will convert it to await (foo()). I once worked briefly with someone who wrote all of their return statements like return(x) which really triggered me, but I was unable to convince them to stop.

@antocuni
Copy link
Contributor

FWIW, in case it's needed in the future, another alternative which is better than the current logic but more lightweight than full AST parsing is to use the tokenize module, e.g. something like this:

import io
import tokenize

def is_async(code):
    f = io.BytesIO(code)
    for token in tokenize.tokenize(f.readline):
        if token.type == tokenize.NAME and token.string in ('async', 'await'):
            return True
    return False

assert is_async(b"""
await foo()
""")

assert not is_async(b"""
print("I am not async and I don't use asyncio")
""")

@JeffersGlass
Copy link
Member Author

@antocuni That's true, though that will look for any instances of using async/await, whereas the goal of this PR was to look for situations that require top-level await.

For example, the following code uses async/await, but doesn't require compiling with top level await enabled.

import asyncio
from js import console

async def clock():
  count = 0
  while(True):
    console.log(count := count + 1)
    await asyncio.sleep(1)

asyncio.create_task(clock())

However, the following does require top-level await, and would have to be run with pyodide.runPythonAsync or similar:

import asyncio
from js import console

count = 0
while(True):
  console.log(count := count + 1)
  await asyncio.sleep(1)

@marimeireles marimeireles added the waiting on feedback Issue or PR waiting on feedback from core team label Sep 12, 2022
@JeffersGlass
Copy link
Member Author

For what it's worth, this PR is not necessarily dead, but I've been deprioritizing it since it didn't have an immediate impact on the codebase for the 2022.09.1 release.

I do intend to continue (and hopefully merge) it though, since we're still having ongoing conversations about lifecycle, sync/async, and the webloop, and I think this functionality may have some use in there.

@tedpatrick
Copy link
Contributor

If we were explicit about denoting a <py-script> element instance as sync or async, would we need this detection?

<py-script mode="sync"></py-script>
<py-script mode="async"></py-script>

@JeffersGlass
Copy link
Member Author

@tedpatrick That would depend exactly what the 'sync' and 'async' modes do, which I don't think we're clear on yet, or at least I'm not. Let's get together a proposal around that, and we can revive/merge/kill this PR if the code becomes useful in that endeavor.

@JeffersGlass
Copy link
Member Author

The code in this PR has been folded into #928 - with the deprecation of Implicit Coroutine Scheduling/top-level-await, the code developed here is being used to show a useful error message if the user still tries to use Top-Level Await abilities.

Closing in favor of that PR.

@JeffersGlass JeffersGlass added status: superseded PR that has been superseded by another PR and removed waiting on feedback Issue or PR waiting on feedback from core team labels Nov 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: superseded PR that has been superseded by another PR
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

7 participants