Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplifying eval_code and avoid injecting dummy variable. #1041

Merged
merged 6 commits into from
Jan 7, 2021

Conversation

casatir
Copy link
Contributor

@casatir casatir commented Jan 4, 2021

#876 brings interesting features to eval_code. However, I am not an AST expert but the resulting code looks overly complicated to me:

  • is the introduction of ̀<EXEC-LAST-EXPRESSION> to assign last expression considered good practice?
  • is it really safe?
  • the redundancy between eval_code and _eval_code_async doesn't follow the DRY principle
  • the previous mod.body splitting was nice and clear and can be kept
  • _adjust_ast_to_store_result and _adjust_ast_to_store_result_helper can be avoided (no need to test them!)

This PR aims at recovering the clearness of eval_code while supporting the semicolon feature and factorizing eval_code and _eval_code_async.

A review by @hoodmane will be nice as he probably wrote it like this for some reason that eludes me.

Lastly, I should mention that this splitting method is what is used in IPython apart from the fact that last expression is evaluated in singlemode (the result is then caught by sys.displayhook).

@hoodmane
Copy link
Member

hoodmane commented Jan 5, 2021

Thanks @casatir. My code was first-thing-that-works code that I have repurposed several times and so it has vestigial crap from supporting several different sets of features that are absent from this code and also things in it that were never necessary. Most of what you've done I think is an improvement, however the definition of eval_code_async you have does not work. Correctly implementing eval_code_async took me a lot of time, I first had a whole series of subtly and less-subtly broken variants.

@hoodmane
Copy link
Member

hoodmane commented Jan 5, 2021

this splitting method is what is used in IPython

I think several different methods are used in IPython:

  1. splitting
  2. wrapper function with co_newlocals set False (makes local variables in the function body write directly into enclosing frame)
  3. storage into temporary result

IPython's support for async is rather inelegant because their original approach and feature set doesn't really fit that well with the weird set of constraints that async execution imposes.

@hoodmane
Copy link
Member

hoodmane commented Jan 5, 2021

Your version of eval_code_async is broken because you cannot use exec with async because exec returns void. If you exec code compiled with PyCF_ALLOW_TOP_LEVEL_AWAIT and an await occurs in the code, it generates a coroutine and then drops it, which means nothing actually happens and the code generates Runtime Warning: coroutine never awaited.

You could use the splitting method like:

if len(mod.body):
    coroutine_or_garbage = eval(compile(mod, "<exec>", "exec", flags=compile_flags), ns, ns)
    if iscoroutine(coroutine_or_garbage):
       await coroutine_or_garbage
if last_expr is not None:
    result = eval(compile(last_expr, "<exec>", "eval", flags=compile_flags), ns, ns)
    if iscoroutine(result):
       result = await result
    return result

but this code clearly cannot be shared between eval_code and eval_code_async.

@hoodmane
Copy link
Member

hoodmane commented Jan 5, 2021

So I think the only design constraint from the original code that you are missing is that in the async version you must only use eval and every time you eval you must store the result and await it if it is a coroutine. With that in mind I decided it was better to store into a temporary global. Using the temporary global approach, you can in fact unify eval_async and eval. I was thinking to eliminate eval and transition over to using eval_async once we have a webloop, but I guess there's some value to having the non async version maybe?

is <EXEC-LAST-EXPRESSION> really safe?

Only way you could run into trouble is if someone is trying to use globals()["<EXEC-LAST-EXPRESSION>"] for something else?

@hoodmane
Copy link
Member

hoodmane commented Jan 5, 2021

Lastly, we should talk about what the eventual API this code should support will look like when Pyodide is run on a WebWorker. The API I have in mind looks like:

Executor.run_python(code : str) -> Execution

execution.set_stdout_callback(callback : (str) -> void ) -> void
execution.set_stderr_callback(callback : (str) -> void ) -> void
async executor.syntax_check() -> Option<SyntaxError>
async executor.result() -> Result<any, any>

In the lifecycle of an execution there are four relevant types of events: stdout writes, stderr writes, whether the syntax check passes, and when the code has run to completion. stdout writes and stderr writes each happen 0 or more times, the syntax check and getting the result (either a value or an error). This API makes it quite easy for the UI to do special handling for syntax errors which is usually desirable.

@casatir
Copy link
Contributor Author

casatir commented Jan 5, 2021

Thanks @hoodmane for the review.

however the definition of eval_code_async you have does not work.

That's what I thought, I wrote it blindly: how do you test it? By the way, is your version really working?

if iscoroutine(res):
    await res

shouldn't it be:

if iscoroutine(res):
    return await res

In one hand, it is important to prepare async support (thanks for your work in that way) but in the other hand, there is no value in having tricky code in main places like eval_code only to implement eval_code_async that is not tested and not supported yet. WDYT?

  1. storage into temporary result

Can't see this in IPython's code, at least not the way you do it. Maybe you talk about this? The temporary variable is in a separate namespace.

So I think the only design constraint from the original code that you are missing is that in the async version you must only use eval and every time you eval you must store the result and await it if it is a coroutine. With that in mind I decided it was better to store into a temporary global. Using the temporary global approach, you can in fact unify eval_async and eval.

Your right, I miss this point. Wouldn't it be cleaner to wrap it in an async function? ala IPython?

is really safe?

Only way you could run into trouble is if someone is trying to use globals()[""] for something else?

Imagine a custom namespace (dict) that counts how many named objects are created during execution. It gets +1 with your approach and the user can't explain it.

Your WebWorker API sounds good but does it has an impact on the way eval_code should be designed? Only eval_code_async though.

@casatir
Copy link
Contributor Author

casatir commented Jan 5, 2021

As far as I understand, the common part in eval_code and eval_code_async is mainly the splitting. What about something like this?

def eval_code(...):
    runner = CodeRunner(...)
    return runner.run()

async def eval_code_async(...):
    runner = CodeRunner(...)
    return await runner.run_async()

with the class CodeRunner managing the splitting and other common stuff.

@hoodmane
Copy link
Member

hoodmane commented Jan 5, 2021

By the way, is your version really working?

Yup, both in a native repl and in pyodide. The pyodide version you may try out here:
https://spectralsequences.com/repl/

from js_wrappers.async_js import wrap_promise
resp = await wrap_promise((await wrap_promise(fetch("api/charts/does-this-exist"))).json())
resp.code # ==> 'get-chart::failed::not-found'

Since the version I dumped in here is not tested it is best assumed to be broken by the "if it isn't tested it is broken" principle, but it is very similar to well-tested stuff in my private code so I have confidence.

shouldn't it be:

return await res

For some annoying reason return await is a syntax error. I think this is planned to fix in v3.10? But no, the return value for mine is always None. In fact there is no way to return anything the normal way without the wrapper function.

Your right, I miss this point. Wouldn't it be cleaner to wrap it in an async function? ala IPython?

I found that their wrapper function approach causes bugs with self assignment. I think this is what happened:

>>> x = 5
>>> x = x + 1 # ==> Unbound Local Error: Local variable x referred to before assignment

Whereas

>>> x = 5
... x = x + 1

works fine. This is weird, I assume that their version of the code must work, but whenever I try it I get subtle bugs. It's worth pointing out that in 3.8 there is an API like new_code_object = CodeObject.replace(co_obj, co_new_locals = False) so if you go this way you don't have to repeat all of the fields of the code object like they do in IPython.

@hoodmane
Copy link
Member

hoodmane commented Jan 5, 2021

there is no value in having tricky code in main places like eval_code only to implement eval_code_async that is not tested and not supported yet.

There is absolutely value in this. In this case, in order to support async/await there is at least one significant PR and several minor fixes that need to be made to core, another PR to add a web loop, and eval_code_async needs to work. Once we have all the pieces, we will presumably need an extra PR to work out kinks.
If we need to make changes ABCDE in order to achieve some new functionality, but by itself each of those changes introduces unnecessary complexity to the code and adds no user-facing features, then your logic would prevent ever implementing the new functionality.

@hoodmane
Copy link
Member

hoodmane commented Jan 5, 2021

Your WebWorker API sounds good but does it has an impact on the way eval_code should be designed? Only eval_code_async though.

You either need to parse the code twice or you need to add some extra logic just after mod = ast.parse(code) to send a message to the Executor to tell it that the syntax check passed. Of course you can factor out everything after mod = ast.parse(code) into a separate method. This is how I got to having _adjust_ast_to_store_result factored out in the first place.

@hoodmane
Copy link
Member

hoodmane commented Jan 5, 2021

how do you test it?

Use a dummy webloop (stolen from someone in the #245 discussion):

class WebLoop(AbstractEventLoop):
    def call_soon(self, coro, arg=None):
        try:
            x = coro.send(arg)
            x = x.then(partial(self.call_soon, coro))
            x.catch(partial(self.fail,coro))
        except StopIteration as result:
            pass

class WrappedPromise:
    def __init__(self, promise):
        self.promise = promise
    def __await__(self):
        x = yield self.promise
        return x


def wrap_promise(promise):
    return WrappedPromise(promise)

@hoodmane
Copy link
Member

hoodmane commented Jan 5, 2021

If you like I can add some tests for eval_code_async along these lines.

I've been more focused on fixing core since it is full of weird behaviors and disasters waiting to happen (see for example #1033).

@casatir
Copy link
Contributor Author

casatir commented Jan 5, 2021

it is best assumed to be broken by the "if it isn't tested it is broken" principle

I don't know where does this comes from but it sounds difficult with multiple contributors. Everyone should look at tests to know what is broken? I would prefer the "if it's broken it should be mentioned (and maybe raise an error)" principle.

For some annoying reason return await is a syntax error.

Are you sure about this?

import asyncio

async def foo() -> str:
    return 'bar'

async def bar() -> str:
    return await foo()

asyncio.run(bar())

gives me 'bar' without any syntax error on Python 3.8.1.

there is no value in having tricky code in main places like eval_code only to implement eval_code_async that is not tested and not supported yet.

There is absolutely value in this. In this case, in order to support async/await there is at least one significant PR and several minor fixes that need to be made to core, another PR to add a web loop, and eval_code_async needs to work. Once we have all the pieces, we will presumably need an extra PR to work out kinks.

Your answer does not explain why it needs to put tricky code in eval_code. I maintain.

If we need to make changes ABCDE in order to achieve some new functionality, but by itself each of those changes introduces unnecessary complexity to the code and adds no user-facing features, then your logic would prevent ever implementing the new functionality.

This is not what I am saying. I think that we should minimize the "pollution" of existing code with features that require several PR. This does not ease contributions. Atomicity may be the goal. If this can't be achieved, it should be developed in a separate branch before being merged into master (as async support), but that's a personal point of view.

@hoodmane
Copy link
Member

hoodmane commented Jan 5, 2021

Are you sure about this?

I was sure about it, but now I am sure that it is wrong. =) It seems that return yield x is a syntax error, and await and yield are practically identical constructs so I got confused.

@hoodmane
Copy link
Member

hoodmane commented Jan 5, 2021

it is best assumed to be broken by the "if it isn't tested it is broken" principle

I don't know where does this comes from but it sounds difficult with multiple contributors. Everyone should look at tests to know what is broken? I would prefer the "if it's broken it should be mentioned."

Sorry, I think you misunderstood what I was saying. I am absolutely against adding broken or untested code without a warning. We want our code not to broken and "if it isn't tested we assume it is broken" so that means we must test. (Though it's important to remember that testing code does not prevent it from being broken: the converse that "if it is tested it works" is completely wrong.)

I think that we should minimize the "pollution" of existing code with features that require several PR.

The options are duplication or pollution.

If this can't be achieved, it should be developed in a separate branch before being merged into master (as async support), but that's a personal point of view.

Maintaining a separate branch is a lot of work, I've been spending a lot of time maintaining pending PRs as it is. I think as long as this project is maintained by unpaid volunteers, there is a limit to what you can expect.

@hoodmane
Copy link
Member

hoodmane commented Jan 5, 2021

I think the main problem here is that my previous PR was highly unpolished and it was merged before I addressed these significant and reasonable complaints you have about it.

@hoodmane
Copy link
Member

hoodmane commented Jan 5, 2021

Your answer does not explain why it needs to put tricky code in eval_code. I maintain.

I think this is sort of fair, this eval_code code is rather simple compared to the webloop and modifications to JsProxy. It does feel to me like you are rather selectively policing this one file though.

@casatir
Copy link
Contributor Author

casatir commented Jan 5, 2021

I am absolutely against adding broken or untested code without a warning.

That's what you did. Maybe I missed the warning.

The options are duplication or pollution.

You're right, branching is a kind of duplication. I prefer duplication.

I've been spending a lot of time maintaining pending PRs as it is

Understanding overly complicated code also takes time.

I think the main problem here is that my previous PR was highly unpolished and it was merged before I addressed these significant and reasonable complaints you have about it.

But you seemed in a hurry: #876 (comment)

I think this is sort of fair, this eval_code code is rather simple compared to the webloop and modifications to JsProxy.

You mean since Pyodide has tricky parts, we should not polish elsewhere?

It does feel to me like you are rather selectively policing this one file though.

Replace "policing" by "polishing" and you exactly get the purpose of this PR, yes.

I have a lot of respect for your implication in Pyodide and all I want is to make it better, at my own pace.

@hoodmane
Copy link
Member

hoodmane commented Jan 5, 2021

That's what you did. Maybe I missed the warning.

No that's correct I added untested code without a warning, and I haven't been meaning to imply that I didn't do that. It was a mistake, I'm sorry. Humans are fallible.

Look, I'm good with your improvements, but I am pretty sure we are going to have a webloop in the next release and I see no reason this file shouldn't be set up with an eval_code_async function now. If you update this to remove eval_code_async entirely, I'm fine with merging this and then I will make a second PR adding a tested version of eval_code_async.

return ns.pop(target_name)
# we extract last expression
last_expr = None
if isinstance(mod.body[-1], (ast.Expr, ast.Await)) and not quiet_code(code):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should combine
isinstance(mod.body[-1], (ast.Expr, ast.Await)) into quiet_code, probably rename quiet_code to should_code_return_value or something like that. Then we can keep my tests but apply them to should_code_return_value.

@rth
Copy link
Member

rth commented Jan 5, 2021

I agree that it would be good to simplify that code, and add more tests. But asyncio features are indeed very likely be included once #880 , #958, and possibly additional PRs are accepted. We can probably make a META PR with everything to make sure the combination is working as expected, but in the end it will be better to merge them separately as they are now, for clearer history and attribution. So how do we move forward from here?

If you update this to remove eval_code_async entirely, I'm fine with merging this and then I will make a second PR adding a tested version of eval_code_async.

Could we avoid that code churn and just add more tests in a separate PR? And improve eval_code in this one?

@casatir don't hesitate to comment on/review PRs in general if you have any concerns.

@hoodmane
Copy link
Member

hoodmane commented Jan 5, 2021

how do we move forward from here?

I could open another PR on top of this branch with my opinion if you like.

@casatir
Copy link
Contributor Author

casatir commented Jan 5, 2021

Here is a proposition to save _eval_code_async with warning and error but that should not be broken (provided that one remove exceptions).

@casatir don't hesitate to comment on/review PRs in general if you have any concerns.

Thanks, time is lacking.

@hoodmane
Copy link
Member

hoodmane commented Jan 6, 2021

@rth I'm fine to merge this, I can delete the spurious NotImplementedErrors and add tests for eval_code_async in a subsequent PR.

@@ -43,41 +43,137 @@ def open_url(url: str) -> StringIO:
return StringIO(req.response)


def quiet_code(code: str) -> bool:
class CodeRunner(object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Python 3,

Suggested change
class CodeRunner(object):
class CodeRunner:

is enough.

Parameters
----------
code
the Python code to run.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the Python code to run.
the Python code to run.

code
the Python code to run.
ns
`locals()` or `globals()` context where to execute code.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`locals()` or `globals()` context where to execute code.
`locals()` or `globals()` context where to execute code.

cf https://numpydoc.readthedocs.io/en/latest/format.html

src/pyodide-py/pyodide/_base.py Outdated Show resolved Hide resolved
@hoodmane
Copy link
Member

hoodmane commented Jan 6, 2021

I really don't like the raise NotImplementedErrors, they are a nuisance and also incorrect because they sit above a complete implementation (which might have a mistake). I think they need to be replaced with warnings e.g.,

import warnings
warnings.warn("asyncio is an experimental and untested feature, things may go wrong!")

@rth
Copy link
Member

rth commented Jan 6, 2021

really don't like the raise NotImplementedError

It doesn't matter much either way, I think. We will fix that once the other pieces of the asyncio support are integrated (hopefully soon).

@hoodmane
Copy link
Member

hoodmane commented Jan 6, 2021

Right, well eval_async can and should be tested independently of the EventLoop and other async support components.

Speaking of which, runPythonAsync is kind of occupying the prime space for an entrypoint to eval_async...

"""

def __init__(self, code: str, ns: Dict[str, Any]):
def __init__(self, ns: Dict[str, Any] = {}):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should better make it None by default, then set to {} if it's None, to avoid the issue of https://stackoverflow.com/a/5592432/179127

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course! My mistake! Never put a mutable as default argument!

"""
Constructor.

Parameters
----------
code
the Python code to run.
ns
`locals()` or `globals()` context where to execute code.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docstring should be merged with class docstring. __init__ doesn't need one.

Also we should add a note that after the first call to run, the ns will be modified inplace (as far as I can tell) which will impact subsequent code evaluation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Is Parameters allowed in class docstring?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


def quiet(self) -> bool:
def quiet(self, code: str) -> bool:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can probably even be a @staticmethod ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, the quiet behavior can be impacted by options from the constructor (see #1056).



def eval_code(code: str, ns: Dict[str, Any]) -> None:
def __init__(self, ns: Dict[str, Any] = None):
Copy link
Member

@rth rth Jan 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type annotations need an Optional here, but this can be addressed in a follow up PR. mypy in CI says that everything is OK which is very suspicious.

@rth rth merged commit 7ac95d0 into pyodide:master Jan 7, 2021
@casatir casatir deleted the simplifying-eval_code branch January 7, 2021 12:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants