Simplifying eval_code and avoid injecting dummy variable. #1041

casatir · 2021-01-04T23:32:25Z

#876 brings interesting features to eval_code. However, I am not an AST expert but the resulting code looks overly complicated to me:

is the introduction of ̀<EXEC-LAST-EXPRESSION> to assign last expression considered good practice?
is it really safe?
the redundancy between eval_code and _eval_code_async doesn't follow the DRY principle
the previous mod.body splitting was nice and clear and can be kept
_adjust_ast_to_store_result and _adjust_ast_to_store_result_helper can be avoided (no need to test them!)

This PR aims at recovering the clearness of eval_code while supporting the semicolon feature and factorizing eval_code and _eval_code_async.

A review by @hoodmane will be nice as he probably wrote it like this for some reason that eludes me.

Lastly, I should mention that this splitting method is what is used in IPython apart from the fact that last expression is evaluated in singlemode (the result is then caught by sys.displayhook).

hoodmane · 2021-01-05T02:38:59Z

Thanks @casatir. My code was first-thing-that-works code that I have repurposed several times and so it has vestigial crap from supporting several different sets of features that are absent from this code and also things in it that were never necessary. Most of what you've done I think is an improvement, however the definition of eval_code_async you have does not work. Correctly implementing eval_code_async took me a lot of time, I first had a whole series of subtly and less-subtly broken variants.

hoodmane · 2021-01-05T02:39:40Z

this splitting method is what is used in IPython

I think several different methods are used in IPython:

splitting
wrapper function with co_newlocals set False (makes local variables in the function body write directly into enclosing frame)
storage into temporary result

IPython's support for async is rather inelegant because their original approach and feature set doesn't really fit that well with the weird set of constraints that async execution imposes.

hoodmane · 2021-01-05T02:44:03Z

Your version of eval_code_async is broken because you cannot use exec with async because exec returns void. If you exec code compiled with PyCF_ALLOW_TOP_LEVEL_AWAIT and an await occurs in the code, it generates a coroutine and then drops it, which means nothing actually happens and the code generates Runtime Warning: coroutine never awaited.

You could use the splitting method like:

if len(mod.body):
    coroutine_or_garbage = eval(compile(mod, "<exec>", "exec", flags=compile_flags), ns, ns)
    if iscoroutine(coroutine_or_garbage):
       await coroutine_or_garbage
if last_expr is not None:
    result = eval(compile(last_expr, "<exec>", "eval", flags=compile_flags), ns, ns)
    if iscoroutine(result):
       result = await result
    return result

but this code clearly cannot be shared between eval_code and eval_code_async.

hoodmane · 2021-01-05T02:47:24Z

So I think the only design constraint from the original code that you are missing is that in the async version you must only use eval and every time you eval you must store the result and await it if it is a coroutine. With that in mind I decided it was better to store into a temporary global. Using the temporary global approach, you can in fact unify eval_async and eval. I was thinking to eliminate eval and transition over to using eval_async once we have a webloop, but I guess there's some value to having the non async version maybe?

is <EXEC-LAST-EXPRESSION> really safe?

Only way you could run into trouble is if someone is trying to use globals()["<EXEC-LAST-EXPRESSION>"] for something else?

hoodmane · 2021-01-05T03:00:10Z

Lastly, we should talk about what the eventual API this code should support will look like when Pyodide is run on a WebWorker. The API I have in mind looks like:

Executor.run_python(code : str) -> Execution

execution.set_stdout_callback(callback : (str) -> void ) -> void
execution.set_stderr_callback(callback : (str) -> void ) -> void
async executor.syntax_check() -> Option<SyntaxError>
async executor.result() -> Result<any, any>

In the lifecycle of an execution there are four relevant types of events: stdout writes, stderr writes, whether the syntax check passes, and when the code has run to completion. stdout writes and stderr writes each happen 0 or more times, the syntax check and getting the result (either a value or an error). This API makes it quite easy for the UI to do special handling for syntax errors which is usually desirable.

casatir · 2021-01-05T08:00:07Z

Thanks @hoodmane for the review.

however the definition of eval_code_async you have does not work.

That's what I thought, I wrote it blindly: how do you test it? By the way, is your version really working?

if iscoroutine(res):
    await res

shouldn't it be:

if iscoroutine(res):
    return await res

In one hand, it is important to prepare async support (thanks for your work in that way) but in the other hand, there is no value in having tricky code in main places like eval_code only to implement eval_code_async that is not tested and not supported yet. WDYT?

storage into temporary result

Can't see this in IPython's code, at least not the way you do it. Maybe you talk about this? The temporary variable is in a separate namespace.

So I think the only design constraint from the original code that you are missing is that in the async version you must only use eval and every time you eval you must store the result and await it if it is a coroutine. With that in mind I decided it was better to store into a temporary global. Using the temporary global approach, you can in fact unify eval_async and eval.

Your right, I miss this point. Wouldn't it be cleaner to wrap it in an async function? ala IPython?

is really safe?

Only way you could run into trouble is if someone is trying to use globals()[""] for something else?

Imagine a custom namespace (dict) that counts how many named objects are created during execution. It gets +1 with your approach and the user can't explain it.

Your WebWorker API sounds good but does it has an impact on the way eval_code should be designed? Only eval_code_async though.

casatir · 2021-01-05T09:06:50Z

As far as I understand, the common part in eval_code and eval_code_async is mainly the splitting. What about something like this?

def eval_code(...):
    runner = CodeRunner(...)
    return runner.run()

async def eval_code_async(...):
    runner = CodeRunner(...)
    return await runner.run_async()

with the class CodeRunner managing the splitting and other common stuff.

hoodmane · 2021-01-05T10:56:04Z

By the way, is your version really working?

Yup, both in a native repl and in pyodide. The pyodide version you may try out here:
https://spectralsequences.com/repl/

from js_wrappers.async_js import wrap_promise
resp = await wrap_promise((await wrap_promise(fetch("api/charts/does-this-exist"))).json())
resp.code # ==> 'get-chart::failed::not-found'

Since the version I dumped in here is not tested it is best assumed to be broken by the "if it isn't tested it is broken" principle, but it is very similar to well-tested stuff in my private code so I have confidence.

shouldn't it be:

return await res

For some annoying reason return await is a syntax error. I think this is planned to fix in v3.10? But no, the return value for mine is always None. In fact there is no way to return anything the normal way without the wrapper function.

Your right, I miss this point. Wouldn't it be cleaner to wrap it in an async function? ala IPython?

I found that their wrapper function approach causes bugs with self assignment. I think this is what happened:

>>> x = 5
>>> x = x + 1 # ==> Unbound Local Error: Local variable x referred to before assignment

Whereas

>>> x = 5
... x = x + 1

works fine. This is weird, I assume that their version of the code must work, but whenever I try it I get subtle bugs. It's worth pointing out that in 3.8 there is an API like new_code_object = CodeObject.replace(co_obj, co_new_locals = False) so if you go this way you don't have to repeat all of the fields of the code object like they do in IPython.

hoodmane · 2021-01-05T11:02:00Z

there is no value in having tricky code in main places like eval_code only to implement eval_code_async that is not tested and not supported yet.

There is absolutely value in this. In this case, in order to support async/await there is at least one significant PR and several minor fixes that need to be made to core, another PR to add a web loop, and eval_code_async needs to work. Once we have all the pieces, we will presumably need an extra PR to work out kinks.
If we need to make changes ABCDE in order to achieve some new functionality, but by itself each of those changes introduces unnecessary complexity to the code and adds no user-facing features, then your logic would prevent ever implementing the new functionality.

hoodmane · 2021-01-05T11:10:17Z

Your WebWorker API sounds good but does it has an impact on the way eval_code should be designed? Only eval_code_async though.

You either need to parse the code twice or you need to add some extra logic just after mod = ast.parse(code) to send a message to the Executor to tell it that the syntax check passed. Of course you can factor out everything after mod = ast.parse(code) into a separate method. This is how I got to having _adjust_ast_to_store_result factored out in the first place.

hoodmane · 2021-01-05T11:13:42Z

how do you test it?

Use a dummy webloop (stolen from someone in the #245 discussion):

class WebLoop(AbstractEventLoop):
    def call_soon(self, coro, arg=None):
        try:
            x = coro.send(arg)
            x = x.then(partial(self.call_soon, coro))
            x.catch(partial(self.fail,coro))
        except StopIteration as result:
            pass

class WrappedPromise:
    def __init__(self, promise):
        self.promise = promise
    def __await__(self):
        x = yield self.promise
        return x


def wrap_promise(promise):
    return WrappedPromise(promise)

hoodmane · 2021-01-05T11:15:45Z

If you like I can add some tests for eval_code_async along these lines.

I've been more focused on fixing core since it is full of weird behaviors and disasters waiting to happen (see for example #1033).

casatir · 2021-01-05T13:34:03Z

it is best assumed to be broken by the "if it isn't tested it is broken" principle

I don't know where does this comes from but it sounds difficult with multiple contributors. Everyone should look at tests to know what is broken? I would prefer the "if it's broken it should be mentioned (and maybe raise an error)" principle.

For some annoying reason return await is a syntax error.

Are you sure about this?

import asyncio

async def foo() -> str:
    return 'bar'

async def bar() -> str:
    return await foo()

asyncio.run(bar())

gives me 'bar' without any syntax error on Python 3.8.1.

there is no value in having tricky code in main places like eval_code only to implement eval_code_async that is not tested and not supported yet.

There is absolutely value in this. In this case, in order to support async/await there is at least one significant PR and several minor fixes that need to be made to core, another PR to add a web loop, and eval_code_async needs to work. Once we have all the pieces, we will presumably need an extra PR to work out kinks.

Your answer does not explain why it needs to put tricky code in eval_code. I maintain.

If we need to make changes ABCDE in order to achieve some new functionality, but by itself each of those changes introduces unnecessary complexity to the code and adds no user-facing features, then your logic would prevent ever implementing the new functionality.

This is not what I am saying. I think that we should minimize the "pollution" of existing code with features that require several PR. This does not ease contributions. Atomicity may be the goal. If this can't be achieved, it should be developed in a separate branch before being merged into master (as async support), but that's a personal point of view.

hoodmane · 2021-01-05T13:49:37Z

Are you sure about this?

I was sure about it, but now I am sure that it is wrong. =) It seems that return yield x is a syntax error, and await and yield are practically identical constructs so I got confused.

hoodmane · 2021-01-05T13:59:44Z

it is best assumed to be broken by the "if it isn't tested it is broken" principle

I don't know where does this comes from but it sounds difficult with multiple contributors. Everyone should look at tests to know what is broken? I would prefer the "if it's broken it should be mentioned."

Sorry, I think you misunderstood what I was saying. I am absolutely against adding broken or untested code without a warning. We want our code not to broken and "if it isn't tested we assume it is broken" so that means we must test. (Though it's important to remember that testing code does not prevent it from being broken: the converse that "if it is tested it works" is completely wrong.)

I think that we should minimize the "pollution" of existing code with features that require several PR.

The options are duplication or pollution.

If this can't be achieved, it should be developed in a separate branch before being merged into master (as async support), but that's a personal point of view.

Maintaining a separate branch is a lot of work, I've been spending a lot of time maintaining pending PRs as it is. I think as long as this project is maintained by unpaid volunteers, there is a limit to what you can expect.

hoodmane · 2021-01-05T14:00:02Z

I think the main problem here is that my previous PR was highly unpolished and it was merged before I addressed these significant and reasonable complaints you have about it.

hoodmane · 2021-01-05T14:06:04Z

Your answer does not explain why it needs to put tricky code in eval_code. I maintain.

I think this is sort of fair, this eval_code code is rather simple compared to the webloop and modifications to JsProxy. It does feel to me like you are rather selectively policing this one file though.

casatir · 2021-01-05T16:57:30Z

I am absolutely against adding broken or untested code without a warning.

That's what you did. Maybe I missed the warning.

The options are duplication or pollution.

You're right, branching is a kind of duplication. I prefer duplication.

I've been spending a lot of time maintaining pending PRs as it is

Understanding overly complicated code also takes time.

I think the main problem here is that my previous PR was highly unpolished and it was merged before I addressed these significant and reasonable complaints you have about it.

But you seemed in a hurry: #876 (comment)

I think this is sort of fair, this eval_code code is rather simple compared to the webloop and modifications to JsProxy.

You mean since Pyodide has tricky parts, we should not polish elsewhere?

It does feel to me like you are rather selectively policing this one file though.

Replace "policing" by "polishing" and you exactly get the purpose of this PR, yes.

I have a lot of respect for your implication in Pyodide and all I want is to make it better, at my own pace.

hoodmane · 2021-01-05T17:13:13Z

That's what you did. Maybe I missed the warning.

No that's correct I added untested code without a warning, and I haven't been meaning to imply that I didn't do that. It was a mistake, I'm sorry. Humans are fallible.

Look, I'm good with your improvements, but I am pretty sure we are going to have a webloop in the next release and I see no reason this file shouldn't be set up with an eval_code_async function now. If you update this to remove eval_code_async entirely, I'm fine with merging this and then I will make a second PR adding a tested version of eval_code_async.

hoodmane · 2021-01-05T17:21:32Z

src/pyodide-py/pyodide/_base.py

-    return ns.pop(target_name)
+    # we extract last expression
+    last_expr = None
+    if isinstance(mod.body[-1], (ast.Expr, ast.Await)) and not quiet_code(code):


You should combine
isinstance(mod.body[-1], (ast.Expr, ast.Await)) into quiet_code, probably rename quiet_code to should_code_return_value or something like that. Then we can keep my tests but apply them to should_code_return_value.

rth · 2021-01-05T19:05:34Z

I agree that it would be good to simplify that code, and add more tests. But asyncio features are indeed very likely be included once #880 , #958, and possibly additional PRs are accepted. We can probably make a META PR with everything to make sure the combination is working as expected, but in the end it will be better to merge them separately as they are now, for clearer history and attribution. So how do we move forward from here?

If you update this to remove eval_code_async entirely, I'm fine with merging this and then I will make a second PR adding a tested version of eval_code_async.

Could we avoid that code churn and just add more tests in a separate PR? And improve eval_code in this one?

@casatir don't hesitate to comment on/review PRs in general if you have any concerns.

hoodmane · 2021-01-05T19:26:38Z

how do we move forward from here?

I could open another PR on top of this branch with my opinion if you like.

casatir · 2021-01-05T23:46:48Z

Here is a proposition to save _eval_code_async with warning and error but that should not be broken (provided that one remove exceptions).

@casatir don't hesitate to comment on/review PRs in general if you have any concerns.

Thanks, time is lacking.

hoodmane · 2021-01-06T00:11:00Z

@rth I'm fine to merge this, I can delete the spurious NotImplementedErrors and add tests for eval_code_async in a subsequent PR.

rth · 2021-01-06T09:49:37Z

src/pyodide-py/pyodide/_base.py

@@ -43,41 +43,137 @@ def open_url(url: str) -> StringIO:
    return StringIO(req.response)


-def quiet_code(code: str) -> bool:
+class CodeRunner(object):


In Python 3,

Suggested change

class CodeRunner(object):

class CodeRunner:

is enough.

rth · 2021-01-06T09:50:41Z

src/pyodide-py/pyodide/_base.py

+        Parameters
+        ----------
+        code
+        the Python code to run.


Suggested change

the Python code to run.

the Python code to run.

rth · 2021-01-06T09:51:15Z

src/pyodide-py/pyodide/_base.py

+        code
+        the Python code to run.
+        ns
+        `locals()` or `globals()` context where to execute code.


Suggested change

`locals()` or `globals()` context where to execute code.

`locals()` or `globals()` context where to execute code.

cf https://numpydoc.readthedocs.io/en/latest/format.html

src/pyodide-py/pyodide/_base.py

hoodmane · 2021-01-06T12:58:55Z

I really don't like the raise NotImplementedErrors, they are a nuisance and also incorrect because they sit above a complete implementation (which might have a mistake). I think they need to be replaced with warnings e.g.,

import warnings
warnings.warn("asyncio is an experimental and untested feature, things may go wrong!")

rth · 2021-01-06T20:58:39Z

really don't like the raise NotImplementedError

It doesn't matter much either way, I think. We will fix that once the other pieces of the asyncio support are integrated (hopefully soon).

hoodmane · 2021-01-06T21:02:20Z

Right, well eval_async can and should be tested independently of the EventLoop and other async support components.

Speaking of which, runPythonAsync is kind of occupying the prime space for an entrypoint to eval_async...

rth · 2021-01-06T22:37:32Z

src/pyodide-py/pyodide/_base.py

    """

-    def __init__(self, code: str, ns: Dict[str, Any]):
+    def __init__(self, ns: Dict[str, Any] = {}):


We should better make it None by default, then set to {} if it's None, to avoid the issue of https://stackoverflow.com/a/5592432/179127

Of course! My mistake! Never put a mutable as default argument!

rth · 2021-01-06T22:39:46Z

src/pyodide-py/pyodide/_base.py

        """
        Constructor.

        Parameters
        ----------
-        code
-           the Python code to run.
        ns
           `locals()` or `globals()` context where to execute code.
        """


This docstring should be merged with class docstring. __init__ doesn't need one.

Also we should add a note that after the first call to run, the ns will be modified inplace (as far as I can tell) which will impact subsequent code evaluation.

Right. Is Parameters allowed in class docstring?

yes, https://numpydoc.readthedocs.io/en/latest/format.html#documenting-classes

rth · 2021-01-06T22:40:17Z

src/pyodide-py/pyodide/_base.py


-    def quiet(self) -> bool:
+    def quiet(self, code: str) -> bool:


Can probably even be a @staticmethod ?

In the future, the quiet behavior can be impacted by options from the constructor (see #1056).

rth · 2021-01-07T08:27:22Z

src/pyodide-py/pyodide/_base.py

-
-
-def eval_code(code: str, ns: Dict[str, Any]) -> None:
+    def __init__(self, ns: Dict[str, Any] = None):


Type annotations need an Optional here, but this can be addressed in a follow up PR. mypy in CI says that everything is OK which is very suspicious.

Simplifying eval_code and avoid injecting dummy variable.

f49c445

hoodmane reviewed Jan 5, 2021

View reviewed changes

A CodeRunner class to factorize eval_code and eval_code_async.

aaa9ed7

casatir force-pushed the simplifying-eval_code branch from 63da7a3 to aaa9ed7 Compare January 5, 2021 23:53

casatir mentioned this pull request Jan 6, 2021

Customizable eval_code behavior #1056

Merged

rth reviewed Jan 6, 2021

View reviewed changes

Typos from rth review.

a6cde9f

True CodeRunner that can run multiple codes following rth suggestion.

c792c30

rth reviewed Jan 6, 2021

View reviewed changes

casatir added 2 commits January 7, 2021 00:00

Corrections from rth review.

c7cf9d8

Typo.

9606ebb

rth reviewed Jan 7, 2021

View reviewed changes

rth merged commit 7ac95d0 into pyodide:master Jan 7, 2021

casatir deleted the simplifying-eval_code branch January 7, 2021 12:49

hoodmane mentioned this pull request Jan 13, 2021

Deprecate the runPythonAsync API? #1133

Closed

rth mentioned this pull request Apr 26, 2021

Convert eval_code helper methods to pure functions #1532

Merged

	`locals()` or `globals()` context where to execute code.
	`locals()` or `globals()` context where to execute code.



		def eval_code(code: str, ns: Dict[str, Any]) -> None:
		def __init__(self, ns: Dict[str, Any] = None):

Simplifying eval_code and avoid injecting dummy variable. #1041

Simplifying eval_code and avoid injecting dummy variable. #1041

Conversation

casatir commented Jan 4, 2021

hoodmane commented Jan 5, 2021

hoodmane commented Jan 5, 2021 • edited

hoodmane commented Jan 5, 2021 • edited

hoodmane commented Jan 5, 2021 • edited

hoodmane commented Jan 5, 2021 • edited

casatir commented Jan 5, 2021

casatir commented Jan 5, 2021

hoodmane commented Jan 5, 2021 • edited

hoodmane commented Jan 5, 2021

hoodmane commented Jan 5, 2021

hoodmane commented Jan 5, 2021

hoodmane commented Jan 5, 2021

casatir commented Jan 5, 2021

hoodmane commented Jan 5, 2021 • edited

hoodmane commented Jan 5, 2021 • edited

hoodmane commented Jan 5, 2021

hoodmane commented Jan 5, 2021

casatir commented Jan 5, 2021

hoodmane commented Jan 5, 2021 • edited

Choose a reason for hiding this comment

rth commented Jan 5, 2021

hoodmane commented Jan 5, 2021

casatir commented Jan 5, 2021

hoodmane commented Jan 6, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hoodmane commented Jan 6, 2021 • edited

rth commented Jan 6, 2021

hoodmane commented Jan 6, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rth Jan 7, 2021 • edited

Choose a reason for hiding this comment

hoodmane commented Jan 5, 2021 •

edited

hoodmane commented Jan 5, 2021 •

edited

hoodmane commented Jan 5, 2021 •

edited

hoodmane commented Jan 5, 2021 •

edited

hoodmane commented Jan 5, 2021 •

edited

hoodmane commented Jan 5, 2021 •

edited

hoodmane commented Jan 5, 2021 •

edited

hoodmane commented Jan 5, 2021 •

edited

hoodmane commented Jan 6, 2021 •

edited

rth Jan 7, 2021 •

edited