Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add filename argument to runPython and use linecache for better tracebacks #3993

Merged
merged 5 commits into from Aug 28, 2023

Conversation

hoodmane
Copy link
Member

@hoodmane hoodmane commented Jul 12, 2023

As suggested by @antocuni.

  • Add a CHANGELOG entry
  • Add / update tests
  • Add new / update outdated documentation

…backs

This adds a filename argument to runPython and runPythonAsync. Also, in
eval_code and eval_code_async if the filename is not wrapped in <lt-gt-signs>
it will put the source into the linecache so that the tracebacks can show the
lines in them.
src/py/_pyodide/_base.py Outdated Show resolved Hide resolved
Copy link
Contributor

@antocuni antocuni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and it's a BIG improvement over the current situation 🎉.
I have two extra comments, though:

  1. it would be nice to have "good tracebacks by default" for all the people randomly using runPython() without knowing that they need to specify a filename. You could use the same approach used by the good old py.code.Source.compile(): invent a name and stick the lines into linecache. This is the relevant code:
    https://github.com/pytest-dev/py/blob/6b219734fcd8d2f6489c2f50585a435b34c029c2/py/_code/source.py#L160-L198

Example of usage (note the filename <0-codegen ...> -- maybe you could use something like <0-runPython ...> or similar):

import py
src = py.code.Source("""
def foo():
    0/0
""")
d = {}
exec(src.compile(), d)
d['foo']()
$ python /tmp/foo.py 
Traceback (most recent call last):
  File "/tmp/foo.py", line 8, in <module>
    d['foo']()
  File "<0-codegen /tmp/foo.py:7>", line 3, in foo
ZeroDivisionError: division by zero
  1. I know that this makes the logic harder so maybe it's worth a separate PR, but I think that for pyscript we want the ability of specifying the same filename with different line ranges: e.g.:
<script type="py"> ... </script>
...
<script type="py"> ... </script>

Ideally, we want to use runPython(..., {filename: "index.html", lineno: 10}) for the first, and runPython(..., {filename: "index.html", lineno: 20}) for the second, I think?

src/py/_pyodide/_base.py Outdated Show resolved Hide resolved
Copy link
Member

@ryanking13 ryanking13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @hoodmane! It generally looks good to me.

I also generally agree with Antonio's review (thanks for the review by the way). I think this PR is okay as-is, then we can add more fine-grained control of tracebacks in a separate PRs.

@hoodmane
Copy link
Member Author

@antocuni Thanks for the suggestions. I agree we should address these in followups. I was trying to start with something really simple. =)

I think if we want to use something wrapped in <> for the temporary names, we'll need to patch out these two lines of linecache:
https://github.com/python/cpython/blob/main/Lib/linecache.py#L88-L89
Though I have nothing against making a patch like that.

@antocuni
Copy link
Contributor

I think if we want to use something wrapped in <> for the temporary names, we'll need to patch out these two lines of linecache:

uh? No, you can just set linecache.cache directly, and bypass updatecache() entirely:

import linecache

def set_linecache_lines(filename, src):
    lines = src.splitlines()
    linecache.cache[filename] = (1, None, lines, filename)

def magic_compile(filename, src):
    codeobj = compile(src, filename, 'exec')
    set_linecache_lines(filename, src)
    return codeobj

src = """
def foo():
    0/0
"""
codeobj = magic_compile('<0-runPython>', src)
d = {}
exec(codeobj, d)
d['foo']()

@hoodmane
Copy link
Member Author

You can but if you put a print statement you'll see that the function is never called.

@hoodmane
Copy link
Member Author

Unless I was messing something up, let me try your code...

@antocuni
Copy link
Contributor

You can but if you put a print statement you'll see that the function is never called.

which function?
Note that I'm not using the "lazy cache" functionality here, I'm just putting lines inside the cache, bypassing all the rest :)

@hoodmane
Copy link
Member Author

Ah that makes sense.

@hoodmane
Copy link
Member Author

Will update this PR to use that approach soon, thanks @antocuni.

@hoodmane
Copy link
Member Author

Two possible problems:

  1. I don't know that we want to store all strings that we "runPython" by default. Probably this should be opt in.
  2. If filenames are reused, this will cause incorrect tracebacks

A zany scheme to solve 2: by experiment, if a character is unprintable then it is left out from the filename in the traceback. Every character from 0x10 to 0x20 is unprintable. So we could use:
"<eval>0x11", "<eval>0x12", ... "<eval>0x1F", "<eval>0x110x10", etc.

Where the pattern is on the nth time we use a certain file name we encode n in hexadecimal and then append to the string one byte of value 0x1v for each hex digit of n. You'll probably hate me for suggesting this though @antocuni.

@hoodmane
Copy link
Member Author

So explicitly the code would be for the nth use of filename:

garbage =  bytes(16 + int(b, 16) for b in hex(num_uses_so_far)[2:]).decode()
filename_with_appended_garbage = filename + garbage

@ryanking13
Copy link
Member

ryanking13 commented Jul 17, 2023

I don't know that we want to store all strings that we "runPython" by default. Probably this should be opt in.

Makes sense to me.

If filenames are reused, this will cause incorrect tracebacks

IMO, this is not a big problem. I think people who use this option should know what they are doing and should not reuse the filename.

A zany scheme to solve 2: by experiment, if a character is unprintable then it is left out from the filename in the traceback. Every character from 0x10 to 0x20 is unprintable. So we could use:
"0x11", "0x12", ... "0x1F", "0x110x10", etc.

no... 😫

@hoodmane hoodmane merged commit 311fa90 into pyodide:main Aug 28, 2023
5 of 8 checks passed
@hoodmane hoodmane deleted the filename branch August 28, 2023 11:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants