Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't automatically copy python objects into javascript #1167

Merged
merged 19 commits into from Feb 2, 2021

Conversation

hoodmane
Copy link
Member

@hoodmane hoodmane commented Jan 22, 2021

This is #1152 again. This implements the main changes proposed in #900.

This PR adjusts js2python to wrap Python list and dict objects in PyProxy objects instead of copying them into Javascript. This has the advantage that it allows Javascript and Python to share a reference to a Javascript Array or object. Automatic copying is lossy and inconvenient, it's asymmetric (python2js proxies Javascript Array and object into JsProxy), and it causes difficulties in a wide variety circumstances. For instance, it makes it impossible in general to get the repr of an object returned from runPython, it requires special case code to handle pyodide.globals (which is a PyProxy of dict), etc etc.

This adds new APIs deepCopyToJavascript and shallowCopyToJavascript as PyProxy methods, using these recovers the old behaviors.

Copy link
Contributor

@phorward phorward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @hoodmane, thank you for setting up your PR again, and apologize for the seesaw.
The solution for the failing tests, as mentioned in #1152 is obviously simpler than I thought first.

I was thinking the right approach would be to merge this as-is and then preferably fix the ergonomics problems before the next release, but I could see adding more to this PR.

I'm okay with this approach, anyway, next time, better we'll test twice against current master.

Oh also the docs on type conversions and the changelog should be updated.

It would be nice if you can do that.

Presumably implementing toJSON on pyproxy would make the tests pass more painlessly.

And this as well. When all changes are done and maybe @rth is also okay with this, we can merge again.

@hoodmane
Copy link
Member Author

hoodmane commented Jan 22, 2021

Right, well if I add these other changes to this branch and it is decided that they should be merged separately I can always split them off into other PRs. I'm a little bit unsure how selenium handles serialization, the selenium docs are surprisingly vague on the topic. Presumably a little bit of experimentation will make it clear though.

@hoodmane
Copy link
Member Author

Okay so I don't really understand how selenium serializes stuff. It definitely doesn't do JSON.stringify. The following assertion passes:

   assert selenium.run_js(
        """
        return {
            toJSON : () => {
                console.log("Running toJSON!");
                return [1,2,3];
            }
        };
        """
    ) == dict(toJSON={})

Whereas if it was doing JSON.stringify then the result would be [1,2,3].
As another test:

   selenium.run_js(
        """
        let a = {};
        a.a = a;
        return a;
        """
   )

Causes selenium.common.exceptions.WebDriverException: Message: unknown error: Maximum call stack size exceeded whereas if it were using JSON.stringify it should say something like: Uncaught TypeError: cyclic object value.

@dalcde You have any idea how this works? I guess maybe the right thing to do is to just avoid returning a pyproxy from selenium at all like I've been doing.

Copy link
Contributor

@phorward phorward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @hoodmane, thank you for the update, do you think we can give it another try?

@hoodmane
Copy link
Member Author

Yeah @phorward I'd appreciate it if you could merge this. (I just merged master so this time there definitely won't be any surprise conflicts!)

@phorward phorward merged commit b3a965a into pyodide:master Feb 2, 2021
@hoodmane hoodmane deleted the py2js-no-auto-copy branch February 2, 2021 18:57
casatir added a commit to casatir/pyodide that referenced this pull request Feb 3, 2021
@casatir casatir mentioned this pull request Feb 3, 2021
@rth
Copy link
Member

rth commented Feb 3, 2021

This adds new APIs deepCopyToJavascript and shallowCopyToJavascript as PyProxy methods, using these recovers the old behaviors.

In terms of naming those are a bit verbose, particularly if they are to be used often. How about deepCopyToJS or deepCopyJS (or some other shorter name)?

@rth
Copy link
Member

rth commented Feb 3, 2021

Also it would be good to document PyProxy and its public methods somewhere in https://pyodide.readthedocs.io/en/latest/usage/api-reference.html as currently these methods only appear in the changelog and nowhere else in the documentation.

@hoodmane
Copy link
Member Author

hoodmane commented Feb 4, 2021

How about deepCopyToJS or deepCopyJS (or some other shorter name)?

I'm happy with any of these. If you have a preference I can open a PR. Maybe even copyToJS, copy2Js, deepCopy2Js. Is it better to keep the word deep in there? What about shallowCopyToJs and copyToJs? I guess deepCopyToJs is my favorite?

@hoodmane
Copy link
Member Author

hoodmane commented Feb 4, 2021

document PyProxy and its public methods somewhere

I think JsProxy also needs this. The type_conversions.md has a fair amount of relevant stuff about the proxies, but it might be good to repeat the same stuff as API docs since that isn't ideal as a reference.

I have a local branch with a bunch of docs updates for recent changes and the changes in my various open PRs, my thought was to batch all the docs updates into one PR.

@rth
Copy link
Member

rth commented Feb 4, 2021

Let's continue the naming discussion in #1192

my thought was to batch all the docs updates into one PR.

Sounds good.

hoodmane pushed a commit to hoodmane/pyodide that referenced this pull request Feb 5, 2021
@hoodmane hoodmane mentioned this pull request Feb 5, 2021
@daoxian
Copy link
Contributor

daoxian commented Feb 6, 2021

Hello @hoodmane, thank you for the magnificent APIs (deep/shallowCopyToJavaScript) that would eliminate the heavy cost of big array conversion from python to Javascript. I think the shallowCopy version is designed for memory address reference instead of bulks of content copy. But I find the performance not so good as I've imagined, which is displayed as the following:
image

It takes about 1~7 seconds for shallowCopyToJavascript() to complete the memory address reference and maybe some necessary meta data copy I guess. However, it's not adequate for a realtime computation. Any suggestions for better conversion performance?

@hoodmane
Copy link
Member Author

hoodmane commented Feb 6, 2021

Hi @daoxian. Thanks for reporting this bad behavior. I think it's preferable to open this sort of question as an issue rather than on the PR in general.

The handling of memory buffers is currently not good, this is an issue we hope to fix at some point. I think the current behavior is to make a javascript Array of 1080 javascript Arrays of 3-byte Uint8Array. The innermost layer is sliced out of the buffer memory so saying arr[700][500][0] = 255 will adjust the original data, but populating those large javascript Arrays is very inefficient at all (remember that an Array is a hashmap).

My guess is it would do much better if you transposed the array to have shape (3, 1080, 1920), though of course this isn't very ergonomic if for instance you are intending to use this data to back an image.

Would you look at #1168? That issue is closely related to the poor performance here. If you have an opinion about that discussion it would be helpful. One thing that limits my ability to fix the Buffer handling code is that I don't really know what the use cases look like, so I really appreciate any input you have. Of course if you want to take a stab at improving the buffer conversion code, let me know and I can tell you what I think you'll need to know.

@daoxian
Copy link
Contributor

daoxian commented Feb 6, 2021

My guess is it would do much better if you transposed the array to have shape (3, 1080, 1920), though of course this isn't very ergonomic if for instance you are intending to use this data to back an image.

Indeed, it's much faster after the image transposed to (3, 1080, 1920).

@hoodmane
Copy link
Member Author

hoodmane commented Feb 6, 2021

Could you copy most recent post to #1202 and continue discussion there?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants