Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invoke pyodide from python / sandboxing #869

Open
nomeata opened this issue Dec 16, 2020 · 16 comments
Open

Invoke pyodide from python / sandboxing #869

nomeata opened this issue Dec 16, 2020 · 16 comments

Comments

@nomeata
Copy link

nomeata commented Dec 16, 2020

Given the great sandboxing properties, I wonder if pyodid could be used as a safe sandbox execution environment, e.g. for educational websites that want to execute untrustworthy code.

What would it take to be albe to do something like
https://pyodide.readthedocs.io/en/latest/using_pyodide_from_javascript.html
but directly from Python (maybe using https://github.com/bytecodealliance/wasmtime-py to execute the wasm)?

@nomeata
Copy link
Author

nomeata commented Dec 16, 2020

(I looked at https://cdn.jsdelivr.net/pyodide/v0.15.0/full/pyodide.asm.wasm to see how easiy it is to wrap it, but based on its list of imports it seems to require a significant amount of support form the embedder, i.e. the JS code in https://cdn.jsdelivr.net/pyodide/v0.15.0/full/pyodide.asm.js. Are there alternatives to that?)

@hoodmane
Copy link
Member

See also #558

@hoodmane
Copy link
Member

@nomeata Wouldn't it be easier just to use a throwaway docker instance?

@nomeata
Copy link
Author

nomeata commented Dec 31, 2020

Dunno, that’s a lot of complexity, compared to an in-process sandbox. Docker wasn’t created for security isolation; wasm was.

But maybe RustPython might be more suitable for this; another project uses it for that: https://github.com/robot-rumble/logic/

@dalcde
Copy link
Contributor

dalcde commented Dec 31, 2020

See also #959 and #950.

Since C-based python modules are based on dynamic linking, it would be fairly hard to use pyodide without a javascript runtime.

If you are willing to use, say, node, #972 should get us a long way. It makes it fairly easy to not expose all javascript functions to the wasm sandbox. You would still have to audit the rest of the code to ensure that one cannot someone get their hands on, say, eval by cleverly exploiting existing importing functions (which again are necessary for python to run properly).

@iamwilhelm
Copy link

@dalcde I've recently wanted to run pyodide outside the browser (server side), and was looking into running in wasmer. I want to be able to treat browser pyodide as "local dev", and then deploy the same python to be run server-side (which ideally would be sandboxed)

Wasmer does have a sample repo compiling for an older version of pyodide (0.12), but it no longer runs/compiles, since pyodide references an older version of emsdk.

I assume #558 from a year ago is still true.

So following the thread here, it seems like you need javascript to run pyodide. How would it be run with node.js? I didn't understand what was written in #972.

@dalcde
Copy link
Contributor

dalcde commented Jan 11, 2021 via email

@hoodmane
Copy link
Member

I see pyodide as comprising of several pieces:

This is a nice explanation. Is it in the docs somewhere? Maybe it'd be good to expand on this a bit and put it somewhere early on in the docs, maybe with a title like "What is Pyodide?" it'd probably be helpful to other people.

@iamwilhelm
Copy link

What exactly do you want out of pyodide? I see pyodide as comprising of several pieces:

I'd like to be able to run pyodide browser-side to write a program. And then when I'm done, I can deploy the python program to a server, ideally using the same pyodide runtime server-side.

Based on what I understand (just started looking into WASM), you should be able to compile python into WASM to run on the browser, and on the server. The only reasons I'm looking into WASM on server is:

  1. a self contained environment of necessary packages
  2. purported sandboxing properties of WASM

I could set up a docker image running the same stack as pyodide, but I was hoping that I could ideally just compile pyodide for the server and have it just work, so I don't have separate moving pieces to keep in sync. Originally, I was thinking I could just use wasmer to run pyodide. But their example repo is outdated, doesn't work anymore, and the team is unresponsive.

Hence, I'm trying to find a way to run pyodide on a server. I'm open to leaving the js in, if it's required to run pyodide on the server. It's just that I don't if know if it's possible, and what might be involved in making it run on node.js.

If it's a no go, then I can either try to compile RustPython to WASM and try to run the same wasm file on browser and wasmer (or some other runtime)--and deal with an incomplete implementation. Or just abandon python as a language for this altogether.

Hopefully, that gives you an idea of what I'm trying to do, and any advice you have to achieve the above.

@samuelcolvin
Copy link
Sponsor

What exactly do you want out of pyodide? I see pyodide as comprising of several pieces:

What I'm looking for (and I think many others are interested in) is the capability to run python in wasmer (or equivalent) as a sandboxed environment. It would have the advantage of fast startup, bullet proof sandboxing and no extra (virtual) infrastructure.

Common packages working too would be an extra plus.

Is there any appetite to split the cpython patches out into a separate package to make it easier to use?

@rth rth changed the title Invoke iodide from python? Invoke pyodide from python? Jan 11, 2021
@hoodmane
Copy link
Member

hoodmane commented Jan 11, 2021

@rth said there were a couple of blocking issues for this in #558:

Several thing would be necessary for this, as far as I understand,

Yesterday @joemarshall opened PR #1102 to resolve #531, so there is some movement here. I looked at the emscripten thread linked and it has not been marked resolved. (I don't know much about these things so someone with more experience should maybe give a better status report.)

@rth
Copy link
Member

rth commented Jan 11, 2021

It's just that I don't if know if it's possible, and what might be involved in making it run on node.js.

There was a demo in #183 (comment) so it's definitely possible, and would be great (#160). We just need to integrate #792 first, then see how we could make it work without maintaining a fork of file_packager.py as was done in #183

I could set up a docker image running the same stack as pyodide, but I was hoping that I could ideally just compile pyodide for the server and have it just work, so I don't have separate moving pieces to keep in sync.

It could be worth pursuing, analyzing what was done in https://github.com/wapm-packages/pyodide is probably a start. I haven't really followed the Wasmer side of things, if there is anything we could do to make it easier please let us know. I'm not sure that splitting cpython patches in a separate repo is really necessary at this point (they didn't do it in wapm-packages/pyodide, and it would increase our maintenance burden) -- if there is a working prototype with motivation why it's necessary, we could certainly re-discuss that.

It could be an interesting project, but if usage in a browser is of any indication, be prepared to encounter some occasional errors, and get 0 results in a search engine when you search for them :) So in the case of sanboxing CPython, easier/more reliable than Docker, at the moment I'm a bit skeptical, maybe in some number of years...

@hoodmane
Copy link
Member

hoodmane commented Jan 11, 2021

So in the case of sanboxing CPython, easier/more reliable than Docker, at the moment I'm a bit skeptical, maybe in some number of years.

Pyodide is still a prototype, do not use right now for ease or reliability. We are working hard to improve it though =)

@simonw
Copy link

simonw commented Oct 2, 2022

I'm really interested in this.

My use-case is sandboxing: I want to be able to run user-provided Python code safely on my server, with robust memory and CPU limits and with zero chance that malicious code could "break out" and access my filesystem or network or perform other malicious actions.

Here's one example of something I'd like to build:

My software lets users upload CSV files to create database tables. I want to provide advanced tools for "transforming" those tables in some way - convert a column to lowercase, extract a zip code from an addresses column, that kind of thing.

One option I'd like to offer is to enter some Python code to be run against every value in a column - a web application equivalent of this CLI tool I built: https://simonwillison.net/2021/Aug/6/sqlite-utils-convert/

Running their Python snippet in WASM via Pyodide feels like a lightweight, safe way that I could build this.

@alexmojaki
Copy link
Contributor

@simonw you may be interested in https://github.com/gristlabs/grist-core. It's an open source web application that lets you build spreadsheet/database hybrid documents, backed by SQLite, with formula columns that run Python in a secure sandbox.

@rth rth changed the title Invoke pyodide from python? Invoke pyodide from python / sandboxing Oct 5, 2022
@rth
Copy link
Member

rth commented Oct 5, 2022

My use-case is sandboxing:

@simonw I agree that sandboxing is an important use case, that we could maybe explore more. As far as I know, the options are currently,

  1. use selenium and rely on the browser sandbox (the corresponding selenium setup can be found in https://github.com/pyodide/pytest-pyodide for instance). This works but is probably fairly brittle, the error reporting via selenium is not great, and having to restart a browser instance for each code snippet doesn't sound ideal.
  2. use nodejs with some kind of extra sandboxing functionality (e.g. https://github.com/patriksimek/vm2). I have no idea how reliable it is, but it would likely be easier to use than selenium.
  3. use Pyodide with Deno, which apparently has builtin sandboxing. I have not tried it personally, and I'm not sure what's the compatibility status currently, but last year it was close to working ES6 modules/import support #1477 (comment) and we probably fixed some of those issues since.

Personally I think, 3 or 2 might be the most promising.

@rth rth mentioned this issue Jan 4, 2023
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants