Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] pyodide-importer package for external Python file import support #1917

Closed
ryanking13 opened this issue Oct 31, 2021 · 18 comments
Closed

Comments

@ryanking13
Copy link
Member

ryanking13 commented Oct 31, 2021

Problem statement

Currently, Pyodide doesn't support importing external Python packages (.py) except for installing a pure Python package wheel (.whl) or Pyodide Python packages(.js, .data) through micropip.install or pyodide.loadPackage.

This is because the external Python files need to be accessed through HTTP(s) (browser) or File System API (node) while the import statements try to find them inside the virtual file system.

pyodide.runPython(await (await fetch("https://some_url/...")).text());

So for now, if someone wants to use external python files, they need to fetch and execute them manually, which is not very straightforward.

Proposal

This issue proposes the pyodide-importer which is a Python package that supports seamless import statement support for Pyodide applications.

Idea

When import statement is called, Python tries to find the import candidates through PathFinder registered in sys.meta_path.

pyodide-importer defines a new Python PathFinder (e.g. PyodidePathFinder), which is appended to the sys.meta_path, that searches and downloads import candidates through HTTP(s) and File System API.

  1. import statement is called and built-in PathFinders failed to locate target packages.
  2. PyodidePathFinder searches packages through HTTP(s) or File System API, searching order follows PEP-420.
  3. If PyodidePathFinder successfully finds the target package (which is checked via HTTP status code or fs.stat), the target package is saved to the predefined path inside the virtual file system.
  4. Rest of the import process is handled by built-in PathFinder.

API design (draft)

"""
pyodide_importer.register_hook()
- appends PyodidePathFinder to sys.meta_path
- also adds virtual file system path to sys.path
- `base_url`: URLs (broswer) or File System Paths (Node) where Python packages are located. The default value is pyodide.indexURL.
- `download_path`: path inside virtual  file system where downloaded Python packages will be located.
- `modules`: names of the modules/packages that can be imported from `base_url`.
"""
def register_hook(
    base_url=pyodide.indexURL,
    download_path="",
    modules=None,
)

"""
pyodide_importer.unregister_hook()
- removes PyodidePathFinder from sys.meta_path
"""
def unregister_hook()

Limitations

  • For now, Pyodide doesn't support downloading binary files synchronously, therefore, importing .pyc, .pyd, .so, or .zip is unavailable.
  • Cannot support namespace packages that do not contain __init__.py file.

For example, import parent.module is not supported if the file is located such as:

📂 parent
┗ 📜 module.py

Solutions from other projects

Update

Minimal implementation of pyodide-importer is here:
https://github.com/ryanking13/pyodide-importer

@hoodmane
Copy link
Member

Another thing that would be really great for dev environments in Chromium to have an emscripten file system using File System Access API. Then you could just allow File System Access API to access dev directory, mount it into emscripten file system, add it to sys.path and efficiently import from file system.

@ryanking13
Copy link
Member Author

ryanking13 commented Oct 31, 2021

Another thing that would be really great for dev environments in Chromium to have an emscripten file system using File System Access API.

Indeed. Though File System Access API is currently available for only Chrome and Edge, there are products like vscode.dev that uses this API. It would be helpful to look around how they fully utilized File System Access API.

@ryanking13
Copy link
Member Author

BTW, vscode.dev team mentioned Pyodide on there blog post.

Since VS Code for the Web is running completely within the browser, some experiences will naturally be more constrained, when compared to what you can do in the desktop app. For example, the terminal and debugger are not available, which makes sense since you can't compile, run, and debug a Rust or Go application within the browser sandbox (although emerging technologies like Pyodide and web containers may someday change this).

@rth
Copy link
Member

rth commented Nov 2, 2021

Thanks for the proposal @ryanking13 ! Sounds like an interesting project idea. Overall +1 to add this as a separate repo to the Pyodide org.

  • download_path: path inside virtual file system where downloaded Python packages will be located.

It would probably need some defaults. As far as I understand you don't want to put them in /lib/python3.9/site-packages?

Cannot support namespace packages that do not contain __init__.py file.

Is it the presence of the __init__.py file that's the issue or the fact that if one has from a import b it's difficult to say if it should be in a/__init__.py or a/b.py ? Can't you try both paths though if there is an import error?

httpimport supports importing through HTTP(s) by extending Python import hook.

Interesting I wonder if would make sense to you a context manager for the API as they do. No strong opinions about it either way. Functions as you proposed also works. It's just gives a bit less flexibility is there are several remote URLs.

@ryanking13
Copy link
Member Author

ryanking13 commented Nov 3, 2021

It would probably need some defaults. As far as I understand you don't want to put them in /lib/python3.9/site-packages?

I think /lib/python3.9/site-packages is not a good place to put them. Because it's for installed packages.

I once thought of using root directory (which can be considered as a cwd) as a default, as users can easily os.listdir() to see the files downloaded. But there are some default directories (lib, bin, ...) inside the root directory, so we need to consider that. So for now, making an additional directory for downloaded Python files (e.g. /external_imports) would be a proper way for this.

Is it the presence of the init.py file that's the issue or the fact that if one has from a import b it's difficult to say if it should be in a/init.py or a/b.py ? Can't you try both paths though if there is an import error?

The problem is when import module is called, module can be an empty directory.
In the native environment, empty directories can be stated, but for us, AFAIK it's not possible to detect directories through HTTP properly.

Interesting I wonder if would make sense to you a context manager for the API as they do.

Yeah, providing a context manager API would be nice!

@bollwyvl
Copy link
Contributor

bollwyvl commented Nov 6, 2021

This looks like some interesting stuff.

There's probably some discussions over on #1715 (comment) that may be of interest, where @stefnotch has even has some links to demonstrations that would make working with multiple application (or even user) files more natural. Alas, haven't had enough time to work on jupyterlite to explore the FS API further....

using root directory (which can be considered as a cwd)

However, in the tinkering i have been able to do, I've been wondering if putting the user's files-of-interest (and cwd) into /home/webuser (or even /home/webuser/work) to make it easier to do mount operations that are less-likely to blow up the underlying "computer".

not possible to detect directories through HTTP properly.

We've started tinkering with some additional sources of wheels for micropip over on jupyterlite/jupyterlite#310: basically, we patch micropip to first check a configurable list of warehouse API-like json files, with the goal of eventually working up to CDN-free operation (unless the owner of site). We get around some of the directory stuff by pre-calculating stuff and conventions, but yeah: without WebDAV, no listing for us!

@ryanking13
Copy link
Member Author

However, in the tinkering i have been able to do, I've been wondering if putting the user's files-of-interest (and cwd) into /home/webuser (or even /home/webuser/work) to make it easier to do mount operations that are less-likely to blow up the underlying "computer".

Yes. Having a proper home directory would be more natural.

We've started tinkering with some additional sources of wheels for micropip over on jupyterlite/jupyterlite#310: basically, we patch micropip to first check a configurable list of warehouse API-like json files, with the goal of eventually working up to CDN-free operation (unless the owner of site).

This would be interesting. I'll look into it. Thanks!

@grimmer0125
Copy link
Contributor

grimmer0125 commented Nov 12, 2021

@ryanking13 thank you for this proposal. It is great.

For now, Pyodide doesn't support downloading binary files synchronously, therefore, importing .pyc, .pyd, .so, or .zip is unavailable.

I wonder why we can not use async to achieve this and why using a compressed wheel is doable.

When modules parameter is None by default, does this API try to search and download all import package/module used in Python code or it is a whitelist and will not try to find anything?

@ryanking13
Copy link
Member Author

ryanking13 commented Nov 13, 2021

I wonder why we can not use async to achieve this and why using a compressed wheel is doable.

I thought like before this because the Python import system itself is synchronous, but we can truly investigate whether we can use async for this.

When modules parameter is None by default, does this API try to search and download all import package/module used in Python code or it is a whitelist and will not try to find anything?

I thought of try to search and download all import package/module used in Python code when modules are not explicitly given, so that people can develop fast.

@ryanking13
Copy link
Member Author

Update:

Minimal implementation of pyodide-importer is here:
https://github.com/ryanking13/pyodide-importer

@grimmer0125
Copy link
Contributor

Update:

Minimal implementation of pyodide-importer is here:

https://github.com/ryanking13/pyodide-importer

Can it list available modules?

@ryanking13
Copy link
Member Author

Can it list available modules?

For now, it's possible by:

pyfinder = register_hook(...)
print(pyfinder.modules)

@rth
Copy link
Member

rth commented Nov 16, 2021

Minimal implementation of pyodide-importer is here:

Very nice!

The readme could use a one or two sentence description mentioning what it does (and how it does it) at the beginning.
Also please add it to https://pyodide.org/en/stable/project/related-projects.html we might need to add a section for packages that are either examples or plugins for Pyodide (or create a separate page for those).

Would you prefer to keep it under your account or move it to the pyodide org? (For the second option you can transfer it to me, so I can transfer it there.)

@phorward
Copy link
Contributor

Hello @ryanking13,

I like the idea of your package and can only comply with @rth, please add more sentences about its use case.
In the past, I did several experiments on this too, like in #489.

Can't we generally make this package a part of the Pyodide standard, like micropip?

@ryanking13
Copy link
Member Author

The readme could use a one or two sentence description mentioning what it does (and how it does it) at the beginning.
Also please add it to https://pyodide.org/en/stable/project/related-projects.html we might need to add a section for packages that are either examples or plugins for Pyodide (or create a separate page for those).

Yes, I will add more documentation and functionalities, and I will add this to the related projects section when it becomes more stable.

Would you prefer to keep it under your account or move it to the pyodide org? (For the second option you can transfer it to me, so I can transfer it there.)

I want to move this to pyodide org when it becomes more stable. For now, it's kind of PoC and needs more tests and functionalities.

@ryanking13
Copy link
Member Author

I did several experiments on this too, like in #489.

I didn't know this. Thanks, @phorward, It will be really helpful!

Can't we generally make this package a part of the Pyodide standard, like micropip?

Well, I'm not sure which is the best way to support external file import. We have been discussing this in #1876 (and also in other issues). Another way that @hoodmane suggested is pyodide.installFiles API.
I think if someday people use pyodide_importer a lot, we may consider adding this into the Pyodide standard, and if not, we could just leave this as a standalone package.

@rth
Copy link
Member

rth commented Sep 17, 2022

OK so what should be done with this issue @ryanking13, do you want to move https://github.com/ryanking13/pyodide-importer to the Pyodide org (if you are still interested in maintaining it) ?

@ryanking13
Copy link
Member Author

I don't actively develop pyodide-importer now. It still works, so people who need it can use it. But I don't have further plans to add it to the Pyodide core. Closing as resolved, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants