Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Micropip should be able to install wheels from the file system #2731

Closed
hoodmane opened this issue Jun 16, 2022 · 18 comments 路 Fixed by #2767
Closed

Micropip should be able to install wheels from the file system #2731

hoodmane opened this issue Jun 16, 2022 · 18 comments 路 Fixed by #2767
Labels
enhancement New feature or request

Comments

@hoodmane
Copy link
Member

馃殌 Feature

Micropip should be able to install a wheel which is in the emscripten file system from a path.

Motivation

In node (and some day maybe in chrome), we can mount the local file system into the Emscripten file system. It would be useful to install these, for instance when testing whether a wheel was built correctly.

@hoodmane hoodmane added the enhancement New feature or request label Jun 16, 2022
@jtpio
Copy link
Contributor

jtpio commented Jun 17, 2022

Posting here for visibility.

The latest JupyterLite release adds support for accessing files from the Python (Pyodide-based) kernel:

While the mounting logic is a bit specific to JupyterLite / JupyterLab and enabled via a ServiceWorker, making micropip able to install packages from a local path would still be useful.

For example one could upload a wheel (or ship it as part of the default content) and install it with:

import micropip
await micropip.install('./snowballstemmer-2.2.0-py2.py3-none-any.whl')

image

@hoodmane
Copy link
Member Author

Maybe it can be micropip.install_file? install_from_filesystem?

@jtpio
Copy link
Contributor

jtpio commented Jun 17, 2022

Probably from a user point of view keeping the same micropip.install() would be more convenient? (like regular pip)

@hoodmane
Copy link
Member Author

We currently interpret micropip.install("a/b/c.whl") as a relative URL, so there is no syntax space for that. Unless you want to do micropip.install(from_path="a/b/c.whl")?

@bollwyvl
Copy link
Contributor

What about intercepting file:// before fetch as meaning file in the WASM machine? I'd think pyodide itself could not be usefully loaded when "hosted" from file:// because .json, much less .wasm, would not be served properly. Further no other protocol would be able to resolve it.

@hoodmane
Copy link
Member Author

What about relative paths though?

@bollwyvl
Copy link
Contributor

bollwyvl commented Jun 18, 2022

Well, URIs don't necessarily have to start with <protocol>:/: there's certainly precedent for non-slashed paths, e.g. python's sqlite3 module, which inherit's sqlite's implementation:

# Open a database in read-only mode.
con = sqlite3.connect("file:template.db?mode=ro", uri=True)

So, with the default settings, file:some-package.whl could be interpreted the same as file:/home/pyodide/some-package.whl.

Indeed, allowing <protocol> to be extended is a whacking good idea: ipfs:// springs to mind as an almost perfect tool for peer-to-peer package distribution.

@rth
Copy link
Member

rth commented Jun 20, 2022

What about intercepting file:// before fetch as meaning file in the WASM machine?

The problem is that when run in Node.js, file:// means host file system. Though for Python file:// would probably still mean the whatever FS Python is installed in. We would want to keep a consistent behavior in the browser and Node. Maybe memfs:// at least as far as micropip is concerned?

@fire17
Copy link

fire17 commented Jun 20, 2022

hey guys! awesome to see this being on the table so recently
got here while searching for a way to use my own packages in pyscript
for example i've got a package called xo-gd, which i want to use, but i cant seem to import it

seems like this is a requested feature not yet available seemlessly,
does anyone know of a way for me to do this? do i need to build pyodide myself with this package? or should i just copy all the python files from my package locally ? (it's already all on pip, so i thought it should be straight-forward)

any info would be appreciated! thx & have a good one!

@rth
Copy link
Member

rth commented Jun 20, 2022

@fire17 Would you mind opening a separate issue about it? It's a bit orthogonal to the current discussion. The general documentation for adding a package can be found here. Looking at the xo-gd description, it looks like a lof of the functionality it exposes will not be supported in the browser (let's discuss the details in a separate issue).

@hoodmane
Copy link
Member Author

I am leaning towards going with micropip.install(from_path="path/to/some/wheel.whl") for now. I guess this won't work with e.g., py-env or requirement lists but if we want some string prefix that indicates to install from the file system we can figure out the details a bit later. My immediate use case is getting CI to work for out of tree package builds which know that they are trying to load a wheel from the local file system.

I guess another thought is it might be worth having a list of local directories that micropip could automatically search when looking for a wheel. This would better handle the case where you have a wheel and several dependencies of the wheel all in the same folder and you want micropip to correctly resolve the dependencies and load all of them.

@rth
Copy link
Member

rth commented Jun 20, 2022

micropip.install(from_path= [..] we can figure out the details a bit later.

Maybe someone should try to do a draft implementation. I suspect that having two inputs would make the implementation less straightforward as currently we just pass a list of requirements everywhere. Also, it's not very common in Python to have multiple and mutually exclusive input arguments, that's probably the reason for the URL file prefix, instead of doing urlopen(from_http=None, from_https=None, from_ftp=None, etc).

For some things, we can figure out the details later, but micropip.install is likely one of the most frequently used functions by end users. So it would be good to agree on a public API and implement it, so we can avoid API breakages or deprecations for this (unless we can finish iterating on it before the next release).

@hoodmane
Copy link
Member Author

Well what about going with @bollwyvl's suggestion. We could set it up so that file:// makes a fetch request but file:some/path can goes to the local file system.

@hoodmane
Copy link
Member Author

Or maybe emfs:some/path. I don't care for memfs because it is possible to mount other file systems e.g., nodefs and then it's a bit confusing. But they are all the Emscripten file system.

@bollwyvl
Copy link
Contributor

bollwyvl commented Jun 21, 2022 via email

@rth
Copy link
Member

rth commented Jun 22, 2022

OK, let's go with emfs:.

Actually I wasn't aware that,

  • The // after the file: denotes that either a hostname or the literal term localhost will follow,[2] although this part may be omitted entirely, or may contain an empty hostname.[3]
  • file://path (i.e. two slashes, without a hostname) is never correct, but is often used

https://en.wikipedia.org/wiki/File_URI_scheme

@bollwyvl
Copy link
Contributor

Yeah, URLs are definitely funny things. Here's a little with what urlparse.parse and new URL do with various things:

from pathlib import Path
from urllib.parse import urlparse
import js
for p in ["emfs:foo.whl", "emfs:./foo.whl", "emfs:/home/pydodide/bar.whl", "emfs://home/pyodide/baz.whl"]:
    print("\n---", p, "---")
    py_parsed = urlparse(p)
    js_parsed = js.eval(f"new URL('{p}')").to_py()
    print("py parsed:", py_parsed)
    print("... url:", py_parsed.geturl())
    print("... path:", Path(py_parsed.path))
    print("... abspath:", Path(py_parsed.path).resolve())
    print("js parsed:", js_parsed)
    print("... parts", f"protocol='{js_parsed.protocol}', origin='{js_parsed.origin}', pathname='{js_parsed.pathname}'")

--- emfs:foo.whl ---
py parsed: ParseResult(scheme='emfs', netloc='', path='foo.whl', params='', query='', fragment='')
... url: emfs:foo.whl
... path: foo.whl
... abspath: /home/pyodide/foo.whl
js parsed: emfs:foo.whl
... parts protocol='emfs:', origin='null', pathname='foo.whl'

--- emfs:./foo.whl ---
py parsed: ParseResult(scheme='emfs', netloc='', path='./foo.whl', params='', query='', fragment='')
... url: emfs:./foo.whl
... path: foo.whl
... abspath: /home/pyodide/foo.whl
js parsed: emfs:./foo.whl
... parts protocol='emfs:', origin='null', pathname='./foo.whl'

--- emfs:/home/pydodide/bar.whl ---
py parsed: ParseResult(scheme='emfs', netloc='', path='/home/pydodide/bar.whl', params='', query='', fragment='')
... url: emfs:/home/pydodide/bar.whl
... path: /home/pydodide/bar.whl
... abspath: /home/pydodide/bar.whl
js parsed: emfs:/home/pydodide/bar.whl
... parts protocol='emfs:', origin='null', pathname='/home/pydodide/bar.whl'

--- emfs://home/pyodide/baz.whl ---
py parsed: ParseResult(scheme='emfs', netloc='home', path='/pyodide/baz.whl', params='', query='', fragment='')
... url: emfs://home/pyodide/baz.whl
... path: /pyodide/baz.whl
... abspath: /pyodide/baz.whl
js parsed: emfs://home/pyodide/baz.whl
... parts protocol='emfs:', origin='null', pathname='//home/pyodide/baz.whl'

@hoodmane
Copy link
Member Author

Thanks everyone for the discussion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants