New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Loading of Structured Packages to Py-Config #519
Comments
For those encountering this challenge, there's a topic on the Pyscript forum for a workaround, as well as an example repo from mudream4869 showing how to compile a local folder to a wheel and reference that in |
I needed the same for demoing my package on the web without forcing users to install my package. Well, I also need it for development, i.e. need the changing Python modules in the local package on-the-fly (compiling wheel every time is not a solution for me). So I created a patched version of PyScript, see attached, by modifying 2 of the .ts files (and the compiled .js). I also created a README for explaining what it does, why and how. EDIT: Corrected, so that the missing packagebase is not an empty string (which causes backward compatibility issues), but undefined rather. |
This is an opportunity:
Of course, this would be so much easier in Python instead of TS. 😀 Wish you could have a disposable interpreter for bootstrapping the actual one. |
Would you do this by looking at the files they were imported from in the |
@JeffersGlass The pithy answer: Python and |
@madhur-tandon I think this is not relevant anymore due to your pr? |
I think it would be the same thing now, but for the paths specification in The gist is - right now anything specified in It'd be a nice-to-have feature to be able to somehow load a complete package from an arbitrary URL in a way that preserves directory structure, without having to first compile it to a wheel or zip it up, but implementation-wise it might be unreasonable. |
@JeffersGlass To begin with, I started with the example you showed i.e.
with the index.html having the following:
However, the above doesn't work, so I tried the other way instead i.e.
and it works. Could you confirm this if it occurs for you as well? i.e. if one way of importing works and the other doesn't? We can then look at the package and retaining the directory structure next. Update: This seems to be non-deterministic i.e. sometimes it works, while sometimes it fails with |
Hi @madhur-tandon - both examples above work for me, with the minor correction in my original code that print_saxophone requires quotes around the emoji: ### emoji.py ###
def print_saxophone():
print("🎷") #These quotes are critical! Oops! Would you confirm whether either method works for you with that adjustment? |
Yes, I already added the quotes and both work for me too. But sometimes, the issue can occur in both of them so it is non-deterministic really and is independent of the way of importing. You can check the Update I wrote at the bottom in the original comment. Also, you can check PR 845 which partially solves the package problem -- by retaining the directory structure. I requested a review from you too. |
FWIW, regarding importing packages, @ryanking13 had a Pyodide package he worked on: https://github.com/ryanking13/pyodide-importer Here was the discussion: pyodide/pyodide#1917 He's no longer updating it, but he discovered lots of edge cases. Useful info. |
I have yet to see this fail for me with either version... interesting. It's essentially the same process that's been in the |
Yes, Pyodide core team has decided not to support that feature internally and our current recommendation is to make a wheel or a zip file instead of downloading each Python file in packages. FWIW, Pyodide has recently added a native file system mount support (pyodide/pyodide#2987). (For now it is supported by chromium based browsers only) |
Sorry for chiming in so late, I didn't see this discussion before.
Speaking of tricky cases, re @pauleveritt
Relative vs absolute URLs This is something which was not obvious to me at first so I think it's better to say it explicitly: currently, If we want to preserve the directory structure, we need to know what is the "base URL". For example, imagine that we want to download these two URLs:
If the "base URL" is I don't think there is any reasonable way to automatically determine the "base URL", it must be specified by the user. So we need a nice way so that they can specify:
A possible proposal This is just a first proposal, but I am thinking of something like this: [[fetch]]
url = "https://github.com/pyscript/pyscript/blob/main/"
folder = "pyscriptjs/tests/integration/"
files = ["__init__.py", "test_00_support.py"]
[[fetch]]
url = "https://example.com"
folder = "foo"
files = ["a.txt", "b.txt"] This would result in the following virtual FS structure:
Moreover, we can still support the common case of ".py files which are siblings of the main .html file". The "normal" version would be this: [[fetch]]
url = ""
folder = ""
files = ["a.txt", "b.txt"] But I think that it's very reasonable to say that <py-config>
[[fetch]]
files = ["utils/__init__.py", "utils/foo.py"]
[[fetch]]
url = "https://github.com/pyscript/pyscript/blob/main/"
folder = "pyscriptjs/tests/integration/"
files = ["__init__.py", "test_00_support.py"]
[[fetch]]
url = "https://example.com"
folder = "foo"
files = ["a.txt", "b.txt"]
</py-config> Naming is hard As usual, finding good names is hard. The ones which I used above are just the first which came to my mind, but here are some alternatives:
Extensibility The proposal above is easily extensible. For example, I can imagine having a <py-config>
[[fetch]]
provider = "github"
repo = "pyscript/pyscriptjs`
version = "2022.09.1" # or a branch name, a commit ID, etc.
</py-config> Other possible "fetch provider" could be (just thinking alound):
|
@antocuni I think it is a very good proposal and will proceed to implement it. But I want to ask for opinions from @JeffersGlass, @fpliger and @tedpatrick -- if they have any objections. For those who need quick context: I recently merged #845 since we changed to using Emscripten's FS APIs Antonio's proposal feels like a natural extension and we should get to action! |
@antocuni On a second thought and perhaps taking some ideas from what @ntoll suggests -- We could simply have a list of dictionary / objects with each dict having key has what the path looks like in the file system and the value as the URI. So,
and so on... IMO, this is much simpler + the responsibility for calculating the URLs or perhaps calculating the path for the filesystem -- both are delegated to the user. WDYT? |
I quite like this! Definitely simpler, and probably less prone to user error. I wonder if the My immediate thought is, "How do we accommodate wanting to fetch many files?" In which case, I think the answer is "Here's a recipe in the docs for how you do that at the start of your script", maybe using |
I'm confused; consider my original examples: [[fetch]]
files = ["utils/__init__.py", "utils/foo.py"] If I understand correctly, it would become the following? [[fetch]]
path = "utils/__init__.py"
uri = "utils/__init__.py"
[[fetch]]
path = "utils/foo.py"
uri = "utils/foo.py" And my second example: [[fetch]]
url = "https://github.com/pyscript/pyscript/blob/main/"
folder = "pyscriptjs/tests/integration/"
files = ["__init__.py", "test_00_support.py"] would become: [[fetch]]
path = "pyscriptjs/tests/integration/__init__.py"
uri = "https://github.com/pyscript/pyscript/blob/main/pyscriptjs/tests/integration/__init__.py"
[[fetch]]
path = "pyscriptjs/tests/integration/test_00_support.py"
uri = "https://github.com/pyscript/pyscript/blob/main/pyscriptjs/tests/integration/test_00_support.py" Did I understand it correctly? Also, why
I think it's definitely complex and very likely to lead to much more user errors, because of the tons of copy&paste which is needed.
if we want to go in this direction, then you can just remove Personally, -1 on this. There is a lot of value in being able to just say Also, from the implementation point of view, having a declarative syntax allows us to fetch them while pyodide is loading, thus considerably cutting down startup time |
Simpler in the very literal sense of: there are only two parameters to understand instead of three. Take the resource at Additionally, requiring each URL/URI consist of a "folder" and "file" is a restriction in the proposal - what if I want to load the file present at There may be other ways to solve the latter problem. |
ok, in this sense I agree. It is surely simpler to implement, it is probably simpler to explain, I'm very doubtful that it's simpler to use. And personally, I think that simplicity to use -- especially for common cases -- should be our first priority. (And a very common case is: I have 3 Also, note that
You have a point here. I start to wonder whether we should support both use cases, because they seem to serve very different purposes:
I'm trying to solve point (1) (which incidentally it's what this issue is about ;)). We can probably come up with a solution which does both. E.g.: [[fetch]]
# point (1). If you specify "files", it downloads multiple files
url = "http://example.com/"
files = ["a.txt", "b.txt"]
[[fetch]]
# point (2). If you specify "save_as", it downloads a single file
url = "http://example.com/api_endpoint?v=12"
save_as = "c.txt"
[[fetch]]
# if you specify both, it's an error
url = "..."
files = [...]
save_as = ... I'm not particularly convinced of the name
Also, let me underline again a very strong point of my proposal which I think was overlooked: since # this is what we have now
paths = ['foo/__init__.py', 'foo/bar.py']
# this would be *completely equivalent* to the above
[[fetch]]
files = ['foo/__init__.py', 'foo/bar.py'] |
@antocuni your understanding regarding both the cases below is correct:
This is simpler in the sense that right now -- in your original proposal -- the user has to split the complete URL into parts for This also opens the area for mistakes on our side -- eg: handling trailing Overall, splitting the URL by the User --> reconstructing it by us --> larger surface area for mistakes. Further, what if I want the file in the FS to be saved in another custom path or perhaps with a different name? i.e. saving And regarding issue of copy paste for cases like:
We can simply say that if you want the same dir structure and the same filename, perhaps only the following is sufficient
and internally, we just copy the value of Also, |
this is not completely true, since
In which path of the virtual FS you want to save the file?
The point of my original proposal is to solve the issue; This assumes that the layout on the web server is the same as the one which you want on disk, of course. I claim that for the common case of "I want to download a python package and make it available on my FS", this is ~always true.
forcing the user to do an endless copy&paste also opens the possibility of a lot of mistakes on their side.
Which means that our code needs to be robust, well tested, aware of corner cases and hopefully bug free. Yes, I agree ;).
disagree.
Look at my answer above, I think I already answered to this, didn't I?
And what would this do?
well, we are talking about URLs, so I think we should use |
As shocked as I am... I'll say I mostly agree with @antocuni here. :D I don't love the verbosity of it but think [most of] it is necessary to avoid weird corner cases. My main desire with this though is:
Elaborating on 5 really quick, this feature is meant to copy/fetch [arbitrary] files to the browser FS, while
I like the direction of this, especially if we can use default values to not force users to type excessive characters "for nothing" while maintaining clarity. A good example of this is what @antocuni mentioned with
There are a couple of cases/questions that I want to clarify though:
Unfortunately, when talking about files and transformations, it's not trivial to think and provide a solution that will works for everyone BUT I think it's key for us to provide ways that we/users can extend how fetch works to support these scenarios and others that we may not have thought of... |
I might have an idea which could solve in an easy and elegant way the majority of use cases described above (the last famous words...). So, first of all, the naming: I started by using Some examples: # fetch a single file
[[fetch]]
from = "http://a.com/data.csv"
# http://a.com/data.csv ==> ./data.csv
# fetch a single file, specify the target filename
[[fetch]]
from = "http://a.com/data.csv?version=1"
to = "/tmp/data.csv"
# http://a.com/data.csv?version=1 ==> /tmp/data.csv
# fetch multiple files from the "default" webserver, save to the default folder
[[fetch]]
files = ["foo/__init__.py", "foo/mod.py"]
# fetch multiple files and put them in a different folder
[[fetch]]
files = ["foo/__init__.py", "foo/mod.py"]
to = "/my/lib/"
# ==> /my/lib/foo/__init__.py
# ==> /my/lib/foo/mod.py
# fetch multiple files from a different base url
[[fetch]]
from = "http://a.com/download/"
files = ["foo/__init__.py", "foo/mod.py"]
to = "/my/lib/"
# http://a.com/download/foo/__init__.py ==> /my/lib/foo/__init__.py
# http://a.com/download/foo/mod.py ==> /my/lib/foo/mod.py Basically, the semantics would be this, in python pseudocode: def fetch_one(entry):
base_url = entry.from or ""
target = entry.to or "./"
if entry.files:
for f in entry.files:
src = os.path.join(base_url, f)
dst = os.path.join(target, f)
wget(src, dst)
else:
src = base_url
dst = target
wget(src, dst) I think that the scheme above plays very well with plugins. Imagine to have a plugin which registers a "github fetch provider": [[fetch]]
from = "github:antocuni/dotfiles"
to = "/tmp/" |
Really like that approach. Wow @antocuni , I really think you are starting to see things as I do 🤣 🤣 🤣 🤣 |
I think there is one small tweak that we can discuss: according to what I proposed above, in the "single file" case, does
This would save the file as
E.g.: [[fetch]]
from = "http://example.com/data.csv"
to = "/tmp/" # ==> /tmp/data.csv
[[fetch]]
from = "http://example.com/data.csv"
to = "/tmp/myfile" # ==> /tmp/myfile To be honest, I'm undecided whether this is an useful feature or just obscure and error prone |
Very nice @antocuni ! I think this is really excellent. Per your last example in the single file case - my two cents is that it's maybe more trouble than it's worth. I think there's more value in saying "The And re: naming, if I understand right (thank you for the Pseudocode!): |
I'm aligned with @JeffersGlass on consistency here...
+1 on explicit naming when it's helpful! I'm debating on the |
so you are proposing that [[fetch]]
from = "http://google.com/"
to = "/tmp" # ===> /tmp/??? (I'm fine to say that this is not supported).
yes, my idea is precisely I'm fine to use a better name for [[fetch]]
from = "http://a.com/data.csv"
to_folder = "/tmp" # ==> /tmp/data.csv
[[fetch]]
from = "http://a.come/data.csv"
to_file = "/tmp/foo.csv" # ==> /tmp/foo.csv |
In this case users should use
Yeah, very aligned here as well. What's happening?! :D |
Checklist
type-feature
tag)What is the idea?
Currently,
<py-env>
makes use of loadFromFile() to copy a single file at each path inPaths
to the same directory the Pyscript code runs in, allowing for simple imports of individual.py
files.However, by breaking up the file structure of the included files, this methods breaks recognition of local packages. It also does not allow for folder of content to be included en masse, which would be useful for processing data stored across multiple files.
Consider a very silly utility file:
This works:
But this breaks:
Why is this needed
Having the ability to structure local Python code as packages is useful for organizing larger codebases into relevant chunks, and allows for better code organization.
What should happen?
I'm not sure what the syntax for usage would be. It could extending the syntax to include a
local-packages
label or similar. (There's probably a better name for the tag.)Or, perhaps there's a way to use
paths
itself more cleverly, where Pyscript can identify at load time which paths represent files and which represent packages/folders.Additional Context
From an implementation perspective, I recognize this may be difficult to handle cleanly, since we need to be able to individually
fetch
all the files that make up a given package from wherever they are stored. But even if the user had to enumerate all the files in a given package in their , it would still be useful to allow them to be referenced as a package.The text was updated successfully, but these errors were encountered: