Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow customizing cache location for packages in Node #3967

Merged
merged 8 commits into from Jun 29, 2023

Conversation

netroy
Copy link
Contributor

@netroy netroy commented Jun 28, 2023

This change allows users to customize where the .whl files are downloaded to.
This is important in the context of docker images, but also in cases where the node_modules folder isn't writable once the application (that uses pyodide) is shipped.

This change is backward-compatible, as installURL still defaults to indexURL.

This fixes #3950

@rth
Copy link
Member

rth commented Jun 28, 2023

Thanks for proposing this @netroy !

I wasn't part of the discussion in the parent issue, but how about calling this cache_dir, node_cache_dir or install_cache_dir? In installURL for me URL indicates a remote resource, while it doesn't really make sense to for us install on a remote resource. Also IMO in the context of Pyodide, "install" inside its virtual file system, which is not the case here. The reason we are putting those files there is to avoid re-downloading them as far as I remember, which is more related to caching.

Also BTW micropip still doesn't cache package download in node, and we could re-use this variable there.

@netroy
Copy link
Contributor Author

netroy commented Jun 28, 2023

install_cache_dir sounds like a good name. but, should I rename it to install_cache_dir or installCacheDir (to use consistent camelCase) ?

@rth
Copy link
Member

rth commented Jun 28, 2023

Yes, sure installCacheDir would be better.

Also a bit lower in the code there is,

  // If we are IN_NODE, download the package from the cdn, then stash it into
  // the node_modules directory for future use.

so shouldn't we also download it into this new folder? Another thing is we should probably create this folder if it doesn't exist?

@rth
Copy link
Member

rth commented Jun 28, 2023

hmm, or maybe packageCacheDir? Naming is hard.

We can also wait for a third opinion about the name.

@netroy
Copy link
Contributor Author

netroy commented Jun 28, 2023

so shouldn't we also download it into this new folder?
👍🏽

Another thing is we should probably create this folder if it doesn't exist?
👍🏽

hmm, or maybe packageCacheDir? Naming is hard.

naming is indeed hard, but packageCacheDir sounds great.

@rth
Copy link
Member

rth commented Jun 28, 2023

Let's go with it then.

@netroy
Copy link
Contributor Author

netroy commented Jun 28, 2023

Done ✅

Also a bit lower in the code there is,

  // If we are IN_NODE, download the package from the cdn, then stash it into
  // the node_modules directory for future use.

so shouldn't we also download it into this new folder?

From what I understand, the package is first downloaded into an Uint8Array (in memory), and then written to the uri, which should be a path inside the packageCacheDir already. so the file will be downloaded into this folder.

But this is only the case when channel === DEFAULT_CHANNEL. I'm not familiar enough with the codebase to know what to do for other values of channel.

@hoodmane
Copy link
Member

The other case is that the user explicitly downloaded from a URL, which can probably be left alone.

@hoodmane
Copy link
Member

The behavior of this patch is a bit confusing if someone passes packageCacheDir when you are not in Node. In that case this allows you to load the packages from a separate URL than pyodide core is loaded from. But it doesn't really have any cache-related functionality in that case. Maybe we should start by ignoring packageCacheDir if not IN_NODE.

@rth
Copy link
Member

rth commented Jun 29, 2023

Maybe we should start by ignoring packageCacheDir if not IN_NODE.

+1 and also update the docstring accordingly to mention caching and that it's used only in node.

@netroy
Copy link
Contributor Author

netroy commented Jun 29, 2023

@rth @hoodmane Can you please review again 🙏🏽 ?

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @netroy !

src/js/load-package.ts Outdated Show resolved Hide resolved
src/js/load-package.ts Outdated Show resolved Hide resolved
src/js/pyodide.ts Outdated Show resolved Hide resolved
src/js/pyodide.ts Outdated Show resolved Hide resolved
@hoodmane
Copy link
Member

As a followup it would be nice to allow customization of the CDN url that we download the packages from. I think ideally we should have three variables:

  • indexURL
  • cdnURL
  • cachePath

cdnURL default:

  • if indexURL is a URL, then indexURL
  • if indexURL is a filesystem path, defaults to jsdelivr

cachePath default:

  • if indexURL is a URL, then undefined
  • if indexURL is a filesystem path, then indexURL

Then downloadPackage would work as follows

  • if cachePath is not undefined:
    • if package exists in the file system under cachePath, load it from the file system and return it
  • download the package from cdnURL
  • if cachePath is not undefined:
    • store the package into cachePath

netroy and others added 2 commits June 29, 2023 20:55
Co-authored-by: Hood Chatham <roberthoodchatham@gmail.com>
@netroy
Copy link
Contributor Author

netroy commented Jun 29, 2023

@hoodmane Thanks for the review. I applied your suggestions.

@hoodmane
Copy link
Member

Thanks again @netroy!

@rth
Copy link
Member

rth commented Jun 29, 2023

to allow customization of the CDN url that we download the packages from

IMO a more general solution is to use absolute URLs in the pyodide-lock.json (#3943) which also addresses the problem of having files in different locations (e.g. some on JsDelivr some on PyPI). That's a blocker if we want to be able to use custom pyodide-lock.json. And once it works it's a question of whether having an extra cdnURL is still useful, as overall it would make the logic more complex.

@hoodmane
Copy link
Member

a more general solution is to use absolute URLs

I think absolute URLs are better when you know exactly where something is going to be hosted, but if we want a pyodide distro to be relocatable then we need to use relative urls.

I think it would be good to have some postprocessor in the lockfile CLI that resolves relative urls in the lockfile relative to some base. But this would be an extra step that would mean that the distro has to be served from that particular location. We could deploy a lockfile like this on npm and then if someone wants to load packages from a custom CDN then they would just have to grab a custom lockfile. That makes a lot of sense.

But obviously pyodide-<version>.tar.bz2 has to be relocatable, so we have to have some consideration of what happens when people use that version.

I think the advantage of what I described is that it unifies the logic between in node and not in node. But maybe there is another way to do this.

@hoodmane hoodmane changed the title Allow customizing installation location for packages Allow customizing cache location for packages in Node Jun 29, 2023
@hoodmane hoodmane merged commit f7562a3 into pyodide:main Jun 29, 2023
16 of 17 checks passed
@netroy netroy deleted the config.installUrl branch June 29, 2023 19:36
@netroy
Copy link
Contributor Author

netroy commented Jun 29, 2023

@hoodmane @rth Thanks a lot for your time 🙏🏽

Out of curiosity, is there a 0.24.x release planned in the coming week?
We are about to release a major version of our application in the next week, where users can run their own python code via pyodide, and we would really appreciate if we could avoid shipping a docker image where the users can't use modules.

If there is no release planned, we could try to setup another package on npm, and our own CDN with a custom build, as we don't really want to rush you folks with a release, but we'd also like to avoid shipping with this limitation of not being able to use modules.

@rth
Copy link
Member

rth commented Jun 29, 2023

but if we want a pyodide distro to be relocatable then we need to use relative URLs.

OK but in what case would one want to be relocatable and not put both the core Pyodide files and the packages into the same folder, then set indexUrl ?

I think the advantage of what I described is that it unifies the logic between in node and not in node. But maybe there is another way to do this.

For node, where Pyodide core is npm installed, I agree it would make sense to be able to specify the cdnUrl but we have to think how that would interact with absolute URLs in the pyodide-lock.json (which I still need ) :)

@rth
Copy link
Member

rth commented Jun 29, 2023

Out of curiosity, is there a 0.24.x release planned in the coming week?

I guess in any case it would be good to do an 0.24.0a1 alpha release with all the changes currently on main. I can look into it next week, unless someone else does it.

@hoodmane
Copy link
Member

It might be better to make another backport of this and @gabrielfougeron's reenabled package. But actually I guess the alpha won't have that many new features at this point. It feels like it should be a fat release but actually most of the stuff that I've been working on either hasn't landed or is to external repos or both...

One thing that I want to get in before the next release is a rework of the streams API. I realized that Emscripten's TTY API is not very good and we can expose something better than our current interface by not using it.

@rth
Copy link
Member

rth commented Jun 29, 2023

It might be better to make another backport

Technically it's a new feature, and not ideal to put it in a patch release in case it does break something that worked before (even if it's unlikely). Though I agree it is unfortunate not to be able e.g. update the npm package more easily.

Yeah, I would say the fact of releasing an alpha doesn't mean a feature freeze;

@hoodmane
Copy link
Member

This seems like a very small, backwards compatible change. The only issue that could happen is if someone was already passing packageCacheDir to loadPyodide even though it does nothing. But that seems like such an unlikely case that I wouldn't be concerned about a breakage there. So I think it's a reasonable backport candidate. But if you are against it we don't have to do it.

I would say the fact of releasing an alpha doesn't mean a feature freeze

Yeah it's never meant that before.

@rth
Copy link
Member

rth commented Jun 29, 2023

No strong opinion about backporting this into a patch release.
More like I would be interested in doing an alpha at some point independently of this request. It would reduce the time people have to wait before using the package they added or feature they contributed to main in general. It's not fully stable, but already better than using the dev CDN endpoint.

@hoodmane
Copy link
Member

I'm in favor of making an alpha release, but we could consider a backport in addition. Or another thing to consider: I think the current set of features are relatively low impact but I'm planning to add messy stuff soon. Maybe we should make an alpha release now and also branch and plan to make a release with just the current features. Then whenever we get around to it we can make another release.

@rth
Copy link
Member

rth commented Jun 29, 2023

Maybe we should make an alpha release now and also branch

Sounds like a plan. Some of the changes around JSPI sound indeed more risky :) Maybe we can wait until next week to branch and see if there is any else small we want to merge by then? Though I guess it can also always be backported later.

@hoodmane
Copy link
Member

Well I want to start merging these JSPI things. How about we make a next branch, anything we want into next release goes into main, and we can rebase next onto main or merge main into next as appropriate.

@hoodmane
Copy link
Member

Maybe I should do the stream changes now so we can consider trying to get that in.

@rth
Copy link
Member

rth commented Jun 29, 2023

Works for me.

@hoodmane
Copy link
Member

I am not sure how to change the branch that a PR targets though...

@hoodmane
Copy link
Member

@hoodmane
Copy link
Member

@ryanking13 we just had a whole discussion about release planning here which you will probably want to read =)

@ryanking13
Copy link
Member

Cool, thanks for the ping @hoodmane. I am +1 with making the 'next' branch and putting risky things into it.

I would like to add my ongoing micropip works including simple API support in the next release so we can experiment with it (e.g. using anaconda.org as an alternative registry)... but I think I don't have much time to work on it until next week. Anyway, it is not a hard blocker so we can release 0.24 if one of you want :)

hoodmane pushed a commit to hoodmane/pyodide that referenced this pull request Jul 1, 2023
This change allows users to customize where wheels are cached in Node.
This is important in the context of docker images and other cases where
the `node_modules` folder isn't writable.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants