Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider --config-settings in wheel cache keys #11164

Open
1 task done
sbidoul opened this issue Jun 5, 2022 · 16 comments
Open
1 task done

Consider --config-settings in wheel cache keys #11164

sbidoul opened this issue Jun 5, 2022 · 16 comments
Labels
C: cache Dealing with cache and files in it state: awaiting PR Feature discussed, PR is needed type: feature request Request for a new feature

Comments

@sbidoul
Copy link
Member

sbidoul commented Jun 5, 2022

What's the problem this feature will solve?

Wheel cache entries are currently keyed by source artifact URL.

However successive builds of the same artifact may yield different results, due to environmental parameters that we can't possibly know about.

There are some parameters we do know about, though: --config-settings

Describe the solution you'd like

Expand wheel cache to _get_cache_path_parts to take into account config settings.

Alternative Solutions

Ask users to use different cache directories or disable caching in when they use different config settings that influence the built wheels.

Additional context

#11126 (comment)

Code of Conduct

@sbidoul sbidoul added type: feature request Request for a new feature S: needs triage Issues/PRs that need to be triaged C: cache Dealing with cache and files in it and removed S: needs triage Issues/PRs that need to be triaged labels Jun 5, 2022
@uranusjr
Copy link
Member

uranusjr commented Jun 6, 2022

IIRC we do disable cache if --install-options etc. is passed, but that’s because we have no way to cache setup.py install. With wheels, a config-sensitive cache makes sense.

@rgommers
Copy link

rgommers commented Jan 7, 2023

The change proposed here seems like a minor band aid for a much larger problem. In general, pip has way too little information to be able to cache anything. Compilers used, dependencies found or not and their versions, environment variables yes/no, --config-settings, and probably more such variations all result in different wheels.

The caching cannot possibly be reliable, so caching wheels for builds from source is best completely disabled imho, and only used for local caching of wheels retrieved from a package index.

@sbidoul
Copy link
Member Author

sbidoul commented Jan 7, 2023

@rgommers I agree that pip lacks information to cache 100% reliably. But it's useful it a lot of situations where no compiler is involved or repeated installs are made on the same platform. --cache-dir can be used to have different caches for more advanced use cases.

@rgommers
Copy link

rgommers commented Jan 7, 2023

But it's useful it a lot of situations where no compiler is involved

That may be worth checking for then, and only caching pure wheels? It's in the METADATA file, so should not be difficult to check.

For compiled code, there's too much variation, and what I've also seen happen regularly is having users with a build that was broken initially (e.g., successful build but error on importing some module) then get confused by the broken wheel being cached and suggested changes to the build setup having no effect.

@sbidoul
Copy link
Member Author

sbidoul commented Feb 26, 2023

only caching pure wheels

I'm -1 on that as there is a broad middle ground where the variations are adequately captured in wheel tags. A typical example in my daily $work is psycopg2 where caching is totally valid and helpful.

@rgommers
Copy link

@sbidoul I wouldn't call it "totally valid", I think what you want to say is "it works well enough in practice for me". It's easy to break, in multiple ways. Not just thinks like choice of compilers, but it has flags (see its setup.py) to build with/without OpenSSL or statically like the PostgreSQL library. So if you use any of that locally, the cached wheel for that version is going to clash with what's on PyPI. And Pip won't know.

I think it's safe to say that technically the only thing Pip can actually reliably cache is wheels downloaded from PyPI. Everything else is technically unreliable, and the proposal in this PR just tweaks how that unreliability works in practice a bit. If you want to go ahead and distinguish --config-settings, that is an improvement.

Pip already has the knobs to disable caching. The problem is that the users won't know about that. A better default that still preserves the main benefits of caching would probably be to still cache for sdists downloaded from PyPI, but not cache wheels built from local sources.

@sbidoul
Copy link
Member Author

sbidoul commented Feb 26, 2023

I aware of all the caveats, as I wrote in the OP. I guess what is considered the best default will vary across user populations.

Perhaps we should log more prominently that a cached wheel is used? And make pip cache remove or --cache-dir more visible?

@rgommers
Copy link

Perhaps we should log more prominently that a cached wheel is used? And make pip cache remove or --cache-dir more visible?

That sounds like a good idea to me.

@ferdnyc
Copy link

ferdnyc commented May 13, 2024

In the interest of getting some traction under this, @pfmoore in #11126 (comment) when @sbidoul proposed capturing --config-settings in the wheel cache keys, you wrote:

That might be worthwhile but I’d be reluctant to include the extra complexity until there’s more sign that backends will actually use the config settings.

Can we assume that sign has well and truly come? --config-settings seems to be getting more use than I would've expected, if anything, so caching builds without taking it into account is becoming a pain point.

@ferdnyc
Copy link

ferdnyc commented May 13, 2024

Perhaps we should log more prominently that a cached wheel is used? And make pip cache remove or --cache-dir more visible?

@sbidoul I wonder, is there any possibility of making it possible (perhaps via a pyproject.toml flag) for a package to opt itself out of build caching?

That would give package authors a safety valve, not only for --config-settings (tho I assume that'll be fixed imminently), but for any future dimensions of configuration or build uniqueness that make packages cache-incompatible in unexpected ways.

I suppose it's possible such a flag could be abused, but... ¯\_(ツ)_/¯

@pfmoore
Copy link
Member

pfmoore commented May 13, 2024

Can we assume that sign has well and truly come?

Yes, I think we can see config_settings getting used now. Not that it makes much difference here - I never expressed reservations about this proposal (and on the proposal where I did, it was only one factor that needed to be considered).

@ferdnyc
Copy link

ferdnyc commented Jul 1, 2024

For the record, the simple reproducer from my report in #12672, which may be useful for anyone wishing to look into the problem:

How to Reproduce

Install two copies of pygobject-stubs, ostensibly configured differently

# Install 1
python3 -m venv venv_first
. ./venv_first/bin/activate
pip install pygobject-stubs --config-settings=config=Gtk3,Gdk3

# Install 2
deactivate
python3 -m venv venv_second
. ./venv_second/bin/activate
pip install pygobject-stubs --config-settings=config=Gtk4,Gdk4

# Verify the stubs versioning...
diff venv_first/lib/python3.*/site-packages/gi-stubs/repository/Gtk.pyi \
  venv_second/lib/python3.*/site-packages/gi-stubs/repository/Gtk.pyi

There should be differences between the two files, because one should have been built with Gtk3 stubs, the other with Gtk4 stubs. However, they will be the same, because the same cached build of the package will have been installed in both venvs.

Replacing both pip install commands with pip install --no-cache-dir, however, will successfully install differently-configured builds of pygobject-stubs.


And as I noted there, while --config-settings can be used in a requirements.txt file to attach configuration options to individual packages, --no-cache-dir is global and cannot be used in requirements.txt. This means that the user has to be instructed to specify --no-cache-dir on the command line (as the pygobject-stubs README does). Including it will disable all caching for the pip transaction, making a command like pip install --no-cache-dir -r requirements.txt a much heavier operation than it otherwise would be.

This makes it extremely difficult (impossible, in practice) for projects which use pygobject-stubs to correctly include it in their dev dependencies, despite the support for supplying --config-settings= arguments that would seemingly enable the dependency to be configured properly.

@sbidoul sbidoul added the state: awaiting PR Feature discussed, PR is needed label Jul 1, 2024
@sbidoul
Copy link
Member Author

sbidoul commented Jul 1, 2024

I marked this issue as awaiting PR as I think we want to do this.

Until a PR is submitted and reviewed, there is no other solution than using --no-cache or playing with different caches (with --cache-dir).

@sbidoul
Copy link
Member Author

sbidoul commented Sep 19, 2024

Another workaround is pip cache purge <package> ; pip install <package> --no-binary <package>.

@notatallshaw
Copy link
Member

FWIW, uv has implemented cache invalidation on --config-settings if that's an important use case for someone: astral-sh/uv#7139

@ferdnyc
Copy link

ferdnyc commented Sep 19, 2024

^ Actually, even better, according to the PR summary:

If --config-settings are provided, we cache the built wheels under one more subdirectory.

We don't invalidate the actual source (i.e., trigger a re-download) or metadata, though -- those can be reused even when --config-settings change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: cache Dealing with cache and files in it state: awaiting PR Feature discussed, PR is needed type: feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

6 participants