Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pip==23.3 breaking "extras" behavior #2004

Open
mattoberle opened this issue Oct 16, 2023 · 19 comments · May be fixed by #2013
Open

pip==23.3 breaking "extras" behavior #2004

mattoberle opened this issue Oct 16, 2023 · 19 comments · May be fixed by #2013

Comments

@mattoberle
Copy link

Using pip==23.3.x can result in invalid requirements.txt for projects that declare extras.
A change in pip==23.3.0 introduced the "canonicalization" of extras names:

The canonicalization leads to this outcome:

  • requirements.in: sqlalchemy[postgresql_psycopg2binary]
  • requirements.txt: sqlalchemy[postgresql-psycopg2binary]

For many cases, this probably isn't an issue.
psycopg2-binary still makes it into requirements.txt as an individual entry.

For more "niche" cases, the extras still carry some significance.
eg. When using rules_python under the Bazel build system, the extras must properly resolve to build out a dependency graph.

Environment Versions

  1. OS Type: Linux
  2. Python version: 3.10.9
  3. pip version: 23.3.0
  4. pip-tools version: 7.3.0

Steps to replicate

  1. Create a simple requirements.in file.
sqlalchemy[postgresql_psycopg2binary]==1.4.47
  1. Create a virtual environment, install pip-tools and the previous major pip version, run pip-compile.
python -m venv .venv
. ./.venv/bin/activate
pip install pip-tools==7.3.0 pip==22.3.1
pip-compile --no-strip-extras
  1. Observe that the resulting requirement.txt file lists sqlalchemy[postgresql_psycopg2binary] (with an underscore).
greenlet==3.0.0
    # via sqlalchemy
psycopg2-binary==2.9.9
    # via sqlalchemy
sqlalchemy[postgresql_psycopg2binary]==1.4.47
    # via -r requirements.in
  1. On the same virtual environment, upgrade to pip>=23.3, run pip-compile.
pip install pip==23.3.0
pip-compile --no-strip-extras
  1. Observe that the resulting requirement.txt file lists sqlalchemy[postgresql-psycopg2binary] (with a hyphen).
  WARNING: sqlalchemy 1.4.47 does not provide the extra 'postgresql-psycopg2binary'

greenlet==3.0.0
    # via sqlalchemy
psycopg2-binary==2.9.9
    # via sqlalchemy
sqlalchemy[postgresql-psycopg2binary]==1.4.47
    # via
    #   -r requirements.in
    #   sqlalchemy

Expected result

The valid input extras should still be valid in the output.

Actual result

The output extras will not resolve.

@mattoberle
Copy link
Author

Just a quick note:

pip==23.3.0 will attempt to look up the canonical extra postgresql-psycopg2binary before falling back to postgresql_psycopg2binary.

The simple answer to this issue may just be that the sqlalchemy extras should list the canonical variants in their package.

@AndydeCleyre
Copy link
Contributor

Oof, in my local test, it comes out as

sqlalchemy[postgresql-psycopg2binary,postgresql_psycopg2binary]==2.0.22

Using the legacy resolver, it's just

sqlalchemy[postgresql_psycopg2binary]==2.0.22

I tried adding a simple

if ireq.extras:
    ireq.extras = set(map(canonicalize_name, ireq.extras))

to the start of format_requirement, but it seems that the str function on a req/ireq still has the old, repeated extra names somewhere....

@AndydeCleyre
Copy link
Contributor

OK this may be sloppy, and I can never remember which ireq/req attributes are guaranteed, or how req affects ireq exactly, but the following addition at the top of format_requirement does seem to iron this out:

    if ireq.req and ireq.req.extras:
        ireq.req.extras = set(map(canonicalize_name, ireq.req.extras))
    if ireq.extras:
        ireq.extras = set(map(canonicalize_name, ireq.extras))

I also don't know if the format method is the proper place to do this canonicalization.

@webknjaz
Copy link
Member

@AndydeCleyre I think that the normalization should happen whenever an instance of InstallRequirement or InstallationCandidate is first encountered/constructed. By doing so, all other parts of the depresolver would work with the normalized stuff...

@AndydeCleyre
Copy link
Contributor

So another place we can get at this is in pip_compat.parse_requirements:

diff --git a/piptools/_compat/pip_compat.py b/piptools/_compat/pip_compat.py
index 6409fbc..6972f03 100644
--- a/piptools/_compat/pip_compat.py
+++ b/piptools/_compat/pip_compat.py
@@ -14,6 +14,7 @@ from pip._internal.network.session import PipSession
 from pip._internal.req import InstallRequirement
 from pip._internal.req import parse_requirements as _parse_requirements
 from pip._internal.req.constructors import install_req_from_parsed_requirement
+from pip._vendor.packaging.utils import canonicalize_name
 from pip._vendor.packaging.version import parse as parse_version
 from pip._vendor.pkg_resources import Requirement
 
@@ -74,7 +75,10 @@ def parse_requirements(
     for parsed_req in _parse_requirements(
         filename, session, finder=finder, options=options, constraint=constraint
     ):
-        yield install_req_from_parsed_requirement(parsed_req, isolated=isolated)
+        ireq = install_req_from_parsed_requirement(parsed_req, isolated=isolated)
+        if ireq.req.extras:
+            ireq.req.extras = set(map(canonicalize_name, ireq.req.extras))
+        yield ireq
 
 
 def create_wheel_cache(cache_dir: str, format_control: str | None = None) -> WheelCache:

But things I don't know include:

  • do we need to manipulate the ireq.extras as well here?
  • will this fix the issue when sqlalchemy[postgresql_psycopg2binary] is a dep of something else?
  • do we need to do similar adjustments in all the other places we use install_req_from_* functions?
  • should we converge on the canonicalized extras names, even though this report expresses an expectation for the non-canonicalized form used by the package (sqlalchemy)?

@dgilmanAIDENTIFIED
Copy link

Some extras are getting duplicated. It doesn't appear to impact the resolver or pip's use of the requirements file but it is an unexpected change when updating to pip 23.3.x. This is pip 23.3.1 and pip-tools 7.3.0:

$ cat test.in
sentry_sdk[pure_eval]

$ pip-compile --output-file test.txt test.in --upgrade --allow-unsafe
WARNING: --strip-extras is becoming the default in version 8.0.0. To silence this warning, either use --strip-extras to opt into the new default or use --no-strip-extras to retain the existing behavior.
#
# This file is autogenerated by pip-compile with Python 3.12
# by the following command:
#
#    pip-compile --allow-unsafe --output-file=test.txt test.in
#
asttokens==2.4.1
    # via sentry-sdk
certifi==2023.7.22
    # via sentry-sdk
executing==2.0.0
    # via sentry-sdk
pure-eval==0.2.2
    # via sentry-sdk
sentry-sdk[pure-eval,pure_eval]==1.32.0
    # via -r test.in
six==1.16.0
    # via asttokens
urllib3==2.0.7
    # via sentry-sdk

@webknjaz
Copy link
Member

webknjaz commented Nov 1, 2023

  • do we need to manipulate the ireq.extras as well here?

Maybe? Given it's not possible at some earlier stage...

  • will this fix the issue when sqlalchemy[postgresql_psycopg2binary] is a dep of something else?

Why not? Wouldn't “something else” also go through normalization?

  • do we need to do similar adjustments in all the other places we use install_req_from_* functions?

Probably. I'd expect us to be consistent across the codebase.

  • should we converge on the canonicalized extras names, even though this report expresses an expectation for the non-canonicalized form used by the package (sqlalchemy)?

I'd say yes. Canonicalization is well-defined now and as the time passes, more people will use newer pip versions that do this. At some point, we'll EOL support for pip versions that don't do such conversion and then, we'd just differ for no good reason. Maybe, we'd need to leave the original dep names in the comments, though.

@AndydeCleyre
Copy link
Contributor

Maybe we should shadow all the install_req_from_* functions, or pursue getting the upstream functions to canonicalize, in pip?

@AndydeCleyre AndydeCleyre linked a pull request Nov 2, 2023 that will close this issue
4 tasks
@webknjaz webknjaz linked a pull request Nov 2, 2023 that will close this issue
4 tasks
AndydeCleyre added a commit to AndydeCleyre/pip-tools that referenced this issue Nov 2, 2023
@webknjaz
Copy link
Member

webknjaz commented Nov 3, 2023

Does pip not normalize in all places? I think it may make sense there too.

@AndydeCleyre
Copy link
Contributor

Does pip not normalize in all places? I think it may make sense there too.

Yeah we're getting non-normalized extras via pip._internal.req.constructors.install_req_from_parsed_requirement, pip._internal.req.constructors.parse_req_from_line, and the main InstallRequirement constructor, at least.

AndydeCleyre added a commit to AndydeCleyre/pip-tools that referenced this issue Nov 5, 2023
@tboddyspargo
Copy link

tboddyspargo commented Dec 1, 2023

It doesn't appear to impact the resolver or pip's use of the requirements file but it is an unexpected change when updating to pip 23.3.x.

I just wanted to share that in my experience, the duplicate extras DO impact the resolution, although it's possible that I'm misattributing the behavior. Some clarifications would help me understand if I should report an issue with pip instead.

In my case, having pyhive[hive_pure_sasl,presto]==0.7.0 in my requirements.in file using pip==23.3.1 and pip-tools==7.3.0 with --resolver=legacy results in the hive_pure_sasl requirements thrift-sasl and pure-sasl being removed/omitted from the resulting lockfile, but including the mash-up of pyhive[hive-pure-sasl,hive_pure_sasl,presto]==0.7.0.

It's odd that the expected extra dependencies ARE included when using --resolver=backtracking.

One additional tidbit: To work around this, we are explicitly listing thrift-sasl in requirements.in, which surprisingly results in pip-compile correctly including a via pyhive comment. This is despite the fact that without explicitly listing thrift-sasl as a direct dependency, pip-compile would have removed its entry entirely.

Do the via comments use a different mechanism to determine which packages are required by which than actual dependency resolution does?

EDIT: Disregard the last. I was using the backtracking resolver at the time, which correctly resolves the dependencies, so naturally the comments were correct.

@tboddyspargo
Copy link

tboddyspargo commented Dec 3, 2023

One other thing that seems related to this:

It's odd, and I think unexpected, that a package listed in the via comments of itself when any extras are provided, a version constraint is present, and the backtracking resolver is used. For example, when a requirement is listed as sqlalchemy[postgresql-psycopg2binary]==2.0.23, via sqlalchemy will show up alongside the source requirements file in the comments. When using the legacy resolver, packages are not listed in the via comments of their own package specifier.

I'm also happy to make that its own issue, if you feel it is distinct enough. Thanks!

EDIT: Added the extra criteria that a version constraint must be used in order to provoke the unexpected behavior.

AndydeCleyre added a commit to AndydeCleyre/pip-tools that referenced this issue Dec 3, 2023
@AndydeCleyre
Copy link
Contributor

@tboddyspargo I don't think I can reproduce the self-via issue.

$ pip-compile --version
pip-compile, version 7.3.0
$ <<<'sqlalchemy[postgresql-psycopg2binary]' pip-compile --no-header - -o - 2>/dev/null
greenlet==3.0.1
    # via sqlalchemy
sqlalchemy[postgresql-psycopg2binary]==2.0.23
    # via -r -
typing-extensions==4.8.0
    # via sqlalchemy

And with my PR #2013:

$ pip-compile --version
pip-compile, version 7.3.1.dev65
$ <<<'sqlalchemy[postgresql-psycopg2binary]' pip-compile --no-header - -o - 2>/dev/null
greenlet==3.0.1
    # via sqlalchemy
sqlalchemy[postgresql-psycopg2binary]==2.0.23
    # via -r -
typing-extensions==4.8.0
    # via sqlalchemy

@tboddyspargo
Copy link

Thanks for pointing that out, @AndydeCleyre! Your counter examples helped me realize that there's another contributing factor. It seems that the redundant via output may only happen if there's a constraint in the original requirement specification.

$ pip-compile --version
pip-compile, version 7.3.0
$ <<<'sqlalchemy[postgresql-psycopg2binary]==2.0.23' pip-compile --no-header - -o - 2>/dev/null
greenlet==3.0.1
    # via sqlalchemy
sqlalchemy[postgresql-psycopg2binary]==2.0.23
    # via
    #   -r -
    #   sqlalchemy
typing-extensions==4.8.0
    # via sqlalchemy

Toggling the presence of ==2.0.23 (as well as other styles of constraints) seems to also toggle the presence of the extra sqlalchemy in the via area of the sqlalchemy lines. However, if you use --resolver=legacy the redundant via entry never seems to appear.

@AndydeCleyre
Copy link
Contributor

Thanks! I can now reproduce using the main branch. I can't confidently say that's fixed by the mentioned PR, because I didn't set out to fix that problem, but I am not seeing it there.

AndydeCleyre added a commit to AndydeCleyre/pip-tools that referenced this issue Jan 8, 2024
AndydeCleyre added a commit to AndydeCleyre/pip-tools that referenced this issue Jan 22, 2024
AndydeCleyre added a commit to AndydeCleyre/pip-tools that referenced this issue Feb 16, 2024
AndydeCleyre added a commit to AndydeCleyre/pip-tools that referenced this issue Feb 17, 2024
AndydeCleyre added a commit to AndydeCleyre/pip-tools that referenced this issue Feb 22, 2024
@dgilmanAIDENTIFIED
Copy link

Apologies if this is implied by the ticket elsewhere, but pip==24.0 and pip-tools==7.4.0 emits a warning I don't think I saw before:

WARNING: sentry-sdk 1.40.5 does not provide the extra 'pure-eval'

The name of the extra in the package and in my requirements.in is pure_eval with an underscore.

@AndydeCleyre
Copy link
Contributor

@dgilmanAIDENTIFIED

I can't reproduce that here on Linux with Python 3.11, with the versions you specified.

Can you test if it still happens with #2013 ?

@dgilmanAIDENTIFIED
Copy link

Testing it out with that branch I don't get the warning. My output is:

root@3d945b95eb1f:/# pip-compile --output-file foo.txt foo.in --upgrade --allow-unsafe
WARNING: --strip-extras is becoming the default in version 8.0.0. To silence this warning, either use --strip-extras to opt into the new default or use --no-strip-extras to retain the existing behavior.
#
# This file is autogenerated by pip-compile with Python 3.11
# by the following command:
#
#    pip-compile --allow-unsafe --output-file=foo.txt foo.in
#
certifi==2024.2.2
    # via sentry-sdk
sentry-sdk[pure-eval]==1.40.5
    # via -r foo.in
urllib3==2.2.1
    # via sentry-sdk

Note that the underscore in pure_eval is replaced by a dash. I'd also expect the packages pure_eval, asttokens and executing to be pinned in the file but they aren't in there at all.

@AndydeCleyre
Copy link
Contributor

Note that the underscore in pure_eval is replaced by a dash.

This is currently intentional.

I'd also expect the packages pure_eval, asttokens and executing to be pinned in the file but they aren't in there at all.

Uh oh. Thanks!

AndydeCleyre added a commit to AndydeCleyre/pip-tools that referenced this issue Mar 3, 2024
AndydeCleyre added a commit to AndydeCleyre/pip-tools that referenced this issue May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants