Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[optimize] early-ignore dependencies not matching chosen markers #4956

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

maciejskorski
Copy link

@maciejskorski maciejskorski commented Dec 30, 2021

Pull Request Check List

Resolves:

#4952
#4670
#6118
#5121

  • Added tests for changed code.
  • Updated documentation for changed code.

@maciejskorski
Copy link
Author

maciejskorski commented Dec 30, 2021

poetry as of now is slow due to exponentially many configurations caused by conditional logic in downstream packages.
In many cases, this can be optimised by pre-filtering them early to only those matching the current or targeted system.

This is helpful for many use-cases, where we can narrow the scope environment markers in advance - for example, the code will be developed, tested and deployed only on linux with python 3.8.

Consider this example on the environment with linux and python 3.8

[tool.poetry]
name = "test"
version = "0.1.0"
description = "demonstrates that poetry triggers overrides to easily (can be early pruned?)"
authors = ["Maciej Skorski <maciej.skorski@gmail.com>"]

[tool.poetry.dependencies]
python = "~3.8"
pandas = [
    {version="1.3.5",markers="python_version=='3.6'"},
    {version="1.3.2",markers="python_version=='3.8' and sys_platform=='win32'"},
    {version="1.3.1",markers="python_version=='3.8' and sys_platform=='linux'"},
]

As of now poetry prepares several variants which takes 6 overrides.

NOTE: the original proposal used CLI to push constraints. The approach below suggested by @radoering seems better.

This PR extends the pyproject.toml file by the section tool.poetry.target_env

[tool.poetry.target_env]
sys_platform = "linux"
python_version = "3.8"

Then poetry install --dry-run -vv gives the required install without overrides.
See another example with Azure packages

@maciejskorski maciejskorski changed the title adding markers pre-filtering early-ignore dependencies not matching chosen markers Dec 31, 2021
@maciejskorski maciejskorski changed the title early-ignore dependencies not matching chosen markers [optimize] early-ignore dependencies not matching chosen markers Dec 31, 2021
@radoering
Copy link
Member

radoering commented Dec 31, 2021

I played around a bit. On its own it works fine, but I think there are still some things to think about:

  1. Although the option affects locking it is only available for the install command. (If there is already a lock file using the option makes no difference because the lock file is used. The option has only an effect if there is no lock file.) Maybe, the option should be available for both commands (install and lock)? But that's just a detail.
  2. My main concern is that I cannot tell if a lock file has been created with or without a marker filter. Imagine the following situation: There is no lock file. Developer A (on Linux) uses install with a marker filter for sys_platform, thus, creates a lock file only suitable for Linux and commits/pushes the lock file. Developer B (on Windows) pulls and installs implicitly using the lock file not suitable for his system. Probably, there will be some missing dependencies in his virtual environment! In my opinion there has to be an entry in the lock file telling which marker filters have been used. Further, poetry had to check if a lock file is not suitable for the current platform and inform the user.
  3. I am not sure, if the marker filter should be an option of a command at all or if it should be some new setting in the pyproject.toml. Your use case says the code will be developed, tested and deployed only on linux with python 3.8.. Thus, shouldn't these restrictions to Linux and Python 3.8 be defined in the pyproject.toml, actually?
    The python constraint is already specified in the pyproject.toml so there is no need to define a marker filter for the python version (at least for this use case). (Maybe, there are some issues like Ignore dependencies not matching the project's python constraint... #4958 or poetry selects wrong python version when markers present #4959 but I think these can be resolved independently from this PR).
    Maybe, a possibility to specify other environment constraints for the project (e.g. for sys_platform) in the pyproject.toml would be useful. Sounds like a common use case. I am wondering if there is already a PEP or something covering this?

@maciejskorski
Copy link
Author

maciejskorski commented Dec 31, 2021

Thanks for getting back with valuable comments. Definitely got your point regarding 2) and 3).

Note that poetry is already not secured against such cases if downstream packages don't cover environment markers exhaustively (for example, if a package puts only linux markers poetry will match this lock against my windows and will not install anything...). This PR was inspired by use-cases of developers who work on the similar code in similar environments and so can narrow the system combinations. Also, the logic of python constraints is sometimes broken #3444 so this is one more use case.

I updated the PR accordingly. Constraints are moved to a dedicated section in the toml file.

This should mitigate your concerns 2) and 3) I suppose, as the rules would be transparent. In addition:

  • we can extend (in theory) the logic allowing for 'linux or windows' and whatever will be parsable.
  • computing the lock this way would be more flexible and the current-system-agnostic (in poetry's spirit, which builds first all variants into the lock, and compares against the current system once the lock is ready)

@maciejskorski maciejskorski marked this pull request as draft December 31, 2021 19:23
@maciejskorski maciejskorski marked this pull request as ready for review December 31, 2021 20:31
@neersighted
Copy link
Member

I think this is getting a lot closer to a useful/mergable state. Putting the environment constraints into the pyproject is absolutely the right way to go here. There is a bit more work to be done, and I'm thinking this should go into 1.3 instead of 1.2, but I think this will be a useful addition.

Copy link
Member

@neersighted neersighted left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this needs a rebase and adjustment for the current state of master. Additionally, I would like to see the docs rewritten to both be more generalize, drop references to this PR, add link to related topics in the docs, and include more examples. It also needs to document the target_env section in general (with a link to the example usage).

Finally, I'm not convinced the name target_env is a good one for this section. Something like extra_constraints might be more descriptive and intuitive.

@@ -67,6 +67,13 @@ class InstallCommand(InstallerCommand):
flag=False,
multiple=True,
),
option(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This option is dead and should be removed (given the move to using the pyproject.toml).

@@ -37,6 +37,14 @@ def create_poetry(
if io is None:
io = NullIO()

# redirect to the extended schema (from poetry, not from poetry-core)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be updating the schema in poetry-core as well, even if the implementation is in poetry only. The intention is to drop all schemas from this repo.

@@ -249,7 +256,21 @@ def _do_install(self, local_repo: Repository) -> int:
self._io,
)

ops = solver.solve(use_latest=self._whitelist).calculate_operations()
if (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% on if this is the only way to get at the config, but this smells to me. I'm pretty sure you should be able to get at the Config object from here, or at least create one from a Factory if not.

self._markers_to_filter
):
self.debug(
f"<warning>Removing {dep} due to markers {dep.marker} not matching the target platform</warning>"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f"<warning>Removing {dep} due to markers {dep.marker} not matching the target platform</warning>"
f"<warning>Removing {dep} due to markers ({dep.marker}) not matching the target environment.</warning>"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted as per your suggestion.

@neersighted neersighted added this to the 1.3 milestone May 24, 2022
@hungbie
Copy link

hungbie commented Jun 2, 2022

+1 for this. We have a big project in our team, each time poetry solves the dependencies it already takes more than 10 minutes (each try). However, we know beforehand the version we want as well as the platform, so retries to have a solution that valid everywhere is redundant and a big waste of resources. Currently in our project we have to retries around 20 times, making the whole process super slow, sometimes can be upto 4 hours.

@dimbleby dimbleby mentioned this pull request Aug 4, 2022
2 tasks
@maciejskorski
Copy link
Author

@neersighted apologizes for the absence—I overlooked the notifications from this topic. If there is still interest in this, I am ready to work on this in upcoming days.

@neersighted
Copy link
Member

@maciejskorski This is definitely something worth revisiting now that we have 1.2 out -- though I must say I have limited bandwidth to help get this over the finish line right now. @radoering or @finswimmer might be good maintainers to work with if they are available...

(sorry for throwing y'all under the bus!)

@neersighted neersighted added this to the 1.3 milestone Sep 6, 2022
@radoering
Copy link
Member

First things first. Let's talk about the representation in pyproject.toml.

Current proposal is something like:

[tool.poetry.target_env]
sys_platform = "linux"
platform_system = Linux"

It might be useful to support multiple platforms. There are at least two alternatives for that:

Lists:

[tool.poetry.target_env]
sys_platform = ["linux", "win32"]
platform_system = ["Linux", "Windows"]

Our (afaik at least partially internal) constraint syntax:

[tool.poetry.target_env]
sys_platform = "linux || win32"
platform_system = "Linux || Windows"

Since our marker handling has improved a lot since this PR was created, an alternative design specifying a PEP 508 compliant environment marker that describes the supported environment, might be possible, e.g.:

[tool.poetry]
target_env = 'sys_platform = "linux" and platform_system = "Linux"'

This approach is more powerful (but also more verbose). You can define conditions that are not possible with the original (and extended) approach, e.g.

target_env = 'sys_platform = "linux" or (sys_platform = "win32 and python_version > "3.7")'

However, I wonder if such use cases even exist. Any thoughts which representation should be chosen?

@dimbleby
Copy link
Contributor

I could believe that there are use cases for wanting to express complex top-level markers eg see #6447 in which a user was battling this

curl -s https://pypi.org/pypi/opencv-python/4.5.5.62/json | jq '.info.requires_dist'
[
  "numpy (>=1.13.3) ; python_version < \"3.7\"",
  "numpy (>=1.21.2) ; python_version >= \"3.10\"",
  "numpy (>=1.21.2) ; python_version >= \"3.6\" and platform_system == \"Darwin\" and platform_machine == \"arm64\"",
  "numpy (>=1.19.3) ; python_version >= \"3.6\" and platform_system == \"Linux\" and platform_machine == \"aarch64\"",
  "numpy (>=1.14.5) ; python_version >= \"3.7\"",
  "numpy (>=1.17.3) ; python_version >= \"3.8\"",
  "numpy (>=1.19.3) ; python_version >= \"3.9\""
]

As for how this gets written in pyproject.toml: markers are a bit ugly but the users who are using this feature are likely to be sophisticated enough that they can take it.

@maciejskorski
Copy link
Author

Since our marker handling has improved a lot since this PR was created, an alternative design specifying a PEP 508 compliant environment marker that describes the supported environment, might be possible, e.g.:

This is something I would be keen to have and implement. However, @radoering, are we absolutely sure that the current marker implementation is mature and correct? Could you advise me on some examples where complex logic (like nested and/or) are tested?

@radoering
Copy link
Member

I'm quite confident in the current implementation. It's possible that very complex expressions are not recognized to be empty, but I think it's good enough for real world use cases. The tests are in poetry-core in test_marker.py, e.g. https://github.com/python-poetry/poetry-core/blob/b0b18230a5eeabe16d0000c67dd40d15a3d8f0e5/tests/version/test_markers.py#L412-L432

@themanifold
Copy link

Just wondering what the status of this was?

@radoering
Copy link
Member

I'd say it is stalled. Maybe, I'll pick this up myself sometime in the future, but at the moment there are many other topics with a higher priority on my list. Thus, nothing to be expected from me here in the next few months.

@radoering radoering removed this from the 1.4 milestone Mar 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants