Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: solve groups for conda #783

Merged
merged 9 commits into from
Feb 7, 2024

Conversation

baszalmstra
Copy link
Contributor

This adds the ability to use solve-groups for conda packages. Pypi packages are not yet solved in a solve-group.

I also really need to clean up the environment.rs file but Ill do that later.

Copy link
Contributor

@ruben-arts ruben-arts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works already really good!


# The default environment does not specify a solve-group which means the python version does not
# have an upperbound.
default = ["min_py38"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two findings,

  1. As we don't define "default" features in the environments we should allow for defining a solve group for the default environments. Current de-serialization is requiring a features definition.
[environments]
default = {solve-group = "test"}
test = { features = ["test"], solve-group = "test" }
  1. We should consider what to do with channels. In the multi-machine example we have different channels for different environments but I could see a situation where you still want to be able to use solve groups to align the versions.
[project]
name = "multi-machine"
description = "A mock project that does ML stuff"
channels = ["conda-forge", "pytorch"]
# All platforms that are supported by the project as the features will take the intersection of the platforms defined there.
platforms = ["win-64", "linux-64", "osx-64", "osx-arm64"]

[dependencies]
python = "3.11.*"
pytorch = {version = ">=2.0.1", channel = "pytorch"}
torchvision = {version = ">=0.15", channel = "pytorch"}
polars = ">=0.20,<0.21"
matplotlib-base = ">=3.8.2,<3.9"
ipykernel = ">=6.28.0,<6.29"

[feature.cuda]
platforms = ["win-64", "linux-64"]
channels = ["nvidia", {channel = "pytorch", priority = -1}]
system-requirements = {cuda = "12.1"}

[feature.cuda.dependencies]
pytorch-cuda = {version = "12.1.*", channel = "pytorch"}

[feature.mlx]
platforms = ["osx-arm64"]
system-requirements = {macos = "13.3"}

[feature.mlx.dependencies]
mlx = "*"

[environments]
run = {solve-group = "main"}
cuda = {features = ["cuda"], solve-group = "main"}
mlx = { features = ["mlx"], solve-group = "main"}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. As we don't define "default" features in the environments we should allow for defining a solve group for the default environments. Current de-serialization is requiring a features definition.
[environments]
default = {solve-group = "test"}
test = { features = ["test"], solve-group = "test" }

Just bikeshedding here, please feel free to ignore:

One (backwards-incompatible) option would be to define an implicit "default" solve group in the same way there are implicit default features. This effectively opts all environments in to the same solve group unless they specify otherwise. This is what conda-lock seems to do for extras and dependency groups, so there's some precedent for it working in practice. In their case, they deal with the case of mutually-exclusive features with a --filter-extras CLI flag, whereas in Pixi you'd just deal with it by having envs explicitly choose a non-default solve group.

Example 1: some web app:

Here, most dependencies required by the app would go in the "default" feature, and the default solve group would ensure these lock to the exact same set of packages across CI, dev, and prod environments -- what you deploy is what you've tested/developed with. In this case, failing on incompatible dependencies is likely desirable behavior compared to finding a solution where package versions differ across environments.

[environments]
ci = ["test"]
dev = ["test", "local-dev-server"]
prod = ["cloud-database-stuff"]

# Expands to:
# default = { features = ["default"], solve-group = "default" }
# ci = { features = ["default", "test"], solve-group = "default" }
# dev = { features = ["default", "test", "local-dev-server"], solve-group = "default" }
# prod = { features = ["default", "cloud-database-stuff"], solve-group = "default" }

Example 2: test matrix:

Here, a solution that satisfies all features is obviously not possible. This would now require explicitly opting out of the default solve group for most environments:

[environments]
pl017 = { features = ["pl017", "py39", "test"], solve-group = "pl017" }
pl018 = { features = ["pl018", "py39", "test"], solve-group = "pl018" }
pl019 = { features = ["pl019", "py39", "test"], solve-group = "pl019" }
pl020 = { features = ["pl020", "py39", "test"], solve-group = "pl020" }
py39 = { features = ["py39", "test"], solve-group = "py39" }
py310 = { features = ["py310", "test"], solve-group = "py310" }
py311 = { features = ["py311", "test"], solve-group = "py311" }
py312 = { features = ["py312", "test"], solve-group = "py312" }

(Technically one could omit the solve group for one of these and have it belong to the default, but it seems preferable to leave the default solve group as unrestricted as possible in this design to maximize the chances of it solving if combined with development dependencies, etc.)


I'd say that whether the above is desirable or not depends on

  1. which use cases are likely to be more common,
  2. what users are likely to assume about the default behavior if solve-group isn't specified, and
  3. what would be the "safest" path if users' assumptions are wrong and they don't realize it.

My personal leaning is that "try to solve everything together unless asked" may lead to fewer surprises than "solve all environments separately unless asked", but that's based on the assumption that a worst-case failure mode of "lock older versions of dependencies everywhere because newer ones are incompatible in one environment" is better than "get different dependency versions which may have subtle incompatibilities which differ by environment".

Full disclosure: not a professional developer here, I mostly write research code... so take my thoughts with a grain of salt and don't feel any pressure to reply if the above seems like a bad idea =)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im in favor of solve all environments separately unless asked.
This will also lead to fewer solver errors (when there are incompatible features) and it stays backwards compatible with 0.13.0. my gut feeling tells me that the "matrix" use case would be a bit more common.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

He @msegado, thanks for the write-up. I see your point, but I personally agree with @pavelzw that I think we should be flexible by default and let the user make the solve harder if their use-case requires it. Now users would need to go to the documentation because they want an extra feature instead of them getting an error because their solve doesn't work.

@pavelzw
Copy link
Contributor

pavelzw commented Feb 6, 2024

Tested it out, works great for me :)

@ruben-arts ruben-arts merged commit a9f1f61 into prefix-dev:main Feb 7, 2024
9 checks passed
@pavelzw
Copy link
Contributor

pavelzw commented Feb 7, 2024

The obvious question: when is the next release planned? 😄

@ruben-arts
Copy link
Contributor

Seeing the work we still need to get n that would be next week. This still needs to be fixed for the pypi dependencies.

@pavelzw
Copy link
Contributor

pavelzw commented Feb 7, 2024

Alright, keep up the good work 🫡

@deltamarnix
Copy link

I have a question about this. How can we decide which is the best way to set up the solve-groups? For example in a test matrix. Would one choose to split it per python version, or per completeness in a feature set. In the following example I decided to put the groups together based on python version.

[environments]
full-py39 = { features = ["py39", "io", "extra", "dev", "test", "doc", "examples"], solve-group = "py39" }
full-py310 = { features = ["py310", "io", "extra", "dev", "test", "doc", "examples"], solve-group = "py310" }
full-py311 = { features = ["py311", "io", "extra", "dev", "test", "doc", "examples"], solve-group = "py311" }
slim-py39 = { features = ["py39", "io", "extra", "examples"], solve-group = "py39" }
slim-py310 = { features = ["py310", "io", "extra", "examples"], solve-group = "py310" }
slim-py311 = { features = ["py311", "io", "extra", "examples"], solve-group = "py311" }
min-py39 = { features = ["py39"], solve-group = "py39" }
min-py310 = { features = ["py310"], solve-group = "py310" }
min-py311 = { features = ["py311"], solve-group = "py311" }

@pavelzw
Copy link
Contributor

pavelzw commented Feb 23, 2024

In your specific example I would suggest per python version.

Per feature set would not make any sense since you would have conflicting dependencies in full-py39 and full-py310.

@deltamarnix
Copy link

Thanks, that makes sense. I will keep it this way then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants