Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci] raise floors on CI dependencies #6375

Merged
merged 21 commits into from
Apr 23, 2024
Merged

[ci] raise floors on CI dependencies #6375

merged 21 commits into from
Apr 23, 2024

Conversation

jameslamb
Copy link
Collaborator

@jameslamb jameslamb commented Mar 19, 2024

Contributes to #6350.

This PR proposes using stricter dependency requirements when creating conda environments in CI.

And some other changes to env creation:

  • switches from matplotlib to the lighterweight matplotlib-base for all jobs
  • removes -q in conda create calls, so we can see in logs what packages get pulled in (it's not that many log lines)
  • moving those lists of requirements out of testing scripts and into files that can be passed to conda create
    • simplifies the testing scripts, and allows those environment files to be re-used in local development

Notes for Reviewers

How this helps with local development

I'm envisioning that for local development, we could (in a follow-up PR) encourage a pattern like this for creating a local dev environment:

conda create \
    --name lgb-dev \
    --yes \
    --file ./.ci/conda-envs/ci-core.txt \
    --file ./.ci/conda-envs/dev.txt \
    python=3.11

Where dev.txt might contain things like mypy, ruff, and pre-commit.

Why are these .txt files and not conda .yaml files?

conda create cannot read conda YAML files, it expects stuff like these .txt files. You can see that in the docs for conda list (which produces requirement lists like these .txt files).

https://github.com/conda/conda/blob/29f5f3192b519cf94c314dbfb442ad974b0a4232/conda/cli/main_list.py#L90

And I wanted to be able to pass in the Python constraint from the command line, which conda env create (the command that can read YAML files) doesn't support.

So did this make things faster?

Yes 🔥

Especially on AppVeyor. On master, building one commit has been taking 45-85 minutes. On this branch, it's taking more like 35-45 minutes.

See https://ci.appveyor.com/project/guolinke/lightgbm/history

On Azure DevOps and GitHub Actions, it looks to me like the speedup is not quite so noticeable, but at a minimum this doesn't seem to make any builds slower.

References

@jameslamb jameslamb changed the title WIP: [ci] raise floors on CI dependencies [ci] raise floors on CI dependencies Mar 22, 2024
@jameslamb jameslamb marked this pull request as ready for review March 22, 2024 05:00
@jmoralez
Copy link
Collaborator

Seems like keeping support for Python 3.7 has become too much of a maintenance burden. I'd be ok to drop it if you want

@borchero
Copy link
Collaborator

Python 3.7 reached EOL on June 27, 2023 (cf. https://devguide.python.org/versions/), I think it's reasonable to drop it 😅

@jameslamb
Copy link
Collaborator Author

jameslamb commented Mar 25, 2024

Seems like keeping support for Python 3.7 has become too much of a maintenance burden. I'd be ok to drop it if you want

This PR and that CI blockage wasn't too bad.

I think we should do one more release with Python 3.7 support and include a note in the release notes that it'll be the last one to support Python 3.7. So I think this PR should be merged as-is.

And when I say "last one to support Python 3.7".... I'd support that being a "soft" drop.

And then only move to python-requires >= 3.8 if there is a compelling reason, like new language features.

Python 3.7 reached EOL on June 27, 2023 ... I think it's reasonable to drop it 😅

That's a good piece of evidence but not the only one I think we should consider when contemplating which versions of Python LightGBM supports. I think at least 2 of these should be true before we drop support:

  • the version is end-of-life
  • the maintenance burden of supporting the version is prohibitively high
  • supporting that version is preventing LightGBM from adding some sufficiently-valuable new functionality

I still support working that way moving forward, to try to support users in constrained environments where it's difficult to update Python versions. So, for example, I don't think LightGBM should drop Python 3.8 support immediately when it reaches end-of-life later this year.

Some relevant prior discussions:

Copy link
Collaborator

@borchero borchero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for trying to speed up CI! I have one comment regarding the dependency specification 👀

This directory contains files used to create `conda` environments for development
and testing of LightGBM.

The `.txt` files here are intended to be used with `conda create --file`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really convinced why a .txt file would be more appropriate than a .yml file here 🤔 I've been happily using .yml files for environment creation for years although I've been using micromamba instead of conda.

While I think that the former would be more fitting for use in CI in general, does anything prevent us from using conda env create -f <file>.yml? I'm almost certain that this works.

Copy link
Collaborator

@jmoralez jmoralez Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've also used micromamba install on the base env (micromamba doesn't have a base env so we could keep the current name) which allows you to specify the python version and a yaml file for the dependencies (example)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does anything prevent us from using conda env create -f .yml?

I described this in "Why are these .txt files and not conda .yaml files?" in the PR description. I'll add more details here.

I wanted to be able to pass in a Python constraint from the command line, to avoid these alternatives I could think of for constraining Python version:

  • using templated statements like python=${PYTHON_VERSION} in an env.yml file and needing to involve some templater like envsubst (docs) to use them
  • generating individual constraint for each Python version LightGBM tests against, each containing a dependency like python=3.10.*
  • continuing to have these lists of dependency requirements hard-coded in CI scripts

As of the latest conda (v24.3.0), conda env create does not support mixing files and command-line constraints.

Given a file env.yaml with these contents:

channels:
  - nodefaults
  - conda-forge
dependencies:
  - scipy>=1.0

This

conda create \
   --name delete-me \
   --file ./env.yaml

Yields this

CondaValueError: could not parse '- nodefaults' in: ./env.yaml

conda env create can read files like that... but you can't pass additional constraints from the command line.

conda env create \
   --name delete-me \
   --file ./env.yaml \
   python=3.10
SpecNotFound: Invalid name 'python=3.10', try the format: user/package

I looked some more into this tonight and found this relevant issue with discussion about changes to these APIs in conda:

micromamba install ... allows you to specify the python version and a yaml file for the dependencies

Oh cool! I didn't realize micromamba allowed for that. That behavior looks like exactly what I'm trying to get here.

however ... I don't support switching to micromamba here in LightGBM's CI. It's described as a "reimplementation" of conda / mamba in its docs (docs link). Part of the motivation of moving to environment files in this PR is to help make the development of LightGBM easier. I'd really like to avoid having that experience start with "install this specific other conda alternative". I also really would not be excited about the prospect of rebuilding the images from https://github.com/guolinke/lightgbm-ci-docker to include micromamba.

Especially just for the benefit of "use .yml files because .txt files seem weird".

@jmoralez @borchero can you think of any other functional reasons that what I've proposed in this PR (using conda create + .txt files + python=${PYTHON_VERSION} from the command line) is problematic?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just don't think txt files are widely adopted in the conda commands. For example, suppose someone already has an environment with LightGBM in it and just wants to add the dependencies to develop, they could just use conda/mamba env update -f env.yaml. conda env update also allows passing multiple files, so if we end up having more than one file they could be passed in that single command.

The two step creation (python version first and dependencies second) isn't really problematic (I think that's what most people do) and I don't think it slows down the second resolution.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my experience, it absolutely does speed up the total time-to-environment-ready to do everything in one step instead of creating an environment and then updating it: #5743.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you support still preserving the "create environment in all one shot" approach by having one YAML file per Python version, like this?

I think that's reasonable, we're extensively using this mechanism at our company 👍

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

With the constraints "create the environment with all dependencies in a single command" and "keep using conda / mamba", there are 2 approaches we've identified:

  1. use conda create --file .ci/ci-core.txt python=${PYTHON_VERSION}
  2. create one YAML file per combination and use something like conda env create --file ./.ci/conda-envs/ci-core-py${PYTHON_VERSION}.yml

If I'm reading this correctly, @jmoralez prefers Option 1 and you prefer Option 2.

I also slightly prefer Option 1, at least right now where all not-Python-3.7 environments can share the same dependency constraints.

If we were to have more Python-version-specific lists of dependencies, I think I'd prefer Option 2 more.

@borchero is it alright with you if we move forward with the Option 1 approach right now and see how it goes? Since this is just affecting CI and commands that we document in a README, I think it should be low-risk to be wrong and end up changing this pattern in the future.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, fine for me, please move ahead with option 1 to improve the CI setup :)

On a slightly related note, this PR prompted me to think about the use of lock files... this would ensure entirely reproducible environments (i.e. fewer CI failures) and would essentially fully alleviate the need for solving in CI jobs. Potentially something for a future PR... 😄

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thanks very much! I'll get this PR building and and merge it on @jmoralez 's existing approval of the current approach.

I think we're almost immediately gonna be talking about this again when we decide how far to go in "dropping" Python 3.8 support like discussed further up in this PR.

the use of lock files... this would ensure entirely reproducible environments

I'd support using lock files (or just generally == types constraints) for the jobs using Python versions that are using end-of-life Python versions or operating systems.

But for the others, I think it's valuable to let dependencies float when we can. That means that LightGBM is continually tested against new releases of all its dependencies, which more evenly spreads out the work of reacting to breaking changes over time, and reduces the total amount of effort required.

For example, I'm thinking about cases like this:

Because CI broke here shortly after a new dask release came out, there was a fairly small changeset in dask to investigate to try to find the root cause. If instead we use == pins here and then only hit that error say a month or 2 months later, the debugging effort would have been much higher. And we would have missed the opportunity to get in a fix like dask/dask#11007 quickly to limit how many versions of the dependency exhibit the behavior that's problematic for lightgbm .

Letting things float also more closely makes CI match the real-world experience of someone starting a new project and running pip install lightgbm scikit-learn numpy or something, where they'd get the latest versions.

Since this project publishes a Python library used in a lot of different contexts, not an application that's distributed as a binary or container image, I think it's valuable to continue that practice of letting the dependencies somewhat float + regularly bumping up the floors to continue getting fast environment solves.

All that said though... if you disagree or see some other opportunities to improve the way we get dependencies in CI, I definitely encourage you to write up a proposal in an issue! We have a loosely-enforced convention of doing those in issues tagged [RFC] ("request for comment"). You can see some at https://github.com/microsoft/LightGBM/issues?q=%22%5BRFC%5D%22+is%3Aissue+is%3Aclosed.

Thanks to both of you for talking through all this with me, it's good that we keep challenging the current state of how things are set up.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I didn't know about the [RFC]-type issues yet, I'll think about it a bit more (taking into account your comment, thanks for the elaboration ;) and might come back to this then 😄

@jameslamb jameslamb merged commit 1443548 into master Apr 23, 2024
41 checks passed
@jameslamb jameslamb deleted the ci/conda-floors branch April 23, 2024 13:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants