Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update lockfile #2002

Merged
merged 3 commits into from
Mar 25, 2024
Merged

Update lockfile #2002

merged 3 commits into from
Mar 25, 2024

Conversation

sneakers-the-rat
Copy link
Collaborator

@sneakers-the-rat sneakers-the-rat commented Mar 21, 2024

Noticed during #2001 that we're still on linkml-runtime 1.7.0 but 1.7.4 has been released. still not sure what da deal is with relocking but here's one.

(imo ideally this should be a gh action cron task that relocks, runs tests, and commits if they pass. if we're using the lockfile during testing, we should not let it get too far out of date bc all users install with pip aka not with a lockfile, so we get progressively out of step with what everyone else sees and miss bugs we shouldn't <3)

@sneakers-the-rat
Copy link
Collaborator Author

oop so ya by updating the lockfile we see a bug that someone installing from pip would experience but we were missing bc old lockfile

@sneakers-the-rat
Copy link
Collaborator Author

oh boy i'm just seeing how much history there is i missed on this.

Previously i raised this:

tagging @sierra-moxon and @pkalita-lbl as the two other ppl i see engaging here consistently.

This is what i said previously:

with poetry/python most people install through pip or some variant and so don't see the lockfile. ... this specifically breaks non-development installs - it breaking pip install linkml and the development install poetry install not matching non-development conditions is the problem.

And i see @vemonet said basically the same thing:

When we publish a python package the poetry.lock is not published (it is only used in development), which means that:

  • We test in an environment with locked dependencies version, not necessarily the latest one
  • When a user install our package, the dependency manager will always try to resolve to the latest available version for each dependency

For integration tests to be efficient, they need to be the closest possible to the experience the user will have using the package. In this case this means the tests needs to be run after resolving the latest version of each dependency (instead of using locked versions).

but i disagree with the argument that we should not version the poetry.lock file, more on that in one second, just trying to bring background here.

@pkalita said:

We have to consider the trade-offs between reproducibility (by including the lock file) and faster failures due to new dependency versions (by omitting the lock file). Given the limited resources we have to devote to LinkML, we've opted for reproducibility.

So i think there is some justified lack of clarity about "whats the point of the lockfile anyway"

Yes the purpose is to allow for someone to (fingers crossed) perfectly reproduce a python environment. But since lockfiles aren't a python standard, but a poetry novelty, we don't have the same kind of use for them that yarn/npm do where each package can exist in its own perfect reality - instead python packages are all installed in the same soupy environment.

The poetry docs do not help with clarity here, and i think actually make it worse:

A simple way to avoid such a scenario is to omit the poetry.lock file. However, by doing so, you sacrifice reproducibility and performance to a certain extent. Without a lockfile, it can be difficult to find the reason for failing tests, because in addition to obvious code changes an unnoticed library update might be the culprit. Further, Poetry will have to lock before installing a dependency if poetry.lock has been omitted. Depending on the number of dependencies, locking may take a significant amount of time.

If you do not want to give up the reproducibility and performance benefits, consider a regular refresh of poetry.lock to stay up-to-date and reduce the risk of sudden breakage for users.

They seem to forget that not only is it possible to install the package with pip, it is almost always how packages using poetry are installed.

So the tradeoff is not between committing the lockfile or not, and it's not between reproducibility and not - the important thing is we need to know when the lockfile is useful.

  • testing off a lockfile is useful because you should get the same results on different machines (modulo OS and any other thing poetry doesn't lock) - that's good when we want someone to be able to submit a PR with a specific environment attached, etc.
  • testing off a lockfile is not useful because it's disconnected from the POV of literally everyone else using the package - if a test fails because a new package was released, that means that the test will fail during normal usage of the package. It means that a) our version bounds need to be revised, or b) we have a bug to fix.

so the point the poetry docs re "not knowing if a bug is caused by a package update or new code" is misleading by a mile imo - if there is a bug that's difficult to explain, one can install the package from the lockfile afterwards and run the tests again and if that fixes it, diagnose the source of the problem in one step.

so tl;dr -

  • we should test using pip install . in regular CI
  • we should also periodically (and imo automatically) relock and test.

as a side note, re: linkml/linkml-runtime#288
we don't need to delete the lockfile during testing. pip install . does the job and is actually what we want here. we don't need to use poetry's dep resolver bc it might differ from pip and our tests being aligned with the experience of running the package is what we want. also that PR removes the lockfile after the package is installed anyway so it still would have used it when installing <3

Copy link

codecov bot commented Mar 23, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.69%. Comparing base (534a2db) to head (92548a5).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2002   +/-   ##
=======================================
  Coverage   80.69%   80.69%           
=======================================
  Files         104      104           
  Lines       11622    11622           
  Branches     2910     2910           
=======================================
  Hits         9378     9378           
  Misses       1701     1701           
  Partials      543      543           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pkalita-lbl
Copy link
Contributor

Thanks for writing up your thoughts on this. I think we're totally in alignment on the scope of the problem. And I just wanted to clarify that I fully appreciate that the poetry.lock file is only part of the equation in a Poetry-managed development environment (i.e. my local machine, a GitHub runner if poetry install is used, etc).

Let me also just elaborate on the scenarios I'm thinking about that I'd prefer to avoid:

  • If we had an automated process that runs poetry update and unconditionally commits the resulting poetry.lock changes into main, that seems like a really easy way for CI to "go red" and never return. People see a failing test in their PR and see it's also failing in main. They say "ah, good, not my problem" and move on.
  • If we removed poetry.lock from git or did pip install . in CI, tests may fail on your PR because of an updated dependency and not because of your changes. To be clear, finding these issues earlier would be a good thing. However. The downside is that it may turn a small PR into a rabbit hole of understanding unrelated test failures. This may be a turn off to outside contributors, and to be honest sometimes core LinkML developers don't have time to go down the rabbit hole.

What I think would make more sense for this project would be either:

  1. Have an automated process that runs poetry update but only opens a PR with the changes. A member (or members) of the core development team would need to be assigned to the PR for review, and they would be responsible for debugging and resolving any test failures before merging the PR. OR...
  2. I don't know exactly if/how the automation for 1 would work. If it's not feasible, have a rotation of core developers manually do the poetry update/fix (if necessary)/commit on a periodic basis.

Either way we're limiting the number of people who potentially have to go down the dependency-upgrade-test-failure rabbit hole and also allowing them to do it when it works with their schedule (hopefully).

I believe (although the meeting notes aren't exactly backing me up) that we talked about this on a developers call and roughly agreed to keeping the CI process the same but updating the poetry.lock file more regularly by some mechanism. @sierra-moxon or @cmungall might be able to tell me if I'm just hallucinating that. If we're all still on board with that I think the next step would be investigating if the automation for 1 above is feasible and deciding on an appropriate schedule.


With all that being said since this PR just updates the poetry.lock file and all the tests pass I'm happy to approve and merge if you're ready, @sneakers-the-rat.

@sneakers-the-rat
Copy link
Collaborator Author

I am ready! And I think the "PR with lockfile updates" makes sense! Maybe it automerges if tests pass, but otherwise doesnt? We also probably want to turn on branch protections to main requiring that a PR be up to date (I think you can currently merge if there are no conflicts, but forcing a merge/rebase might prevent weird problems from like accidentally backtracking)

Happy to PR that, ive seen it done a few times. Maybe it runs once a week or so? And always from the same branch so if an old PR is still open it can just get updated instead of stacking up.

@sneakers-the-rat
Copy link
Collaborator Author

It might also be good to make a habit of putting the developer call discussions into the docs - if not as like a "when we make decisions we make sure theyre reflected in the developer docs" kinda thing, then copying whatever notes yall take into a changelog style section in the docs. Good to keep everyone on the same page by making stuff public and referenceable :)

@pkalita-lbl
Copy link
Contributor

Maybe it automerges if tests pass, but otherwise doesnt?

If that's automatable that would be great.

We also probably want to turn on branch protections to main requiring that a PR be up to date (I think you can currently merge if there are no conflicts, but forcing a merge/rebase might prevent weird problems from like accidentally backtracking)

I hadn't though about that, but on first glance yeah that might be a good idea

Maybe it runs once a week or so? And always from the same branch so if an old PR is still open it can just get updated instead of stacking up.

I was thinking bi-weekly but obviously it wouldn't be set in stone or hard to change later so I don't have a strong preference. The same branch thing is a good idea, too.

It might also be good to make a habit of putting the developer call discussions into the docs

I'll bring it up on the next developer call 😂

@pkalita-lbl pkalita-lbl self-requested a review March 25, 2024 18:31
@pkalita-lbl pkalita-lbl merged commit 9df3d3b into linkml:main Mar 25, 2024
11 checks passed
@rly
Copy link
Contributor

rly commented Mar 25, 2024

How about this github action (see example weekly workflow): https://github.com/Apakottur/action-poetry-package-update

I haven't tried it, but it might be as easy as adding this workflow.

@sneakers-the-rat
Copy link
Collaborator Author

that one looks like it's gonna update the deps in pyproject.toml too, but yes something like that :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants