Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An easy way to update packages versions #1153

Closed
mathbunnyru opened this issue Aug 25, 2020 · 23 comments · Fixed by #1378
Closed

An easy way to update packages versions #1153

mathbunnyru opened this issue Aug 25, 2020 · 23 comments · Fixed by #1378
Labels
type:Maintenance A proposed enhancement to how we maintain this project

Comments

@mathbunnyru
Copy link
Member

I see a lot of commits just updating several packages.

It would be great to have a tool, which automatically finds all the versions which are old (with some blacklist maybe) and tries to update these versions and if tests pass, it will automatically commit the changes.

So we will always have latest (and greatest packages), and reduce a lot of manual work.

@parente
Copy link
Member

parente commented Aug 26, 2020

One implementation thought: We could set up a GitHub Action workflow to run that tool on some cadence and open / close PRs based on test status.

@romainx
Copy link
Collaborator

romainx commented Aug 26, 2020

Good idea, note that this tool already exist and can be used at least as a base. It is described in the contributor documentation.

$ make check-outdated/base-notebook

# INFO     test_outdated:test_outdated.py:80 3/8 (38%) packages could be updated
# INFO     test_outdated:test_outdated.py:82
# Package     Current    Newest
# ----------  ---------  --------
# conda       4.7.12     4.8.2
# jupyterlab  1.2.5      2.0.0
# python      3.7.4      3.8.2

@romainx
Copy link
Collaborator

romainx commented Oct 20, 2020

I think one of the best way to do it would be to use a tool like dependabot. However it does not support conda, and its Docker support will not help on our use cases.

@parente parente added the type:Maintenance A proposed enhancement to how we maintain this project label Nov 29, 2020
@trallard
Copy link
Member

I think the idea of having a GitHub action to update the dependencies would be a good approach (and overcomes the limitations from dependabot).

I would be happy to give this a go

@mathbunnyru
Copy link
Member Author

mathbunnyru commented Jan 13, 2021

I think the idea of having a GitHub action to update the dependencies would be a good approach (and overcomes the limitations from dependabot).

I would be happy to give this a go

I think no one has been working on this issue for the past few months, so I say, give this a go, if you want to, it would be awesome :)

@trallard
Copy link
Member

Great - will get working on this and create a draft PR as soon as possible

@maresb
Copy link
Contributor

maresb commented Apr 23, 2021

This is perhaps a silly question, but why pin the dependencies in the first place? Especially for scipy-notebook.

I've been gaining more experience with conda-forge, and it seems like a good strategy is to leave most things unpinned. When something unexpected happens, it's probably because of some error in the upstream dependencies, which can (and should) be fixed in the respective feedstock.

@mathbunnyru
Copy link
Member Author

mathbunnyru commented Apr 23, 2021

This is perhaps a silly question, but why pin the dependencies in the first place? Especially for scipy-notebook.

I've been gaining more experience with conda-forge, and it seems like a good strategy is to leave most things unpinned. When something unexpected happens, it's probably because of some error in the upstream dependencies, which can (and should) be fixed in the respective feedstock.

That's not a silly question at all.

I see several positive things in pinning versions:

  1. Reproducibility. If we don't fix versions and build the same code at different times, it will give different results. It's not something that I expect (I have some background in C++). This is really important.

  2. It's quite an easy strategy to rebuild the images - they are rebuilt if someone pushes an update. If we do not fix version, when do we build the images? (should it be every day or should we track dependencies? )

  3. I had troubles when not fixing versions with the dependency resolution. It was happening a long time ago, I hope it's better now with conda.

  4. People see which versions we're using and they can decide if they want to use the image or not.

  5. Let's assume the situation when you try to change datascience-notebook and you haven't changed scipy-notebook at all. And, something has broken in the dependencies of scipy-notebook. And now, instead of dealing just with datascience-notebook, you have to change the code you didn't touch.

  6. When something unexpected happens, it's probably because of some error in the upstream dependencies, which can (and should) be fixed in the respective feedstock.

But we can't say this to our users, right? So sometimes we will have to fix some versions.

@maresb
Copy link
Contributor

maresb commented Apr 23, 2021

@mathbunnyru, I used to think similarly, but my perspective has changed.

I think reproducibility is ultimately the responsibility of the end user, and that is easily achieved by pinning a Docker build number. Moreover, the current practice of pinning major/minor version numbers doesn't provide exact reproducibility. For that you'd need not only the patch number but also the conda-forge build number.

For exact reproducibility, I add the following command in my Dockerfile: conda env export > $CONDA_DIR/environment.yaml. From there, it's easy to generate a build artifact with (docker run --rm image-name cat /opt/conda/environment.yaml) > environment.yaml. I don't have any good ideas for how to publish it though... naively committing it would trigger an infinite loop in CI.

I do agree with your 5. While I think it's a fact of life that upstream dependencies will change and break things, I can see how pinning makes things more tame.

I'm not suggesting never to fix versions, just that fixing versions is overrated, and that environment.yaml may be a better way to guarantee reproducibility.

@mathbunnyru
Copy link
Member Author

mathbunnyru commented Apr 24, 2021

Thanks for your ideas @maresb.

It would be great to hear from @parente and @romainx

@parente
Copy link
Member

parente commented Apr 24, 2021

The major.minor version pinning approach used here originated in the early days of conda-forge when it was extremely difficult to get a working build with the number of packages in these images. I think it's reasonable to experiment with an unpinned strategy today as long as users are informed about the change, there is a manifest of what actually got installed during a build (there is on the wiki), and active maintainers are ok with troubleshooting a potential decrease in build stability.

@romainx
Copy link
Collaborator

romainx commented Apr 26, 2021

In fact if we not only to do it for conda dependencies but also for other parts of the stack like Ubuntu upstream image.

We should also change the build policy to switch to some kind of regular build (daily, weekly) vs build after a merge on the master branch.
The drawback is that the time that will not be spend in updating the images will certainly have to be spent in fixing the builds.
But I'm also Ok to give it a try 👍
Having everything correctly built at the first time will be a good indicator 😄

@slmg
Copy link

slmg commented Aug 10, 2021

Late loyal user feedback on this.

Whilst I completely understand the reasons behind un-pinning packages versions, I believe it may have been a good idea to at least keep the major component. So switching from 'jupyterlab=3.0.16', 'scipy=1.7.*' to 'jupyterlab=3.*, 'scipy=1.*' as an example.

Most packages use semantic versioning, so that would in theory always get the latest non-breaking changes (which I believe is what everyone tends to want).

Personally I have been finding it very convenient to come here from time to time and check what changed (anything major?) via diff links like https://github.com/jupyter/docker-stacks/compare/b9f6ce795cfc..master.

  1. People see which versions we're using and they can decide if they want to use the image or not.

Now that everything is going to be latest, visibility is lost. For user controlling their environment, the only way I see to check what change is to

  1. Pick and download a newly built docker image

  2. Run mamba list / pip list inside a container instance.

  3. Save the previous command output (likely via docker cp) and compare it to the equivalent output of the image in current use.

This is a lot more cumbersome to get the same information. Am I missing anything?

Perhaps it would be good to automatically commit a lock file as part of new builds?

Thank you for your great work on this repo!

@mathbunnyru
Copy link
Member Author

This is a lot more cumbersome to get the same information. Am I missing anything?

There are also wiki build manifests, but you will have to diff entire pages, which is also doable, but a bit more difficult, I suppose.

https://github.com/jupyter/docker-stacks/wiki

@slmg
Copy link

slmg commented Aug 10, 2021

Thank you that is useful. Yes I think a custom tool to diff build manifests might be my way to go moving forward.

@maresb
Copy link
Contributor

maresb commented Aug 10, 2021

Having everything in a git repo would make diffing easier, but at least all the data is already there in the wiki, and it's probably not worth the time at this point.

@parente
Copy link
Member

parente commented Aug 12, 2021

git clone https://github.com/jupyter/docker-stacks.wiki.git should work to get a local clone of the wiki, which is itself a git repo. I haven’t checked a diff to see how readable it is, but I’d guess it’s not too bad since the page is prepend only.

@slmg
Copy link

slmg commented Aug 12, 2021

Yes that's what I ended up doing 👍. Below is an excerpt of some doc I wrote (for scipy-notebook).

I am using vscode to produce rich diffs, though git diff --no-index does a good job too for those preferring to stay in the terminal.

How to select a new image

  1. Visit https://github.com/jupyter/docker-stacks/wiki
    and pick a build canditate for jupyter/scipy-notebook.

  2. Using the build commit id tag, check if anything significant changed
    via https://github.com/jupyter/docker-stacks/compare/current..new
    (repace current and new by respective tag commit id).
    Look for changes specifically in

    • base-notebook/Dockerfile
    • minimal-notebook/Dockerfile
    • scipy-notebook/Dockerfile
  3. Check diffs between current and new build manifests. Anything major?

    git clone --depth=1 https://github.com/jupyter/docker-stacks.wiki.git
    
    cd docker-stacks.wiki/manifests
    
    code --diff scipy-notebook-<current>.md scipy-notebook-<new>.md

@maresb
Copy link
Contributor

maresb commented Aug 12, 2021

The wiki clone stores the various versions of manifests in separate files, which is a git antipattern. One can do diffs by hand as in @slmg's solution, but you're not actually leveraging git except as a filestore. Ideally you'd have a single base-notebook.md file instead of base-notebook-[hash].md.

Given that one can do the diffs by hand, I don't personally have enough motivation to arrange the manifests into a true git repo. But I think it would be nice. Because you could for example use GitHub as an interface for browsing the diffs without having to work in a local clone.

@slmg
Copy link

slmg commented Aug 12, 2021

It would be nice. Though would it not make it hard to keep track of the original commit id from this repo? How could we determine which version to compare if only one base-notebook.md was present, given that the wiki would generate its own commit hash?

It would work fine if manifests were committed to this repo, with the commit hash used to tag images. Which would enable to fall back to a simple visit to https://github.com/jupyter/docker-stacks/compare/current..new

But as you mentioned that sounds like a a lot of work for little benefits, and the current process allows to get the relevant info without too much efforts. It's much better than my first thought of pulling GBs locally just to fiddle with conda list at least! 😄

@mathbunnyru
Copy link
Member Author

I think the best solution is to have both things - both base-notebook-[hash].md and base-notebook.md.
Users can choose whatever they want.
And also, put the original commit hash in the message, to make it easy to find.

@maresb
Copy link
Contributor

maresb commented Aug 12, 2021

@mathbunnyru great point! And that would probably be trivial to implement.

Unfortunately it seems that GitHub doesn't allow one to browse the files or view blame in a wiki repo. (It doesn't work to visit https://github.com/jupyter/docker-stacks.wiki/blob/master/Home.md.)

I wonder if it be difficult to mirror the wiki as a normal browsable GitHub repo?

@mathbunnyru
Copy link
Member Author

I wonder if it be difficult to mirror the wiki as a normal browsable GitHub repo?

I think that might be a bit too much :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:Maintenance A proposed enhancement to how we maintain this project
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants