An easy way to update packages versions #1153

mathbunnyru · 2020-08-25T20:43:11Z

I see a lot of commits just updating several packages.

It would be great to have a tool, which automatically finds all the versions which are old (with some blacklist maybe) and tries to update these versions and if tests pass, it will automatically commit the changes.

So we will always have latest (and greatest packages), and reduce a lot of manual work.

parente · 2020-08-26T02:07:15Z

One implementation thought: We could set up a GitHub Action workflow to run that tool on some cadence and open / close PRs based on test status.

romainx · 2020-08-26T04:31:05Z

Good idea, note that this tool already exist and can be used at least as a base. It is described in the contributor documentation.

$ make check-outdated/base-notebook

# INFO     test_outdated:test_outdated.py:80 3/8 (38%) packages could be updated
# INFO     test_outdated:test_outdated.py:82
# Package     Current    Newest
# ----------  ---------  --------
# conda       4.7.12     4.8.2
# jupyterlab  1.2.5      2.0.0
# python      3.7.4      3.8.2

romainx · 2020-10-20T15:20:31Z

I think one of the best way to do it would be to use a tool like dependabot. However it does not support conda, and its Docker support will not help on our use cases.

trallard · 2021-01-13T15:36:08Z

I think the idea of having a GitHub action to update the dependencies would be a good approach (and overcomes the limitations from dependabot).

I would be happy to give this a go

mathbunnyru · 2021-01-13T18:26:15Z

I think the idea of having a GitHub action to update the dependencies would be a good approach (and overcomes the limitations from dependabot).

I would be happy to give this a go

I think no one has been working on this issue for the past few months, so I say, give this a go, if you want to, it would be awesome :)

trallard · 2021-01-14T09:54:33Z

Great - will get working on this and create a draft PR as soon as possible

maresb · 2021-04-23T13:34:20Z

This is perhaps a silly question, but why pin the dependencies in the first place? Especially for scipy-notebook.

I've been gaining more experience with conda-forge, and it seems like a good strategy is to leave most things unpinned. When something unexpected happens, it's probably because of some error in the upstream dependencies, which can (and should) be fixed in the respective feedstock.

mathbunnyru · 2021-04-23T16:40:54Z

This is perhaps a silly question, but why pin the dependencies in the first place? Especially for scipy-notebook.

I've been gaining more experience with conda-forge, and it seems like a good strategy is to leave most things unpinned. When something unexpected happens, it's probably because of some error in the upstream dependencies, which can (and should) be fixed in the respective feedstock.

That's not a silly question at all.

I see several positive things in pinning versions:

Reproducibility. If we don't fix versions and build the same code at different times, it will give different results. It's not something that I expect (I have some background in C++). This is really important.
It's quite an easy strategy to rebuild the images - they are rebuilt if someone pushes an update. If we do not fix version, when do we build the images? (should it be every day or should we track dependencies? )
I had troubles when not fixing versions with the dependency resolution. It was happening a long time ago, I hope it's better now with conda.
People see which versions we're using and they can decide if they want to use the image or not.
Let's assume the situation when you try to change datascience-notebook and you haven't changed scipy-notebook at all. And, something has broken in the dependencies of scipy-notebook. And now, instead of dealing just with datascience-notebook, you have to change the code you didn't touch.
When something unexpected happens, it's probably because of some error in the upstream dependencies, which can (and should) be fixed in the respective feedstock.

But we can't say this to our users, right? So sometimes we will have to fix some versions.

maresb · 2021-04-23T20:11:13Z

@mathbunnyru, I used to think similarly, but my perspective has changed.

I think reproducibility is ultimately the responsibility of the end user, and that is easily achieved by pinning a Docker build number. Moreover, the current practice of pinning major/minor version numbers doesn't provide exact reproducibility. For that you'd need not only the patch number but also the conda-forge build number.

For exact reproducibility, I add the following command in my Dockerfile: conda env export > $CONDA_DIR/environment.yaml. From there, it's easy to generate a build artifact with (docker run --rm image-name cat /opt/conda/environment.yaml) > environment.yaml. I don't have any good ideas for how to publish it though... naively committing it would trigger an infinite loop in CI.

I do agree with your 5. While I think it's a fact of life that upstream dependencies will change and break things, I can see how pinning makes things more tame.

I'm not suggesting never to fix versions, just that fixing versions is overrated, and that environment.yaml may be a better way to guarantee reproducibility.

mathbunnyru · 2021-04-24T15:56:54Z

Thanks for your ideas @maresb.

It would be great to hear from @parente and @romainx

parente · 2021-04-24T20:09:05Z

The major.minor version pinning approach used here originated in the early days of conda-forge when it was extremely difficult to get a working build with the number of packages in these images. I think it's reasonable to experiment with an unpinned strategy today as long as users are informed about the change, there is a manifest of what actually got installed during a build (there is on the wiki), and active maintainers are ok with troubleshooting a potential decrease in build stability.

romainx · 2021-04-26T19:32:37Z

In fact if we not only to do it for conda dependencies but also for other parts of the stack like Ubuntu upstream image.

We should also change the build policy to switch to some kind of regular build (daily, weekly) vs build after a merge on the master branch.
The drawback is that the time that will not be spend in updating the images will certainly have to be spent in fixing the builds.
But I'm also Ok to give it a try 👍
Having everything correctly built at the first time will be a good indicator 😄

slmg · 2021-08-10T11:02:18Z

Late loyal user feedback on this.

Whilst I completely understand the reasons behind un-pinning packages versions, I believe it may have been a good idea to at least keep the major component. So switching from 'jupyterlab=3.0.16', 'scipy=1.7.*' to 'jupyterlab=3.*, 'scipy=1.*' as an example.

Most packages use semantic versioning, so that would in theory always get the latest non-breaking changes (which I believe is what everyone tends to want).

Personally I have been finding it very convenient to come here from time to time and check what changed (anything major?) via diff links like https://github.com/jupyter/docker-stacks/compare/b9f6ce795cfc..master.

People see which versions we're using and they can decide if they want to use the image or not.

Now that everything is going to be latest, visibility is lost. For user controlling their environment, the only way I see to check what change is to

Pick and download a newly built docker image
Run mamba list / pip list inside a container instance.
Save the previous command output (likely via docker cp) and compare it to the equivalent output of the image in current use.

This is a lot more cumbersome to get the same information. Am I missing anything?

Perhaps it would be good to automatically commit a lock file as part of new builds?

Thank you for your great work on this repo!

mathbunnyru · 2021-08-10T11:05:31Z

This is a lot more cumbersome to get the same information. Am I missing anything?

There are also wiki build manifests, but you will have to diff entire pages, which is also doable, but a bit more difficult, I suppose.

https://github.com/jupyter/docker-stacks/wiki

slmg · 2021-08-10T11:22:37Z

Thank you that is useful. Yes I think a custom tool to diff build manifests might be my way to go moving forward.

maresb · 2021-08-10T13:23:58Z

Having everything in a git repo would make diffing easier, but at least all the data is already there in the wiki, and it's probably not worth the time at this point.

parente · 2021-08-12T01:49:12Z

git clone https://github.com/jupyter/docker-stacks.wiki.git should work to get a local clone of the wiki, which is itself a git repo. I haven’t checked a diff to see how readable it is, but I’d guess it’s not too bad since the page is prepend only.

slmg · 2021-08-12T02:44:31Z

Yes that's what I ended up doing 👍. Below is an excerpt of some doc I wrote (for scipy-notebook).

I am using vscode to produce rich diffs, though git diff --no-index does a good job too for those preferring to stay in the terminal.

How to select a new image

Visit https://github.com/jupyter/docker-stacks/wiki
and pick a build canditate for jupyter/scipy-notebook.
Using the build commit id tag, check if anything significant changed
via https://github.com/jupyter/docker-stacks/compare/current..new
(repace current and new by respective tag commit id).
Look for changes specifically in
- base-notebook/Dockerfile
- minimal-notebook/Dockerfile
- scipy-notebook/Dockerfile

Check diffs between current and new build manifests. Anything major?

git clone --depth=1 https://github.com/jupyter/docker-stacks.wiki.git

cd docker-stacks.wiki/manifests

code --diff scipy-notebook-<current>.md scipy-notebook-<new>.md

maresb · 2021-08-12T12:23:15Z

The wiki clone stores the various versions of manifests in separate files, which is a git antipattern. One can do diffs by hand as in @slmg's solution, but you're not actually leveraging git except as a filestore. Ideally you'd have a single base-notebook.md file instead of base-notebook-[hash].md.

Given that one can do the diffs by hand, I don't personally have enough motivation to arrange the manifests into a true git repo. But I think it would be nice. Because you could for example use GitHub as an interface for browsing the diffs without having to work in a local clone.

slmg · 2021-08-12T13:03:08Z

It would be nice. Though would it not make it hard to keep track of the original commit id from this repo? How could we determine which version to compare if only one base-notebook.md was present, given that the wiki would generate its own commit hash?

It would work fine if manifests were committed to this repo, with the commit hash used to tag images. Which would enable to fall back to a simple visit to https://github.com/jupyter/docker-stacks/compare/current..new

But as you mentioned that sounds like a a lot of work for little benefits, and the current process allows to get the relevant info without too much efforts. It's much better than my first thought of pulling GBs locally just to fiddle with conda list at least! 😄

mathbunnyru · 2021-08-12T13:06:18Z

I think the best solution is to have both things - both base-notebook-[hash].md and base-notebook.md.
Users can choose whatever they want.
And also, put the original commit hash in the message, to make it easy to find.

maresb · 2021-08-12T13:27:24Z

@mathbunnyru great point! And that would probably be trivial to implement.

Unfortunately it seems that GitHub doesn't allow one to browse the files or view blame in a wiki repo. (It doesn't work to visit https://github.com/jupyter/docker-stacks.wiki/blob/master/Home.md.)

I wonder if it be difficult to mirror the wiki as a normal browsable GitHub repo?

mathbunnyru · 2021-08-12T13:35:21Z

I wonder if it be difficult to mirror the wiki as a normal browsable GitHub repo?

I think that might be a bit too much :)

parente added the type:Maintenance A proposed enhancement to how we maintain this project label Nov 29, 2020

This was referenced Jun 20, 2021

Support conda (python) package manager renovatebot/renovate#2213

Open

Automatically deduce package versions #1378

Merged

mathbunnyru added this to the Next Generation docker stacks milestone Jun 30, 2021

mathbunnyru closed this as completed in #1378 Aug 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

An easy way to update packages versions #1153

An easy way to update packages versions #1153

mathbunnyru commented Aug 25, 2020

parente commented Aug 26, 2020

romainx commented Aug 26, 2020

romainx commented Oct 20, 2020

trallard commented Jan 13, 2021

mathbunnyru commented Jan 13, 2021 •

edited

Loading

trallard commented Jan 14, 2021

maresb commented Apr 23, 2021

mathbunnyru commented Apr 23, 2021 •

edited

Loading

maresb commented Apr 23, 2021

mathbunnyru commented Apr 24, 2021 •

edited

Loading

parente commented Apr 24, 2021

romainx commented Apr 26, 2021 •

edited

Loading

slmg commented Aug 10, 2021 •

edited

Loading

mathbunnyru commented Aug 10, 2021

slmg commented Aug 10, 2021

maresb commented Aug 10, 2021

parente commented Aug 12, 2021

slmg commented Aug 12, 2021 •

edited

Loading

maresb commented Aug 12, 2021

slmg commented Aug 12, 2021

mathbunnyru commented Aug 12, 2021

maresb commented Aug 12, 2021

mathbunnyru commented Aug 12, 2021

An easy way to update packages versions #1153

An easy way to update packages versions #1153

Comments

mathbunnyru commented Aug 25, 2020

parente commented Aug 26, 2020

romainx commented Aug 26, 2020

romainx commented Oct 20, 2020

trallard commented Jan 13, 2021

mathbunnyru commented Jan 13, 2021 • edited Loading

trallard commented Jan 14, 2021

maresb commented Apr 23, 2021

mathbunnyru commented Apr 23, 2021 • edited Loading

maresb commented Apr 23, 2021

mathbunnyru commented Apr 24, 2021 • edited Loading

parente commented Apr 24, 2021

romainx commented Apr 26, 2021 • edited Loading

slmg commented Aug 10, 2021 • edited Loading

mathbunnyru commented Aug 10, 2021

slmg commented Aug 10, 2021

maresb commented Aug 10, 2021

parente commented Aug 12, 2021

slmg commented Aug 12, 2021 • edited Loading

How to select a new image

maresb commented Aug 12, 2021

slmg commented Aug 12, 2021

mathbunnyru commented Aug 12, 2021

maresb commented Aug 12, 2021

mathbunnyru commented Aug 12, 2021

mathbunnyru commented Jan 13, 2021 •

edited

Loading

mathbunnyru commented Apr 23, 2021 •

edited

Loading

mathbunnyru commented Apr 24, 2021 •

edited

Loading

romainx commented Apr 26, 2021 •

edited

Loading

slmg commented Aug 10, 2021 •

edited

Loading

slmg commented Aug 12, 2021 •

edited

Loading