Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create base image to base-notebook for non-server Jupyter applications #1825

Merged
merged 20 commits into from
Nov 12, 2022

Conversation

kevin-bates
Copy link
Member

@kevin-bates kevin-bates commented Nov 2, 2022

Describe your changes

This pull request introduces the image base-jupyter docker-stacks-foundation from which base-notebook is derived. This enables the ability for other Jupyter-related images (e.g., Jupyter kernel or nbclient applications) to be built which adhere to the same infrastructure found across the images.

This exercise was a bit more involved than I originally anticipated, but I guess I'm not surprised given the nature of what is actually provided here. I'm impressed with the build system you've assembled! Along the way, I ran into some questions and needed to make decisions, which I've attempted to document below, and I anticipate other issues arising during your review and the ultimate completion of the tests.

Issue ticket if applicable

Resolves: #1809

Checklist (especially for first-time contributors)

  • I have performed a self-review of my code
  • If it is a core feature, I have added thorough tests
    • Tests were refactored from base-notebook since this image is a direct subset.
  • I will try not to use force-push to make the review process easier for reviewers
  • I have updated the documentation for significant changes

Questions and decisions:

(There are a number of naming decisions here, none of which I hold a strong affinity to, so don't hesitate to suggest changes.)

  • I wanted to name the jovyan-base (seemed like a play on the 'solar system' aspect) but felt the base-prefix precedent was already set by base-notebook - so went with base-jupyter rather than base-jovyan.

  • base-notebook/Dockerfile: Should argument ROOT_CONTAINER be changed to BASE_CONTAINER like other "derived" Dockerfiles? Left as ROOT_CONTAINER for b/c purposes?

  • base-notebook/Dockerfile: Should arguments NB_USER, NB_UID, and NB_GID be preserved in base-notebook/Dockerfile for b/c purposes?

  • base-notebook/Dockerfile: mamba install does not support --root-prefix (unlike micromamba install) so it was removed.

  • test_container_options.py was split into test_user_options.py (for uid, chown tests) in base-jupyter and left as test_container_options.py for port-based tests in base-notebook.

  • The base-jupyter test test_package_managers.py removed npm and test_npm_package_manager.py added to base-notebook tests since npm is not installed until Lab is installed.

  • The base-jupyter test test_user_options.py - changed the hidden folder named '.jupyter' to 'work' since .jupyter doesn't exist in base-jupyter (as a function of creating configuration in base-notebook).

  • GH workflow, hub-overview.yml unchanged since base-jupyter doesn't apply.

Current build status (from tests within my fork)

Have hit two CI snags (that I'm aware of):

  • Building base-notebook gets when attempting to pull the new base image: jupyter/base-jupyter.
ERROR: pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed
  • Doc builds fail link check - seems like a chicken/egg issue since this location won't exist until after the merge.
( using/selecting: line   21) broken    https://github.com/jupyter/docker-stacks/commits/main/base-jupyter/Dockerfile - 404 Client Error: Not Found for url: https://github.com/jupyter/docker-stacks/commits/main/base-jupyter/Dockerfile
4. Create a new repository in the jupyter org on Docker Hub named after the stack folder in the git repo.

5. Grant the stacks team permission to write to the repo.

@mathbunnyru
Copy link
Member

mathbunnyru commented Nov 2, 2022

I'm impressed with the build system you've assembled!

Thank you! This means a lot because many attempts have failed before I came up with this and I've spent much time implementing it this way ❤️
Overall, I'm proud of how it works now (we currently don't have any issues related to the build system, which is awesome).

I will try to review this PR in the next two days. I appreciate, that you have a nice list of all decisions you made here 👍

@mathbunnyru
Copy link
Member

  • I wanted to name the jovyan-base (seemed like a play on the 'solar system' aspect) but felt the base-prefix precedent was already set by base-notebook - so went with base-jupyter rather than base-jovyan.

I don't have strong opinion on this one. I may suggest jovyan-init, so in the end it will be jupyter/jovyan-init.

  • base-notebook/Dockerfile: Should argument ROOT_CONTAINER be changed to BASE_CONTAINER like other "derived" Dockerfiles? Left as ROOT_CONTAINER for b/c purposes?

In the most root container (base-jupyter ) let's have ROOT_CONTAINER, and in the other ones (including base-notebook) let's have BASE_CONTAINER.
The purpose of ROOT_CONTAINER is to make it possible for others to build our stacks on completely different OS.

  • base-notebook/Dockerfile: Should arguments NB_USER, NB_UID, and NB_GID be preserved in base-notebook/Dockerfile for b/c purposes?

No, let's remove them. Preserving them might confuse users, who think it's ok to change them in minimal (but it will be too late).

  • base-notebook/Dockerfile: mamba install does not support --root-prefix (unlike micromamba install) so it was removed.

👌

  • test_container_options.py was split into test_user_options.py (for uid, chown tests) in base-jupyter and left as test_container_options.py for port-based tests in base-notebook.

👌

  • The base-jupyter test test_package_managers.py removed npm and test_npm_package_manager.py added to
    base-notebook tests since npm is not installed until Lab is installed.

Let's maybe create a small function which runs docker for one particular package manager and import it from both tests?

  • The base-jupyter test test_user_options.py - changed the hidden folder named '.jupyter' to 'work' since .jupyter doesn't exist in base-jupyter (as a function of creating configuration in base-notebook).

Could you maybe check for another hidden folder?

  • GH workflow, hub-overview.yml unchanged since base-jupyter doesn't apply.

Please, let's update this file as well, because we would like to have a nice description for this docker image as well, because it will be pushed to DockerHub as well.

base-notebook/Dockerfile Outdated Show resolved Hide resolved
base-notebook/Dockerfile Show resolved Hide resolved
base-notebook/Dockerfile Outdated Show resolved Hide resolved
docs/using/selecting.md Outdated Show resolved Hide resolved
docs/using/selecting.md Outdated Show resolved Hide resolved
docs/using/selecting.md Show resolved Hide resolved
docs/using/selecting.md Outdated Show resolved Hide resolved
tests/base-jupyter/test_user_options.py Outdated Show resolved Hide resolved
@mathbunnyru
Copy link
Member

One more thing that bothers me is the image size increase.
main: https://github.com/jupyter/docker-stacks/actions/runs/3369124854
This PR: https://github.com/jupyter/docker-stacks/actions/runs/3378464310

For example, all-spark-notebook-aarch64 is 170MB more heavy now.
It would be nice to understand the reason for this size increase.
I would highly recommend using dive to investigate the problem.

@consideRatio
Copy link
Collaborator

Delibaration on the image name

I'm thinking quite a bit on the naming of the new image. What do you think about docker-stacks-foundation? The final image name would become jupyter/docker-stacks-foundation.

I value if it becomes quite clear for whoever ends up seeing the name can guess that this is a opinionated functionality defined in this project, and not something one can read and learn about in any jupyter project. I've quite a few times found myself communicating that NB_USER, fix-permissions, etc stems from this project and the images provided here and can't be assumed unless they are used as base images.

@kevin-bates
Copy link
Member Author

@mathbunnyru - thank you for the review.

I'm going to hold off with the image name change until a consensus has been reached.

Regarding the image size differences, that is interesting. Could you please share how you're accessing the image sizes? I found looking into the "Load image to docker" step of the links you sent shows some sizes and found the following:

From main:

jupyter/base-notebook   latest      867365114988   About an hour ago   829MB
jupyter/minimal-notebook   latest      ef8645ce888f   About an hour ago   1.3GB
jupyter/scipy-notebook   latest      6961658de632   56 minutes ago   2.77GB
jupyter/datascience-notebook   latest      68957e163b30   41 minutes ago   4.27GB
jupyter/pyspark-notebook   latest      14b34fd8f90c   42 minutes ago   3.58GB
jupyter/all-spark-notebook   latest      d85b72773bde   19 minutes ago   4.39GB

From this PR:

jupyter/base-jupyter   latest      30a1db4fcb94   2 hours ago     317MB
jupyter/base-notebook   latest      7b84b88c6f39   2 hours ago     839MB
jupyter/minimal-notebook   latest      3865070c8ea2   About an hour ago   1.31GB
jupyter/scipy-notebook   latest      809a6cfb0244   55 minutes ago   2.78GB
jupyter/datascience-notebook   latest      28b509278a60   40 minutes ago   4.37GB
jupyter/pyspark-notebook   latest      39fdeb7453e5   41 minutes ago   3.76GB
jupyter/all-spark-notebook   latest      fd97f2e27882   21 minutes ago   4.57GB

From what I can see, the introduction of base-jupyter adds 10MB to the size of base-notebook and that increase is consistent for all images until pyspark-notebook in which we see the 180MB increase. As a result, I don't understand how the 180MB increase on the pyspark-notebook (multiple dependencies later) would be attributed to the introduction of a new base image when previous dependencies increased by the consistent 10MB.

FWIW - the size changes are consistent across the x86_64 images as well.

Are you asking me to "dive" into why there's a 10MB increase or figure out the increase due to spark (and/or perhaps pyarrow) installs across main and this PR?

@kevin-bates
Copy link
Member Author

FWIW - when I build the images locally (using make build/image) - from my pr branch and main - I get the following (tags designate the build branch):

jupyter/base-notebook                                        main                   c57ee58daeb1   35 seconds ago   979MB
jupyter/base-notebook                                        pr                     3e7ea8e50493   3 minutes ago    979MB
jupyter/base-jupyter                                         pr                     241f602904ba   4 minutes ago    366MB

No increase.

I'll build all the images locally and see if their sizes are consistent with those listed in the previous comment from the main branch.

Are the images built during a given PR pushed anywhere so they can be analyzed from that build?

docs/using/selecting.md Outdated Show resolved Hide resolved
@mathbunnyru
Copy link
Member

Could you please share how you're accessing the image sizes?
Are the images built during a given PR pushed anywhere so they can be analyzed from that build?

The answer is the same here.
You need to open summary page for the builds.
https://github.com/jupyter/docker-stacks/actions/runs/3394923824

Under the graph, you will see all the artefacts.
Furthermore, you can download them and analyze them locally.
The retention period is 3 days. I set it this way to not waste too much resources and be ably to debug builds if they go wrong.

No increase.

I will rebuild images in main branch now, so you will have fresh builds.
The size increase might be the reason of some external changes (which is fine).
But if this time the size differs again, then we will have to investigate this.

@mathbunnyru
Copy link
Member

@mathbunnyru
Copy link
Member

mathbunnyru commented Nov 4, 2022

Are you asking me to "dive" into why there's a 10MB increase or figure out the increase due to spark (and/or perhaps pyarrow) installs across main and this PR?

10MB size increase is perfectly fine.
If we see 180MB increase again for pyspark again, I would like to ask you to dive into this.

Co-authored-by: Ayaz Salikhov <mathbunnyru@users.noreply.github.com>
@mathbunnyru
Copy link
Member

main: https://github.com/jupyter/docker-stacks/actions/runs/3395647439
all-spark-notebook-aarch64 | 4.35 GB

PR: https://github.com/jupyter/docker-stacks/actions/runs/3396469599
all-spark-notebook-aarch64 | 4.35 GB

I think this issue is resolved 🎉

@@ -130,206 +78,27 @@ def test_nb_user_change(container: TrackedContainer) -> None:
), f"Hidden folder .jupyter was not copied properly to {nb_user} home folder. stat: {output}, expected {expected_output}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry that I wasn't clear about this one.

Let's move everything except the hidden folder test (the last part of this function) to the jupyter-base tests and keep hidden folder test here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok - that's what my previous commit did, so I think we're good. Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like you to split test_nb_user_change function into two parts.
The first part works perfectly fine in foundation image and should go to the appropriate test folder.
The second one only checks that the .jupyter folder is properly copied for base-notebook image and this part of this function stays in base-notebook tests.

There will be some small duplication how we start the container, but it's fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that. Re-reading your original comment - this is what you were getting at. So there's only a constraint on the names of the test files, but not on the names of the test functions themselves?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I don't know the restrictions of pytest 😢
But I know, that there is an ability to run one particular test in pytest, so I would definitely try to use different function names for different tests.

Copy link
Member

@mathbunnyru mathbunnyru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🎉

The only thing left to be done is to create DockerHub image.
I will do it as soon as I get right permissions there.
I hope it will happen next week.
jupyterhub/team-compass#559

@kevin-bates
Copy link
Member Author

Hi @mathbunnyru. I've split test_nb_user_change between the foundation and base-notebook tests where the latter only tests the existence and permissions of the .jupyter folder. Rather than remove the check for .jupyter in the test_nb_user_change test in docker-stacks-foundation I replaced it with a check against the work folder since the test hadn't checked any sub-folders and felt that was a reasonable thing to do at the "foundation" level.

Also, thank you for the fix-up commits. I had noticed the alphabetical ordering the first time but completely forgot that the name change would lead to a position change - so thanks for correcting that. I was too focused on the replacement activity.

Copy link
Member

@mathbunnyru mathbunnyru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @kevin-bates!
I'm absolutely happy with this PR now.
So, let's wait until I get permissions, and then I will merge this.

@consideRatio please, take a look at this PR as well, please.

@mathbunnyru mathbunnyru closed this Nov 9, 2022
@mathbunnyru mathbunnyru reopened this Nov 9, 2022
@mathbunnyru
Copy link
Member

Closed and reopened the PR to trigger CI once again (there were some network failures).
And now I have permissions in the jupyter repo, which means I can create new images.
I will try to do it before the end of this week.

Copy link
Collaborator

@consideRatio consideRatio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, amazing work on this @kevin-bates and @mathbunnyru!! Super happy to see the build system manages to handle it!

@mathbunnyru
Copy link
Member

mathbunnyru commented Nov 12, 2022

@mathbunnyru
Copy link
Member

mathbunnyru commented Nov 12, 2022

My plan is the following:

  • Create completely empty repo
  • Update the token to make sure it will work for a new repo as well~
  • Squash-merge this PR
  • Check the CI worked fine.
  • Check README and DockerHub description is updated properly.

@mathbunnyru mathbunnyru merged commit 10e52ee into jupyter:main Nov 12, 2022
@mathbunnyru
Copy link
Member

@kevin-bates @consideRatio and it worked 🎉
Just two small fixes after the merge, it is really solid work here 👍

https://github.com/jupyter/docker-stacks/actions/runs/3452686673

@kevin-bates
Copy link
Member Author

Awesome - thank you for your help and guidance @mathbunnyru!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ENH] - Introduce base image from which base-notebook derives
3 participants