Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal to minimize what TLJH installs in the user environment #872

Closed
consideRatio opened this issue Apr 9, 2023 · 9 comments · Fixed by #890
Closed

Proposal to minimize what TLJH installs in the user environment #872

consideRatio opened this issue Apr 9, 2023 · 9 comments · Fixed by #890

Comments

@consideRatio
Copy link
Member

We are installing the following in the user environment, but I think we should minimize this for various reasons.

  • The more we put here, the more complexity we take on as maintainers related to the software
    • We need to bump versions over time
    • We need to ensure upgrades works and clarify how they are meant to work etc
    • We may need to explain the "magic" of these installed packages to users who may not expect that for example tljh installs specifically ipywidgets==8

# When tljh.installer runs, the users' environment as typically found in
# /opt/tljh/user, is setup with these packages.
#
# WARNING: The order of these dependencies matters, this was observed when using
# the requirements-txt-fixer pre-commit hook that sorted them and made
# our integration tests fail.
#
# JupyterHub + notebook package are base requirements for user environment
jupyterhub==3.*
notebook==6.*
# Install additional notebook frontends!
jupyterlab==3.*
nteract-on-jupyter==2.*
# nbgitpuller for easily pulling in Git repositories
nbgitpuller==1.*
# jupyter-resource-usage to show people how much RAM they are using
jupyter-resource-usage==0.7.*
# Most people consider ipywidgets to be part of the core notebook experience
ipywidgets==8.*

I think we should install what jupyterhub needs to authenticate users, the jupyterhub package, but everything else is something I consider relevant for discussion.

  1. Should we install any Jupyter UI?
  2. Should we install any additional things like nbgitpuller, jupyter-resource-usage, and ipywidgets?

I lean towards 1) no and 2) no currently, but still help users get setup with a basic environment either via documentation only, but perhaps also a prompt of yes/no for installing for example jupyterlab.

@consideRatio consideRatio changed the title Minimize what TLJH installs in the user environment Suggestion to minimize what TLJH installs in the user environment Apr 9, 2023
@consideRatio consideRatio changed the title Suggestion to minimize what TLJH installs in the user environment Proposal to minimize what TLJH installs in the user environment Apr 9, 2023
@manics
Copy link
Member

manics commented Apr 10, 2023

I think we should always install a frontend, i.e. the latest JupyterLab release. I don't have an opinion on the other packages.

@yuvipanda what are your thoughts?

@consideRatio
Copy link
Member Author

I think we should always install a frontend, i.e. the latest JupyterLab release

Install, and upgrade, or just install once?

I think we should make it easy to get setup with one UI, like jupyterlab, but that we should be hands off for upgrades of tljh going onwards. Like an initial assistance to setup a user environment. If someone wants to not install jupyterlab and go rstudio directly, i think that should be fine without being forced to install jupyterlab, for example by allowing them to sa "no" to a prompt during initial setup.

Not disagreeing with you about facilitating setup of jupyterlab @manics, but there is nuance. Do you think it should be forced and/or something we also upgrade over time in the user environment, or rather something users opt-in to by default on initial setups?

@yuvipanda
Copy link
Collaborator

Thanks for opening this and thinking hard about the maintenance concerns, @consideRatio.

For useful context, https://words.yuvi.in/post/the-littlest-jupyterhub/ is the originating post for TLJH, and I believe fundamentally it sits in the place of 'I am now forced to be a sysadmin, but do not want to'. So as much as possible, TLJH needs to be usable at a basic level as much as possible immediately as installed. The current setup was achieved via in-person user testing, where I got someone who constantly finds themselves in this situation, and adjusted until it felt right.

I think given that need, I think the current set of packages (minus nteract) is what I would consider minimum for someone to recognize as JupyterHub. So I'd like to leave them as is. RStudio is currently not possible to setup on TLJH and I'm ok with that.

If the problem is that of bumping versions, then we can explore doing depandabot type stuff. But I'd really like it to not require further setup to get a basic notebook working after installation.

I can provide individual justifications for the other packages required if you would like as well.

@CagtayFabry
Copy link

Personally I would appreciate any option to keep the base environment as clean as possible 👍
(I tend to completely hide the base kernel from users as well and put everything into a dedicated working kernel)

I can see the appeal of a "ready-to-go" configuration directly after installing but having a clean environment makes the sysadmin side so much easier

maybe just a few more options (if wanted) to provide to the installer could be a way?

@yuvipanda
Copy link
Collaborator

I think the question is really one of defaults - folks should be able to uninstall whatever they wish. We don't put any scientific packages in there - just UI stuff that is all (IMO) minimum necessary for people who don't understand the ins and outs of how this works.

@manics
Copy link
Member

manics commented Apr 12, 2023

Install, and upgrade, or just install once?

Originally I was thinking install the user environment once, and only upgrade packages if they're no longer compatible. That's still tricky to get right though, in theory the conda metadata should handle everything (install jupyterhub-singleuser==JUPYTERHUB_VERSION, dependencies are upgraded if necessary).

However, based on this suggestion:

maybe just a few more options (if wanted) to provide to the installer could be a way?

Here's an alternative: add a --no-user-environment flag or similar:

  • The default behaviour would install pinned versions of conda, mamba, JupyterLab and any other packages we think should be installed by default.
  • If --no-user-environment is passed then the user environment isn't touched, the admin is entirely responsible for updating packages

@consideRatio
Copy link
Member Author

[...] and only upgrade packages if they're no longer compatible. That's still tricky to get right though, [...]

The more I've thought about this, I'm now grown opinionated towards not upgrading things like ipywidgets or trying to figure out if things are compatible or not - because we just can't say much. We could say that the packages we have managed are internally compatible, but ipywidgets for example is a dependency for all kinds of things that the user may have installed - so there is simply no way we can upgrade this safely.

If we would touch the user env more than needed, and that could break something for a user, I would feel responsible for it as a maintainer if a issue report comes in. I really want to avoid that! I believe its essential for the maintenance of this repo, and personally for my motivation to maintain it, that we draw a clear line so that that we don't get responsible for issues in the user env.

With this said, let's separate two questions for now:

  1. what do we do in the user env during initial setup?
  2. what do we do in the user env during upgrades of tljh?

I think the second question must be answered to unblock a release.

My opinion about the second is to do as little as possible during an upgrade in the user env as we just can't know whats breaking and not. In practice, I figure it would mean to ensure a jupyterhub version compatible with the hub environment (note there are: system env, hub env, and user env).

@manics and @yuvipanda what do you think about my suggestion of not touching anything more than needed in the user env during upgrades of a TLJH installation?

@consideRatio
Copy link
Member Author

Current situation

We and pip install --upgrade -r requirements-base.txt during initial TLJH installs and but also during TLJH upgrades.

conda.ensure_pip_requirements(
USER_ENV_PREFIX,
os.path.join(HERE, "requirements-base.txt"),
upgrade=True,
)

# When tljh.installer runs, the users' environment as typically found in
# /opt/tljh/user, is setup with these packages.
#
# WARNING: The order of these dependencies matters, this was observed when using
# the requirements-txt-fixer pre-commit hook that sorted them and made
# our integration tests fail.
#
# JupyterHub + notebook package are base requirements for user environment
jupyterhub==3.*
notebook==6.*
# Install additional notebook frontends!
jupyterlab==3.*
# nbgitpuller for easily pulling in Git repositories
nbgitpuller==1.*
# jupyter-resource-usage to show people how much RAM they are using
jupyter-resource-usage==0.7.*
# Most people consider ipywidgets to be part of the core notebook experience
ipywidgets==8.*

@minrk
Copy link
Member

minrk commented Apr 18, 2023

I like @consideRatio's proposal about reducing what we do on upgrade, but not changing what we install to start or by default.

I think this relates to #858

I think tljh is mostly designed as a starting point, and the tljh installer is not generally expected to be run again after that. I know it works now, but I think we should probably think more carefully about what the installer does as an 'upgrade' step vs. initial install.

We have at least these bare minumum requirements:

hub env:

  • jupyterhub with semi-strict
  • some selection of common authenticators (at least native since it's default, probably oauth)

user env (bare minimum):

  • jupyterhub matches hub env
  • jupyter-server (with possible lower bound for compatibility)

But also a minimum set to be 'basically functional' to start, but for which there are alternatives and we have to pick one (or more):

  • jupyterlab
  • nbclassic and/or notebook
  • ipywidgets

and we have these packages which have been selected for tljh specifically because they reduce maintenance burden of a tljh instance for the target users:

  • nbgitpuller
  • jupyter-resource-usage

Note that tljh is not THE way to install jupyterhub on a single server. If you're a sysadmin, setting up jupyterhub isn't a huge task, and you can make all the choices you want. You can even use tljh purely as a convenient starting point, so any choices it makes to start are safely ignored/reverted after the fact. It's specifically for the use cases @yuvipanda described in the blog linked above.

With all that in mind, I think it makes sense to follow @consideRatio's proposal for tljh "upgrade" to only affect actual compatibility problems, and never "reset" the user env. I see two (orthogonal) possible proposals to address this:

  • tljh installer in 'upgrade mode' (detecting an existing env), only checks required dependencies, and only uses >= version checks. e.g.
    • if jupyterhub>=3 is present in the Hub env, take no action
    • If ipywidgets has been removed from the user env, take no action
    • if jupyterhub<3 is present, or no jupyterhub is present, install or upgrade it
  • if folks think it's necessary, add an apt-style --no-install-recommends that omits the extensions.
    • alternately, a --no-user-env as @manics suggested to skip the user env entirely, leaving it up to the admin (e.g. if using DockerSpawner, the user env is not used)

I think the latter proposal doesn't solve much if upgrade only checks for compatibility instead of resetting the env, so I'd suggest we wait on that until after we see a need once upgrade is less disruptive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants