Skip to content
This repository has been archived by the owner on May 2, 2024. It is now read-only.

Large user ID causes 600+ GB openedx image in Tutor #178

Closed
3 tasks
kdmccormick opened this issue Sep 11, 2023 · 4 comments · Fixed by overhangio/tutor#918
Closed
3 tasks

Large user ID causes 600+ GB openedx image in Tutor #178

kdmccormick opened this issue Sep 11, 2023 · 4 comments · Fixed by overhangio/tutor#918
Assignees

Comments

@kdmccormick
Copy link
Collaborator

kdmccormick commented Sep 11, 2023

Background

@rgraber 's openedx (and openedx-dev) images ended up ridiculously big and took an extremely long time to finish building. In particular, the layer created by the useradd command was accounting for the vast majority of the space.

We investigated, and noticed that her host's user ID (id -u) is ~2.1 billion. This user ID is passed into the Tutor openedx Dockerfile as APP_USER_ID , and is then used in the useradd command to create the image's app user. Manually passing a smaller APP_USER_ID returned the build time & image size to reasonable amounts.

Note: Tutor purposefully uses the host's user ID as the image's app user ID in order to avoid annoying file-owner thrashing when working with bind-mounted directories.

Reproduction:


# exporting layers should take several minutes
# final image size should be > 600 GB
tutor images build openedx-dev --build-arg APP_USER_ID=2000000000

# export layers should take < 10 seconds
# final image size should be < 3 GB
tutor images build openedx-dev --build-arg APP_USER_ID=1000

Tested on:

  • macOS with OrbStack
  • Ubuntu 22.04 on Linux 6.2.0-32-generic with Docker 24.0.5

Workaround

On systems where the host's user ID is too big, it can be overriden as so:

tutor images build openedx-dev --build-arg APP_USER_ID=1000

Unfortunately, this may cause file permissions issues when bind-mounting dirs. Haven't tested yet.

Open questions

  • Is the image actually taking up 700 GB of filesystem space, or is it just that Docker thinks the image is 700 GB?
  • Why is Becca (and Jeremy's) user IDs so big? Is it a macOS thing? Or maybe an IT endpoint management software thing?
  • What's max user ID that works OK? Perhaps 2^15, or 2^16?
  • Is this a Docker bug, or is unreasonable to expect that user IDs can be this big?

Tasks

  • Find the cause of the bug
  • Propose a solution
  • Implement it
@kdmccormick kdmccormick self-assigned this Sep 11, 2023
@rgraber
Copy link

rgraber commented Sep 11, 2023

Support for the idea that Docker just thinks it's 700G (screenshots from orbstack):

Screenshot 2023-09-11 at 1 33 10 PM Screenshot 2023-09-11 at 1 35 28 PM

@regisb
Copy link

regisb commented Sep 12, 2023

tl;dr: skip to the end.


I was able to reproduce this issue on Linux with this minimal example:

Dockerfile:

FROM docker.io/ubuntu:22.04 as minimal

ARG APP_USER_ID=1000
RUN useradd --home-dir /openedx --create-home --shell /bin/bash --uid ${APP_USER_ID} app
USER ${APP_USER_ID}

Completes in 1s:

docker buildx build --build-arg=APP_USER_ID=1000 --tag=userid1k .

340s and it's still running:

docker buildx build --build-arg=APP_USER_ID=2000000000 --tag=userid2g .

It's the "exporting layers" step which takes a very long time to complete, as you discovered yourself. This leads me to think that the bug may come from buildkit. So I attempted a build without buildkit:

$ DOCKER_BUILDKIT=0 docker build --build-arg=APP_USER_ID=2000000000 --tag=userid2g-nobuildkit .
DEPRECATED: The legacy builder is deprecated and will be removed in a future release.
            BuildKit is currently disabled; enable it by removing the DOCKER_BUILDKIT=0
            environment-variable.

Sending build context to Docker daemon  3.072kB
Step 1/4 : FROM docker.io/ubuntu:22.04 as minimal
 ---> 3b418d7b466a
Step 2/4 : ARG APP_USER_ID=1000
 ---> Using cache
 ---> e9f05e022397
Step 3/4 : RUN useradd --home-dir /openedx --create-home --shell /bin/bash --uid ${APP_USER_ID} app
 ---> Running in cfa158915dd4

And the build is now stuck on the useradd step. So maybe that the bug does not come buildkit.

On the other hand, this one-line docker run command completes without issue:

docker run --rm -it docker.io/library/ubuntu:22.04 useradd --home-dir /openedx --create-home --shell /bin/bash --uid 2000000000 app

The maximum allowed user ID in ubuntu 22.04 is 60k:

$ docker run --rm -it docker.io/library/ubuntu:22.04 grep ^UID_MAX /etc/login.defs
UID_MAX                 60000

but the build works find with APP_USER_ID=1000000 (10⁶):

$ docker buildx build --build-arg=APP_USER_ID=1000000 .

This is fascinating.


Aaaaaaand here's a similar report: https://stackoverflow.com/questions/73351423/docker-build-hangs-when-adding-user-with-a-large-value-of-user-id-uid 🍾

The upstream issue: moby/moby#5419 (2014!)

Aaaaaand the fix is to add --no-log-init to the useradd command, such that the /var/log/faillog does not take 60Gb...

Who wants to open a PR?

@CodeWithEmad
Copy link

I'd like to give it a try.

@kdmccormick
Copy link
Collaborator Author

Thanks Emad!

regisb pushed a commit to overhangio/tutor that referenced this issue Oct 11, 2023
On macOS, building the "openedx-dev" Docker image resulted in an image that required more than 600 GB of disk space. This was due to the `adduser` command which was called with a user ID of 2x10⁹ (on macOS only). This resulted in a very large /var/log/faillog file, hence the image size.

Related upstream discussion: moby/moby#5419
Close openedx-unsupported/wg-developer-experience#178
moonesque pushed a commit to edSPIRIT/tutor that referenced this issue Nov 20, 2023
On macOS, building the "openedx-dev" Docker image resulted in an image that required more than 600 GB of disk space. This was due to the `adduser` command which was called with a user ID of 2x10⁹ (on macOS only). This resulted in a very large /var/log/faillog file, hence the image size.

Related upstream discussion: moby/moby#5419
Close openedx-unsupported/wg-developer-experience#178
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants