-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Minor Docker fixes & podman "rootless" container support (see comments) #2722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…fault Debian/Ubuntu/etc have a package called python-is-python3 tho.
…er.io podman requires specifying docker.io-- doesn't default to it.
…hem. Shouldn't hurt docker tho
…cker" build.sh: When creating the volume, grab its path and do a "podman unshare" to give the running "appuser" access to the volume's files. Without it, you'd have to run the rootless user as "root" in the container, and no one wants that. run.sh: "--platform" seems to have problems with podman. This may actually be an issue with arm64 as I'm trying both at the same time. Regardless, I unset it on podman to keep behavior the same for docker.
Instead of chmodding w/a mountpoint path, set the ownership of /data and /data/outputs from inside the container w/a one-time root login. This is the "proper" method that doesn't rely on a hard path, for future compatibility. Moved the creation of ./outputs to here as it should probably only be done 1x and because it needs to exist to set up the correct ownership for podman
This removes some docker/buildkit-specific lines that break on Podman... for now. See: containers/buildah#4325 containers/buildah#3815 This is a painful patch, but without using another Dockerfile for Podman, I couldnt find a non-convoluted alternative.
| --mount=type=cache,target=/var/cache/apt,sharing=locked \ | ||
| --mount=type=cache,target=/var/lib/apt,sharing=locked \ | ||
| apt-get update \ | ||
| RUN apt-get update \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you remove the build cache?
This question of course also applys to all further removements of the build cache, but not adding the same comment now x times 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the notes above and in f70fb02 it breaks podman unfortunately.
docker/Dockerfile
Outdated
| build-essential=12.9 \ | ||
| gcc=4:10.2.* \ | ||
| python3-dev=3.9.* | ||
| python3-dev>=${PYTHON_VERSION} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
last time I checked there was no version of the python 3.10 headers available for the slim image, has this changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dunno-- it builds fine :) I'll change back to 3.9 tho
docker/Dockerfile
Outdated
| # syntax=docker/dockerfile:1 | ||
|
|
||
| ARG PYTHON_VERSION=3.9 | ||
| ARG PYTHON_VERSION=3.10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the security score of the Python3.9 container is much better than the one of 3.10 I would prefer to stay on 3.9 until they fixed this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay-- wasn't sure why it was at 3.9 because the other docs said it was tested on 3.10 and known to be working, but I can change it back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is indeed working (even when using the 3.9.* python-dev package on python 3.10), but the 3.9 image had only 25% of the security issues I got with 3.10. Since there are people which use this image in the cloud, security should be a high prio 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha... I'm going back to 3.9....
Quick question-- I presume you have this running w/a gpu. To run it on my machine w/nvidia support I needed to rely on a base image that supported cuda, namely nvidia/cuda... I was trying to figure out-- where/how does this docker image get the gpu support from?
docker/Dockerfile
Outdated
| python3-dev>=${PYTHON_VERSION} | ||
|
|
||
| # prepare pip for buildkit cache | ||
| ARG APPNAME=InvokeAI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
already defined in previous stage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but in podman the args get cleared out at every stage. I don't know why but it took me longer than I care to admit to figure that out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this is producing a "huge" maintanance overheap (one spot to change a default value vs three spots)
So if rly necesarry, then plz create env variables from the arguments in the base image and reuse those in the later stages, while cleaning them out to not polute the container
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok I can try that.
| --upgrade-deps | ||
|
|
||
| # copy sources | ||
| COPY --link . . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you remove the --link?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See f70fb02. Breaks podman. I hate it too :(
docker/Dockerfile
Outdated
|
|
||
| # Create a new user | ||
| ARG APPDIR=/usr/src | ||
| ARG APPNAME=InvokeAI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both already defined in previous stage
| -U \ | ||
| "${UNAME}" | ||
| "${UNAME}" \ | ||
| -u 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to define the UID, isn't it enough to have a user group created with the similar name of the user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for podman-- the way that podman works with rootless containers is that it uses another uid that is distinct from the uid of the user running the container on the host. In this case, the uid/guid of 1000:1000 in the container is actually something like 100999:100999 in the mounts/volumes. Or at least that's what you WANT it to be. So I needed to explicitly set that when the account is created so that the user can be run correctly. (I could be wrong and there might be another way to do this, but if there is I don't know it. I could run invokeai in the container as the "root" user rather than as appuser, at which point there would no longer be file access issues, but this seems like a bad idea.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just built the current main branch locally and executed some commands:
./docker/build.sh
Activated virtual environment: /Users/mauwii/git/mauwii/InvokeAI/.venv
You are using these values:
Dockerfile: ./Dockerfile
index-url: https://download.pytorch.org/whl/cpu
Volumename: invokeai_data
Platform: linux/arm64
Container Registry: ghcr.io
Container Repository: mauwii/invokeai
Container Tag: main-cpu
Container Flavor: cpu
Container Image: ghcr.io/mauwii/invokeai:main-cpu
Volume already exists
[+] Building 178.5s (23/23) FINISHED
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 2.77kB 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 35B 0.0s
=> resolve image config for docker.io/docker/dockerfile:1 2.3s
=> [auth] docker/dockerfile:pull token for registry-1.docker.io 0.0s
=> docker-image://docker.io/docker/dockerfile:1@sha256:39b85bbfa7536a5feceb7372a0817649ecb2724562a38360f4d6a7782a409b14 3.8s
=> => resolve docker.io/docker/dockerfile:1@sha256:39b85bbfa7536a5feceb7372a0817649ecb2724562a38360f4d6a7782a409b14 0.0s
=> => sha256:39b85bbfa7536a5feceb7372a0817649ecb2724562a38360f4d6a7782a409b14 8.40kB / 8.40kB 0.0s
=> => sha256:7f44e51970d0422c2cbff3b20b6b5ef861f6244c396a06e1a96f7aa4fa83a4e6 482B / 482B 0.0s
=> => sha256:a28edb2041b8f23c38382d8be273f0239f51ff1f510f98bccc8d0e7f42249e97 2.90kB / 2.90kB 0.0s
=> => sha256:9d0cd65540a143ce38aa0be7c5e9efeed30d3580d03667f107cd76354f2bee65 10.82MB / 10.82MB 3.1s
=> => extracting sha256:9d0cd65540a143ce38aa0be7c5e9efeed30d3580d03667f107cd76354f2bee65 0.6s
=> [internal] load .dockerignore 0.0s
=> [internal] load build definition from Dockerfile 0.0s
=> [internal] load metadata for docker.io/library/python:3.9-slim 0.0s
=> [internal] load build context 0.1s
=> => transferring context: 3.36MB 0.0s
=> [python-base 1/4] FROM docker.io/library/python:3.9-slim 0.0s
=> CACHED [python-base 2/4] RUN rm -f /etc/apt/apt.conf.d/docker-clean && echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' >/etc/apt/apt.conf.d/keep-cache 0.0s
=> CACHED [python-base 3/4] RUN --mount=type=cache,target=/var/cache/apt,sharing=locked --mount=type=cache,target=/var/lib/apt,sharing=locked apt-get update && apt-get in 0.0s
=> CACHED [python-base 4/4] WORKDIR /usr/src 0.0s
=> CACHED [pyproject-builder 1/6] RUN --mount=type=cache,target=/var/cache/apt,sharing=locked --mount=type=cache,target=/var/lib/apt,sharing=locked apt-get update && apt- 0.0s
=> CACHED [pyproject-builder 2/6] RUN mkdir -p /var/cache/buildkit/pip 0.0s
=> CACHED [pyproject-builder 3/6] RUN --mount=type=cache,target=/var/cache/buildkit/pip,sharing=locked python3 -m venv "InvokeAI" --upgrade-deps 0.0s
=> [pyproject-builder 4/6] COPY --link . . 0.1s
=> [pyproject-builder 5/6] RUN --mount=type=cache,target=/var/cache/buildkit/pip,sharing=locked "InvokeAI/bin/pip" install . 148.7s
=> [pyproject-builder 6/6] RUN python3 -c "from patchmatch import patch_match" 3.9s
=> CACHED [runtime 1/3] RUN useradd --no-log-init -m -U "appuser" 0.0s
=> CACHED [runtime 2/3] RUN mkdir -p "/data" && chown -R "appuser" "/data" 0.0s
=> [runtime 3/3] COPY --chown=appuser --from=pyproject-builder /usr/src/InvokeAI InvokeAI 11.0s
=> exporting to image 4.8s
=> => exporting layers 4.7s
=> => writing image sha256:d274038b0dd470a06f4bcfb8da22fb1fbe071c73ca947d96ef82c5e346dbf62b 0.0s
=> => naming to ghcr.io/mauwii/invokeai:main-cpu 0.0s
Use 'docker scan' to run Snyk tests against images to find vulnerabilities and learn how to fix them
~/git/mauwii/InvokeAI main ± docker run --rm --interactive --tty --entrypoint=/bin/bash ghcr.io/mauwii/invokeai:main-cpu
appuser@299ed35c86f9:/usr/src$ id -u
1000
appuser@299ed35c86f9:/usr/src$ id -g
1000
appuser@299ed35c86f9:/usr/src$ whoami
appuser
appuser@299ed35c86f9:/usr/src$ apt-get update
Reading package lists... Done
E: List directory /var/lib/apt/lists/partial is missing. - Acquire (13: Permission denied)
appuser@299ed35c86f9:/usr/src$ sudo apt-get update
bash: sudo: command not found
appuser@299ed35c86f9:/usr/src$- no sudo for building the container
- appuser already has uid 1000
- no sudo inside the container
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inside the container the uid, guid is 1000. Same as w/podman. But if this is rootless docker, try to touch /data/outputs/testfile inside the container, and then jump out and look at the uid/guid of the file in ./outputs . With podman, it's something other than 1000, like some big #. I believe with rootless docker it's the same, which is why you'd use newuidmap and newguidmap. Although I don't know much about rootless docker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, just out of curiosity, are you running docker run from user 1000/1000? What happens if you run it rootlessly from 1001/1001? Will that affect the file ownership of files you create?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since /data/outputs is mounted as a bindmount, on my local file system the file permissions are set for my current user (501:20 / mauwii:staff), while in the container they are mounted with permissions set to 1000:1000 / appuser:appuser.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh... that's very different than rootless podman, where multiple users running in the container would each have their own uid/guid. If you added a second user, 1001:1001 in the container and created a file, would it still appear as 501:20 outside the container?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why should it be different to creating a new user with 1000:1000????
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fat-tire is correct, if the user in container is 1000, then in general the files it creates on the bind mounted volume will also be owned by uid=1000. Is it possible that docker on mac changes ownership to the current user for convenience? If so, that's not a standard or generally expected behaviour.
| then | ||
| ARCH=arm64 | ||
| fi | ||
| if [ $ARCH == "x86_64" ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had no problems with neither aarch64 nor x86_64 (tested on a M1 and a I7)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why but both could not find the repository when I tried those arch names until I changed it to the ones displayed on the docker hub site. Also, on podman you have to specify docker.io fwiw
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the missing docker.io in the base image tag is totally my fault and the dockerfile is much "cleaner" with the registry preponed to the base-image tag 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heh sounds good
docker/env.sh
Outdated
| # docker is the default container engine, but should work with "podman" (rootless) too! | ||
| if [[ -z "${CONTAINER_ENGINE}" ]]; then | ||
| CONTAINER_ENGINE="docker" | ||
| fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not using CONTAINER_ENGINE=${CONTAINER_ENGINE:-docker}?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I'm a dummy. I can fix that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah - didn't mean it like that, but since I am always interested in learning new tricks I thought there could be a reason for the if statement 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, no I really am a dummy. Standby for a new commit coming soon!
| docker run \ | ||
| if [[ "${CONTAINER_ENGINE}" == "podman" ]]; then | ||
| PODMAN_ARGS="--user=appuser:appuser" | ||
| unset PLATFORM #causes problems |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if PLATFORM causes problems, then maybe runpod is not buildkit compatible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like podman is stlil behind when it comes to buildkit. The good news is that podman 4.1.1 has caching mount support for example. The bad news is debian 11 includes 3.0.1 and even Ubuntu 22.10 (the latest release) uses podman version 3.4.4. The author of that article even suggests using actual buildkit with podman, then says wait never mind it doesn't work very well.
(Since you have security as a primary concern, I recommend considering trying podman as the container is run and managed by a local user (and the container's user is ALSO a local user.) So even if someone breaks out of the container's local user to root user, then they break out of the container entirely, they're STILL constrained within a user process.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I run docker rootless and the user in the container runtime is also not having root permissions 🙈
Please try if your problems are resolved when pulling the built image from https://docs.docker.com/engine/security/rootless/ which would be much better than removing all those features from the Dockerfile 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I'll try it. I'm also adding a commit with everything discussed so far that I can change and not break the build.
Docker can also be used rootless: https://docs.docker.com/engine/security/rootless/
People where already using this image with the cuda runtime: https://invoke-ai.github.io/InvokeAI/installation/040_INSTALL_DOCKER/
But it is rly no option to remove the build cache which is not only used by our CI/CD. Also it would be nice to have the linked copy job and not need to change default values for one build argument in three stages. And btw: I always make sure that the built image is compatible with https://www.runpod.io, so maybe your problems could already be solved by pulling the built image from https://hub.docker.com/r/invokeai/invokeai instead of building it locally |
|
Does runpod use podman or docker? I can try to pull the full image from docker hub and see what happens. The build issues at least should not be a facgtor. But I think I'll be stuck w/cpu until I can figure out how to use the existing image with the cuda runtime. In the meantime, I may as well push the requested changes here, and you can decide if you want to use any parts of it. If not, I can always just host a "Podman/CUDA"-specific version for my own use. No big deal. |
The latest tag is built for CUDA |
| ${PLATFORM+--platform="${PLATFORM}"} \ | ||
| --name="${REPOSITORY_NAME,,}" \ | ||
| --hostname="${REPOSITORY_NAME,,}" \ | ||
| --mount=source="${VOLUMENAME}",target=/data \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"mount=source=" is that valid syndax? Podman was confused by it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was working there, never used Podman 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay I looked in the documentation and didn't see it... maybe type defaults to volume and the second = is the equivalent of a comma..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docker run -d \
--name=nginxtest \
--mount source=nginx-vol,destination=/usr/share/nginx/html,readonly \
nginx:latest
So replace the equal sign in --mount= with a space and you get the same as the docker docs are refering to. So can be changed if necesarry.
@fat-tire: The |
Runpod uses Kubernetes as far as I can tell (so it's |
Hey thanks for the response! I did try building with "cuda" set as the flavor (manually, just to be sure), and did notice a ton of packages coming in and as you suggest at first I assumed I had everything I needed to run w/cuda (as the docs said). But- no matter what I did and how I started it (again, this is with Podman) it kept coming up "cpu". Going to the container shell, running python and importing torch and checking if gpu is available, I constantly got "False". I thought maybe the problem was that the container didn't have accesss to the nvidia hardware, so I tried adding to the stuff, and I played with giving it As a last resort, I tried the As for your suggestion to not worry re building in Podman and just use the prebuilt images-- sure that would be fine and that way there's no need to pull all those things that aren't working in Docker yet. I've yet to try pulling/running the pre-built docker-built container rather than building it myself but I'll give it a try in the next day or so and report back. Thx again! |
|
Had some time to do a Podman test with the officially prebaked image (latest tag) on docker hub-- here's the command to run it (note I'm using both Command to run: Note I'm including explicit access to the nvidia devices just in case it's needed when running rootless (it's what I use when I'm running the nvidia/cuda image so I didn't want to make any changes since I know they provide access the container needs, For all I know Running this command downloads the image from docker hub and runs, resulting in the (expected) error:
This was expected due to the uid/guid ownership issue discussed above. The workaround was to run this 1x. It can only be run after the image is downloaded and the files/volumes are created: Now that the mounted directories are correct, the above The problem though-- CPU. not CUDA. From the container: Also: Contrast with my and and Most obviously torch/torchvision have Thoughts? |
|
For a moment I thought perhaps I was just using the wrong tag and getting the cpu version as a result,, but trying the
If I Unfortunately, this didn't work. I also tried installing cu113 (no dice) and the cuda package from nvida directly to the base image per nvidia's instructions. This didn't work either. |
|
I don't know I see you're mapping quite a lot of devices in one of the above commands. Are you certain to be running the Curious as to the mininal set of |
|
Thanks for taking a look. Yeah I'm sure I am running it the same way as I copy/pasted the RUN command from the Trying with Curious why would the |
I don't know if it's still necessary, but the last time I needed to use graphics acceleration inside a podman container (fedora silverblue, to use an old version of invoke) I had to install the nvidia-container-runtime package (which only has a version for RHEL 8.3, but it works on fedora) on the host system, plus the respective gpu drivers inside the container as well. Also, I had to edit /etc/nvidia-container-runtime/config.toml and set no-cgroups = true. I don't have an Nvidia gpu anymore, I don't have anything to test with, but maybe this is the way to go. |
|
Well I have some good news.. I got it running from the pre-built on podman/rootless! What's nice is that the image is significantly smaller, even with the nvidia drivers installed, than the previous container I was using, and rebuilding will be VERY fast now. Steps:
I have all of the above working with the latest "3.0.0+a0" version. Does anyone want this code... or what should I do with it? Since the real work is done in the prebuilt, these are all very small, simple files. With a $CONTAINER_ENGINE flag, it could be integrated into the source in this repo.. but I can also make a dedicated I didn't have to touch
|
|
Great to hear you got it working! If there was a way to run this without installing the nvidia driver into the image, that would be ideal, in my opinion. Generally you really want to be using the driver that is already loaded by the kernel. But perhaps that's a hard limitation due to podman's rootless nature - i'm not sure. I think your work here is valuable for supporting users who wish to run in a rootless container. Is there a way to do this without maintaining a separate Dockerfile and build/run scripts? Will leave it up to @mauwii to make the call on how to proceed next. |
|
I already addressed a lot of changes (see the 11 unresolved conversations) and made clear that I would not want to remove the caching 😅 |
|
Thanks for addressing the changes in the other convos-- You wouldn't have to remove the caching as podman does now run with the prebaked image (built with caching, --linking etc). My working FROM docker.io/invokeai/invokeai:main-cuda
ARG ARCH=x86_64
ARG NVIDIA_VERSION=525.85
USER 0
RUN apt update && apt install -y kmod curl
RUN cd /tmp && curl https://us.download.nvidia.com/XFree86/Linux-${ARCH}/${NVIDIA_VERSION}/NVIDIA-Linux-${ARCH}-${NVIDIA_VERSION}.run -o /tmp/NVIDIA-Linux-${ARCH}-${NVIDIA_VERSION}.run \
&& bash /tmp/NVIDIA-Linux-${ARCH}-${NVIDIA_VERSION}.run --no-kernel-module --no-kernel-module
-source --run-nvidia-xconfig --no-backup --no-questions --accept-license --ui=none \
&& rm -f /tmp/NVIDIA-Linux-${ARCH}-${NVIDIA_VERSION}.run \
&& rm -rf /tmp/*
RUN apt remove --purge kmod curl -y && apt-get cleanThe #!/bin/bash
CONTAINER_BUILD="buildah bud"
TAG=latest
if [ -z "$RESOLVE_NVIDIA_VERSION" ]; then
export NVIDIA_VERSION=`nvidia-smi --query-gpu=driver_version --format=csv,noheader`
else
export NVIDIA_VERSION="${RESOLVE_NVIDIA_VERSION}"
fi
${CONTAINER_BUILD} -t "invokeai:${TAG}" -t "invokeai" --build-arg ARCH=`uname -m` --build-arg NVIDIA_VERSION="${NVIDIA_VERSION}"As you can see it passes the current NVIDIA driver and ARCH to the build command. The container has to match the host, so this may be an issue for making any generic image for rootless. I did try avoiding installing the nvidia-driver and instead tried using only the RUN apt update && apt install gpg curl -y
RUN distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
RUN apt update && apt install -y nvidia-container-runtime
RUN apt remove --purge kmod curl gpg -y && apt-get cleanBut this did test NOT work, at least not for me. The --device /dev/dri --device /dev/input --device /dev/nvidia0 \
--device /dev/nvidiactl --device /dev/nvidia-modeset \
--device /dev/nvidia-uvm --device /dev/nvidia-uvm-tools \That's basically all I got! Just happy it's working now, and, while I don't know what if anything above can be integrated, hopefully someone else running rootlessly can find value in it. |
|
Hey it's been a few weeks and I'm inclined to close this just to not clutter up the PR area. Is there anything anyone wants from here? I've got podman running locally w/the method outlined above-- one thing I did add recently is: --env TRANSFORMERS_CACHE=/data/.cache \to the I've also noticed I get this: When trying to delete an image via the trash can icon in the web ui, because image files can't apparently be moved from |
I was intrigued by the Docker support and decided to try it. When it comes to containers, I always prefer Podman running "rootless" rather than Docker running as root, so made a few changes to support this as well.
This was tested on both Podman and Docker.
Notes:
CONTAINER_ENGINE="podman"(default is "docker") onbuild.shandrun.sh. Otherwise, everything should hopefully run as before.Could someone with podman and/or docker test it? Even if y'all don't want ALL the commits here, hopefully some of it will be of value for Docker users too.
Enjoy!