Skip to content
This repository has been archived by the owner on Nov 9, 2023. It is now read-only.

Fix Linux Docker container #110

Closed
wants to merge 1 commit into from

Conversation

darlannakamura
Copy link

Fix issues to recognize NVIDIA drivers, such as nvidia-smi and nvcc --version inside the container.

I installed locally nvidia-docker, with sudo apt install nvidia-docker2 and updated the Dockerfile base image.
It seems that apt install -y libnvidia-compute-$NV_VER was overriding the base image with the working drivers.

This pull request addresses an issue with recognizing NVIDIA drivers inside the container. The problem was solved by installing nvidia-docker locally with:

sudo apt install nvidia-docker2
sudo systemctl restart docker

And updating the base image in the Dockerfile. The fix involved removing apt install -y libnvidia-compute-$NV_VER, which was overriding the correct driver. As result, nvidia-smi and nvcc --version now works properly inside the container and of course, Deep Face Live.

@darlannakamura darlannakamura changed the title Fix Linux Docker container with CUDA Fix Linux Docker container Dec 26, 2022
@iperov
Copy link
Owner

iperov commented Dec 26, 2022

@CeeBeeEh check it please

@CeeBeeEh
Copy link
Contributor

CeeBeeEh commented Dec 27, 2022

@darlannakamura Hi, can you please detail what problem you're trying to solve? You PR disables GPU processing for the "Face merger" step, which significantly reduces the performance.

I know you said "This pull request addresses an issue with recognizing NVIDIA drivers inside the container" but I'm not sure why you need nvidia-smi and nvcc --version, as those aren't needed within the container by DeepFaceLive.

You also switched the docker image from Nvidia runtime to the devel image, which is fine but unnecessary. The runtime image has everything that's needed for inference and executing of applications. You generally only want the devel image if you need all the extra packages for development or compiling binaries (which we don't). The devel image is also substantially larger than the runtime image, which takes longer to download and takes up more disk space.

Edit: just did a quick check, the runtime image is about 2.5GBs, and the devel image is well over 9GB. That's more the 3x the disk space used.

@darlannakamura
Copy link
Author

Hi @CeeBeeEh ,
I apologize for my oversight. You're correct that I shouldn't have used devel instead of runtime. The issue I'm attempting to address with this PR is that when I installed DeepFaceLive on my machine running Ubuntu 20.04 with two NVIDIA 3080 Ti GPUs and CUDA 11.8, I discovered that the program was not detecting my GPUs. Upon further investigation, I discovered that the GPU drivers were not working properly inside the container. After manually installing the libraries, I found that running apt install -y libnvidia-compute-520 caused my previously working drivers to stop working. I am not sure if this is a problem specific to my particular setup or if it affects others as well.

Feel free to close this PR if it does not affect other users.

@CeeBeeEh
Copy link
Contributor

@darlannakamura So I think you might be experiencing a different issue altogether. Firstly, you don't need to install the drivers within the container itself (although you can), that actually the purpose of the nvidia-docker2 package, which is to facilitate communication from the container to the GPU drivers. Installing the drivers within the container would almost negate the purpose of a container and kinda brings it closer to being a full VM (not really, and there are some esoteric cases where it might be useful).

The drivers should only need to exist on the host system. All that's needed in the container are the libraries the application needs (such as CUDA). In fact, this is one of the advantage of containers. You can install only the GPU drivers and docker bits, and nothing else and do everything else from containers. I basically do that with EndeavourOS as my host and most things through DistroBox (a simplified and persistent container control system).

In any case, it's odd that the container isn't seeing the GPUs at all. And all nvidia based container images should provide nvidia-smi, which I think is provided by one of the toolkit packages and not strictly by the drivers themselves (which really is just a meta package).

Where you wrote:

The problem was solved by installing nvidia-docker locally with:

sudo apt install nvidia-docker2

I'm pretty sure that was the source of the issue. You cannot have a container communicate with the hardware in any way without having that package.

The purpose of the line apt install -y libnvidia-compute-$NV_VER is to forcibly inject accelerated OpenGL GPU support, which for some reason isn't available in most of the nvidia docker images aside from the ones at this repo.

That package should also not cause any issues as it should be mirroring the host driver version anyways.

If you have any issues or questions, please feel free to ask them.

@iperov I think we can close this.

@iperov iperov closed this Dec 27, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants