Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Unable to locate package nvidia-container-toolkit" on Debian (Ubuntu) x86_64 #52

Closed
iamdempa opened this issue Apr 19, 2023 · 4 comments
Assignees
Labels
resolution/no-repro Resolution: cannot reproduce this issue

Comments

@iamdempa
Copy link

iamdempa commented Apr 19, 2023

Hi Team,

Nice work and appreciate your efforts on this project 🫡

I am trying to run the Docker container and I had the following issue when executing the command sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit-base

Hit:1 http://eu-central-1.ec2.archive.ubuntu.com/ubuntu jammy InRelease
Hit:2 http://eu-central-1.ec2.archive.ubuntu.com/ubuntu jammy-updates InRelease
Hit:3 http://eu-central-1.ec2.archive.ubuntu.com/ubuntu jammy-backports InRelease
Hit:4 https://download.docker.com/linux/ubuntu jammy InRelease
Get:5 http://security.ubuntu.com/ubuntu jammy-security InRelease [110 kB]
Fetched 110 kB in 1s (195 kB/s)
Reading package lists... Done
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package nvidia-container-toolkit-base

And the solution I found was to:

wget https://nvidia.github.io/nvidia-docker/gpgkey --no-check-certificate
sudo apt-key add gpgkey
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update

sudo apt-get install -y nvidia-container-toolkit

This fix the problem but still giving the following error for the command docker run --runtime=nvidia --shm-size=64g -p 7860:7860 -v ${HOME}/.cache:/root/.cache --rm h2o-llm -it generate.py --base_model=EleutherAI/gpt-neox-20b --lora_weights=h2ogpt_lora_weights --prompt_type=human_bot

docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #1: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.

Could someone help me on this? I am trying to run the Docker container. Tried with docker compose up but still the same.

@pseudotensor
Copy link
Collaborator

Hi, please try the documentation here: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker

specifically try doing this first:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
      && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
      && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
            sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
            sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

This may be required to find the correct packages, and it was missed because on my system I had already done it before perhaps.

Let us know if this fixes it, in meantime I'll update instructions to include this step.

Thanks!

pseudotensor added a commit that referenced this issue Apr 19, 2023
@iamdempa
Copy link
Author

iamdempa commented Apr 20, 2023

Hi @pseudotensor, thank you for the commands. Yes it fixes the earlier problem but still having issues with the latter, which is;

nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown

Could you also specify the minimum CPU/Memory requirements for a machine to run this Docker container?

Thank you,
Best Regards

@pseudotensor
Copy link
Collaborator

The system requirements scale with the model size. E.g. 20B requires 4 48GB GPUs for generation unless use 8bit then 2 48GB GPUs is ok.

@achraf-mer
Copy link
Collaborator

Hi @iamdempa, just checking again if you are still experiencing issues with the latest changes.

If so, I would be happy to help, we typically use the steps here to setup cuda toolkit: https://github.com/h2oai/h2ogpt/blob/main/docs/INSTALL.md#installing-cuda-toolkit
but it could happen that under some different pre-conditions on your system the cuda libs are not found, in which case, one can check the /etc/ld.so.conf.d/cuda... and make sure it points to the right location of libnvidia-ml, that is if you can confirm that indeed libnvidia-ml.so.a is installed somewhere on your system (find / -name libnvidia-ml* 2> /dev/null).
If you can share the result of the find command, and how the ld cache is setup for your cuda install we debug.

@achraf-mer achraf-mer added the resolution/no-repro Resolution: cannot reproduce this issue label Oct 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
resolution/no-repro Resolution: cannot reproduce this issue
Projects
None yet
Development

No branches or pull requests

3 participants