-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error deploying docker on P3 instance #424
Comments
Volta GPUs are not supported for recent models. |
I think if you use Llama it might work. I was able to make it work on V100 gpus |
Thanks for the prompt reply, it runs fine on a G5 instance. |
I use v100, also encountered problems, log:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
System Info
docker run --gpus all --shm-size 1g -p 8080:80 -v $PWD/data:/data ghcr.io/huggingface/text-generation-inference:0.8 --model-id tiiuae/falcon-7b --num-shard 2
Information
Tasks
Reproduction
I'm trying to deploy falcon-7b using the following command on an AWS p3.2xlarge
I begin by running the following docker run command
docker run --gpus all --shm-size 1g -p 8080:80 -v $PWD/data:/data ghcr.io/huggingface/text-generation-inference:0.8 --model-id tiiuae/falcon-7b --num-shard 2
But see a failure here. Which seems to be a mixture of CUDA and Flash errors. The output to Nvidia-smi is below the error stack
Expected behavior
I would have expected this docker image to work on a P3 instance
The text was updated successfully, but these errors were encountered: