New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot connect to dalle when run in docker #18
Comments
Having the same issue on AWS Deep Learning AMI GPU PyTorch 1.11.0 (Amazon Linux 2) 20220328 (CUDA 116). |
we recently fixed in the Dockerfile in #20 , you could give it a try this should solve the problem as we have successfully run it on |
Hey @nthomsencph , does |
Now it works 🔥 Rebooted ec2 instance, ran |
Hi @nthomsencph I'm having the same issue on a g5x.large, which EC2 instance are you using? Which instructions did you follow to install the nvidia toolkit on docker? How did you install the cudnn8 inside docker? Thanks! |
Hi @spuliz. We sprung for a AWS Deep Learning AMI (One which comes with CUDA116 and more - See above) to skip the hassle of configuring this. |
Thanks @nthomsencph which EC2 instance did you use? I am having an issue with the Tesla K-series GPUs as your AMI does not have the NVDIA drivers already installed. The issue I am having is that I am not able to find an AMI with cuda 11.6 installed |
We used a p1.large with a 16GB GPU. No more is necessary since we don't expect too many requests. The Deep Learning AMI we use for this have CUDA 116 preinstalled. On honeymoon so that's all the help I can offer ☀️ |
Did you try building docker and run it via docker container? I just rebuild and run without any issue. https://github.com/jina-ai/dalle-flow#run-in-docker git clone https://github.com/jina-ai/dalle-flow.git
cd dalle-flow
docker build --build-arg GROUP_ID=$(id -g ${USER}) --build-arg USER_ID=$(id -u ${USER}) -t jinaai/dalle-flow .
docker run -p 51005:51005 -v $HOME/.cache:/home/dalle/.cache --gpus all jinaai/dalle-flow |
I believe this issue has been resolved. Feel free to reopen if the problem occurs again. |
Hello, thanks for sharing this wonderful project.
I had a problem there, I tried to run it in docker and access it locally. The docker build and run process is smooth, but when I started the client and tried to access it locally, this error occurs:
ConnectionError: failed to connect to all addresses |Gateway: Communication error with deployment at address 0.0.0.0:49336. Head or worker may be down.
I checked the port and see it should be the port of dalle as:
gateway/rep-0@60 adding connection for deployment dalle/heads/0 to grpc://0.0.0.0:49336
Any idea on how I could fix this? Thank you so much.
The text was updated successfully, but these errors were encountered: