Cannot connect to dalle when run in docker #18

AstrocyteTaki · 2022-05-18T03:33:15Z

Hello, thanks for sharing this wonderful project.

I had a problem there, I tried to run it in docker and access it locally. The docker build and run process is smooth, but when I started the client and tried to access it locally, this error occurs:
ConnectionError: failed to connect to all addresses |Gateway: Communication error with deployment at address 0.0.0.0:49336. Head or worker may be down.
I checked the port and see it should be the port of dalle as:
gateway/rep-0@60 adding connection for deployment dalle/heads/0 to grpc://0.0.0.0:49336

Any idea on how I could fix this? Thank you so much.

nthomsencph · 2022-05-18T08:15:49Z

Having the same issue on AWS Deep Learning AMI GPU PyTorch 1.11.0 (Amazon Linux 2) 20220328 (CUDA 116).

hanxiao · 2022-05-21T07:13:52Z

we recently fixed in the Dockerfile in #20 , you could give it a try

this should solve the problem as we have successfully run it on p2.8xlarge, @jina-ai/engineering will share more details next week.

alaeddine-13 · 2022-05-21T09:02:56Z

Hey @nthomsencph , does nvcc -v, nvidia-smi and torch.cuda.device_count() print results correctly inside the ec2 instance and inside the docker image ?
(In order to get inside the docker image and run the commands you can do docker run -it --entypoint /bin/bash jina-ai/dalle-flow)

nthomsencph · 2022-05-22T11:18:35Z

Now it works 🔥

Rebooted ec2 instance, ran docker prune -a, pulled repo and ran instructions. Thanks!

spuliz · 2022-05-23T22:14:13Z

Hi @nthomsencph I'm having the same issue on a g5x.large, which EC2 instance are you using? Which instructions did you follow to install the nvidia toolkit on docker? How did you install the cudnn8 inside docker?

Thanks!

nthomsencph · 2022-05-24T02:22:26Z

Hi @spuliz. We sprung for a AWS Deep Learning AMI (One which comes with CUDA116 and more - See above) to skip the hassle of configuring this.

spuliz · 2022-05-26T13:22:37Z

Thanks @nthomsencph which EC2 instance did you use? I am having an issue with the Tesla K-series GPUs as your AMI does not have the NVDIA drivers already installed. The issue I am having is that I am not able to find an AMI with cuda 11.6 installed

nthomsencph · 2022-05-30T08:04:27Z

We used a p1.large with a 16GB GPU. No more is necessary since we don't expect too many requests. The Deep Learning AMI we use for this have CUDA 116 preinstalled.

On honeymoon so that's all the help I can offer ☀️

hanxiao · 2022-06-11T19:12:00Z

Did you try building docker and run it via docker container? I just rebuild and run without any issue.

https://github.com/jina-ai/dalle-flow#run-in-docker

git clone https://github.com/jina-ai/dalle-flow.git
cd dalle-flow

docker build --build-arg GROUP_ID=$(id -g ${USER}) --build-arg USER_ID=$(id -u ${USER}) -t jinaai/dalle-flow .

docker run -p 51005:51005 -v $HOME/.cache:/home/dalle/.cache --gpus all jinaai/dalle-flow

delgermurun · 2022-10-07T13:48:08Z

I believe this issue has been resolved. Feel free to reopen if the problem occurs again.

delgermurun closed this as completed Oct 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot connect to dalle when run in docker #18

Cannot connect to dalle when run in docker #18

AstrocyteTaki commented May 18, 2022

nthomsencph commented May 18, 2022 •

edited

hanxiao commented May 21, 2022 •

edited

alaeddine-13 commented May 21, 2022 •

edited

nthomsencph commented May 22, 2022

spuliz commented May 23, 2022

nthomsencph commented May 24, 2022

spuliz commented May 26, 2022 •

edited

nthomsencph commented May 30, 2022

hanxiao commented Jun 11, 2022

delgermurun commented Oct 7, 2022

Cannot connect to dalle when run in docker #18

Cannot connect to dalle when run in docker #18

Comments

AstrocyteTaki commented May 18, 2022

nthomsencph commented May 18, 2022 • edited

hanxiao commented May 21, 2022 • edited

alaeddine-13 commented May 21, 2022 • edited

nthomsencph commented May 22, 2022

spuliz commented May 23, 2022

nthomsencph commented May 24, 2022

spuliz commented May 26, 2022 • edited

nthomsencph commented May 30, 2022

hanxiao commented Jun 11, 2022

delgermurun commented Oct 7, 2022

nthomsencph commented May 18, 2022 •

edited

hanxiao commented May 21, 2022 •

edited

alaeddine-13 commented May 21, 2022 •

edited

spuliz commented May 26, 2022 •

edited