New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

https requests hang in 1.10.0 #20178

Closed
peterklipfel opened this Issue Feb 10, 2016 · 23 comments

Comments

Projects
None yet
@peterklipfel
Copy link

peterklipfel commented Feb 10, 2016

Was there a change that could cause this? This bug has been difficult to reproduce outside of an openstack in my office. But I was able to (apparently) resolve this bug by downgrading to 1.9.1.

To keep things clean, the details are posted on stackoverflow

I'm happy to post them here if you'd like

@AkihiroSuda

This comment has been minimized.

Copy link
Member

AkihiroSuda commented Feb 10, 2016

Perhaps it depends on command line arguments of the docker daemon and client?
Are you using something specific to Openstack? (neutron?)

@thaJeztah thaJeztah added this to the 1.10.1 milestone Feb 10, 2016

@thaJeztah

This comment has been minimized.

Copy link
Member

thaJeztah commented Feb 10, 2016

ping @mavenugo @sanimej any ideas?

@mavenugo

This comment has been minimized.

Copy link
Contributor

mavenugo commented Feb 10, 2016

@peterklipfel @thaJeztah nothing obvious. Seems to be environment specific & doesn't seem to be docker networking related.

Can you share more information on

  • docker info
  • docker daemon configuration
  • docker run command-line

Also, pls confirm if seccomp isn't blocking any of the syscall in your container by passing --security-opt seccomp:unconfined.

@tiborvass

This comment has been minimized.

Copy link
Collaborator

tiborvass commented Feb 10, 2016

@peterklipfel please copy the stackoverflow post here in github.

Also does it work with HTTP?

@peterklipfel

This comment has been minimized.

Copy link

peterklipfel commented Feb 11, 2016

Thanks for all the quick replies everyone!

@mavenugo This seems to be a very environment-specific bug. I cannot reproduce it on other clouds. Given that the solution I found was to downgrade docker, I thought I might seek help from you guys. I have not enabled --security-opt seccomp:unconfined I will give that a try.

@tiborvass HTTP does work in the container

@peterklipfel

This comment has been minimized.

Copy link

peterklipfel commented Feb 11, 2016

@tiborvass Here's the stackoverflow question

From inside a docker container, I'm running

# openssl s_client -connect rubygems.org:443 -state -nbio 2>&1 | grep "^SSL"     

SSL_connect:before/connect initialization
SSL_connect:SSLv2/v3 write client hello A
SSL_connect:error in SSLv2/v3 read server hello A

That's all I get

I can't connect to any https site from within the docker container. The container is running on an openstack vm. The vm can connect via https.

Any advice?

UPDATE

root@ce239554761d:/# curl -vv https://google.com
* Rebuilt URL to: https://google.com/
* Hostname was NOT found in DNS cache
*   Trying 216.58.217.46...
* Connected to google.com (216.58.217.46) port 443 (#0)
* successfully set certificate verify locations:
*   CAfile: none
  CApath: /etc/ssl/certs
* SSLv3, TLS handshake, Client hello (1):

and then it hangs.

Also, I'm getting intermittent successes now.

Sanity Checks:

  • changing the docker ips doesn't fix the problem
  • The docker containers work on my local machine
  • The docker containers work on other clouds
  • Docker 1.10.0 doesn't work in the vms
  • Docker 1.9.1 works in the vms

Update 2

If I downgrade to 1.9.1, and then upgrade to 1.10.0, things work. This is only the case on a fresh install.

@tiborvass

This comment has been minimized.

Copy link
Collaborator

tiborvass commented Feb 11, 2016

@peterklipfel can you provide us with a script that reproduces the issue? Otherwise I'm not sure how we can help :S

@thaJeztah thaJeztah modified the milestones: 1.10.1, 1.10.2 Feb 11, 2016

@peterklipfel

This comment has been minimized.

Copy link

peterklipfel commented Feb 15, 2016

On ubuntu 14.04.3

sudo apt-get update && sudo apt-get -y upgrade
sudo apt-get install apt-transport-https ca-certificates
sudo apt-key adv --keyserver hkp://p80.pool.sks-keyservers.net:80 --recv-keys 58118E89F3A912897C070ADBF76221572C52609D
# add "deb https://apt.dockerproject.org/repo ubuntu-trusty main" to /etc/apt/sources.list.d/docker.list
#something like "echo https://apt.dockerproject.org/repo ubuntu-trusty main | sudo tee /etc/apt"/sources.list/docker.list
#check
sudo apt-cache policy docker-engine
sudo apt-get update
sudo apt-get -y install docker-engine
#optionally add ubuntu user to the docker group
sudo docker pull buildpack-deps:jessie
sudo docker run -it buildpack-deps:jessie /bin/bash

# from inside the container
curl -vv https://google.com
curl -vv https://amazon.com
curl -vv https://rubygems.org
curl -vv https://anything.you.want

As stated in a previous post, the problem now appears to not happen all the time.

@tiborvass tiborvass modified the milestones: 1.10.3, 1.10.2 Feb 19, 2016

@thaJeztah thaJeztah modified the milestones: 1.11.0, 1.10.3 Mar 8, 2016

@mavenugo

This comment has been minimized.

Copy link
Contributor

mavenugo commented Mar 13, 2016

@peterklipfel PTAL #18842 (comment) . I think this could be a possible reason ? could you update your /etc/resolv.conf to a proper DNS server that properly truncates a DNS response > 512 bytes (such as 8.8.8.8) and try it out ?

@icecrime icecrime removed this from the 1.11.0 milestone Mar 30, 2016

@ludwick

This comment has been minimized.

Copy link

ludwick commented Apr 7, 2016

I just wanted to add another possible replication. We use Google Compute Engine hosts ("container-vm" image). On it, we run a customized docker image with Ubuntu 14.04 as the base (FROM ubuntu:14.04) that runs Bamboo. Inside that container is where we run unit tests and build docker images. We were having an issue where one of our containers was not building and was hanging during a ruby gem install. After debugging we found that it couldn't contact rubygems.org but only under ssl. While I tried the workaround with /etc/resolv.conf it didn't do anything. Since the Dockerfile for our bamboo container used the script from https://get.docker.io to install docker, it was getting 1.10.3. Downgrading to docker 1.9.1 fixed this issue.

Note I was also able to replicating this by executing docker run -it buildpack-deps:jessie /bin/bash inside the host Ubuntu container, installing curl and then attempting reach https://rubygems.org. It hung. Note that it emits the same kind of handshake issues as seen above:

...
* Connected to rubygems.org (54.186.104.15) port 443 (#0)
* successfully set certificate verify locations:
*   CAfile: none
  CApath: /etc/ssl/certs
* SSLv3, TLS handshake, Client hello (1):
... (blocks from here)

For us only rubygems.org was consistently unreachable via ssl. google.com for example was fine, though we didn't test extensively once we had a workaround. If I launch an Ubuntu 14.04 instance in GCE, install docker 1.10.3, then run docker run -it buildpack-deps:jessie /bin/bash, then it does not replicate, so I suspect this might be a container-within-container issue for us.

@peterklipfel

This comment has been minimized.

Copy link

peterklipfel commented Apr 14, 2016

I kind of abandoned this thread - my apologies for that. This was happening on our development cloud, and then it stopped happening. My best guess is that it had something to do with this: https://tech.vijayp.ca/linux-kernel-bug-delivers-corrupt-tcp-ip-data-to-mesos-kubernetes-docker-containers-4986f88f7a19#.ta2k46q51

@thaJeztah

This comment has been minimized.

Copy link
Member

thaJeztah commented Apr 14, 2016

@peterklipfel can we close this issue if you no longer are able to reproduce?

@peterklipfel

This comment has been minimized.

Copy link

peterklipfel commented Apr 14, 2016

@thaJeztah I'm comfortable with closing it.

@thaJeztah

This comment has been minimized.

Copy link
Member

thaJeztah commented Apr 14, 2016

Thanks!

@thaJeztah thaJeztah closed this Apr 14, 2016

@dimfeld

This comment has been minimized.

Copy link

dimfeld commented Apr 17, 2016

I'm seeing something similar to @ludwick's report, on Google Compute Engine although not in a docker-in-docker situation. Host OS is Ubuntu 14.04. The OS in the container itself doesn't seem to matter though I've mostly been testing with ubuntu:latest.

I upgraded from 1.8.3 to 1.10.3, and afterwards my containers are unable to access https://slack.com (and I see the same thing for rubygems.org), receiving no response to the initial client hello, but many other HTTPS sites work fine. I see the same behavior on 1.11, but on 1.9.1 the requests go through fine. This applies both to existing containers and to newly-created images/containers.

Unfortunately I don't have a good way to reproduce this. I'm seeing it on a few machines that have been running for a while, but starting up a new instance, installing docker 1.8.3, rebooting, and then upgrading to 1.10.3 doesn't seem to exhibit the same issue.

@mdomke

This comment has been minimized.

Copy link

mdomke commented Jun 9, 2016

Im experiencing the same issue on OpenStack with Ubuntu 16.04 and

ubuntu@ci:~$ sudo docker version
Client:
 Version:      1.10.3
 API version:  1.22
 Go version:   go1.6.1
 Git commit:   20f81dd
 Built:        Wed, 20 Apr 2016 14:19:16 -0700
 OS/Arch:      linux/amd64

Server:
 Version:      1.10.3
 API version:  1.22
 Go version:   go1.6.1
 Git commit:   20f81dd
 Built:        Wed, 20 Apr 2016 14:19:16 -0700
 OS/Arch:      linux/amd64

and if I then run an Alpine image and try to access registry-1.docker.io with curl it hangs while waiting for the server response

ubuntu@ci:~$ sudo docker run --rm -it alpine sh
/ # apk add --update curl
fetch http://dl-cdn.alpinelinux.org/alpine/v3.4/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.4/community/x86_64/APKINDEX.tar.gz
(1/4) Installing ca-certificates (20160104-r4)
(2/4) Installing libssh2 (1.7.0-r0)
(3/4) Installing libcurl (7.49.1-r0)
(4/4) Installing curl (7.49.1-r0)
Executing busybox-1.24.2-r8.trigger
Executing ca-certificates-20160104-r4.trigger
OK: 6 MiB in 15 packages
/ # curl -vv https://registry-1.docker.io
* Rebuilt URL to: https://registry-1.docker.io/
*   Trying 52.22.123.154...
* Connected to registry-1.docker.io (52.22.123.154) port 443 (#0)
* ALPN, offering http/1.1
* Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: none
* TLSv1.2 (OUT), TLS header, Certificate Status (22):
* TLSv1.2 (OUT), TLS handshake, Client hello (1):

@peterklipfel Could you figure out in what way this issue was related to your OpenStack setup?

@bennibu

This comment has been minimized.

Copy link

bennibu commented Jun 22, 2016

@mdomke @thaJeztah I can reproduce this bug in our openstack environment (ubuntu 14.04). I tested it with fresh nodes with the following OS images:

  • ubuntu 14.04
  • ubuntu 16.04
  • debian 8.5.0

docker version was always 1.11.2.

So I guess this is related to the openstack tcp package routing?

@thaJeztah

This comment has been minimized.

Copy link
Member

thaJeztah commented Jun 23, 2016

@bennibu could well be, have you opened an issue with OpenStack?

@stieler-it

This comment has been minimized.

Copy link

stieler-it commented Jun 24, 2016

@bennibu @thaJeztah We have been hunting these problems for some days now and since it appears to be an OpenStack problem, I filed a bug there. Hard to gasp, since the problem only appears with Docker >= 1.10, though. Please feel free to add any further information you have to this ticket: https://bugs.launchpad.net/neutron/+bug/1595762

@AkihiroSuda AkihiroSuda referenced this issue Jun 24, 2016

Closed

memo #2

@thaJeztah

This comment has been minimized.

Copy link
Member

thaJeztah commented Jun 24, 2016

@stieler-it thanks! Perhaps @mavenugo has suggestions what to look for

@bennibu

This comment has been minimized.

Copy link

bennibu commented Jun 25, 2016

@thaJeztah @stieler-it We have found the problem. I discussed this issue with our openstack provider and they pointed me to tune the MTU settings, according to this blog post. This fixed the problem for us.
I have to confess, that I have no experience so far in tuning network settings, but I guess It would be helpful if you use 1450 as default to avoid having your users running into the same issues.
But anyways, no everything is fine. Thanks for your feedback.

@stieler-it

This comment has been minimized.

Copy link

stieler-it commented Jun 25, 2016

@bennibu Great, works here as well. For OpenStack an MTU 1454 should be the desired value (check ifconfig). The reason it happens only with Docker >= 1.10 is that MTU is no longer infered there, see #22028.

@x-yuri

This comment has been minimized.

Copy link

x-yuri commented Nov 15, 2018

I added --mtu=1400 to service file, restarted docker and can't reproduce it anymore.

I tried running dockerd without --mtu, which made docker0's mtu equal to 1500. Which is equal to my physical interface's mtu. This way it works.

I tried it with --mtu=1600. Unless running a docker container docker0's mtu is 1500. While running a container 1600 for both docker0 and container's NIC. And again it works.

Maybe I'm going to face it again after restart...

UPD Although I'm running it locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment