Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker service logs hangs #38640

Closed
drnybble opened this issue Jan 25, 2019 · 10 comments

Comments

@drnybble
Copy link

commented Jan 25, 2019

Description

docker service logs sometimes hangs

Steps to reproduce the issue:
docker service logs service-name

Describe the results you received:
Some logs are shown on the terminal but it never finishes
It does not always hang at the same spot
Service I am trying has 2 replicas.
Happens maybe 50% of the time I try.

Let me know if there is further debug information I can collect

Describe the results you expected:
All logs shown and it does not hang

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version:

Client:
 Version:           18.09.0
 API version:       1.39
 Go version:        go1.10.4
 Git commit:        4d60db4
 Built:             Wed Nov  7 00:48:57 2018
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.0
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.4
  Git commit:       4d60db4
  Built:            Wed Nov  7 00:16:44 2018
  OS/Arch:          linux/amd64
  Experimental:     false

Output of docker info:

Containers: 35
 Running: 18
 Paused: 0
 Stopped: 17
Images: 26
Server Version: 18.09.0
Storage Driver: overlay2
 Backing Filesystem: xfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: active
 NodeID: ov37b8anl2rz98ywv5bfcoa01
 Is Manager: true
 ClusterID: yjzcdr6xadmpfmtwx32z5hthg
 Managers: 3
 Nodes: 3
 Default Address Pool: 10.0.0.0/8
 SubnetSize: 24
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 10
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 10.31.11.22
 Manager Addresses:
  10.31.11.22:2377
  10.31.11.87:2377
  10.31.11.88:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: c4446665cb9c30056f4998ed953e6d4ff22c7c39
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: fec3683
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-139-generic
Operating System: Ubuntu 16.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.796GiB
Name: quoited1.fyre.ibm.com
ID: 6RP6:HW7M:IYDP:YNI7:ITUX:RHPH:DQZX:LBVV:RROL:AO7J:YVUC:OKMM
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine

WARNING: No swap limit support

Additional environment details (AWS, VirtualBox, physical, etc.):
The stack has about 20 services. I am only seeing one of them hang.
I deleted one of the stopped containers for this service, it seems to have made it occur less frequently but I can't say for sure.

@drnybble

This comment has been minimized.

Copy link
Author

commented Jan 31, 2019

Anyone can tell me what logs to collect to investigate this issue? This is not a one-off, I have seen it on numerous services.

@chilloutman

This comment has been minimized.

Copy link

commented Feb 6, 2019

A similar issue is described here: https://forums.docker.com/t/docker-service-logs-seems-to-hang-after-printing-large-output-in-a-short-time/48436

I can produce the hang when running "docker service logs" without any flags every time the amount of logs is large enough. Using "--tail" with a small enough number works, but obviously doesn't output all the logs.

Wether it's writing to terminal or "> /dev/null" or "| gzip" or "| more" makes no difference. It hangs every time.


Could be the same problem as #36312 (long log lines and/or incorrect chucking)

@githubsaturn

This comment has been minimized.

Copy link

commented Feb 13, 2019

@thaJeztah

Any thought on this? This is pretty critical for apps that use Docker API.

It's super easy to reproduce. I can even reproduce it on Play With Docker stack:

  • Create a template stack (3 managers + 2 workers)
  • Run: docker service create --name testservice --replicas 15 dockersaturn/testloggingbug (nothing specific about this image, it just outputs a counter in logs)
  • Wait until all containers are up and running. (takes a minute)
  • Run docker service logs testservice --tail 500 All good! No issues.
  • Destroy a manager node and a worker node by hard deleting them. (don't use node rm, just delete from the UI)
  • Wait until things settle.
  • Run docker service logs testservice --tail 500 Here it comes, one out of 10 tries hangs! Or simply run this:
for run in {1..30}
do
  docker service logs testservice --tail 500
done

I tried this 2 times from scratch. And repro steps are pretty solid.

@bobf

This comment has been minimized.

Copy link

commented Jun 10, 2019

I have thought I had this problem a couple of times but it actually turned out that (in my estimation, at least) the log files had just become so large that something like this would hang indefinitely:

docker service logs -f --since 10s <service_name>

I presume because Docker is searching through gigabytes of spooled logs to find the correct timestamp ?

When I redeployed my services with sensible log rotation (5 x 10M in my case) my problem went away.

Obviously this is not related to the original issue (which seems to be a separate bug) but I thought sharing this here might help some people avoid misdiagnosing their issue.

@emosbaugh

This comment has been minimized.

Copy link
Contributor

commented Jun 11, 2019

I am seeing this issue as well. Interesting thing is if I add --tail its pretty consistently will hang greater than a certain number of lines (in my case 55285 although this fluctuates slightly). Probably has to do with the number of bytes?

FWIW I only have one node and the service only has one task.

@emosbaugh

This comment has been minimized.

Copy link
Contributor

commented Jun 12, 2019

some goroutine dumps that seem relevant

1 @ 0x55c952b01460 0x55c952b01520 0x55c952ad84e6 0x55c952ad81db 0x55c9534899c6 0x55c953b2466e 0x55c953b20ddc 0x55c953b2508b 0x55c95348db7f 0x55c95348c44a 0x55c95348c881 0x55c9530211bb 0x55c95349fcca 0x55c952dbc136 0x55c95302f388 0x55c95349e762 0x55c952dbef9e 0x55c952dbb143 0x55c952b2f1f1
#	0x55c9534899c5	github.com/docker/docker/api/server/httputils.WriteLogStream+0x205										/go/src/github.com/docker/docker/api/server/httputils/write_log_stream.go:35
#	0x55c953b2466d	github.com/docker/docker/api/server/router/swarm.(*swarmRouter).swarmLogs+0x5ed									/go/src/github.com/docker/docker/api/server/router/swarm/helpers.go:64
#	0x55c953b20ddb	github.com/docker/docker/api/server/router/swarm.(*swarmRouter).getServiceLogs+0x15b								/go/src/github.com/docker/docker/api/server/router/swarm/cluster_routes.go:285
#	0x55c953b2508a	github.com/docker/docker/api/server/router/swarm.(*swarmRouter).(github.com/docker/docker/api/server/router/swarm.getServiceLogs)-fm+0x6a	/go/src/github.com/docker/docker/api/server/router/swarm/cluster.go:40
#	0x55c95348db7e	github.com/docker/docker/api/server/router.cancellableHandler.func1+0xce									/go/src/github.com/docker/docker/api/server/router/local.go:92
#	0x55c95348c449	github.com/docker/docker/api/server/middleware.ExperimentalMiddleware.WrapHandler.func1+0xd9							/go/src/github.com/docker/docker/api/server/middleware/experimental.go:26
#	0x55c95348c880	github.com/docker/docker/api/server/middleware.VersionMiddleware.WrapHandler.func1+0x400							/go/src/github.com/docker/docker/api/server/middleware/version.go:62
#	0x55c9530211ba	github.com/docker/docker/pkg/authorization.(*Middleware).WrapHandler.func1+0x7aa								/go/src/github.com/docker/docker/pkg/authorization/middleware.go:59
#	0x55c95349fcc9	github.com/docker/docker/api/server.(*Server).makeHTTPHandler.func1+0x199									/go/src/github.com/docker/docker/api/server/server.go:141
#	0x55c952dbc135	net/http.HandlerFunc.ServeHTTP+0x45														/usr/local/go/src/net/http/server.go:1947
#	0x55c95302f387	github.com/docker/docker/vendor/github.com/gorilla/mux.(*Router).ServeHTTP+0x227								/go/src/github.com/docker/docker/vendor/github.com/gorilla/mux/mux.go:103
#	0x55c95349e761	github.com/docker/docker/api/server.(*routerSwapper).ServeHTTP+0x71										/go/src/github.com/docker/docker/api/server/router_swapper.go:29
#	0x55c952dbef9d	net/http.serverHandler.ServeHTTP+0xbd														/usr/local/go/src/net/http/server.go:2697
#	0x55c952dbb142	net/http.(*conn).serve+0x652															/usr/local/go/src/net/http/server.go:1830

1 @ 0x55c952b01460 0x55c952b11426 0x55c9531ac29b 0x55c9531ac029 0x55c9531acdd7 0x55c952b38e78 0x55c952b38ffa 0x55c9531acd21 0x55c9531ca557 0x55c9531cb3df 0x55c9531d7dd4 0x55c9531d73d5 0x55c953421d8d 0x55c95327abd4 0x55c9540186a7 0x55c952b2f1f1
#	0x55c9531ac29a	github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*recvBufferReader).read+0x1ea				/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/transport.go:142
#	0x55c9531ac028	github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*recvBufferReader).Read+0x68				/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/transport.go:131
#	0x55c9531acdd6	github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*transportReader).Read+0x56				/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/transport.go:369
#	0x55c952b38e77	io.ReadAtLeast+0x87													/usr/local/go/src/io/io.go:309
#	0x55c952b38ff9	io.ReadFull+0x59													/usr/local/go/src/io/io.go:327
#	0x55c9531acd20	github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*Stream).Read+0xc0					/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/transport.go:353
#	0x55c9531ca556	github.com/docker/docker/vendor/google.golang.org/grpc.(*parser).recvMsg+0x66						/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/rpc_util.go:452
#	0x55c9531cb3de	github.com/docker/docker/vendor/google.golang.org/grpc.recv+0x4e							/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/rpc_util.go:561
#	0x55c9531d7dd3	github.com/docker/docker/vendor/google.golang.org/grpc.(*csAttempt).recvMsg+0x133					/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:529
#	0x55c9531d73d4	github.com/docker/docker/vendor/google.golang.org/grpc.(*clientStream).RecvMsg+0x44					/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:395
#	0x55c953421d8c	github.com/docker/docker/vendor/github.com/grpc-ecosystem/go-grpc-prometheus.(*monitoredClientStream).RecvMsg+0x4c	/go/src/github.com/docker/docker/vendor/github.com/grpc-ecosystem/go-grpc-prometheus/client_metrics.go:160
#	0x55c95327abd3	github.com/docker/docker/vendor/github.com/docker/swarmkit/api.(*logsSubscribeLogsClient).Recv+0x63			/go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/api/logbroker.pb.go:565
#	0x55c9540186a6	github.com/docker/docker/daemon/cluster.(*Cluster).ServiceLogs.func1+0x926						/go/src/github.com/docker/docker/daemon/cluster/services.go:459

1 @ 0x55c952b01460 0x55c952b11426 0x55c9531ac29b 0x55c9531ac029 0x55c9531acdd7 0x55c952b38e78 0x55c952b38ffa 0x55c9531acd21 0x55c9531ca557 0x55c9531cb3df 0x55c9531d9096 0x55c95342362d 0x55c95327b8b4 0x55c9533b06ae 0x55c953fa5ed9 0x55c95327f3bf 0x55c95327b7a4 0x55c9534249b8 0x55c953fdc739 0x55c9531d2058 0x55c9531d4023 0x55c9531db191 0x55c952b2f1f1
#	0x55c9531ac29a	github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*recvBufferReader).read+0x1ea						/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/transport.go:142
#	0x55c9531ac028	github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*recvBufferReader).Read+0x68						/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/transport.go:131
#	0x55c9531acdd6	github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*transportReader).Read+0x56						/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/transport.go:369
#	0x55c952b38e77	io.ReadAtLeast+0x87															/usr/local/go/src/io/io.go:309
#	0x55c952b38ff9	io.ReadFull+0x59															/usr/local/go/src/io/io.go:327
#	0x55c9531acd20	github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*Stream).Read+0xc0							/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/transport.go:353
#	0x55c9531ca556	github.com/docker/docker/vendor/google.golang.org/grpc.(*parser).recvMsg+0x66								/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/rpc_util.go:452
#	0x55c9531cb3de	github.com/docker/docker/vendor/google.golang.org/grpc.recv+0x4e									/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/rpc_util.go:561
#	0x55c9531d9095	github.com/docker/docker/vendor/google.golang.org/grpc.(*serverStream).RecvMsg+0x125							/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:746
#	0x55c95342362c	github.com/docker/docker/vendor/github.com/grpc-ecosystem/go-grpc-prometheus.(*monitoredServerStream).RecvMsg+0x4c			/go/src/github.com/docker/docker/vendor/github.com/grpc-ecosystem/go-grpc-prometheus/server_metrics.go:164
#	0x55c95327b8b3	github.com/docker/docker/vendor/github.com/docker/swarmkit/api.(*logBrokerPublishLogsServer).Recv+0x63					/go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/api/logbroker.pb.go:769
#	0x55c953fa5ed8	github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/logbroker.(*LogBroker).PublishLogs+0x378				/go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/logbroker/broker.go:394
#	0x55c95327f3be	github.com/docker/docker/vendor/github.com/docker/swarmkit/api.(*raftProxyLogBrokerServer).PublishLogs+0x20e				/go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/api/logbroker.pb.go:1514
#	0x55c95327b7a3	github.com/docker/docker/vendor/github.com/docker/swarmkit/api._LogBroker_PublishLogs_Handler+0xb3					/go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/api/logbroker.pb.go:750
#	0x55c9534249b7	github.com/docker/docker/vendor/github.com/grpc-ecosystem/go-grpc-prometheus.(*ServerMetrics).StreamServerInterceptor.func1+0xf7	/go/src/github.com/docker/docker/vendor/github.com/grpc-ecosystem/go-grpc-prometheus/server_metrics.go:121
#	0x55c953fdc738	github.com/docker/docker/vendor/github.com/docker/swarmkit/manager.New.func2+0x78							/go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/manager.go:265
#	0x55c9531d2057	github.com/docker/docker/vendor/google.golang.org/grpc.(*Server).processStreamingRPC+0x497						/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/server.go:1167
#	0x55c9531d4022	github.com/docker/docker/vendor/google.golang.org/grpc.(*Server).handleStream+0x12b2							/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/server.go:1253
#	0x55c9531db190	github.com/docker/docker/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1+0xa0						/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/server.go:680

1 @ 0x55c952b01460 0x55c952b11426 0x55c953483574 0x55c9534841bf 0x55c953ff591a 0x55c953ef6612 0x55c953eff23e 0x55c952b2f1f1
#	0x55c953483573	github.com/docker/docker/vendor/golang.org/x/time/rate.(*Limiter).waitN+0x2e3			/go/src/github.com/docker/docker/vendor/golang.org/x/time/rate/rate.go:263
#	0x55c9534841be	github.com/docker/docker/vendor/golang.org/x/time/rate.(*Limiter).WaitN+0x6e			/go/src/github.com/docker/docker/vendor/golang.org/x/time/rate/rate_go17.go:20
#	0x55c953ff5919	github.com/docker/docker/daemon/cluster/executor/container.(*controller).Logs+0x4d9		/go/src/github.com/docker/docker/daemon/cluster/executor/container/controller.go:535
#	0x55c953ef6611	github.com/docker/docker/vendor/github.com/docker/swarmkit/agent.(*taskManager).Logs+0x111	/go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/agent/task.go:72
#	0x55c953eff23d	github.com/docker/docker/vendor/github.com/docker/swarmkit/agent.(*worker).Subscribe.func2+0xad	/go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/agent/worker.go:558

1 @ 0x55c952b01460 0x55c952b11426 0x55c953fa479f 0x55c95327e698 0x55c95327adb0 0x55c9534249b8 0x55c953fdc739 0x55c9531d2058 0x55c9531d4023 0x55c9531db191 0x55c952b2f1f1
#	0x55c953fa479e	github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/logbroker.(*LogBroker).SubscribeLogs+0x54e				/go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/logbroker/broker.go:258
#	0x55c95327e697	github.com/docker/docker/vendor/github.com/docker/swarmkit/api.(*raftProxyLogsServer).SubscribeLogs+0x207				/go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/api/logbroker.pb.go:1340
#	0x55c95327adaf	github.com/docker/docker/vendor/github.com/docker/swarmkit/api._Logs_SubscribeLogs_Handler+0x10f					/go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/api/logbroker.pb.go:592
#	0x55c9534249b7	github.com/docker/docker/vendor/github.com/grpc-ecosystem/go-grpc-prometheus.(*ServerMetrics).StreamServerInterceptor.func1+0xf7	/go/src/github.com/docker/docker/vendor/github.com/grpc-ecosystem/go-grpc-prometheus/server_metrics.go:121
#	0x55c953fdc738	github.com/docker/docker/vendor/github.com/docker/swarmkit/manager.New.func2+0x78							/go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/manager.go:265
#	0x55c9531d2057	github.com/docker/docker/vendor/google.golang.org/grpc.(*Server).processStreamingRPC+0x497						/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/server.go:1167
#	0x55c9531d4022	github.com/docker/docker/vendor/google.golang.org/grpc.(*Server).handleStream+0x12b2							/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/server.go:1253
#	0x55c9531db190	github.com/docker/docker/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1+0xa0						/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/server.go:680
@thaJeztah

This comment has been minimized.

Copy link
Member

commented Jun 12, 2019

ping @dperny ^

@emosbaugh

This comment has been minimized.

Copy link
Contributor

commented Jun 12, 2019

seems to hang at 14M

@emosbaugh

This comment has been minimized.

Copy link
Contributor

commented Jun 12, 2019

also its always blocking on the limiter

1 @ 0x55c952b01460 0x55c952b11426 0x55c953483574 0x55c9534841bf 0x55c953ff591a 0x55c953ef6612 0x55c953eff23e 0x55c952b2f1f1
#	0x55c953483573	github.com/docker/docker/vendor/golang.org/x/time/rate.(*Limiter).waitN+0x2e3			/go/src/github.com/docker/docker/vendor/golang.org/x/time/rate/rate.go:263
#	0x55c9534841be	github.com/docker/docker/vendor/golang.org/x/time/rate.(*Limiter).WaitN+0x6e			/go/src/github.com/docker/docker/vendor/golang.org/x/time/rate/rate_go17.go:20
#	0x55c953ff5919	github.com/docker/docker/daemon/cluster/executor/container.(*controller).Logs+0x4d9		/go/src/github.com/docker/docker/daemon/cluster/executor/container/controller.go:535
#	0x55c953ef6611	github.com/docker/docker/vendor/github.com/docker/swarmkit/agent.(*taskManager).Logs+0x111	/go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/agent/task.go:72
#	0x55c953eff23d	github.com/docker/docker/vendor/github.com/docker/swarmkit/agent.(*worker).Subscribe.func2+0xad	/go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/agent/worker.go:558
@emosbaugh

This comment has been minimized.

Copy link
Contributor

commented Jun 12, 2019

I think the issue is that wait time is calculated at 1 token per rate (1 second). if you have a line that is 100 bytes in length then you will have to wait 100 seconds on WaitN. Perhaps this is a misuse of the limiter?

limiter.WaitN(ctx, len(msg.Line))
// durationFromTokens is a unit conversion function from the number of tokens to the duration
// of time it takes to accumulate them at a rate of limit tokens per second.
func (limit Limit) durationFromTokens(tokens float64) time.Duration {
	seconds := tokens / float64(limit)
	return time.Nanosecond * time.Duration(1e9*seconds)
}

https://github.com/golang/time/blob/9d24e82272b4f38b78bc8cff74fa936d31ccd8ef/rate/rate.go#L363-L368

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.