Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

duplicate keys when registrator restarts #644

Open
websitesca opened this issue Oct 14, 2018 · 4 comments
Open

duplicate keys when registrator restarts #644

websitesca opened this issue Oct 14, 2018 · 4 comments

Comments

@websitesca
Copy link

  • What version of docker are you running?
    18.06.1-ce-mac73 (26764)
    running a stack in swarm mode

  • What version of registrator are you running?
    v7

  • Did you build a custom version of registrator? If so, what is that image?
    no

  • What is the exact command you are running registrator with?
    running it as a swarm service with args in tasktemplate:
    "Args": [
    "--retry-attempts=-1",
    "--resync=10",
    "--deregister=always",
    "--cleanup=true",
    "--internal=true",
    "etcd://etcd:2379/registrator"
    ],

  • What is the exact command you are running your container with?
    not sure which container you're referring to here - i'm running a series of swarm services (some created from a stack some created via docker api) that generate containers. they are being successfully picked up by registrator but they don't deregister under certain conditions

  • A log capture of all the docker events before, during, and after the issue.

here is a log dump just when i kill the registrator container and so the docker daemon generates a new container task for the registrator service and adds the duplicate etcd keys:
2018-10-14 10:40:32.347299-0700 localhost docker[26780]: (docker) Created Activity ID: 0x90bc0, Description: Retrieve User by ID
2018-10-14 10:40:32.411789-0700 localhost com.docker.driver.amd64-linux[25409]: proxy >> GET /_ping
2018-10-14 10:40:32.411914-0700 localhost com.docker.driver.amd64-linux[25409]: socket.DialUnix: vms/0/00000003.00000948 (in /Users/wca/Library/Containers/com.docker.docker/Data)
2018-10-14 10:40:32.414211-0700 localhost com.docker.driver.amd64-linux[25409]: proxy << GET /_ping
2018-10-14 10:40:32.414736-0700 localhost com.docker.driver.amd64-linux[25409]: proxy >> DELETE /v1.38/containers/5955eb9b6a6a?force=1
2018-10-14 10:40:32.414848-0700 localhost com.docker.driver.amd64-linux[25409]: socket.DialUnix: vms/0/00000003.00000948 (in /Users/wca/Library/Containers/com.docker.docker/Data)
2018-10-14 10:40:32.500303-0700 localhost com.docker.driver.amd64-linux[25409]: osxfs: die event: de-registering container 5955eb9b6a6a95ea1e631eca9fd272e2dfcd3cc91e948ccfc11f7568653ace82
2018-10-14 10:40:32.500562-0700 localhost com.docker.driver.amd64-linux[25409]: socket.DialUnix: osxfs.sock (in /Users/wca/Library/Containers/com.docker.docker/Data)
2018-10-14 10:40:32.501100-0700 localhost com.docker.osxfs[25407]: Volume.stop docker/5955eb9b6a6a95ea1e631eca9fd272e2dfcd3cc91e948ccfc11f7568653ace82 (paths = [])
2018-10-14 10:40:32.552466-0700 localhost com.docker.hyperkit[25412]: [58466.639502] br0: port 3(veth175) entered disabled state
2018-10-14 10:40:32.553020-0700 localhost com.docker.hyperkit[25412]: [58466.640134] vethb277587: renamed from eth0
2018-10-14 10:40:32.590590-0700 localhost com.docker.hyperkit[25412]: [58466.677504] docker_gwbridge: port 3(veth0dcba07) entered disabled state
2018-10-14 10:40:32.591247-0700 localhost com.docker.hyperkit[25412]: [58466.678393] vethf2a4132: renamed from eth1
2018-10-14 10:40:32.619772-0700 localhost com.docker.hyperkit[25412]: [58466.706044] docker_gwbridge: port 3(veth0dcba07) entered disabled state
2018-10-14 10:40:32.621999-0700 localhost com.docker.hyperkit[25412]: [58466.708578] device veth0dcba07 left promiscuous mode
2018-10-14 10:40:32.623538-0700 localhost com.docker.hyperkit[25412]: [58466.709741] docker_gwbridge: port 3(veth0dcba07) entered disabled state
2018-10-14 10:40:32.665911-0700 localhost com.docker.hyperkit[25412]: [58466.752787] br0: port 3(veth175) entered disabled state
2018-10-14 10:40:32.666860-0700 localhost com.docker.hyperkit[25412]: [58466.753982] device veth175 left promiscuous mode
2018-10-14 10:40:32.667463-0700 localhost com.docker.hyperkit[25412]: [58466.754462] br0: port 3(veth175) entered disabled state
2018-10-14 10:40:32.737795-0700 localhost com.docker.driver.amd64-linux[25409]: osxfs: destroy event: de-registering container 5955eb9b6a6a95ea1e631eca9fd272e2dfcd3cc91e948ccfc11f7568653ace82
2018-10-14 10:40:32.738017-0700 localhost com.docker.driver.amd64-linux[25409]: socket.DialUnix: osxfs.sock (in /Users/wca/Library/Containers/com.docker.docker/Data)
2018-10-14 10:40:32.738188-0700 localhost com.docker.driver.amd64-linux[25409]: proxy << DELETE /v1.38/containers/5955eb9b6a6a?force=1
2018-10-14 10:40:35.176261-0700 localhost com.docker.driver.amd64-linux[25409]: osxfs: destroy event: de-registering container 9e7097d36e22e0cb3c8070224aa9c6be3fc06f4e826a149d480bd95c39ec0547
2018-10-14 10:40:35.176688-0700 localhost com.docker.driver.amd64-linux[25409]: socket.DialUnix: osxfs.sock (in /Users/wca/Library/Containers/com.docker.docker/Data)
2018-10-14 10:40:37.919334-0700 localhost com.docker.hyperkit[25412]: [58472.006489] veth176: renamed from veth52b3a13
2018-10-14 10:40:37.922122-0700 localhost com.docker.hyperkit[25412]: [58472.009119] br0: port 3(veth176) entered blocking state
2018-10-14 10:40:37.923419-0700 localhost com.docker.hyperkit[25412]: [58472.010330] br0: port 3(veth176) entered disabled state
2018-10-14 10:40:37.925238-0700 localhost com.docker.hyperkit[25412]: [58472.012304] device veth176 entered promiscuous mode
2018-10-14 10:40:37.939233-0700 localhost com.docker.hyperkit[25412]: [58472.026630] docker_gwbridge: port 3(vethb7d1050) entered blocking state
2018-10-14 10:40:37.939986-0700 localhost com.docker.hyperkit[25412]: [58472.027335] docker_gwbridge: port 3(vethb7d1050) entered disabled state
2018-10-14 10:40:37.940725-0700 localhost com.docker.hyperkit[25412]: [58472.028240] device vethb7d1050 entered promiscuous mode
2018-10-14 10:40:37.942598-0700 localhost com.docker.hyperkit[25412]: [58472.030000] IPv6: ADDRCONF(NETDEV_UP): vethb7d1050: link is not ready
2018-10-14 10:40:37.943328-0700 localhost com.docker.hyperkit[25412]: [58472.030705] docker_gwbridge: port 3(vethb7d1050) entered blocking state
2018-10-14 10:40:37.944050-0700 localhost com.docker.hyperkit[25412]: [58472.031393] docker_gwbridge: port 3(vethb7d1050) entered forwarding state
2018-10-14 10:40:38.037521-0700 localhost com.docker.hyperkit[25412]: [58472.125091] IPVS: Creating netns size=2104 id=208
2018-10-14 10:40:38.038085-0700 localhost com.docker.hyperkit[25412]: [58472.125598] IPVS: ftp: loaded support on port[0] = 21
2018-10-14 10:40:38.268089-0700 localhost com.docker.hyperkit[25412]: [58472.355193] eth0: renamed from vethe95033c
2018-10-14 10:40:38.272069-0700 localhost com.docker.hyperkit[25412]: [58472.358833] docker_gwbridge: port 3(vethb7d1050) entered disabled state
2018-10-14 10:40:38.273350-0700 localhost com.docker.hyperkit[25412]: [58472.360323] br0: port 3(veth176) entered blocking state
2018-10-14 10:40:38.274354-0700 localhost com.docker.hyperkit[25412]: [58472.361469] br0: port 3(veth176) entered forwarding state
2018-10-14 10:40:38.330134-0700 localhost com.docker.hyperkit[25412]: [58472.417356] eth1: renamed from vethcb3d78c
2018-10-14 10:40:38.334173-0700 localhost com.docker.hyperkit[25412]: [58472.420925] IPv6: ADDRCONF(NETDEV_CHANGE): vethb7d1050: link becomes ready
2018-10-14 10:40:38.335711-0700 localhost com.docker.hyperkit[25412]: [58472.422423] docker_gwbridge: port 3(vethb7d1050) entered blocking state
2018-10-14 10:40:38.337214-0700 localhost com.docker.hyperkit[25412]: [58472.423862] docker_gwbridge: port 3(vethb7d1050) entered forwarding state
2018-10-14 10:40:38.437150-0700 localhost com.docker.driver.amd64-linux[25409]: osxfs: start event: re-registering container 1dc9ccb50c243e12df8577d7f1dc36f8cc076f8d9fc55fb8e58c7813fb0ac21c
2018-10-14 10:40:38.444212-0700 localhost com.docker.driver.amd64-linux[25409]: socket.DialUnix: osxfs.sock (in /Users/wca/Library/Containers/com.docker.docker/Data)
2018-10-14 10:40:38.444447-0700 localhost com.docker.osxfs[25407]: Volume.approve docker/1dc9ccb50c243e12df8577d7f1dc36f8cc076f8d9fc55fb8e58c7813fb0ac21c (paths = [/var/run/docker.sock:state=default])
2018-10-14 10:40:38.445001-0700 localhost com.docker.osxfs[25407]: Volume.approve docker/1dc9ccb50c243e12df8577d7f1dc36f8cc076f8d9fc55fb8e58c7813fb0ac21c (watches [])
2018-10-14 10:40:38.445068-0700 localhost com.docker.osxfs[25407]: Volume.start docker/1dc9ccb50c243e12df8577d7f1dc36f8cc076f8d9fc55fb8e58c7813fb0ac21c (paths = [])
2018-10-14 10:40:38.497513-0700 localhost com.docker.vpnkit[25408]: DNS lookup usage.gliderlabs.io A: NoSuchRecord

  • If relevant, Dockerfile for application that is having issues.

Description of the problem:

  • the problem is that services don't DEREGISTER when registrator starts up
  • say i've got the following keys in etcd (a bunch of old services that aren't running anymore, under prefix /registrator):

/registrator/varnish/5ec0befd2be4:stack1_varnish.2.j5hfjisgx45kil54uh0xaurgv:80
/registrator/varnish/5ec0befd2be4:stack1_varnish.1.71ivz7n3ognoc43ol0tvsl8nl:80

  • now if I restart registrator by killing the registrator container and having swarm create a new container task for the registrator service (and note that I've got --cleanup and --resync=10)... those old services don't go away, registrator just adds the current active services to the list of old services as duplicates:

/registrator/varnish/5ec0befd2be4:stack1_varnish.1.71ivz7n3ognoc43ol0tvsl8nl:80
/registrator/varnish/5955eb9b6a6a:stack1_varnish.2.j5hfjisgx45kil54uh0xaurgv:80
/registrator/varnish/5955eb9b6a6a:stack1_varnish.1.71ivz7n3ognoc43ol0tvsl8nl:80
/registrator/varnish/5ec0befd2be4:stack1_varnish.2.j5hfjisgx45kil54uh0xaurgv:80

How reproducible:

  • very, problem happens everytime

Steps to Reproduce:

  • run a stack including registrator in the stack
  • run registrator, note that registrator successfully adds the running services into
  • kill registrator container so that swarm creates a new registrator container
  • now you'll see duplicates in etcd

Actual Results:

  • duplicates in etcd

Expected Results:

  • no duplicates

Additional info:

@giulioprovasi
Copy link

got exact same issue with consul, here's the registrator config:

version: '2'
services:
    registrator:
        image: gliderlabs/registrator
        volumes:
            - /var/run/docker.sock:/tmp/docker.sock
        command:
            -retry-attempts=-1
            -cleanup
            --resync=10
            --deregister=always
            --cleanup=true
            --internal=true
            consul://consul-server-bootstrap:8500
        networks:
            - back

@stoffus
Copy link

stoffus commented Nov 14, 2018

Same here with Consul 1.3.0, running one registrator container per node in a Docker Swarm.

@jcperezamin
Copy link

Same issue here Consul 1.4.3, kubernetes 13.3

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
creationTimestamp: null
labels:
io.kompose.service: registrator
name: registrator
spec:
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
template:
metadata:
creationTimestamp: null
labels:
io.kompose.service: registrator
spec:
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
containers:
- args:
- -useIpFromEnv=POD_IP
- -retry-attempts=-1
- -resync=10
- -cleanup=true
- -deregister=always
- consul://consul:8500
image: gliderlabs/registrator
name: registrator
resources: {}
volumeMounts:
- mountPath: /tmp/docker.sock
name: registrator-claim0
restartPolicy: Always
volumes:
- name: registrator-claim0
persistentVolumeClaim:
claimName: registrator-claim0

@rushou
Copy link

rushou commented Nov 27, 2019

got exact same issue with consul, here's the registrator config:

version: '2'
services:
    registrator:
        image: gliderlabs/registrator
        volumes:
            - /var/run/docker.sock:/tmp/docker.sock
        command:
            -retry-attempts=-1
            -cleanup
            --resync=10
            --deregister=always
            --cleanup=true
            --internal=true
            consul://consul-server-bootstrap:8500
        networks:
            - back

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants