New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add verbose flag to network inspect to show all services & tasks in swarm mode #31710

Merged
merged 2 commits into from Mar 14, 2017

Conversation

@sanimej

sanimej commented Mar 9, 2017

For swarm mode networks currently network inspect only shows endpoints local to that host. Service Discovery and overlay network reachability information gets exchanged through the gossip channel between the nodes. There have been issues where failures in the gossip channel can lead to inconsistent state across clusters. But there was no easy way to identify it.

This change adds a verbose flag to the network inspect output to display all services on that network with all the task IPs and host IP where the container is running. This will be very useful to quickly identify any inconsistent state across hosts (this can show up stale or incorrect IPs in DNS queries).

Edit: libnetwork PR has been merged. Updated the vendoring.

Fixes docker #24186

Example output from a 3 node cluster. s1 has 3 replicas and s2 has 1 replica.

vagrant@net-3:~$ docker network inspect --verbose ov1
[
    {
        "Name": "ov1",
        "Id": "ybmyjvao9vtzy3oorxbssj13b",
        "Created": "2017-03-07T22:08:56.471374351Z",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "10.0.0.0/24",
                    "Gateway": "10.0.0.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Containers": {
            "5f125d4718f351333decbf34e02c3227a4455731b067cf6921cf9ffaa6779148": {
                "Name": "s1.2.mnzsnw7vadg2ra6mboz4ccuyb",
                "EndpointID": "54a324b9ebe5fc37c65e29cdb8eaa3becfa736e3a8a78a858bd4352a744c6879",
                "MacAddress": "02:42:0a:00:00:04",
                "IPv4Address": "10.0.0.4/24",
                "IPv6Address": ""
            }
        },
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "4097"
        },
        "Labels": {},
        "Peers": [
            {
                "Name": "net-3-3090671942ce",
                "IP": "192.168.33.13"
            },
            {
                "Name": "net-2-fb80208efd75",
                "IP": "192.168.33.12"
            },
            {
                "Name": "net-1-6ecbc0040a73",
                "IP": "192.168.33.11"
            }
        ],
        "Services": {
            "s1": {
                "VIP": "10.0.0.2",
                "Ports": [],
                "Tasks": [
                    {
                        "Name": "s1.1.530e00o1oj9l1kivoci7oupzt",
                        "EndpointID": "f0a46deaf7ce1d87bf42437188a7ba4d8d6ae3e98fe439db95171d8330e278a0",
                        "EndpointIP": "10.0.0.3",
                        "Info": "Host IP 192.168.33.12"
                    },
                    {
                        "Name": "s1.3.98jplsdb5j7ozibrbv0bfs3fg",
                        "EndpointID": "005c24c13b1ddb0395de122ae7c75c889cebc52d4cc61ee918ef3a9dbffa2b27",
                        "EndpointIP": "10.0.0.5",
                        "Info": "Host IP 192.168.33.11"
                    },
                    {
                        "Name": "s1.2.mnzsnw7vadg2ra6mboz4ccuyb",
                        "EndpointID": "54a324b9ebe5fc37c65e29cdb8eaa3becfa736e3a8a78a858bd4352a744c6879",
                        "EndpointIP": "10.0.0.4",
                        "Info": "Host IP 192.168.33.13"
                    }
                ]
            },
            "s2": {
                "VIP": "10.0.0.6",
                "Ports": [],
                "Tasks": [
                    {
                        "Name": "s2.1.spc20i5to6crco9wdh5t3gzh2",
                        "EndpointID": "9d36c87e72236b0c5e0c5794b6963c84c3449ecbab5648ccb861993c760c30b9",
                        "EndpointIP": "10.0.0.7",
                        "Info": "Host IP 192.168.33.12"
                    }
                ]
            }
        }
    }
]
@aaronlehmann

This comment has been minimized.

Show comment
Hide comment
@aaronlehmann

aaronlehmann Mar 10, 2017

Contributor

There is a gofmt issue preventing CI from running.

Contributor

aaronlehmann commented Mar 10, 2017

There is a gofmt issue preventing CI from running.

Show outdated Hide outdated api/types/types.go Outdated
Show outdated Hide outdated api/types/types.go Outdated
@aaronlehmann

This comment has been minimized.

Show comment
Hide comment
@aaronlehmann

aaronlehmann Mar 10, 2017

Contributor

Rather than changing the API to include this information, could we do this on the client side by adding network selectors to ListTasksRequest? I suggested doing this in #31551 (comment) for something else. It would be pretty easy to implement that, and it probably makes sense to do.

Contributor

aaronlehmann commented Mar 10, 2017

Rather than changing the API to include this information, could we do this on the client side by adding network selectors to ListTasksRequest? I suggested doing this in #31551 (comment) for something else. It would be pretty easy to implement that, and it probably makes sense to do.

@sanimej

This comment has been minimized.

Show comment
Hide comment
@sanimej

sanimej Mar 10, 2017

@aaronlehmann One of the reasons for implementing this is to have a quick way to check if the network control plane state distributed by gossip is consistent across all nodes. So this has to work on all nodes, mainly the workers. I am working on a diagnostics container which will probe the kernel state and make sure its consistent (for ex: LB entries in IPVS matches the number of tasks for a given service). So using the swarm control api will not work in this case. The service/task information presented here is fetched from libnetwork's networkDB.

sanimej commented Mar 10, 2017

@aaronlehmann One of the reasons for implementing this is to have a quick way to check if the network control plane state distributed by gossip is consistent across all nodes. So this has to work on all nodes, mainly the workers. I am working on a diagnostics container which will probe the kernel state and make sure its consistent (for ex: LB entries in IPVS matches the number of tasks for a given service). So using the swarm control api will not work in this case. The service/task information presented here is fetched from libnetwork's networkDB.

@mavenugo

This comment has been minimized.

Show comment
Hide comment
@mavenugo

mavenugo Mar 12, 2017

Contributor

Thanks @sanimej . Yes, this is a very useful addition.

@aaronlehmann as @sanimej suggested, the main purpose of this change is provide a way to perform consistency check between the distributed control-plane (via Gossip) and the distributed data-plane that is built using various tools such as iptables, l2/l3 table, ipvs, etc...

Contributor

mavenugo commented Mar 12, 2017

Thanks @sanimej . Yes, this is a very useful addition.

@aaronlehmann as @sanimej suggested, the main purpose of this change is provide a way to perform consistency check between the distributed control-plane (via Gossip) and the distributed data-plane that is built using various tools such as iptables, l2/l3 table, ipvs, etc...

@mavenugo

@sanimej couple of minor comments. rest of it LGTM.

Show outdated Hide outdated api/types/types.go Outdated
Show outdated Hide outdated api/types/types.go Outdated
@sanimej

This comment has been minimized.

Show comment
Hide comment
@sanimej

sanimej Mar 12, 2017

@mavenugo Addressed the comments. PTAL.

sanimej commented Mar 12, 2017

@mavenugo Addressed the comments. PTAL.

Show outdated Hide outdated vendor.conf Outdated
@mavenugo

LGTM

Santhosh Manohar
Vendor libnetwork for network inspect --verbose changes
Signed-off-by: Santhosh Manohar <santhosh@docker.com>
@sanimej

This comment has been minimized.

Show comment
Hide comment
@sanimej

sanimej Mar 13, 2017

Vendoring also fixes docker #30727

sanimej commented Mar 13, 2017

Vendoring also fixes docker #30727

@sanimej

This comment has been minimized.

Show comment
Hide comment
@sanimej

sanimej Mar 13, 2017

@thaJeztah Updated the reference document and man page for the network inspect command.

sanimej commented Mar 13, 2017

@thaJeztah Updated the reference document and man page for the network inspect command.

Ports []string
LocalLBIndex int
Tasks []Task
}

This comment has been minimized.

@aaronlehmann

aaronlehmann Mar 13, 2017

Contributor

Do these need to be included in swagger.yaml?

@aaronlehmann

aaronlehmann Mar 13, 2017

Contributor

Do these need to be included in swagger.yaml?

This comment has been minimized.

@sanimej

sanimej Mar 13, 2017

ServiceInfo and Task are shown only in the verbose case. Its not clear if multiple responsess: sections can be specified in swagger.yml file for different options of an API. Have to look into how its done in swagger. I will open an issue for this and get it done in a subsequent PR if thats ok.

@sanimej

sanimej Mar 13, 2017

ServiceInfo and Task are shown only in the verbose case. Its not clear if multiple responsess: sections can be specified in swagger.yml file for different options of an API. Have to look into how its done in swagger. I will open an issue for this and get it done in a subsequent PR if thats ok.

@sanimej

This comment has been minimized.

Show comment
Hide comment
@sanimej

sanimej Mar 13, 2017

@aaronlehmann Addressed the comments. PTAL. Opened an issue to update the response types correctly in swagger.yml.

sanimej commented Mar 13, 2017

@aaronlehmann Addressed the comments. PTAL. Opened an issue to update the response types correctly in swagger.yml.

@sanimej

This comment has been minimized.

Show comment
Hide comment
@sanimej

sanimej Mar 13, 2017

Corrected the error handling for verbose query string.

sanimej commented Mar 13, 2017

Corrected the error handling for verbose query string.

@aaronlehmann

This comment has been minimized.

Show comment
Hide comment
@aaronlehmann

aaronlehmann Mar 14, 2017

Contributor

LGTM

Contributor

aaronlehmann commented Mar 14, 2017

LGTM

@thaJeztah thaJeztah added this to the 17.04.0 milestone Mar 14, 2017

Show outdated Hide outdated man/src/network/inspect.md Outdated
@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Mar 14, 2017

Member

Left some small nits, but no show-stoppers

Member

thaJeztah commented Mar 14, 2017

Left some small nits, but no show-stoppers

Santhosh Manohar
Enhance network inspect to show all tasks, local & non-local, in swar…
…m mode

Signed-off-by: Santhosh Manohar <santhosh@docker.com>
@sanimej

This comment has been minimized.

Show comment
Hide comment
@sanimej

sanimej Mar 14, 2017

@thaJeztah Updated the PR.

sanimej commented Mar 14, 2017

@thaJeztah Updated the PR.

@thaJeztah

LGTM, thanks!

@mavenugo mavenugo merged commit cdf66ba into moby:master Mar 14, 2017

6 of 7 checks passed

z Jenkins build is being scheduled
Details
dco-signed All commits are signed
experimental Jenkins build Docker-PRs-experimental 31681 has succeeded
Details
janky Jenkins build Docker-PRs 40304 has succeeded
Details
powerpc Jenkins build Docker-PRs-powerpc 391 has succeeded
Details
vendor Jenkins build Docker-PRs-vendor 2999 has succeeded
Details
windowsRS1 Jenkins build Docker-PRs-WoW-RS1 11383 has succeeded
Details

@sanimej sanimej referenced this pull request Mar 14, 2017

Merged

bump 17.04.0-rc1 #31811

dnephin pushed a commit to dnephin/docker that referenced this pull request Apr 17, 2017

Merge pull request moby#31710 from sanimej/drillerrr
Add verbose flag to network inspect to show all services & tasks in swarm mode

dnephin pushed a commit to dnephin/docker that referenced this pull request Apr 17, 2017

Merge pull request moby#31710 from sanimej/drillerrr
Add verbose flag to network inspect to show all services & tasks in swarm mode
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment