Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabling ILB/ELB on windows using per-node, per-network LB endpoint. #34674

Merged
merged 2 commits into from Sep 18, 2017

Conversation

@pradipd
Copy link
Contributor

commented Aug 29, 2017

Note: This is 1 of 3 PRs (the other 2 are in libnetwork and swarmkit repos).

Signed-off-by: Pradip Dhara pradipd@microsoft.com

- What I did
Enabled ILB/ELB (routing mesh) functionality for windows.

- How I did it
This change enables ILB/ELB functionality on windows using a per-node, per-network LB endpoint. The per-node, per-network LB endpoint was discussed with docker
as a solution to #30820. Note that only windows is using the LB endpoint. We have not changed the linux networking code to take advantage of the LB endpoint.

- How to verify it
make test on linux (note: I'm still seeing 2 tests occasionally fail. But, I'm fairly certain they are unrelated to my change. I'm sending this out now and will continue to debug).
We have added a bunch of powershell scripts to test out functionality. However, they are not public. We will work with docker folks in figuring out how to make them public. Ideally, we should get the docker_cli_swarm_tests running on windows, but, that is much bigger task and requires changes in windows.

- Description for the changelog
Enabling ILB/ELB on windows using per-node, per-network LB endpoint.

- A picture of a cute animal (not mandatory but encouraged)
tubby shaved

@pradipd

This comment has been minimized.

Copy link
Contributor Author

commented Aug 29, 2017

libnetwork PR: docker/libnetwork#1925
swarmkit PR: docker/swarmkit#2363

@pradipd

This comment has been minimized.

Copy link
Contributor Author

commented Aug 29, 2017

@vieux

This comment has been minimized.

Copy link
Collaborator

commented Aug 30, 2017

@pradipd can you please fix the tests ?

@@ -360,6 +387,34 @@ func (daemon *Daemon) createNetwork(create types.NetworkCreateRequest, id string
}
daemon.LogNetworkEvent(n, "create")

if agent && !n.Info().Ingress() && n.Type() == "overlay" {

This comment has been minimized.

Copy link
@dnephin

dnephin Aug 30, 2017

Member

Please move this code (and the code in deleteNetwork) to a new function. A separate function will make it easier to unit test, and help the reader understand the different steps performed as part of createNetwork and deleteNetwork.

This comment has been minimized.

Copy link
@pradipd

pradipd Sep 1, 2017

Author Contributor

Done.

sb, err := c.NewSandbox(sandboxName)
if err != nil {
logrus.Errorf("Failed creating %s sandbox: %v", sandboxName, err)
return nil, err

This comment has been minimized.

Copy link
@dnephin

dnephin Aug 30, 2017

Member

I don't think the error needs to be logged. The conventional way of handling errors is to return them with errors.Wrapf(err, "failed creating %s sandbox", sandboxName).

The errors should end up in the logs.

@pradipd

This comment has been minimized.

Copy link
Contributor Author

commented Aug 31, 2017

Thanks for the feedback. I'm addressing it now. I had to fix some changes in libnetwork PR (docker/libnetwork#1925) and swarmkit PR (docker/swarmkit#2363)

endpointName := prefix + "-endpoint"
ep, err := n.CreateEndpoint(endpointName, libnetwork.CreateOptionIpam(ip, nil, nil, nil), libnetwork.CreateOptionLoadBalancer())
if err != nil {
logrus.Errorf("Failed creating %s in sandbox %s: %v", endpointName, sandboxName, err)

This comment has been minimized.

Copy link
@pradipd

pradipd Sep 1, 2017

Author Contributor

Note: I left the logrus error messages here, because they were here before for when we were creating the ingress network.

@pradipd

This comment has been minimized.

Copy link
Contributor Author

commented Sep 1, 2017

@vieux In order to get the tests passing I need changes from docker/libnetwork#1925 and docker/swarmkit#2363
Do I have to wait for those 2 pull requests to merge?
Or do i vendor in those changes and point vendor.conf to my github repo?

@GordonTheTurtle

This comment has been minimized.

Copy link

commented Sep 7, 2017

Please sign your commits following these rules:
https://github.com/moby/moby/blob/master/CONTRIBUTING.md#sign-your-work
The easiest way to do this is to amend the last commit:

$ git clone -b "windows_routingmesh" git@github.com:pradipd/moby.git somewhere
$ cd somewhere
$ git rebase -i HEAD~842354127368
editor opens
change each 'pick' to 'edit'
save the file and quit
$ git commit --amend -s --no-edit
$ git rebase --continue # and repeat the amend for each commit
$ git push -f

Amending updates the existing PR. You DO NOT need to open a new one.

@pradipd

This comment has been minimized.

Copy link
Contributor Author

commented Sep 12, 2017

@vieux : Can you help me understand the integration tests?
Originally I added a test to docker_cli_swarm_test.go, but that gave a janky error.
integration-cli is deprecated. Please add an API integration test to
17:19:20 ./integration/COMPONENT/. See ./TESTING.md for more details

So, I moved the test to docker_api_swarm_test.go, but the test is exactly the same. That fixed the Janky error, but, I don't think that was the proper way to fix the test.

I also tried to add the test to integration/service, but, I can't figure out how to run the tests.
How to I run the tests under integration/service?

@dnephin
Copy link
Member

left a comment

How to I run the tests under integration/service?

The same way you would the cli suite, make test-integration

d := s.AddDaemon(c, true, true)

// Create a new overlay network
out, err := d.Cmd("network", "create", "-d", "overlay", "new-overlay")

This comment has been minimized.

Copy link
@dnephin

dnephin Sep 12, 2017

Member

The API suite should not use d.Cmd() please use testEnv.APIClient() to get a client, and make calls to the API.

This comment has been minimized.

Copy link
@pradipd

pradipd Sep 12, 2017

Author Contributor

Thanks for the clarification. That is what I thought, but, I saw other tests in the .Cmd() and hence the confusion. I will clean this up.

@pradipd pradipd force-pushed the pradipd:windows_routingmesh branch from 9f7bd27 to a70fad7 Sep 14, 2017

@GordonTheTurtle GordonTheTurtle removed the dco/no label Sep 14, 2017

@pradipd pradipd force-pushed the pradipd:windows_routingmesh branch 2 times, most recently from 50f273d to 6fda10b Sep 14, 2017

@vieux

This comment has been minimized.

Copy link
Collaborator

commented Sep 15, 2017

ping @tiborvass @fcrisciani @vdemeester can you take a look ?

@pradipd pradipd force-pushed the pradipd:windows_routingmesh branch from 6fda10b to 704610a Sep 15, 2017

@vieux vieux added the rebuild/z label Sep 15, 2017

e.backend.ReleaseIngress()
e.backend.ClearLBAttachments()

This comment has been minimized.

Copy link
@fcrisciani

fcrisciani Sep 17, 2017

Contributor

is this function clearing the attachments of all the networks or only the Ingress?

This comment has been minimized.

Copy link
@pradipd

pradipd Sep 18, 2017

Author Contributor

It is clearing all the attachments for all networks --- which is wrong.
I fixed this in the latest PR.
The lb attachments are always "reset". I.e. they are cleared and then updated with the latest node.lbattachments passed in from the swarm manager.

return err
}

err = e.backend.AddLBAttachments(lbAttachments)

This comment has been minimized.

Copy link
@fcrisciani

fcrisciani Sep 17, 2017

Contributor

The feeling here is that if there is not ingress network there is no lb configured, is it correct or there is a different code path followed?

This comment has been minimized.

Copy link
@pradipd

pradipd Sep 18, 2017

Author Contributor

If it is not ingress network, then we save the lb attachment (actually, only the per network IP address).
We only use the lb IP address when we create a service.

@@ -496,6 +543,28 @@ func (daemon *Daemon) DeleteNetwork(networkID string) error {
return daemon.deleteNetwork(networkID, false)
}

func (daemon *Daemon) deleteLoadBalancerSandbox(n libnetwork.Network) {

This comment has been minimized.

Copy link
@fcrisciani

fcrisciani Sep 17, 2017

Contributor

I don't see a disableService here, wondering why there is an asymmetry between the EnableService in the Create and no need for the disableService here

This comment has been minimized.

Copy link
@pradipd

pradipd Sep 18, 2017

Author Contributor

Oops. I think I missed this in the last PR. I'll update it. sorry.

This comment has been minimized.

Copy link
@pradipd

pradipd Sep 18, 2017

Author Contributor

Address in the latest PR.

@thaJeztah

This comment has been minimized.

Copy link
Member

commented Sep 18, 2017

@pradipd vendor check is failing:

18:12:05 The result of vndr differs
18:12:05 
18:12:05  M vendor/github.com/docker/libnetwork/drivers/overlay/peerdb.go
18:12:05 
18:12:05 Please vendor your package with github.com/LK4D4/vndr.
@pradipd

This comment has been minimized.

Copy link
Contributor Author

commented Sep 18, 2017

Oops. I shouldn't have done s/sanbox/sandbox/ in the vendored packages.
will fix.

@thaJeztah
Copy link
Member

left a comment

thanks for addressing my nits; don't forget to squash commits to two commits; 1 for "vendoring" and 1 for the local changes. (We don't have to preserve the "fix vendor" and "address review comments" when merging)

Enabling ILB/ELB on windows using per-node, per-network LB endpoint.
Signed-off-by: Pradip Dhara <pradipd@microsoft.com>

@pradipd pradipd force-pushed the pradipd:windows_routingmesh branch from e5e3089 to f314530 Sep 18, 2017

vendoring libnetwork and swarmkit
Signed-off-by: Pradip Dhara <pradipd@microsoft.com>

@pradipd pradipd force-pushed the pradipd:windows_routingmesh branch from f314530 to 4c1b079 Sep 18, 2017

@vieux vieux merged commit a2ee40b into moby:master Sep 18, 2017

7 checks passed

dco-signed All commits are signed
experimental Jenkins build Docker-PRs-experimental 36914 has succeeded
Details
janky Jenkins build Docker-PRs 45569 has succeeded
Details
powerpc Jenkins build Docker-PRs-powerpc 5960 has succeeded
Details
vendor Jenkins build Docker-PRs-vendor 3833 has succeeded
Details
windowsRS1 Jenkins build Docker-PRs-WoW-RS1 17160 has succeeded
Details
z Jenkins build Docker-PRs-s390x 5756 has succeeded
Details

@pradipd pradipd deleted the pradipd:windows_routingmesh branch Sep 19, 2017

@dnephin
Copy link
Member

left a comment

Thank you for making those changes to the test case.

I think the test still needs a bit of cleanup. Would you mind opening a PR to address these points?

poll.WaitOn(t, serviceRunningTasksCount(client, serviceID, instances))

_, _, err = client.ServiceInspectWithRaw(context.Background(), serviceID, types.ServiceInspectOptions{})
require.NoError(t, err)

This comment has been minimized.

Copy link
@dnephin

dnephin Sep 19, 2017

Member

These 2 lines seem to be unnecessary. We already know the service can be inspected from the poll.WaitOn(). I think these 2 lines need to be removed.

This comment has been minimized.

Copy link
@pradipd

pradipd Sep 19, 2017

Author Contributor

Will remove in new PR.

require.NoError(t, err)
assert.Contains(t, network.Containers, overlayName+"-sbox")

err = client.ServiceRemove(context.Background(), serviceID)

This comment has been minimized.

Copy link
@dnephin

dnephin Sep 19, 2017

Member

Everything from this line to the end of the test case seems to be test teardown, which should already be handled by defer setupTest(t)(). Why is all this necessary?

This comment has been minimized.

Copy link
@pradipd

pradipd Sep 19, 2017

Author Contributor

I want to validate that the network is removed cleanly.
I see setupTest() is tearing everything down, but, it is forcefully shutting things down. I want to make sure the typical user pattern works correctly.

}
}

func serviceRunningTasksCount(client client.ServiceAPIClient, serviceID string, instances uint64) func(log poll.LogT) poll.Result {

This comment has been minimized.

Copy link
@dnephin

dnephin Sep 19, 2017

Member

This is a duplicate of serviceContainerCount() in inspect_test.go. I'm fine if you want to rename it, but I don't think we need both. We can just add the state check, which seems to be the only difference.

@pradipd

This comment has been minimized.

Copy link
Contributor Author

commented Sep 19, 2017

Will open up a new PR to address these.

@pradipd

This comment has been minimized.

Copy link
Contributor Author

commented Sep 19, 2017

#34898 is the PR.

@@ -61,4 +62,5 @@ type Backend interface {
LookupImage(name string) (*types.ImageInspect, error)
PluginManager() *plugin.Manager
PluginGetter() *plugin.Store
GetLBAttachmentStore() *networkSettings.LBAttachmentStore

This comment has been minimized.

Copy link
@stevvooe

stevvooe Sep 22, 2017

Contributor

GetAttachmentStore

@@ -121,6 +122,8 @@ type Daemon struct {
pruneRunning int32
hosts map[string]bool // hosts stores the addresses the daemon is listening on
startupDone chan struct{}

lbAttachmentStore network.LBAttachmentStore

This comment has been minimized.

Copy link
@stevvooe

stevvooe Sep 22, 2017

Contributor

attachmentStore

@@ -31,3 +34,36 @@ type EndpointSettings struct {
*networktypes.EndpointSettings
IPAMOperational bool
}

// LBAttachmentStore stores the load balancer IP address for a network id.
type LBAttachmentStore struct {

This comment has been minimized.

Copy link
@stevvooe

stevvooe Sep 22, 2017

Contributor

AttachmentStore.

@stevvooe

This comment has been minimized.

Copy link
Contributor

commented Sep 22, 2017

@pradipd I fixed the naming in swarmkit to remove LB. Could you please do so in moby, as well?

@pradipd

This comment has been minimized.

Copy link
Contributor Author

commented Sep 22, 2017

I have made the changes. How do I sync with your changes in swarmkit?
Do I vendor in your branch/commit in vendor.conf?
Or do I wait till you merge your changes into swarmkit, then vendor in the new master commit hash?

@stevvooe

This comment has been minimized.

Copy link
Contributor

commented Sep 22, 2017

@pradipd Wait till we merge the swarmkit changes and submit a PR with the updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.