Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-work the DNS part of the test for #26762 #27587

Merged
merged 1 commit into from
Jun 17, 2016
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
166 changes: 108 additions & 58 deletions test/e2e/federated-service.go
Original file line number Diff line number Diff line change
Expand Up @@ -50,10 +50,6 @@ const (
FederatedServicePod = "federated-service-test-pod"

DefaultFederationName = "federation"

// TODO: Only suppoprts IPv4 addresses. Also add a regexp for IPv6 addresses.
FederatedIPAddrRegexp = `(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])`
FederatedDNS1123Regexp = `([a-z0-9]([-a-z0-9]*[a-z0-9])?\.)*([a-z0-9]([-a-z0-9]*[a-z0-9])?)`
)

var _ = framework.KubeDescribe("Service [Feature:Federation]", func() {
Expand Down Expand Up @@ -148,49 +144,67 @@ var _ = framework.KubeDescribe("Service [Feature:Federation]", func() {
}
})

It("should be able to discover a federated service", func() {
framework.SkipUnlessFederated(f.Client)

createService(f.FederationClientset_1_3, clusterClientSets, f.Namespace.Name)

svcDNSNames := []string{
FederatedServiceName,
fmt.Sprintf("%s.%s", FederatedServiceName, f.Namespace.Name),
fmt.Sprintf("%s.%s.svc.cluster.local.", FederatedServiceName, f.Namespace.Name),
fmt.Sprintf("%s.%s.%s", FederatedServiceName, f.Namespace.Name, federationName),
fmt.Sprintf("%s.%s.%s.svc.cluster.local.", FederatedServiceName, f.Namespace.Name, federationName),
}
for _, name := range svcDNSNames {
discoverService(f, name, true)
}
})

It("should be able to discover a non-local federated service", func() {
framework.SkipUnlessFederated(f.Client)

createService(f.FederationClientset_1_3, clusterClientSets, f.Namespace.Name)
Describe("DNS", func() {
BeforeEach(func() {
framework.SkipUnlessFederated(f.Client)
createService(f.FederationClientset_1_3, clusterClientSets, f.Namespace.Name)
})

// Delete a federated service shard in the default e2e Kubernetes cluster.
err := f.Clientset_1_3.Core().Services(f.Namespace.Name).Delete(FederatedServiceName, &api.DeleteOptions{})
Expect(err).NotTo(HaveOccurred())
waitForFederatedServiceShard(f.Clientset_1_3, f.Namespace.Name, nil, 0)
It("should be able to discover a federated service", func() {
framework.SkipUnlessFederated(f.Client)

localSvcDNSNames := []string{
FederatedServiceName,
fmt.Sprintf("%s.%s", FederatedServiceName, f.Namespace.Name),
fmt.Sprintf("%s.%s.svc.cluster.local.", FederatedServiceName, f.Namespace.Name),
}
for _, name := range localSvcDNSNames {
discoverService(f, name, false)
}
svcDNSNames := []string{
FederatedServiceName,
fmt.Sprintf("%s.%s", FederatedServiceName, f.Namespace.Name),
fmt.Sprintf("%s.%s.svc.cluster.local.", FederatedServiceName, f.Namespace.Name),
fmt.Sprintf("%s.%s.%s", FederatedServiceName, f.Namespace.Name, federationName),
fmt.Sprintf("%s.%s.%s.svc.cluster.local.", FederatedServiceName, f.Namespace.Name, federationName),
}
// TODO(mml): This could be much faster. We can launch all the test
// pods, perhaps in the BeforeEach, and then just poll until we get
// successes/failures from them all.
for _, name := range svcDNSNames {
discoverService(f, name)
}
})

svcDNSNames := []string{
fmt.Sprintf("%s.%s.%s", FederatedServiceName, f.Namespace.Name, federationName),
fmt.Sprintf("%s.%s.%s.svc.cluster.local.", FederatedServiceName, f.Namespace.Name, federationName),
}
for _, name := range svcDNSNames {
discoverService(f, name, true)
}
Context("non-local federated service", func() {
BeforeEach(func() {
framework.SkipUnlessFederated(f.Client)

// Delete a federated service shard in the default e2e Kubernetes cluster.
err := f.Clientset_1_3.Core().Services(f.Namespace.Name).Delete(FederatedServiceName, &api.DeleteOptions{})
Expect(err).NotTo(HaveOccurred())
waitForFederatedServiceShard(f.Clientset_1_3, f.Namespace.Name, nil, 0)
})

It("should be able to discover a non-local federated service", func() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you accidentally left in the 'F'?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh! You beat me to it :-)

framework.SkipUnlessFederated(f.Client)

svcDNSNames := []string{
fmt.Sprintf("%s.%s.%s", FederatedServiceName, f.Namespace.Name, federationName),
fmt.Sprintf("%s.%s.%s.svc.cluster.local.", FederatedServiceName, f.Namespace.Name, federationName),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few problems here, as discussed in person:

  1. Deleting the service from the local cluster won't actually work, as the Federation Service Controller will re-create it (or at least it should, and if it doesn't, that's a bug). Instead you'll have to remove all of the endpoints from the service (by deleting all pods that match the service's selector). That should remove the DNS entry for the local shard of the federated service. Again, if it doesn't, then that's a bug.
  2. Even once the DNS records have been deleted, DNS propagation will take at least TTL seconds (currently set to 180 in the Federated Service Controller) to propagate. To get around that you'd probably need to resolve before deleting the service's backends, and then after deleting, until the IP address changes (which might take up to 180 seconds, due to DNS caching). We might consider reducing that TTL in e2e test clusters to speed up the tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug filed: #27623

}
for _, name := range svcDNSNames {
discoverService(f, name)
}

// TODO(mml): Unclear how to make this meaningful and not terribly
// slow. How long (how many minutes?) do we verify that a given DNS
// lookup *doesn't* work before we call it a success? For now,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, try for DNS TTL seconds (currently configured to180).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that, but it adds 9 minutes to every run when things are passing. I guess I can leave a TODO saying consider dialing down the TTL and consider parallelizing the pod execution. I already mentioned the latter elsewhere.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm happy to leave this test disabled until we've dialed down the TTL. Just wanted to explain what the solution would be :-)

// commenting out.
/*
localSvcDNSNames := []string{
FederatedServiceName,
fmt.Sprintf("%s.%s", FederatedServiceName, f.Namespace.Name),
fmt.Sprintf("%s.%s.svc.cluster.local.", FederatedServiceName, f.Namespace.Name),
}
for _, name := range localSvcDNSNames {
discoverService(f, name, false)
}
*/
})
})
})
})

Expand Down Expand Up @@ -231,7 +245,8 @@ func waitForFederatedServiceShard(cs *release_1_3.Clientset, namespace string, s
}

func createService(fcs *federation_release_1_3.Clientset, clusterClientSets []*release_1_3.Clientset, namespace string) {
By("Creating a federated service")
By(fmt.Sprintf("Creating federated service %q in namespace %q", FederatedServiceName, namespace))

labels := map[string]string{
"foo": "bar",
}
Expand Down Expand Up @@ -260,17 +275,16 @@ func createService(fcs *federation_release_1_3.Clientset, clusterClientSets []*r
},
}
nservice, err := fcs.Core().Services(namespace).Create(service)
framework.Logf("Trying to create service %q in namespace %q", service.ObjectMeta.Name, service.ObjectMeta.Namespace)
Expect(err).NotTo(HaveOccurred())
for _, cs := range clusterClientSets {
waitForFederatedServiceShard(cs, namespace, nservice, 1)
}
}

func discoverService(f *framework.Framework, name string, exists bool) {
command := []string{"nslookup", name}
func discoverService(f *framework.Framework, name string) {
command := []string{"sh", "-c", fmt.Sprintf("until nslookup '%s'; do sleep 1; done", name)}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that you have a potentially endless loop here, right?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Didn't you decide that "sh" -"c" was the wrong way to do this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the write way as long as the next argument is a single string representing a bash command. Before we had "sh" "-c" "nslookup" "foo", and when you do that, the "foo" goes nowhere.

There is an endless loop here if nslookup never succeeds. But the Eventually has a timeout, and when we reach that timeout, the test fails and we'll delete the pod on the way out of the test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write way

😆

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Just wanted to check. Best check what the timeout on Eventually is. I'm not sure offhand.


framework.Logf("Looking for the service with pod command %q", command)
defer f.Client.Pods(f.Namespace.Name).Delete(FederatedServicePod, api.NewDeleteOptions(0))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better to do this only if the pod creation succeeded below.


pod := &api.Pod{
ObjectMeta: api.ObjectMeta{
Expand All @@ -285,16 +299,52 @@ func discoverService(f *framework.Framework, name string, exists bool) {
Command: command,
},
},
RestartPolicy: api.RestartPolicyNever,
RestartPolicy: api.RestartPolicyOnFailure,
},
}
if exists {
f.TestContainerOutputRegexp("federated service discovery", pod, 0, []string{
`Name:\s+` + FederatedDNS1123Regexp + `\nAddress \d+:\s+` + FederatedIPAddrRegexp + `\s+` + FederatedDNS1123Regexp,
})
} else {
f.TestContainerOutputRegexp("federated service discovery", pod, 0, []string{
`nslookup: can't resolve '` + name + `'`,
})

_, err := f.Client.Pods(f.Namespace.Name).Create(pod)
Expect(err).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer this on one line?

NotTo(HaveOccurred(), "Trying to create pod to run %q", command)

// If we ever get any container logs, stash them here.
logs := ""

logerr := func(err error) error {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps neater and more re-usable to factor this out into a separate function that others can use?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but I'm not sure the final form the test is going to take. E.g., adding back in the "DNS record missing" assertions, possibly checking the output for the correct IP address. When it feels solidified, I will go back and extract stuff.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough.

if err == nil {
return nil
}
if logs == "" {
return err
}
return fmt.Errorf("%s (%v)", logs, err)
}

// TODO(mml): Eventually check the IP address is correct, too.
Eventually(func() error {
pod, err := f.Client.Pods(f.Namespace.Name).Get(FederatedServicePod)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Similar comment to above. Perhaps better as a standalone function?

if err != nil {
return logerr(err)
}
if len(pod.Status.ContainerStatuses) < 1 {
return logerr(fmt.Errorf("no container statuses"))
}

// Best effort attempt to grab pod logs for debugging
logs, err = framework.GetPodLogs(f.Client, f.Namespace.Name, FederatedServicePod, "federated-service-discovery-container")
if err != nil {
framework.Logf("Cannot fetch pod logs: %v", err)
}

status := pod.Status.ContainerStatuses[0]
if status.State.Terminated == nil {
return logerr(fmt.Errorf("container is not in terminated state"))
}
if status.State.Terminated.ExitCode == 0 {
return nil
}

return logerr(fmt.Errorf("exited %d", status.State.Terminated.ExitCode))
}, time.Minute*2, time.Second*2).
Should(BeNil(), "%q should exit 0, but it never did", command)
}