Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k8s enhancements #207

Merged
merged 22 commits into from
Dec 12, 2018
Merged

k8s enhancements #207

merged 22 commits into from
Dec 12, 2018

Conversation

yorinasub17
Copy link
Contributor

@yorinasub17 yorinasub17 commented Dec 11, 2018

More functions for #194

  • Expose some functions that can be used without testing.T to not take in testing.T so that they can be used outside testing context.

    • GetPublicIpsOfEc2InstancesFromClientE
    • GetKubernetesClientFromFileE
    • GetNodesByClientE
  • Migrate ContinuouslyCheckUrl from module-ecs

  • Expose StoreConfigToFile from the k8s module tests, as that can be used for storing a kubernetes resource config to disk to be used with kubectl apply

  • Introduce wrapper functions for calling kubectl:

    • RunKubectl, RunKubectlE, RunKubectlAndGetOutputE
    • KubectlApplyFromString, KubectlApplyFromStringE
    • KubectlDeleteFromString, KubectlDeleteFromStringE
  • Introduce functions for interacting with Kubernetes Service resources.

@yorinasub17 yorinasub17 changed the title Yori kubectl k8s enhancements Dec 11, 2018
@brikis98
Copy link
Member

Expose some functions that can be used without testing.T to not take in testing.T so that they can be used outside testing context.

What's the use case? Do we really want production code depending on a test library?

Migrate ContinuouslyCheckUrl from module-ecs

Nice 👍

Copy link
Member

@brikis98 brikis98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forget, do we have a top-level Kubernetes example in /examples and corresponding test for it in /test?

statusCode, body, err := HttpGetE(t, url)
logger.Logf(t, "Got response %d and err %v from URL at %s", statusCode, err, url)
if err != nil {
t.Fatalf("Failed to make HTTP request to the URL at %s: %s\n", url, err.Error())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does calling t.Fatal in another goroutine work? I have some vague memory that it does not...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this works. It's been causing my tests to fail (and me pulling hair) the past 2 business days 😄 Also, I copy pasted this with no modifications out of module-ecs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok I see. So the bug is that there are race conditions in there. Ok I'll see what I can do to fix this.

Copy link
Contributor Author

@yorinasub17 yorinasub17 Dec 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah we can call Errorf in the goroutine (https://golang.org/src/testing/testing.go, line 513). Switched to using that, and introduced a waitgroup that callers can use to ensure the goroutine is shutdown before test ends.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nice find! You should add a comment that indicates why we specifically use Errorf to each spot in the code so we don't accidentally break that.

One other way to do this would be for the method to return a channel and to write to that channel an error if something went wrong or nil if everything finished successfully. The calling code could then read from that channel to see how things went. But Errorf is probably good enough for now that it's not worth adding this complexity.

@@ -23,6 +24,19 @@ func LoadConfigFromPath(path string) clientcmd.ClientConfig {
return config
}

// LoadApiClientConfig will load a ClientConfig object from a file path that points to a location on disk containing a
// kubectl config, with the requested context loaded.
func LoadApiClientConfig(path string, context string) (*restclient.Config, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LoadApiClientConfigE

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

// filename.
func StoreConfigToTempFile(t *testing.T, configData string) string {
out, err := StoreConfigToTempFileE(t, configData)
require.NoError(t, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is cleaner. Will make mental note to use this elsewhere in Terratest in the future.

clientset, err := GetKubernetesClientE(t)
require.NoError(t, err)
statusMsg := fmt.Sprintf("Wait for service %s to be provisioned.", serviceName)
message, err := retry.DoWithRetryE(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use the non "E" version here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

// The load balancer is ready if it has at least one ingress point
return len(ingress) > 0
default:
return true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the default be false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment at beginning of function:

Only the LoadBalancer type has a delay. All other service types are available if the resource exists.

return corev1.Node{}, err
}
if len(nodes) == 0 {
return corev1.Node{}, NewKubernetesError("There are no nodes in the cluster")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the advantage of a single NewKubernetesError used throughout this code? It doesn't make it easy to type check for specific errors, unlike, for example, a NoNodesInCluster type that is specific to this exact error here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. Will fix.

// Path is /AVAILABILITY_ZONE/INSTANCE_ID
parts := strings.Split(awsIDUri.Path, "/")
instanceID := parts[2]
availabilityZone := parts[1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check length of parts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added check

@yorinasub17
Copy link
Contributor Author

What's the use case? Do we really want production code depending on a test library?

While I agree this is a bit awkward, the wrappers we built around the clouds end up being useful abstractions for writing any code that works against the underlying systems.

For example, in the rollout command I ended up doing a lot of the same things I would do in a test context: waiting for nodes to be ready, initializing clients by config files, manipulating config files, calling kubectl. And I really wanted to avoid copy pasting the code and duplicating the tests.

I guess the most ideal situation would be to break out the cloud verification code into its own repo that then prod code and terratest can import, but I wasn't quite ready to shave the yak of figuring out the right abstraction, and also cause a lot of disruption.

@brikis98
Copy link
Member

For example, in the rollout command I ended up doing a lot of the same things I would do in a test context: waiting for nodes to be ready, initializing clients by config files, manipulating config files, calling kubectl. And I really wanted to avoid copy pasting the code and duplicating the tests.

Sanity check: is this code really the same or just somewhat the same. I worry that Terratest may update a lot of these functions over time to be more and more suited for testing, with little regard for potential prod uses of this code... And if it's not too much code, sometimes, trying to keep things perfectly DRY can be counter-productive.

@yorinasub17
Copy link
Contributor Author

is this code really the same or just somewhat the same.

The kubectl wrappers (RunKubectl) are exactly the same right now. Getting and waiting for nodes are also exactly the same. The AWS stuff is more questionable.

I worry that Terratest may update a lot of these functions over time to be more and more suited for testing, with little regard for potential prod uses of this code...

Those are valid points, although as mentioned, it would be nice if they were isolated in a separate library to avoid that... but then there is overhead ⬇️

And if it's not too much code, sometimes, trying to keep things perfectly DRY can be counter-productive.

Ok I will revert those and copy paste and see how much duplication it ends up being, and if overtime it really does get out of hand, we can discuss merging then.

@yorinasub17
Copy link
Contributor Author

  • Reverted the "FromClient" versions
  • Addressed all suggested changes:
    • Added more unique error types
    • Use the E suffix convention

@yorinasub17
Copy link
Contributor Author

UPDATES:

  • c7853c2 - Made ContinuouslyCheckUrl use Errorf. It will also now return a sync.WaitGroup that can be used to wait for the continuous check to shutdown.
  • ad2c714 - Found and fixed a bug in DummyServer, where we try to register multiple handlers to route / on the DefaultServeMux. Caught this when I wrote a test for ContinuouslyCheckUrl.
  • 4f285bf - Added more k8s functionalities, related to dealing with namespaces. Also changed the function APIs to take in a KubectlOptions object so that you can control which context to use as well as which namespace.
  • 8d4160d - Added a basic k8s example and test

Currently, I am waiting to get a green build.

@yorinasub17
Copy link
Contributor Author

UPDATE:

  • 0091cc6 - Introduce GetRandomStableRegion for aws, which only selects from regions that have been around for more than a year. The list is hardcoded for now, but added the reference so that it can be evaluated on a yearly basis. This way, we won't suddenly have a bunch of test failures everytime a new region pops up.

Copy link
Member

@brikis98 brikis98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are great additions. Thx Yori!

@@ -0,0 +1,40 @@
# Kubernetes Basic Example
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, you may want to add to our newsletter that Terratest now supports automated testing for Kubernetes as a top-level item. SEO gold.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup was thinking of doing that as soon as this merges

@@ -20,6 +20,35 @@ const regionOverrideEnvVarName = "TERRATEST_REGION"
// this region as a default.
const defaultRegion = "us-east-1"

// Reference for launch dates: https://aws.amazon.com/about-aws/global-infrastructure/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice addition 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, is there any easy way to add a yearly reminder to update this? Something in GitHub? Calendar invite?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set a reminder in slack for now. Used 1/23 every year.

http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
// Create new serve mux so that multiple handlers can be created
server := http.NewServeMux()
server.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix. I could've sworn we did this already at some point... But it must've been in some other code base...

stopChecking <- true
wg.Wait()
shutDownServer(t, listener)
}()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, this test could pass if ContinuouslyCheckUrl was a noop... Worth adding a negative test that takes the server down part of the way through and checks that a failure happened? This might be tricky to do given that ContinuouslyCheckUrl calls t.Errorf explicitly... Unless you pass it a mock t...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok what do you think of? 4b54663 Not perfect, but I think is slightly better.

I know we can do the error channel instead like you suggested above, but I like the fact that the test will fail without any additional logic if there is one request that failed in the current model.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

)

// ServiceNotAvailable is returned when a Kubernetes service is not yet available to accept traffic.
type ServiceNotAvailable struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are much clearer errors, thanks 👍

// This will get the service resource and verify that it exists and was retrieved successfully. This function will
// fail the test if the there is an error retrieving the service resource from Kubernetes.
service := k8s.GetService(t, options, "nginx-service")
require.Equal(t, service.Name, "nginx-service")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this test actually try to connect to nginx?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one, no. I wanted to keep this example as basic as possible. There is another test that will actually connect and check nginx: kubernetes_basic_example_service_check_test.go. You can see there that there is some added complexity from getting the endpoint and verifying it.

@yorinasub17
Copy link
Contributor Author

Ok build passed, and review checked off. Going to merge, release, and start drafting an entry in the newsletter!

@yorinasub17 yorinasub17 merged commit dfcc65c into master Dec 12, 2018
@yorinasub17 yorinasub17 deleted the yori-kubectl branch December 12, 2018 23:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants