Run the integration tests using Ginkgo and GHA #536

adejanovski · 2021-03-13T08:14:11Z

Use a Matrix to test different features in parallel
Full Medusa test with backup and restore using MinIO and S3
Full Reaper test scenario

What this PR does:
Adds a first set of integration tests for Medusa and Reaper deployments and introduces a GHA workflow for running them on push.

Which issue(s) this PR fixes:
Fixes #104

Checklist

Changes manually tested
Chart versions updated (if necessary)
Automated Tests added/updated
Documentation added/updated
CHANGELOG.md updated (not required for documentation PRs)
CLA Signed: DataStax CLA

Use a Matrix to test different features in parallel Full Medusa test with backup and restore using MinIO and S3 Full Reaper test scenario

burmanm · 2021-03-13T13:43:26Z

.github/workflows/integration-tests.yaml

+          cd tests/integration
+
+          # Medusa on S3 requires a secret. Skip step if the secret is absent.
+          if [[ ( -n "${{ secrets.MEDUSA_SECRET }}" && "${{ matrix.scenario }}" == "Medusa on S3" ) \


This should be in the if: clause above so the status shows correctly that it was passed.

burmanm · 2021-03-13T13:44:49Z

tests/integration/go.mod

@@ -0,0 +1,12 @@
+module github.com/k8ssandra/k8ssandra/tests/integration


The current approach was to use a single go.mod for everything, or at least everything for every test. We don't want the unit tests and integration tests to diverge in their modules.

That sounds a bit contrary to what we do for the separation of the tests themselves. @burmanm can you explain a bit why if we have dependencies only for unit tests, and some for integration tests, why would we want to lump them all together making it more difficult to track specific dependencies?

The norm is a single module per project. See https://github.com/golang/go/wiki/Modules#modules.

With respect to test separation in k8s projects, the following structure is pretty common:

Unit tests live in the same directory as the code under test

Integration tests that use envtest also live in the same directory as the code under test; however, I think some separation might be worth considering to make it clear that these are not unit tests.

End-to-end tests that run against a live cluster live under <project-root>/test/e2e.

We can rename our mentions of integration to e2e if that complies better with Go/k8s terminology, no worries.
Anyone against doing so?

I like the idea of keeping unit, integration, and e2e separate. Something like:

<project-root>/test/unit (single scoped verifications)
<project-root>/test/integration (not using live cluster)
<project-root>/test/e2e (using live cluster - more blackbox style)

I'll remove the submodule, I thought it was something we wanted here because of dependency conflicts (although VSCode gave me a hard time with it).
I struggle a bit with the distinction between integration tests and e2e tests 🤔
How would the tests implemented in this PR qualify to you between the two?

My experience has been that integration testing is verifying an integration between two components typically. That could be done without a live cluster and may include more mocks/spys as needed.

Testing A+B and B+C gives some confidence for A->C but the tests are written a bit differently in that they are testing at a lower level of integration.

E2E is more of a user experience something like input @ A -> C (verify output at end) which can be more black box in nature.

This doc is a good reference for differences at these levels.

burmanm · 2021-03-13T13:45:02Z

tests/integration/integration_suite_test.go

@@ -1,4 +1,4 @@
-package integration_test
+package integration


I was matching the directory name. It makes it easier IMO to understand where packages live by just checking the directories layout.

The _test has a meaning though in go's packages (please read it). It still has to follow directory / directory_test structure.

Just read about this, thanks for the pointer.
But it doesn't seem to apply here as we're not really testing an integration package (which would require then to use integration_test).
I've checked the package for the e2e tests of Kubernetes and it's called e2e, not e2e_test.

Am I misunderstanding your point here? What would we get by naming the package integration_test?

I wanted you to understand the change before making them, instead of making changes without understanding the effect. We have no need for _test in this case.

burmanm · 2021-03-13T13:45:37Z

tests/integration/medusa_minio_integration_test.go

+	medusaMinioTestKeyspace = "medusa"
+)
+
+var _ = Describe("Medusa on MinIO", func() {


Wasn't the plan to ditch ginkgo?

Based on prior discussions, I was also under the impression that we were going to just using the testing package probably along with https://github.com/stretchr/testify.

@jsanda and @adejanovski I propose an informal write-up for the prior discussions about our testing approach, so we are all on the same page. I can see where Ginkgo from an integration standpoint may still be viable w/ a more direct approach for the unit tests.

Looking at this medusa_minio_integration_test.go file, it is extremely readable, easy-to-understand, and self-documenting.

I propose an informal write-up for the prior discussions about our testing approach

An excellent suggestion and I think that would be great in keeping with the spirit of open source. Maybe we can try to do it with GH Discussions?

Decision pending. Discussion taking place here.

We already decided against BDD, lets not decide multiple times the same thing. Seriously.

Actually we did not. We decided against GoDog, but not everyone is on the same page about BDD vs testing.
I agree though it's time to build a consensus, which yet has to happen.

The whole meeting was about BDD, including our guest appearance. We need a reason to use BDD or framework at all beyond testing.T and there simply isn't any (and nothing in this PR has convinced more than the opposite).

I think one thing that would help me (and possibly others), if we could review a set of k8s non-BDD Go tests used for integration and e2e testing to understand what that fully looks like with all the complexities.

My concern is that these frameworks & libs (Ginkgo, Terratest, Gomega, etc.) were built to ease the pain and understanding of tests at the integration and e2e testing levels.

burmanm · 2021-03-13T13:46:33Z

tests/integration/medusa_s3_integration_test.go

+
+var _ = Describe("Medusa on S3", func() {
+	Context("Deploy K8ssandra with Medusa", func() {
+		It("Create a kind cluster and a namespace", func() {


Same code as above test, lets not repeat these steps. This should be createCluster()

Sounds good 👍

burmanm · 2021-03-13T13:47:11Z

tests/integration/medusa_minio_integration_test.go

+			ICanSeeTheNamespaceInTheListOfNamespacesStep()
+		})
+		It("Create the Medusa secret ", func() {
+			IDeployMinIOUsingHelmAndCreateTheBucketStep("k8ssandra-medusa")


These function names are too-Springish. Go prefers short function names.

If we go with a less descriptive approach to our testing, for example removal of Ginkgo, the function naming will become critical to understanding any of what is going on. I suspect we will have to have long function names and/or numerous comments.

While this is subjective I tend to agree that the function names do not seem like very idiomatic Go.

We should aim to have descriptive function names regardless of whether we use Ginkgo, testing, or something else.

Readable, descriptive test code is definitely important, but I feel we are getting hung up some things that feel a bit academic right now. I would like to see more discussion and focus around things like:

How do setup/tear down things?

How should we verify various state?

What clients should we use to talk to the api server?

How do we run things in parallel for CI?

These functions are inherited from the GoDog experiments which auto-generated function names, I didn't named them myself ;)
We can definitely make them shorter, but it would indeed be good to keep them descriptive enough (Go or not Go, I think that should apply to all languages).

Long doesn't always indicate descriptive as it makes reading them a lot slower. Also it should not have information that's obvious.

burmanm · 2021-03-13T13:48:24Z

tests/integration/medusa_s3_integration_test.go

+)
+
+var (
+	medusaS3TestTable    = "medusa_test"


Why are these separate from minioTestTable? Why not simply const( testTable = "medusa_test") in a single file?

I'm a Go noob, so trying to make things work as my main goal. Happy to refactor as suggested, it's just that I'm not sure yet what's the variables scope, especially in the context of Ginkgo.

If ginkgo modifies scopes, then it shouldn't be used.

burmanm · 2021-03-13T13:50:14Z

tests/integration/reaper_integration_test.go

+		It("Check the presence of expected resources", func() {
+			ICanCheckThatResourceOfTypeWithLabelIsPresentInNamespaceStep("service", "app.kubernetes.io/managed-by=reaper-operator")
+			ICanCheckThatResourceOfTypeWithNameIsPresentInNamespaceStep("service", "k8ssandra-dc1-all-pods-service")
+			ICanCheckThatResourceOfTypeWithNameIsPresentInNamespaceStep("service", "k8ssandra-dc1-service")


Why couldn't I check? I know the function takes type, name as parameters. Simply: checkResourceExists("service", "kassandra-dc1-service")

Again, this is inherited from GoDog which generated function names based on the Gherkin steps.

burmanm · 2021-03-13T13:51:10Z

tests/integration/steps/integration_steps.go

+	return string(outb.String())
+}
+
+func GetServicesWithLabel(label string) (*v1.ServiceList, error) {


Go does not use "Get" as prefix.

I propose we create a basic Go standards guide to be on the same page. We have a few folks that are learning Go and have background with numerous other languages where it is acceptable to have getters/setters.

https://golang.org/doc/effective_go

I propose we create a basic Go standards guide to be on the same page.

Effective Go should definitely be the starting point.

It's not really a getter since username/password are not class members (or anything equivalent in Go). I'll rename to "Extract..." since it's more accurate with what is being done.

burmanm · 2021-03-13T13:52:10Z

tests/integration/steps/integration_steps.go

+	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
+)
+
+var namespace = "k8ssandra"


It gets changed afterwards. I'll rather not set a value since it won't be used, which is the misleading part here.

It's a global parameter? Ugh.

burmanm · 2021-03-13T13:53:23Z

tests/integration/steps/integration_steps.go

+
+	// Wait for cass-operator pod to be ready
+	attempts := 0
+	maxAttempts := 10


Why not for i := 0; i < maxAttempts; i++ { and exit if done?

Also, if using gomega, you should use Eventually func. And --wait should not exit at all, so why multiple attempts?

Actually, why on earth this part at all?

why on earth this part at all?

Please be welcoming and positive in your comments. No project wants contributors to feel offended.
If you want to teach me something, which is fine as I'm eager to learn, do it nicely. Thanks.

Eventually is a great alternative that I didn't know of indeed. It'll make my clunky code much cleaner and efficient.

No, what I mean is that why do we have a part that repeats wait? Why not just set a longer timeout?

This includes any unnecessary use of Eventually also.

There are bits where multiple attempts are needed and we don't have a blocking call such as kubectl wait.
In order to remove the magic sleep during restore, I'm waiting on cassandra operator progress to change to "Updating" and then "Ready" for example, which requires to repeat the test as long as we don't reach the expected value.
I'll push another commit soon with these changes and you can see if that matches your expectations.

Why would cass-operator require that?

Isn't

k8s.RunKubectl(GinkgoT(), kubectlOptions, "wait", "--for=condition=Ready", "cassandradatacenter/dc1", "--timeout=600s")

doing exactly that? Waiting for the condition to change to Ready?

Actually not, because when restoring a backup, the cassandradatacenter already has the Ready condition but then gets updated by cass-operator.
Just checking for the Ready condition made the wait command to return instantly, without giving time for the Cassandra pods to be recreated and failed afterwards when trying to access Cassandra using cqlsh.
That's why I initially added the sleep...

So checking first for a state change (in this case detect that cass-operator is updating the cassandradatacenter) and then wait for the dc to be ready allows to remove the sleep and follow the chain of events properly.

burmanm · 2021-03-13T13:57:07Z

tests/integration/steps/integration_steps.go

+
+func WaitForPodWithLabelToBeReady(label string, waitTime time.Duration, maxAttempts int) {
+	kubectlOptions := k8s.NewKubectlOptions("", "", namespace)
+	attempts := 0


This whole function should be the last command and only last command.

Yup, moving all these retry based checks to use Eventually now.

burmanm · 2021-03-13T13:57:59Z

tests/integration/steps/integration_steps.go

+}
+
+var (
+	Info    = Yellow


Are these supposed to be mutable strings or enums or what?

Copy/pasted this from a webpage after searching for ways to print colored log lines.
I'd go with enums here as they don't need to mutate.
I guess I should use const instead then, right?

With enums you would use iota.

burmanm · 2021-03-13T14:01:12Z

tests/integration/steps/integration_steps.go

+	}
+	helm.Install(GinkgoT(), helmOptions, restoreChartPath, "restore-test")
+	// Give a little time for the cassandraDatacenter resource to be recreated
+	time.Sleep(60 * time.Second)


I don't like hardcoded sleeps, they smell like ducttape.

Agreed, using Eventually should remove the need for this.

burmanm · 2021-03-13T14:01:35Z

tests/integration/steps/integration_steps.go

+	attempts := 0
+	for {
+		attempts++
+		response, err := restClient.R().Get("http://repair.localhost:8080/cluster")


We have reaperClient in https://github.com/k8ssandra/reaper-client-go which you can use.

And if it lacks some feature you need for testing, add there.

Neat, I didn't know about this client. Will definitely use it instead.

burmanm · 2021-03-13T14:03:01Z

tests/integration/steps/integration_steps.go

+			log.Println(fmt.Sprintf("The HTTP request failed with error %s", err))
+		} else {
+			data := response.Body()
+			log.Println(fmt.Sprintf("Reaper response: %s", data))


All of these should probably be log.Printf(..)

Very good point 👍

burmanm · 2021-03-13T14:04:07Z

All in all, I see no testing.T modifications. Ginkgo shouldn't be here.

jsanda

I made a first pass through. There is a lot to cover so I will have more feedback in follow up reviews.

Good stuff!

jsanda · 2021-03-13T16:41:15Z

tests/integration/medusa_minio_integration_test.go

+
+var _ = Describe("Medusa on MinIO", func() {
+	Context("Deploy K8ssandra with Medusa", func() {
+		It("Create a kind cluster and a namespace", func() {


I don't want the tests themselves to create the cluster. Doing so makes it more difficult to target different clusters. We definitely want the kind cluster setup, but it should be done separately. Tests should assume that there is a running cluster to which they have access.

Maybe we could make the code more versatile to handle different situations. Having tests handle cleanup/creation is very handy to avoid having to wrap executions in further scripts.

I agree that we need to be able to run integration tests on kind, k3d, minikube, etc... but also GKE, EKS, etc...

We also need to account for limited resources available when running these integration tests, which don't allow us to create several namespaces that would host different k8ssandra clusters. We need to do some cleanup. I've noticed that deleting namespaces could take a while, probably even longer than deleting a cluster with kind and recreating it, so I'm not sure what's the best way to handle this init phase.

Deleting namespace (or actually, uninstalling our installation first) is a important test though.

We also need to account for limited resources available when running these integration tests

We indeed do, but I also do not want to require someone to run kind to run the tests. I think it is perfectly reasonable to run against a GKE cluster for example. I would prefer a separate Makefile target for creating the kind cluster.

We need to do some cleanup. I've noticed that deleting namespaces could take a while, probably even longer than deleting a cluster with kind and recreating it, so I'm not sure what's the best way to handle this init phase.

Resource cleanup definitely needs to happen in the tests. Note that some resources are cluster scoped so just deleting the namespace for example is not sufficient.

jsanda · 2021-03-13T16:46:21Z

tests/integration/medusa_minio_integration_test.go

+			ICreateTheMedusaSecretInTheNamespaceApplyingTheFileStep("secret/medusa_minio_secret.yaml")
+			ICanSeeTheSecretInTheListOfSecretsInTheNamespaceStep("medusa-bucket-key")
+		})
+		It("Install K8ssandra with Medusa on MinIO", func() {


There is a problem with the test structure having all of these separate specs. If one spec fails, the subsequent specs are most likely going to fail as well. Then they just become noise and make the test run longer than it needs to be.

The easiest way to do it with Ginkgo is to combine all of these specs and use the By function to provide additional context. The text you pass to By is logged as well.

Actually, I consider that such e2e/integration/functional tests should fail as soon as one step fails. It's, IMO, how BDD testing should be. One step expects the previous steps to have succeeded, otherwise we're not in the state we expect to be.
If I can't get the cassandradatacenter resource to be ready, then there's no point in trying to access Cassandra or run a repair.

What you describe is more in line with how unit testing should be done where you can reduce the feedback loop by being able to see all failing tests within a single run.

Yes, but that's not what happens. The It parts are ran by ginkgo even if previous one fails.

I actually use Ginkgo's failFast flag in CI (and when running the tests locally) to prevent subsequent tests from running after a failure.

Here's the exact command used to run the MinIO tests for example:
go test . -timeout 30m -ginkgo.failFast --ginkgo.trace --ginkgo.progress --ginkgo.slowSpecThreshold=900 --ginkgo.v --ginkgo.focus="Medusa on MinIO" -v -p 1

I have not used -ginkgo.failFast before. That definitely helps. I wonder if that applies across the whole test suite or just within the current container. I hope the latter. I'll check that out.

If I can't get the cassandradatacenter resource to be ready, then there's no point in trying to access Cassandra or run a repair.

Absolutely. Once the CassandraDatacenter is ready though, we can perform multiple tests/verifications independently and possibly in parallel. For example, the C* cluster should be automatically registered in Reaper. If that check fails, it should not prevent us from checking that Stargate is deployed.

I just ran the following using -ginkgo.failFast:

var _ = FDescribe("A fake test", func() { Context("running multiple tests in context 1", func() { It("test 1", func() { Expect(1).To(Equal(11)) }) It("test 2", func() { Expect(1).To(Equal(2)) }) }) Context("running multiple tests in context 2", func() { It("test 3", func() { Expect(1).To(Equal(3)) }) It("test 4", func() { Expect(1).To(Equal(3)) }) }) })

Here is the output:

• Failure [0.000 seconds] A fake test /Users/jsanda/Development/k8ssandra/k8ssandra/tests/unit/fake_test.go:8 running multiple tests in context 1 /Users/jsanda/Development/k8ssandra/k8ssandra/tests/unit/fake_test.go:9 test 1 [It] /Users/jsanda/Development/k8ssandra/k8ssandra/tests/unit/fake_test.go:10 Expected <int>: 1 to equal <int>: 11 /Users/jsanda/Development/k8ssandra/k8ssandra/tests/unit/fake_test.go:11 ------------------------------

It appears to apply across the entire test suite vs only the parent container.

jsanda · 2021-03-13T16:50:08Z

tests/integration/steps/integration_steps.go

+	return string(outb.String())
+}
+
+func GetServicesWithLabel(label string) (*v1.ServiceList, error) {


I propose we create a basic Go standards guide to be on the same page.

Effective Go should definitely be the starting point.

jsanda · 2021-03-13T16:55:02Z

tests/integration/steps/integration_steps.go

+
+func GetServicesWithLabel(label string) (*v1.ServiceList, error) {
+	kubectlOptions := k8s.NewKubectlOptions("", "", namespace)
+	clientset, err := k8s.GetKubernetesClientFromOptionsE(GinkgoT(), kubectlOptions)


We have multiple options when it comes to which clients to use to access the api server. I have a strong preference for using the controller-runtime client. It is easy to use with a pretty small API. More importantly, it is the same client that we use extensively in operator code. You will see controller-runtime used all throughout cass-operator, reaper-operator, medusa-operator, and most operator projects. For an example of its usage in tests take a look at https://github.com/k8ssandra/reaper-operator/tree/master/test.

jsanda · 2021-03-13T16:58:52Z

tests/integration/steps/integration_steps.go

+	}
+
+	// Wait for CassandraDatacenter to be ready..
+	k8s.RunKubectl(GinkgoT(), kubectlOptions, "wait", "--for=condition=Ready", "cassandradatacenter/dc1", "--timeout=1800s")


The wait package from https://github.com/kubernetes/apimachinery is typically used in e2e tests for polling. See https://github.com/k8ssandra/reaper-operator/blob/master/test/framework/framework.go#L129 for an example.

Your suggested alternative looks more verbose, unless I'm missing something.
Could you elaborate on how using apimachinery is better than relying on kubectl wait?
Is it because you'd like to remove Terratest?

Using wait or Eventually is definitely more verbose but it allows us to log updates which I find helpful when polling for a long time. The particular example I referenced unfortunately does not do any logging, but this one does.

jsanda · 2021-03-13T17:03:01Z

tests/integration/steps/integration_steps.go

+	cqlCredentials := GetUsernamePassword("k8ssandra-superuser", namespace)
+	kubectlOptions := k8s.NewKubectlOptions("", "", namespace)
+	// Get reaper service
+	output, _ := k8s.RunKubectlAndGetOutputE(GinkgoT(), kubectlOptions, "exec", "-it", "k8ssandra-dc1-default-sts-0", "--", "/opt/cassandra/bin/cqlsh", "--username", cqlCredentials.username, "--password", cqlCredentials.password, "-e", query)


The exec command is interesting. The api server provides a REST API. exec is a subresource of Pod. While I do want to use controller-runtime, it doesn't provide support for making exec requests. We would need to use client-go directly. That would be my preference over terratest but something can tackle later. :)

Sounds good 👍

jeffbanks

Given our test direction discussions today, I'm guessing this will be migrated into a different form w/out Ginkgo.

adejanovski · 2021-03-23T11:46:16Z

closing in favor of #544

adejanovski requested review from jsanda, burmanm, emerkle826, jdonenine and jeffbanks March 13, 2021 08:14

adejanovski force-pushed the alex/add_integration_tests_ci branch 2 times, most recently from 74d12aa to eb3c168 Compare March 13, 2021 09:42

Run the integration tests using Ginkgo and GHA

f20da62

Use a Matrix to test different features in parallel Full Medusa test with backup and restore using MinIO and S3 Full Reaper test scenario

adejanovski force-pushed the alex/add_integration_tests_ci branch from eb3c168 to f20da62 Compare March 13, 2021 09:59

burmanm reviewed Mar 13, 2021

View reviewed changes

jsanda requested changes Mar 13, 2021

View reviewed changes

jeffbanks requested changes Mar 15, 2021

View reviewed changes

adejanovski closed this Mar 23, 2021

		@@ -0,0 +1,12 @@
		module github.com/k8ssandra/k8ssandra/tests/integration

Run the integration tests using Ginkgo and GHA #536

Run the integration tests using Ginkgo and GHA #536

Conversation

adejanovski commented Mar 13, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adejanovski Mar 15, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adejanovski Mar 15, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

burmanm Mar 13, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

burmanm commented Mar 13, 2021

jsanda left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adejanovski Mar 15, 2021 •

edited

adejanovski Mar 15, 2021 •

edited

burmanm Mar 13, 2021 •

edited