System test runner MVP #64

ycombinator · 2020-08-18T22:40:12Z

This PR fleshes out the implementation of the system tests runner.

Initially it focusses on packages whose services can be spun up using Docker Compose but it has a provision to be extended later to include other types of services, e.g. cloud integration packages.

Configuring the package/dataset for a system test

Every package must define how it's service must be spun up and torn down, by placing the appropriate files in the _dev folder located at the package's root folder.
- Currently, only Docker Compose-based services are supported.
- For this, a _dev/docker-compose.yml file must be created. This file may define as many services as needed but the main one is determined by a naming convention: the name of the main service in the docker-compose.yml file must be the same name as the package.
Additionally, every package dataset that wants to be system-tested must define a system test configuration file, config.yml, located in the _dev/test/system folder at the dataset's root folder.

For example, see this PR setting up the above files for the apache package and its datasets.

Running the system test

First, we must spin up the Elastic Stack and export environment variables.

$ elastic-package stack up --version 8.0.0-SNAPSHOT -d
$ eval $(elastic-package stack shellinit)

Then we can run the system test on a package (or specific dataset(s) within a package with the optional --dataset flag).

$ cd /path/to/some/package/root/folder
$ elastic-package test system

The above command will perform the following steps under the hood:

It will find the _dev folder at the package root and look inside it to decide how the service for this package must be spun up (and later torn down).
- Docker Compose-based services are started up as Docker containers on the same network as the Elastic Stack spun up with elastic-package stack up.
The service is spun up and certain context variables are registered. These variables become available to every dataset's test configuration file. Currently, the following context variables are available:
- {{ Service.STDOUT }}: Available for Docker Compose-based services. Its value will be a file path available in the Agent container. This file will contain the contents of the service's STDOUT stream. It is useful for gathering logs from Docker Compose-based services as they write their logs to STDOUT by default.
- {{ Service.Ports.n.InternalPort }}: Available for Docker-Compose based services. Its value will be the internal port available on the Docker network shared between the service container(s) and Elastic Stack containers. Note that the n here is a 0-based index, to allow for a service to listen on multiple internal ports.
- {{ Service.Port.InternalPort }}: Available for Docker-Compose based services. It is simply a convenient alias for {{ Service.Ports.0.InternalPort }} as most Docker-Compose based services will only listen on a single internal port.
- {{ Service.Hostname }}: Host name of service, as addressable from Agent container.
The dataset's system test configuration file is loaded into memory. Any context variables are substituted with their corresponding values. All other dataset and package variables omitted from the system test configuration file are populated from the package and dataset manifest files, respectively.
A policy is created for the dataset being tested in Ingest Manager. The policy is attached to the test Agent that is already spun up as part of the Elastic Stack and enrolled with Fleet. Within a minute or so the policy is sent to the Agent by Fleet.
Agent applies the policy and should start indexing data into the appropriate data stream(s) in Elasticsearch.
The system test runner checks if the expected data streams exist and have data (documents) in them.

ycombinator · 2020-08-18T22:42:19Z

internal/testrunner/runners/system/system.go

+	}
+
+	// Step 2. Setup test cluster (ES + Kibana + Agent).
+	// Step 2a. Register Agent with Fleet.


@mtojek I would like your advice here: what would be the best way to accomplish these two steps (2. and 2a.) here?

Step 2a:

It depends where the agent will be spawned. If it runs in a container then the enrollment should be performed automatically (it's in agent's Dockerfile).
I believe you have to interact with cluster to attach new container with observed product (like mysql or nginx) to the existing network, and detach it after tests.

Step 2:

I think you should assume that the cluster is already up and running, and the Agent in subscribed. You can pick up configuration options through environment variables (shellinit). I assume that the Agent is long-running and we just switch configuration inside.

I think you should assume that the cluster is already up and running, and the Agent in subscribed. You can pick up configuration options through environment variables (shellinit). I assume that the Agent is long-running and we just switch configuration inside.

This is the part I am most interested in. Is there some way I can ensure that the cluster is up, rather than assuming it is? Are there any functions/methods I could call from here to make that happen idempotently?

I believe you have to interact with cluster to attach new container with observed product (like mysql or nginx) to the existing network, and detach it after tests.

Cool, this is along the lines of what I was hoping for, along with some bind mounting hackery for logs (since we don't have a good story for that yet with Agent in a container environment). For this I would need to know the name of the network. Is there some function/method I could call from here to get that (related to the previous paragraph, I suppose)?

This is the part I am most interested in. Is there some way I can ensure that the cluster is up, rather than assuming it is? Are there any functions/methods I could call from here to make that happen idempotently?

I think it' s cluster.BootUp(), the only issue we need to figure out is waiting for Kibana. I believe you start with the assumption that the cluster is up and fail in case it's not.

For this I would need to know the name of the network. Is there some function/method I could call from here to get that (related to the previous paragraph, I suppose)?

There is no such function at the moment, as it wasn't required, but I suppose we can enforce any name store it in .env file in .elastic-package.

elasticmachine · 2020-08-18T22:42:34Z

💚 Build Succeeded

Expand to view the summary

Build stats

Build Cause: [Pull request #64 updated]
Start Time: 2020-09-09T01:45:20.611+0000
Duration: 8 min 30 sec

ycombinator · 2020-08-19T17:52:49Z

@mtojek I just built elastic-package from this PR and tried running elastic-package test system from the apache package folder. I got this error:

$ elastic-package test system
system run /Users/shaunak/development/github/integrations/packages/apache/dataset/access/_dev/test/system
Custom build packages directory found: /Users/shaunak/development/github/integrations/build/integrations
Building package-registry
Step 1/3 : FROM docker.elastic.co/package-registry/distribution:snapshot
ERROR: Service 'package-registry' failed to build: Get https://docker.elastic.co/v2/package-registry/distribution/manifests/snapshot: unauthorized: authentication required
Error: error running package system tests: could not setup test cluster: building docker images failed: running command failed: exit status 1

Any idea what causes this / how a user of elastic-package is expected to authenticate?

ycombinator · 2020-08-19T18:21:53Z

I was able to resolve the issue in my previous comment. It was a one-off issue having to do with some updates to the Elastic Docker Registry which required a re-auth, so nothing we need to worry about in general with elastic-package users.

mtojek · 2020-08-25T10:33:56Z

internal/testrunner/testrunner.go

@@ -55,12 +76,12 @@ func RegisterRunner(testType TestType, runFunc RunFunc) {
 }

 // Run method delegates execution to the registered test runner, based on the test type.
-func Run(testType TestType, testFolderPath string) error {
+func Run(testType TestType, testFolder TestFolder, packageRootPath string) error {


I think we can sync up both branches.

I introduced a similar concept of TestOptions: https://github.com/elastic/elastic-package/blob/c3fd692e9923f4160b11b3eac569b76014552934/internal/testrunner/testrunner.go to easily add more arguments here.

mtojek · 2020-08-25T10:38:40Z

internal/testrunner/runners/system/system.go

+		return errors.Wrap(err, "could not create service runner")
+	}
+
+	ctxt := common.MapStr{}


I think it's still early and there is no implementation using MapStr. Do you plan to use the MapStr to flatten JSON object representation? If so, I wonder if standars encoding/json is sufficient.

Also, this is the reason why I don't flatten the expected results in the pipeline tests. Maybe I should introduce it there.

I'm not 100% sure yet how this usage is going to evolve. My current thinking is to use ctxt to pass some arbitrary key-value pair data between the various steps of testing but this is something I'm experimenting with.

For test results, since those are consistently going to be present for each test run, I probably wouldn't use ctxt but something else. I haven't gotten that far yet so I'm not sure yet.

mtojek · 2020-08-25T10:40:03Z

internal/kibana/ingestmanager/client.go

@@ -0,0 +1,16 @@
+package ingestmanager
+
+type Client struct {


I used a dedicated client for Elasticsearch. Maybe there is something similar for Kibana?

I have not found a reliable one so far. I suspect, over time, as Kibana APIs become more "public" we will see one emerge. At that point we can use it here.

BTW, looks like there is some community interest in this too: elastic/go-elasticsearch#182

Oh, nice :)

mtojek · 2020-08-25T10:43:39Z

internal/testrunner/runners/system/system.go

 	return nil
 }

-func init() {
-	testrunner.RegisterRunner(TestType, Run)
+func getStackSettingsFromEnv() stackSettings {


I exposed environment variables here: https://github.com/elastic/elastic-package/blob/c3fd692e9923f4160b11b3eac569b76014552934/internal/elasticsearch/client.go Does it make sense to use them here?

I'm open to it but I'm not sure how to do that. Can you elaborate what should change here?

I suppose it would be merging the other PR and adjust the implementation here to use environment variables defined in a single place (e.g. close to the stack).

Okay, I plan to bring the changes from the other PR into this one today. Let's see how things look at that point.

mtojek

Yeah, I like the way it looks now, my only concern is related to the place where the context is stored.

Also, it seems that CI failed with:

[2020-09-01T23:15:55.736Z] + elastic-package stack up -d
[2020-09-01T23:15:55.736Z] elastic-package has been installed.
[2020-09-01T23:15:55.736Z] Boot up the Elastic stack
[2020-09-01T23:19:17.373Z] Error: booting up the stack failed: running docker-compose failed: running command failed: running Docker Compose up command failed: exit status 1

For debugging purposes, you can enable verbose mode in this test (currently it doesn't say too much about the root cause).

Once this PR is merged, we can focus on creating followup issues and prioritize them.

mtojek · 2020-09-07T07:29:09Z

internal/common/context.go

+// ServiceContext encapsulates context that is both available to a ServiceDeployer and
+// populated by a DeployedService. The fields in ServiceContext may be used in handlebars
+// templates in system test configuration files, for example: {{ Hostname }}.
+type ServiceContext struct {


This looks definitely much better and clear.

Not sure why is it in the common package, but probably will understand it later while reviewing.

I wonder if the ServiceContext implementation should be stored close to the ServiceDeployer implementation, in the servicedeployer package. If the package name doesn't fit well, what do you think about deployment or something similar.

I agree! I think I originally had it in the servicedeployer package but then ran into some import cycle issues, which is when I moved it out into common. But I just moved it back into servicedeployer now (via 65bf419) and I was able to build elastic-package. 🤷

mtojek · 2020-09-07T07:57:25Z

internal/testrunner/runners/system/servicedeployer/deployed_service.go

+	TearDown() error
+
+	// GetContext returns the current context from the service.
+	GetContext() common.ServiceContext


nit: GetContext could be renamed to Context

I like the parallel with SetContext so I would prefer to leave it as GetContext.

Sure, let's leave it :)

For reference - I was referring to https://golang.org/doc/effective_go.html#Getters

Aha! Thanks for the reference. I will rename it :)

Fixed in 8c1c071.

…uration file

ycombinator · 2020-09-08T21:27:56Z

CI failure should be fixed by 326120e.

mtojek · 2020-09-08T21:28:54Z

CI failure should be fixed by 326120e.

Out of curiosity, why did it fail?

ycombinator · 2020-09-08T21:31:06Z

Out of curiosity, why did it fail?

Before 326120e, we were using 7.9.0 as the default version of the stack. In this PR we introduce the Elastic Agent in the Docker Compose file. Elastic Agent 7.9.0's Docker image did not have the ability to set the --insecure flag when enrolling with Fleet (Kibana). This ability was added in 7.9.1 (elastic/beats#20713), so I needed to bump up the default version to 7.9.1.

mtojek · 2020-09-08T21:35:41Z

Clear now! Thanks for explanation.

ycombinator · 2020-09-08T21:38:42Z

There is one more issue we need to deal with w.r.t. the default stack version and this PR. As of now in this PR the default stack version is set to 7.9.1. However, this PR calls Kibana APIs that are only available starting 7.10.0 — actually these APIs are available in 7.9.1 as well but they use different identifiers in that version, e.g. "config" instead of "policy".

So what should we do about this? We could bump the default stack version to 7.10.0-SNAPSHOT or we could rename the API identifiers to the old ones. Personally I lean towards the former option as we know users are unlikely to use elastic-package with 7.9.1 of the stack. What is your preference, @mtojek?

mtojek

So what should we do about this? We could bump the default stack version to 7.10.0-SNAPSHOT or we could rename the API identifiers to the old ones. Personally I lean towards the former option as we know users are unlikely to use elastic-package with 7.9.0 of the stack. What is your preference, @mtojek?

I'm for the option that will be less painful for developers (elastic-package users), so let's bump it up to the 7.10.0-SNAPSHOT.

Anyway, I think is in the good shape to be merged. I left a comment about passing options, but you can postpone or ignore it :) LGTM!

mtojek · 2020-09-08T21:39:24Z

internal/compose/compose.go

@@ -67,7 +136,7 @@ func (p *Project) Down(opts CommandOptions) error {
 	args = append(args, "down")
 	args = append(args, opts.ExtraArgs...)

-	if err := runDockerComposeCmd(args, opts.Env); err != nil {
+	if err := p.runDockerComposeCmd(args, opts.Env, nil); err != nil {


I'm admit I'm allergic to passing nil values as it tends to evolve into WinAPI "NULL" syndrome: call(a, b, 1, 1, NULL, NULL, NULL, NULL, NULL, true, NULL, 1, NULL) :)

I think you can repack all these args into dockerComposeOptions? Less refactoring in the future.

Done in e0f21d7.

mtojek · 2020-09-08T21:45:47Z

@ycombinator let's get it merge and see in the wild how does it operate on real packages.

ycombinator · 2020-09-08T21:52:07Z

Anyway, I think is in the good shape to be merged.

Thanks. There are quite a few suggestions/ideas in this PR that need follow up PRs:

Volume mounting - [System test runner] Use volume mounts for logs #85
Observing logs of Agent and underlying Beats while testing - [System test runner] Allow access to Agent and underlying Beats logs when testing #86
Calling the package installation API?- [System test runner] Install package? #87
Deleting old test assets before running a test?- [System test runner] Cleanup old test assets before each run? #88

I will file issues for each of these after I merge this PR and update this comment with references.

mtojek · 2020-09-08T21:53:45Z

Works for me! We can easily share them.

ycombinator · 2020-09-08T22:20:06Z

Investigating the latest CI failure...

ycombinator · 2020-09-09T01:46:36Z

Updated the health check for the Elastic Agent service in d1886e7 to check for the expected log entry across multiple log files, in case the logs had rotated before the health check started running. This should fix CI.

ycombinator commented Aug 18, 2020

View reviewed changes

ycombinator changed the title ~~Fleshing out system test runner a bit~~ [WIP] Fleshing out system test runner a bit Aug 18, 2020

ycombinator force-pushed the system-test-runner branch from 8aae4b3 to 7691a9c Compare August 19, 2020 17:50

ycombinator force-pushed the system-test-runner branch 2 times, most recently from 9b39e20 to 53ce3f7 Compare August 22, 2020 00:06

mtojek reviewed Aug 25, 2020

View reviewed changes

ycombinator force-pushed the system-test-runner branch from c875bed to d92ea48 Compare August 25, 2020 18:50

ycombinator mentioned this pull request Aug 28, 2020

Refactoring: extracting out docker compose API #80

Merged

ycombinator force-pushed the system-test-runner branch 2 times, most recently from 87a5fd8 to d338494 Compare August 31, 2020 22:08

ycombinator marked this pull request as ready for review September 1, 2020 06:52

ycombinator added 16 commits August 31, 2020 23:52

Unexporting const

106cea9

Fleshing out system test runner a bit

f26bf88

Adding godoc and TODOs

a787cd1

Exporting Ctx

cf2a174

Try to boot up test cluster

3d85661

Using logger

c527c75

WIP

05c7487

WIP: rename service to servicedeployer

3ecb0d4

WIP: start talking to Kibana APIs

8d86523

Fixing compile errors

3d34177

Delete policy when test ends

8508671

WIP: Adding idea TODOs

07a0d35

More WIP TODO

44acacf

Fixing syntax error

1c5491b

WIP: Fleshing out Kibana API call for adding package to policy

1cdb3c3

Fixing compile errors

f28f6c5

mtojek reviewed Sep 7, 2020

View reviewed changes

ycombinator added 5 commits September 8, 2020 16:38

Moving service context to servicedeployer package

65bf419

Read ES and Kibana hostnames from Elastic Stack Docker Compose config…

99c3f4a

…uration file

Adding test timestamp to policy name

9fde2df

Bumping up default version of Elastic Stack to latest released version

326120e

Renaming method per language idiom

8c1c071

ycombinator added 2 commits September 8, 2020 17:40

Fixing comment that I missed earlier

870e3bd

Prefixing shellinit hostnames with http://

1fccf0e

mtojek approved these changes Sep 8, 2020

View reviewed changes

ycombinator added 2 commits September 8, 2020 18:01

Refactoring: introducing dockerComposeOptions struct

e0f21d7

Bumping up default stack version to 7.10.0-SNAPSHOT

8c56b7a

Updating health check command for Elastic Agent

d1886e7

ycombinator merged commit e62cc86 into elastic:master Sep 9, 2020

ycombinator deleted the system-test-runner branch September 9, 2020 01:54

This was referenced Sep 9, 2020

[System test runner] Add more service deployers #89

Closed

[System test runner] Assert that fields in event document are in fields.yml #90

Closed

System test runner MVP #64

System test runner MVP #64

Conversation

ycombinator commented Aug 18, 2020 • edited Loading

Configuring the package/dataset for a system test

Running the system test

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ycombinator Aug 19, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elasticmachine commented Aug 18, 2020 • edited Loading

💚 Build Succeeded

Build stats

ycombinator commented Aug 19, 2020

ycombinator commented Aug 19, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mtojek left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ycombinator Sep 8, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ycombinator commented Sep 8, 2020

mtojek commented Sep 8, 2020

ycombinator commented Sep 8, 2020 • edited Loading

mtojek commented Sep 8, 2020

ycombinator commented Sep 8, 2020 • edited Loading

mtojek left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mtojek commented Sep 8, 2020

ycombinator commented Sep 8, 2020 • edited Loading

mtojek commented Sep 8, 2020

ycombinator commented Sep 8, 2020

ycombinator commented Sep 9, 2020

ycombinator commented Aug 18, 2020 •

edited

Loading

ycombinator Aug 19, 2020 •

edited

Loading

elasticmachine commented Aug 18, 2020 •

edited

Loading

ycombinator commented Aug 19, 2020 •

edited

Loading

ycombinator Sep 8, 2020 •

edited

Loading

ycombinator commented Sep 8, 2020 •

edited

Loading

ycombinator commented Sep 8, 2020 •

edited

Loading

ycombinator commented Sep 8, 2020 •

edited

Loading