Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System test runner MVP #64

Merged
merged 57 commits into from
Sep 9, 2020
Merged

Conversation

ycombinator
Copy link
Contributor

@ycombinator ycombinator commented Aug 18, 2020

This PR fleshes out the implementation of the system tests runner.

Initially it focusses on packages whose services can be spun up using Docker Compose but it has a provision to be extended later to include other types of services, e.g. cloud integration packages.

Configuring the package/dataset for a system test

  1. Every package must define how it's service must be spun up and torn down, by placing the appropriate files in the _dev folder located at the package's root folder.

    • Currently, only Docker Compose-based services are supported.
    • For this, a _dev/docker-compose.yml file must be created. This file may define as many services as needed but the main one is determined by a naming convention: the name of the main service in the docker-compose.yml file must be the same name as the package.
  2. Additionally, every package dataset that wants to be system-tested must define a system test configuration file, config.yml, located in the _dev/test/system folder at the dataset's root folder.

For example, see this PR setting up the above files for the apache package and its datasets.

Running the system test

First, we must spin up the Elastic Stack and export environment variables.

$ elastic-package stack up --version 8.0.0-SNAPSHOT -d
$ eval $(elastic-package stack shellinit)

Then we can run the system test on a package (or specific dataset(s) within a package with the optional --dataset flag).

$ cd /path/to/some/package/root/folder
$ elastic-package test system

The above command will perform the following steps under the hood:

  1. It will find the _dev folder at the package root and look inside it to decide how the service for this package must be spun up (and later torn down).

    • Docker Compose-based services are started up as Docker containers on the same network as the Elastic Stack spun up with elastic-package stack up.
  2. The service is spun up and certain context variables are registered. These variables become available to every dataset's test configuration file. Currently, the following context variables are available:

    • {{ Service.STDOUT }}: Available for Docker Compose-based services. Its value will be a file path available in the Agent container. This file will contain the contents of the service's STDOUT stream. It is useful for gathering logs from Docker Compose-based services as they write their logs to STDOUT by default.
    • {{ Service.Ports.n.InternalPort }}: Available for Docker-Compose based services. Its value will be the internal port available on the Docker network shared between the service container(s) and Elastic Stack containers. Note that the n here is a 0-based index, to allow for a service to listen on multiple internal ports.
    • {{ Service.Port.InternalPort }}: Available for Docker-Compose based services. It is simply a convenient alias for {{ Service.Ports.0.InternalPort }} as most Docker-Compose based services will only listen on a single internal port.
    • {{ Service.Hostname }}: Host name of service, as addressable from Agent container.
  3. The dataset's system test configuration file is loaded into memory. Any context variables are substituted with their corresponding values. All other dataset and package variables omitted from the system test configuration file are populated from the package and dataset manifest files, respectively.

  4. A policy is created for the dataset being tested in Ingest Manager. The policy is attached to the test Agent that is already spun up as part of the Elastic Stack and enrolled with Fleet. Within a minute or so the policy is sent to the Agent by Fleet.

  5. Agent applies the policy and should start indexing data into the appropriate data stream(s) in Elasticsearch.

  6. The system test runner checks if the expected data streams exist and have data (documents) in them.

}

// Step 2. Setup test cluster (ES + Kibana + Agent).
// Step 2a. Register Agent with Fleet.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtojek I would like your advice here: what would be the best way to accomplish these two steps (2. and 2a.) here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Step 2a:

It depends where the agent will be spawned. If it runs in a container then the enrollment should be performed automatically (it's in agent's Dockerfile).
I believe you have to interact with cluster to attach new container with observed product (like mysql or nginx) to the existing network, and detach it after tests.

Step 2:

I think you should assume that the cluster is already up and running, and the Agent in subscribed. You can pick up configuration options through environment variables (shellinit). I assume that the Agent is long-running and we just switch configuration inside.

Copy link
Contributor Author

@ycombinator ycombinator Aug 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should assume that the cluster is already up and running, and the Agent in subscribed. You can pick up configuration options through environment variables (shellinit). I assume that the Agent is long-running and we just switch configuration inside.

This is the part I am most interested in. Is there some way I can ensure that the cluster is up, rather than assuming it is? Are there any functions/methods I could call from here to make that happen idempotently?

I believe you have to interact with cluster to attach new container with observed product (like mysql or nginx) to the existing network, and detach it after tests.

Cool, this is along the lines of what I was hoping for, along with some bind mounting hackery for logs (since we don't have a good story for that yet with Agent in a container environment). For this I would need to know the name of the network. Is there some function/method I could call from here to get that (related to the previous paragraph, I suppose)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the part I am most interested in. Is there some way I can ensure that the cluster is up, rather than assuming it is? Are there any functions/methods I could call from here to make that happen idempotently?

I think it' s cluster.BootUp(), the only issue we need to figure out is waiting for Kibana. I believe you start with the assumption that the cluster is up and fail in case it's not.

For this I would need to know the name of the network. Is there some function/method I could call from here to get that (related to the previous paragraph, I suppose)?

There is no such function at the moment, as it wasn't required, but I suppose we can enforce any name store it in .env file in .elastic-package.

@elasticmachine
Copy link
Collaborator

elasticmachine commented Aug 18, 2020

💚 Build Succeeded

Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: [Pull request #64 updated]

  • Start Time: 2020-09-09T01:45:20.611+0000

  • Duration: 8 min 30 sec

@ycombinator ycombinator changed the title Fleshing out system test runner a bit [WIP] Fleshing out system test runner a bit Aug 18, 2020
@ycombinator
Copy link
Contributor Author

@mtojek I just built elastic-package from this PR and tried running elastic-package test system from the apache package folder. I got this error:

$ elastic-package test system
system run /Users/shaunak/development/github/integrations/packages/apache/dataset/access/_dev/test/system
Custom build packages directory found: /Users/shaunak/development/github/integrations/build/integrations
Building package-registry
Step 1/3 : FROM docker.elastic.co/package-registry/distribution:snapshot
ERROR: Service 'package-registry' failed to build: Get https://docker.elastic.co/v2/package-registry/distribution/manifests/snapshot: unauthorized: authentication required
Error: error running package system tests: could not setup test cluster: building docker images failed: running command failed: exit status 1

Any idea what causes this / how a user of elastic-package is expected to authenticate?

@ycombinator
Copy link
Contributor Author

ycombinator commented Aug 19, 2020

I was able to resolve the issue in my previous comment. It was a one-off issue having to do with some updates to the Elastic Docker Registry which required a re-auth, so nothing we need to worry about in general with elastic-package users.

@ycombinator ycombinator force-pushed the system-test-runner branch 2 times, most recently from 9b39e20 to 53ce3f7 Compare August 22, 2020 00:06
@@ -55,12 +76,12 @@ func RegisterRunner(testType TestType, runFunc RunFunc) {
}

// Run method delegates execution to the registered test runner, based on the test type.
func Run(testType TestType, testFolderPath string) error {
func Run(testType TestType, testFolder TestFolder, packageRootPath string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can sync up both branches.

I introduced a similar concept of TestOptions: https://github.com/elastic/elastic-package/blob/c3fd692e9923f4160b11b3eac569b76014552934/internal/testrunner/testrunner.go to easily add more arguments here.

return errors.Wrap(err, "could not create service runner")
}

ctxt := common.MapStr{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's still early and there is no implementation using MapStr. Do you plan to use the MapStr to flatten JSON object representation? If so, I wonder if standars encoding/json is sufficient.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this is the reason why I don't flatten the expected results in the pipeline tests. Maybe I should introduce it there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure yet how this usage is going to evolve. My current thinking is to use ctxt to pass some arbitrary key-value pair data between the various steps of testing but this is something I'm experimenting with.

For test results, since those are consistently going to be present for each test run, I probably wouldn't use ctxt but something else. I haven't gotten that far yet so I'm not sure yet.

@@ -0,0 +1,16 @@
package ingestmanager

type Client struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used a dedicated client for Elasticsearch. Maybe there is something similar for Kibana?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not found a reliable one so far. I suspect, over time, as Kibana APIs become more "public" we will see one emerge. At that point we can use it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, looks like there is some community interest in this too: elastic/go-elasticsearch#182

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, nice :)

return nil
}

func init() {
testrunner.RegisterRunner(TestType, Run)
func getStackSettingsFromEnv() stackSettings {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to it but I'm not sure how to do that. Can you elaborate what should change here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it would be merging the other PR and adjust the implementation here to use environment variables defined in a single place (e.g. close to the stack).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I plan to bring the changes from the other PR into this one today. Let's see how things look at that point.

@ycombinator ycombinator force-pushed the system-test-runner branch 2 times, most recently from 87a5fd8 to d338494 Compare August 31, 2020 22:08
@ycombinator ycombinator marked this pull request as ready for review September 1, 2020 06:52
Copy link
Contributor

@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I like the way it looks now, my only concern is related to the place where the context is stored.

Also, it seems that CI failed with:

[2020-09-01T23:15:55.736Z] + elastic-package stack up -d
[2020-09-01T23:15:55.736Z] elastic-package has been installed.
[2020-09-01T23:15:55.736Z] Boot up the Elastic stack
[2020-09-01T23:19:17.373Z] Error: booting up the stack failed: running docker-compose failed: running command failed: running Docker Compose up command failed: exit status 1

For debugging purposes, you can enable verbose mode in this test (currently it doesn't say too much about the root cause).

Once this PR is merged, we can focus on creating followup issues and prioritize them.

// ServiceContext encapsulates context that is both available to a ServiceDeployer and
// populated by a DeployedService. The fields in ServiceContext may be used in handlebars
// templates in system test configuration files, for example: {{ Hostname }}.
type ServiceContext struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks definitely much better and clear.

Not sure why is it in the common package, but probably will understand it later while reviewing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the ServiceContext implementation should be stored close to the ServiceDeployer implementation, in the servicedeployer package. If the package name doesn't fit well, what do you think about deployment or something similar.

Copy link
Contributor Author

@ycombinator ycombinator Sep 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree! I think I originally had it in the servicedeployer package but then ran into some import cycle issues, which is when I moved it out into common. But I just moved it back into servicedeployer now (via 65bf419) and I was able to build elastic-package. 🤷

TearDown() error

// GetContext returns the current context from the service.
GetContext() common.ServiceContext
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: GetContext could be renamed to Context

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the parallel with SetContext so I would prefer to leave it as GetContext.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let's leave it :)

For reference - I was referring to https://golang.org/doc/effective_go.html#Getters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha! Thanks for the reference. I will rename it :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 8c1c071.

@ycombinator
Copy link
Contributor Author

CI failure should be fixed by 326120e.

@mtojek
Copy link
Contributor

mtojek commented Sep 8, 2020

CI failure should be fixed by 326120e.

Out of curiosity, why did it fail?

@ycombinator
Copy link
Contributor Author

ycombinator commented Sep 8, 2020

Out of curiosity, why did it fail?

Before 326120e, we were using 7.9.0 as the default version of the stack. In this PR we introduce the Elastic Agent in the Docker Compose file. Elastic Agent 7.9.0's Docker image did not have the ability to set the --insecure flag when enrolling with Fleet (Kibana). This ability was added in 7.9.1 (elastic/beats#20713), so I needed to bump up the default version to 7.9.1.

@mtojek
Copy link
Contributor

mtojek commented Sep 8, 2020

Clear now! Thanks for explanation.

@ycombinator
Copy link
Contributor Author

ycombinator commented Sep 8, 2020

There is one more issue we need to deal with w.r.t. the default stack version and this PR. As of now in this PR the default stack version is set to 7.9.1. However, this PR calls Kibana APIs that are only available starting 7.10.0 — actually these APIs are available in 7.9.1 as well but they use different identifiers in that version, e.g. "config" instead of "policy".

So what should we do about this? We could bump the default stack version to 7.10.0-SNAPSHOT or we could rename the API identifiers to the old ones. Personally I lean towards the former option as we know users are unlikely to use elastic-package with 7.9.1 of the stack. What is your preference, @mtojek?

Copy link
Contributor

@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what should we do about this? We could bump the default stack version to 7.10.0-SNAPSHOT or we could rename the API identifiers to the old ones. Personally I lean towards the former option as we know users are unlikely to use elastic-package with 7.9.0 of the stack. What is your preference, @mtojek?

I'm for the option that will be less painful for developers (elastic-package users), so let's bump it up to the 7.10.0-SNAPSHOT.

Anyway, I think is in the good shape to be merged. I left a comment about passing options, but you can postpone or ignore it :) LGTM!

@@ -67,7 +136,7 @@ func (p *Project) Down(opts CommandOptions) error {
args = append(args, "down")
args = append(args, opts.ExtraArgs...)

if err := runDockerComposeCmd(args, opts.Env); err != nil {
if err := p.runDockerComposeCmd(args, opts.Env, nil); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm admit I'm allergic to passing nil values as it tends to evolve into WinAPI "NULL" syndrome: call(a, b, 1, 1, NULL, NULL, NULL, NULL, NULL, true, NULL, 1, NULL) :)

I think you can repack all these args into dockerComposeOptions? Less refactoring in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in e0f21d7.

@mtojek
Copy link
Contributor

mtojek commented Sep 8, 2020

@ycombinator let's get it merge and see in the wild how does it operate on real packages.

@ycombinator
Copy link
Contributor Author

ycombinator commented Sep 8, 2020

Anyway, I think is in the good shape to be merged.

Thanks. There are quite a few suggestions/ideas in this PR that need follow up PRs:

I will file issues for each of these after I merge this PR and update this comment with references.

@mtojek
Copy link
Contributor

mtojek commented Sep 8, 2020

Works for me! We can easily share them.

@ycombinator
Copy link
Contributor Author

Investigating the latest CI failure...

@ycombinator
Copy link
Contributor Author

Updated the health check for the Elastic Agent service in d1886e7 to check for the expected log entry across multiple log files, in case the logs had rotated before the health check started running. This should fix CI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants