Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New process and guidelines for integration tests (replaces demo-project and homepage_sample_code tests in GitLab CI #6218

Closed
aaronsteers opened this issue Jun 15, 2022 · 17 comments

Comments

@aaronsteers
Copy link
Contributor

From a call with @tayloramurphy and @DouweM, we started thinking of replacing demo-project with a set of sample Meltano projects in meltano/meltano that can serve as integration tests, perhaps along with specific scripts that would run - such as the invoking of specific plugins.

I'm sure there are better and more sophisticated approaches, but one approach would be a convention have a meltano.yml, a script.sh, and a meltano.after.yml which should match to the results. Another approach would be a set of pytest tests that perform a series of operations against (a copy of?) the source meltano.yml file and confirm the results (or errors), possibly including specific telemetry events that are expected to be sent over the course of those script commands.

@aaronsteers
Copy link
Contributor Author

@pandemicsyn, @WillDaSilva, @edgarrmondragon - This ties back to other conversations we've had recently. Curious for your thoughts/suggestions on this front.

@WillDaSilva
Copy link
Member

@edgarrmondragon
Copy link
Collaborator

Another approach would be a set of pytest tests that perform a series of operations against (a copy of?) the source meltano.yml file and confirm the results (or errors)

I like that, with a good caching setup we should be able to run a bunch of them without waiting too long for results. I was thinking of a similar approach for testing #3322 since settings service unit tests don't seem to capture all the interactions.

@DouweM
Copy link
Contributor

DouweM commented Jun 16, 2022

We can use this both to test the impact of certain commands on meltano.yml (e.g. meltano config <plugin> set adds/updates a key), and the correct behavior of certain commands given a specific meltano.yml or entire project director (e.g. meltano elt <tap> <target> --transform=run in a project created before 2.0 keeps working on the latest version).

@tayloramurphy
Copy link
Collaborator

@aaronsteers I'm guessing you didn't intend to open this in the handbook repo, right?

I'm quite supportive of this!

@tayloramurphy tayloramurphy transferred this issue from meltano/handbook Jun 16, 2022
@pandemicsyn
Copy link
Contributor

I'm sure there are better and more sophisticated approaches, but one approach would be a convention have a meltano.yml, a script.sh, and a meltano.after.yml which should match to the results. Another approach would be a set of pytest tests that perform a series of operations against (a copy of?) the source meltano.yml file and confirm the results (or errors), possibly including specific telemetry events that are expected to be sent over the course of those script commands.

@aaronsteers Yep, even in the simple approach, we can spin up a snowplow-micro instance, and configure meltano to point telemetry at it. Run our script.sh integration script, and then at the end of the test check that snowplow micro reports only successful events. Probably wouldn't take much beyond a little grep/jq magic to also verify that we have the expected start/completed events for the commands we ran. That would make for a great general purpose safety net for telemetry.

This setup can also fairly easily grow to accommodate automated UI browser tests I'd think. Including checking that js fired telemetry gets emitted as expected.

@pandemicsyn
Copy link
Contributor

pandemicsyn commented Jun 24, 2022

We've chatted a bit in the past about having more examples in the docs as well, and we had some issues with doc drift leading up 2.0. I've been thinking about that a bit and wanted to float an idea. Using a setup like @aaronsteers described, I think we can actually kill two birds with one stone.

We can get more examples for users - that are actually maintained and always work, and new and improved integration tests. The only real change would be that instead of having a script.sh file, we treat our test orchestrations as living and usable Markdown docs. Users can use them as examples, and we can use them as scripts to execute via mdsh.

Contrived example, but if you wanted test and document how to transition from using elt to run you might have something like:

# transition-from-elt-to-run.md

This example shows how to transition an `etl` task with a custom state-id to a `job` executed via `run`. 
To follow along with this example, download link to meltano yml to a fresh project and run:

```
meltano install
```

Then assuming you had an `elt` job invoked like so:

```shell
meltano elt --state-id=my-custom-id tap-gitlab target-postgres
```

You would first need to rename the id to match meltano's internal pattern:

```shell
meltano state copy my-custom-id tap-gitlab-to-target-postgres
```

Then you can create a job and execute it using `meltano run`:


```shell
meltano job add my-new-job --task="tap-gitlab target-postgres"
meltano run my-new-job
```

Compiling this via mdsh will yield a bash script that we can invoke from our integration tests:

meltano install
meltano elt --state-id=my-custom-id tap-gitlab target-postgres
meltano state copy my-custom-id tap-gitlab-to-target-postgres
meltano job add my-new-job --task="tap-gitlab target-postgres 
meltano run my-new-job

So we could structure our tests something like:

/meltano/docs/example-library
|- kitchen-sink - a canonical kitchen sink project that shows all options in one file
   |- meltano.yaml
   |- kitchen-sink.md
|- env-config-aliases
   | - meltano.yaml
   | - all-the-ways-to-configure-things-demo.md
|- performing-work
   | - meltano.yaml
   | - meltano-job-run-demo.md
... more examples and awesome-sauce demo's...

meltano/integration/example-library
|- kitchen-sink 
   |- expected-meltano.yaml
   |- validation.sh 
|- env-config-aliases
   | - expected-meltano.yaml
   | - validation.sh
| - performing-work
   | - expected-meltano.yaml
   | - expected-output.jsonl
   | - expected-log-line-matches.txt
   | - validation.sh
... more examples and awesome-sauce test's...

Where validation.sh compiles the md, executes it (just like the script.sh in @aaronsteers original proposal), and performs any additional needed validation steps (diffing files, greping log lines for matches, etc).

@aaronsteers @afolson curious how y'all feel about this. It would basically guarantee that our example library always works, and since all these come with a meltano.yml, users can download them and actually follow along.

@tayloramurphy
Copy link
Collaborator

@pandemicsyn I love this idea. It would lend itself well to incremental adoption and support from the community as well. I'm supportive 👍

@aaronsteers
Copy link
Contributor Author

aaronsteers commented Jun 24, 2022

@pandemicsyn - Ditto what @tayloramurphy said. Your proposal checks all the boxes I was looking for (plus a few more):

  1. Arbitrary starting point in the meltano.yml of each case.
  2. A generic script which performs a series of actions (essentially a sequence of meltano cli commands).
  3. A generic contract (validation.sh) that each sample can use for custom validation.
  4. An 'expected output' version of the meltano.yml that ensures the project is exactly as we expect it to be after all operations are completed. (For instance, would catch config landing in the wrong place even if all other behaviors were functioning as expected.)
  5. We are able to use both the 'before' and 'after' versions of meltano.yml to validate against JSON Schema rules.
  6. The examples and tests are both trivial in terms of effort to maintain and create, and can be reasonably included in the definition of done for all new features. (Basically these could also be the demo script.)
  7. Tests are fast and easy to parallelize.
  8. From all of the above in place, it would be trivial to adapt a TDD driven development approach, where we (or Product) write the examples and test cases before development even begins. 🎉

In short, I love it. I especially love your proposal to use something like mdsh, since that basically makes these dual-purpose tests and tutorials. 👌

@pandemicsyn
Copy link
Contributor

@aaronsteers Perfect, so next week then, I'll put the scaffolding in place and build out a 1st work guide/test. I think doing a kitchen sink/end to end walk through gives us the most bang for our buck testing wise, so I'd vote we start with that one, unless there's a specific scenario you'd prefer me to start with.

Also, there's quite a few existing jsonschema validator actions/workflows, and also https://github.com/python-jsonschema/check-jsonschema which looks very robust. So, at first glance, adding a jsonschema check of our kitchen sink yaml seems like it wouldn't be much additional work.

@afolson
Copy link
Contributor

afolson commented Jun 27, 2022

@pandemicsyn Overall I love this idea! Would it be easier on maintainers to put all of this in the integration/ directory and I can just pull the markdown files/example yml into the docs when they build?

@pandemicsyn
Copy link
Contributor

Would it be easier on maintainers to put all of this in the integration/ directory and I can just pull the markdown files/example yml into the docs when they build?

@afolson nah, it's not a big deal. At some point, we might also have some tests that are internal to us and that we don't want to pollute the docs space with, and then those can live solely in integration/. So its probably worth keeping them separated, for now.

@afolson
Copy link
Contributor

afolson commented Jun 27, 2022

@pandemicsyn Alright, sounds great! Let me know how I can help/review.

@pandemicsyn
Copy link
Contributor

@aaronsteers @afolson (and anyone else interested 😁 ). I've got a working PoC here: #6303

I think its almost a bit easier to grok this by looking at the branch, but tl;dr this ships two PoC integration tests:

Which results in a workflow run like:

https://github.com/meltano/meltano/actions/runs/2578001545

When can shift this discussion over to the PR but I've got some open questions. Namely:

When should these run ? We've got options!

I don't think it makes sense to run these on every commit for every PR by default. I'd vote some combo of:

  1. They run automatically whenever new tests are included in a PR or existing tests are modified
  2. They run automatically for every commit on a PR, if the PR is flagged as requiring extended tests by a reviewer or author (via something like attaching a specific label). With clear guidelines and expectations in the handbook/dev guide.
  3. They can be run on-demand via workflow ui or cli
  4. They run automatically as part of the release to prevent shipping a broken build.

Optionally:

  • Since we're using semantic commits we could trigger based on things like whether the PR is a feature.
  • I don't think there's an "at merge but prior to merge" trigger yet, but we could trigger these to run on every push to main.

@pandemicsyn
Copy link
Contributor

@tayloramurphy @aaronsteers ok so we've got this framework in place and we're running the basic tests for every PR. I've got a list of other tests cases we might want to start adding. Should I convert this issue to a discussion to use as a central place to collect tests cases (that y'all can then seed our backlog with when time permits), or would you prefer something else?

@tayloramurphy
Copy link
Collaborator

Should I convert this issue to a discussion to use as a central place to collect tests cases (that y'all can then seed our backlog with when time permits), or would you prefer something else?

@pandemicsyn either converting this or closing it and making a new discussion. I don't have a preference 👍

@pandemicsyn
Copy link
Contributor

👍 closing in favor of:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

7 participants