New process and guidelines for integration tests (replaces `demo-project` and `homepage_sample_code` tests in GitLab CI #6218

aaronsteers · 2022-06-15T23:04:20Z

From a call with @tayloramurphy and @DouweM, we started thinking of replacing demo-project with a set of sample Meltano projects in meltano/meltano that can serve as integration tests, perhaps along with specific scripts that would run - such as the invoking of specific plugins.

I'm sure there are better and more sophisticated approaches, but one approach would be a convention have a meltano.yml, a script.sh, and a meltano.after.yml which should match to the results. Another approach would be a set of pytest tests that perform a series of operations against (a copy of?) the source meltano.yml file and confirm the results (or errors), possibly including specific telemetry events that are expected to be sent over the course of those script commands.

The text was updated successfully, but these errors were encountered:

aaronsteers · 2022-06-15T23:05:22Z

@pandemicsyn, @WillDaSilva, @edgarrmondragon - This ties back to other conversations we've had recently. Curious for your thoughts/suggestions on this front.

WillDaSilva · 2022-06-15T23:10:39Z

Relates to:

Setup testing framework for Snowplow #2955

edgarrmondragon · 2022-06-15T23:46:19Z

Another approach would be a set of pytest tests that perform a series of operations against (a copy of?) the source meltano.yml file and confirm the results (or errors)

I like that, with a good caching setup we should be able to run a bunch of them without waiting too long for results. I was thinking of a similar approach for testing #3322 since settings service unit tests don't seem to capture all the interactions.

DouweM · 2022-06-16T00:01:07Z

We can use this both to test the impact of certain commands on meltano.yml (e.g. meltano config <plugin> set adds/updates a key), and the correct behavior of certain commands given a specific meltano.yml or entire project director (e.g. meltano elt <tap> <target> --transform=run in a project created before 2.0 keeps working on the latest version).

tayloramurphy · 2022-06-16T13:55:28Z

@aaronsteers I'm guessing you didn't intend to open this in the handbook repo, right?

I'm quite supportive of this!

pandemicsyn · 2022-06-16T16:28:37Z

I'm sure there are better and more sophisticated approaches, but one approach would be a convention have a meltano.yml, a script.sh, and a meltano.after.yml which should match to the results. Another approach would be a set of pytest tests that perform a series of operations against (a copy of?) the source meltano.yml file and confirm the results (or errors), possibly including specific telemetry events that are expected to be sent over the course of those script commands.

@aaronsteers Yep, even in the simple approach, we can spin up a snowplow-micro instance, and configure meltano to point telemetry at it. Run our script.sh integration script, and then at the end of the test check that snowplow micro reports only successful events. Probably wouldn't take much beyond a little grep/jq magic to also verify that we have the expected start/completed events for the commands we ran. That would make for a great general purpose safety net for telemetry.

This setup can also fairly easily grow to accommodate automated UI browser tests I'd think. Including checking that js fired telemetry gets emitted as expected.

pandemicsyn · 2022-06-24T15:04:14Z

We've chatted a bit in the past about having more examples in the docs as well, and we had some issues with doc drift leading up 2.0. I've been thinking about that a bit and wanted to float an idea. Using a setup like @aaronsteers described, I think we can actually kill two birds with one stone.

We can get more examples for users - that are actually maintained and always work, and new and improved integration tests. The only real change would be that instead of having a script.sh file, we treat our test orchestrations as living and usable Markdown docs. Users can use them as examples, and we can use them as scripts to execute via mdsh.

Contrived example, but if you wanted test and document how to transition from using elt to run you might have something like:

# transition-from-elt-to-run.md

This example shows how to transition an `etl` task with a custom state-id to a `job` executed via `run`. 
To follow along with this example, download link to meltano yml to a fresh project and run:

```
meltano install
```

Then assuming you had an `elt` job invoked like so:

```shell
meltano elt --state-id=my-custom-id tap-gitlab target-postgres
```

You would first need to rename the id to match meltano's internal pattern:

```shell
meltano state copy my-custom-id tap-gitlab-to-target-postgres
```

Then you can create a job and execute it using `meltano run`:


```shell
meltano job add my-new-job --task="tap-gitlab target-postgres"
meltano run my-new-job
```

Compiling this via mdsh will yield a bash script that we can invoke from our integration tests:

meltano install
meltano elt --state-id=my-custom-id tap-gitlab target-postgres
meltano state copy my-custom-id tap-gitlab-to-target-postgres
meltano job add my-new-job --task="tap-gitlab target-postgres 
meltano run my-new-job

So we could structure our tests something like:

/meltano/docs/example-library
|- kitchen-sink - a canonical kitchen sink project that shows all options in one file
   |- meltano.yaml
   |- kitchen-sink.md
|- env-config-aliases
   | - meltano.yaml
   | - all-the-ways-to-configure-things-demo.md
|- performing-work
   | - meltano.yaml
   | - meltano-job-run-demo.md
... more examples and awesome-sauce demo's...

meltano/integration/example-library
|- kitchen-sink 
   |- expected-meltano.yaml
   |- validation.sh 
|- env-config-aliases
   | - expected-meltano.yaml
   | - validation.sh
| - performing-work
   | - expected-meltano.yaml
   | - expected-output.jsonl
   | - expected-log-line-matches.txt
   | - validation.sh
... more examples and awesome-sauce test's...

Where validation.sh compiles the md, executes it (just like the script.sh in @aaronsteers original proposal), and performs any additional needed validation steps (diffing files, greping log lines for matches, etc).

@aaronsteers @afolson curious how y'all feel about this. It would basically guarantee that our example library always works, and since all these come with a meltano.yml, users can download them and actually follow along.

tayloramurphy · 2022-06-24T15:21:14Z

@pandemicsyn I love this idea. It would lend itself well to incremental adoption and support from the community as well. I'm supportive 👍

aaronsteers · 2022-06-24T15:48:49Z

@pandemicsyn - Ditto what @tayloramurphy said. Your proposal checks all the boxes I was looking for (plus a few more):

Arbitrary starting point in the meltano.yml of each case.
A generic script which performs a series of actions (essentially a sequence of meltano cli commands).
A generic contract (validation.sh) that each sample can use for custom validation.
An 'expected output' version of the meltano.yml that ensures the project is exactly as we expect it to be after all operations are completed. (For instance, would catch config landing in the wrong place even if all other behaviors were functioning as expected.)
We are able to use both the 'before' and 'after' versions of meltano.yml to validate against JSON Schema rules.
The examples and tests are both trivial in terms of effort to maintain and create, and can be reasonably included in the definition of done for all new features. (Basically these could also be the demo script.)
Tests are fast and easy to parallelize.
From all of the above in place, it would be trivial to adapt a TDD driven development approach, where we (or Product) write the examples and test cases before development even begins. 🎉

In short, I love it. I especially love your proposal to use something like mdsh, since that basically makes these dual-purpose tests and tutorials. 👌

pandemicsyn · 2022-06-24T18:56:12Z

@aaronsteers Perfect, so next week then, I'll put the scaffolding in place and build out a 1st work guide/test. I think doing a kitchen sink/end to end walk through gives us the most bang for our buck testing wise, so I'd vote we start with that one, unless there's a specific scenario you'd prefer me to start with.

Also, there's quite a few existing jsonschema validator actions/workflows, and also https://github.com/python-jsonschema/check-jsonschema which looks very robust. So, at first glance, adding a jsonschema check of our kitchen sink yaml seems like it wouldn't be much additional work.

afolson · 2022-06-27T14:37:39Z

@pandemicsyn Overall I love this idea! Would it be easier on maintainers to put all of this in the integration/ directory and I can just pull the markdown files/example yml into the docs when they build?

pandemicsyn · 2022-06-27T16:06:48Z

Would it be easier on maintainers to put all of this in the integration/ directory and I can just pull the markdown files/example yml into the docs when they build?

@afolson nah, it's not a big deal. At some point, we might also have some tests that are internal to us and that we don't want to pollute the docs space with, and then those can live solely in integration/. So its probably worth keeping them separated, for now.

afolson · 2022-06-27T17:10:04Z

@pandemicsyn Alright, sounds great! Let me know how I can help/review.

pandemicsyn · 2022-06-28T18:33:34Z

@aaronsteers @afolson (and anyone else interested 😁 ). I've got a working PoC here: #6303

I think its almost a bit easier to grok this by looking at the branch, but tl;dr this ships two PoC integration tests:

Which results in a workflow run like:

https://github.com/meltano/meltano/actions/runs/2578001545

When can shift this discussion over to the PR but I've got some open questions. Namely:

When should these run ? We've got options!

I don't think it makes sense to run these on every commit for every PR by default. I'd vote some combo of:

They run automatically whenever new tests are included in a PR or existing tests are modified
They run automatically for every commit on a PR, if the PR is flagged as requiring extended tests by a reviewer or author (via something like attaching a specific label). With clear guidelines and expectations in the handbook/dev guide.
They can be run on-demand via workflow ui or cli
They run automatically as part of the release to prevent shipping a broken build.

Optionally:

Since we're using semantic commits we could trigger based on things like whether the PR is a feature.
I don't think there's an "at merge but prior to merge" trigger yet, but we could trigger these to run on every push to main.

pandemicsyn · 2022-07-06T14:50:55Z

@tayloramurphy @aaronsteers ok so we've got this framework in place and we're running the basic tests for every PR. I've got a list of other tests cases we might want to start adding. Should I convert this issue to a discussion to use as a central place to collect tests cases (that y'all can then seed our backlog with when time permits), or would you prefer something else?

tayloramurphy · 2022-07-06T17:58:12Z

Should I convert this issue to a discussion to use as a central place to collect tests cases (that y'all can then seed our backlog with when time permits), or would you prefer something else?

@pandemicsyn either converting this or closing it and making a new discussion. I don't have a preference 👍

pandemicsyn · 2022-07-06T19:02:00Z

👍 closing in favor of:

Integration test backlog #6374

tayloramurphy transferred this issue from meltano/handbook Jun 16, 2022

tayloramurphy added kind/Tech Debt valuestream/Meltano labels Jun 16, 2022

tayloramurphy assigned pandemicsyn Jun 16, 2022

aaronsteers mentioned this issue Jun 16, 2022

Update dbt plugin definitions to dbt 1.1 meltano/hub#560

Closed

tayloramurphy mentioned this issue Jun 23, 2022

Add folder of meltano.yml samples to validate against JSON Schema rules #6181

Closed

aaronsteers mentioned this issue Jun 27, 2022

How to catch dev dependencies that actually need to be included in the distributed presentations of Meltano? #6277

Closed

pandemicsyn mentioned this issue Jun 28, 2022

ci: basic integration test framework #6303

Merged

WillDaSilva mentioned this issue Jun 30, 2022

Move getting_start_windows GitLab CI job tests under Pytest #6317

Closed

pandemicsyn closed this as completed Jul 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New process and guidelines for integration tests (replaces `demo-project` and `homepage_sample_code` tests in GitLab CI #6218

New process and guidelines for integration tests (replaces `demo-project` and `homepage_sample_code` tests in GitLab CI #6218

aaronsteers commented Jun 15, 2022

aaronsteers commented Jun 15, 2022

WillDaSilva commented Jun 15, 2022

edgarrmondragon commented Jun 15, 2022

DouweM commented Jun 16, 2022

tayloramurphy commented Jun 16, 2022

pandemicsyn commented Jun 16, 2022

pandemicsyn commented Jun 24, 2022 •

edited

tayloramurphy commented Jun 24, 2022

aaronsteers commented Jun 24, 2022 •

edited

pandemicsyn commented Jun 24, 2022

afolson commented Jun 27, 2022

pandemicsyn commented Jun 27, 2022

afolson commented Jun 27, 2022

pandemicsyn commented Jun 28, 2022

pandemicsyn commented Jul 6, 2022

tayloramurphy commented Jul 6, 2022

pandemicsyn commented Jul 6, 2022

New process and guidelines for integration tests (replaces demo-project and homepage_sample_code tests in GitLab CI #6218

New process and guidelines for integration tests (replaces demo-project and homepage_sample_code tests in GitLab CI #6218

Comments

aaronsteers commented Jun 15, 2022

aaronsteers commented Jun 15, 2022

WillDaSilva commented Jun 15, 2022

edgarrmondragon commented Jun 15, 2022

DouweM commented Jun 16, 2022

tayloramurphy commented Jun 16, 2022

pandemicsyn commented Jun 16, 2022

pandemicsyn commented Jun 24, 2022 • edited

tayloramurphy commented Jun 24, 2022

aaronsteers commented Jun 24, 2022 • edited

pandemicsyn commented Jun 24, 2022

afolson commented Jun 27, 2022

pandemicsyn commented Jun 27, 2022

afolson commented Jun 27, 2022

pandemicsyn commented Jun 28, 2022

When should these run ? We've got options!

pandemicsyn commented Jul 6, 2022

tayloramurphy commented Jul 6, 2022

pandemicsyn commented Jul 6, 2022

New process and guidelines for integration tests (replaces `demo-project` and `homepage_sample_code` tests in GitLab CI #6218

New process and guidelines for integration tests (replaces `demo-project` and `homepage_sample_code` tests in GitLab CI #6218

pandemicsyn commented Jun 24, 2022 •

edited

aaronsteers commented Jun 24, 2022 •

edited