Add conformance testing #2434

tetsuo-cpp · 2022-11-10T10:35:08Z

Description

Hey Cosign folks! Recently, @woodruffw and I have been working on a conformance testing suite for Sigstore clients. At the moment, it just does a basic sign/verify test along the happy path, but we'll be focusing on building out coverage in the coming weeks. The test suite is exposed via a GitHub Action which can be found here.

At this point, we want to get the test suite running against a few clients to begin dogfooding and check that our assumptions around the CLI design of clients other than sigstore-python are reasonable.

The way this works is that we've designed a CLI protocol, which is essentially a set of flags that the client needs to support in order to be conformance tested. Every client's flags differ slightly so, generally speaking, it's required to write some kind of wrapper script to convert the flags described in that document to those that the client understands. For Cosign, that means that there's going to need to be a wrapper program somewhere called cosign-conformance that takes those flags and then invokes cosign under the hood. You can take a look at this PR for sigstore-python to get an idea of what I mean: sigstore/sigstore-python#298.

If you're happy for Cosign to be included in this, I'd be happy to make a PR to get things going.

CC: @di

The text was updated successfully, but these errors were encountered:

znewman01 · 2022-11-10T19:49:36Z

Hey @tetsuo-cpp !

This is great—but I'd actually prefer to do the testing at a slightly lower level.

For instance, in sigstore/protobuf-specs#5 there's an Input protobuf message, which would capture the input to a Verify() function. (We could do the same thing for signing.) Clients don't necessarily need to make this the primary interface to their Verify() functions, but it should be pretty trivial to parse stuff out of the protobufs and pass it in. We've been imagining a suite of test vectors specified in the form of these protos.

This has a few advantages IMO: it avoids shelling out (and should be faster), and is much easier to write a test harness for. It doesn't test the CLI interface directly, but I think that's a good thing—we're going to be able to exercise much more interesting paths via the library interface.

WDYT? I am open to discussion here if you feel strongly that CLI-level testing is better, or that there's a place for both.

Regardless, I think the Cosign repo isn't quite the right place for this issue—would you want to open one up in https://github.com/sigstore/protobuf-specs which seems to be the home of most cross-client concerns?

tetsuo-cpp · 2022-11-11T02:21:39Z

Hey @znewman01, thanks for the ideas and feedback.

This has a few advantages IMO: it avoids shelling out (and should be faster), and is much easier to write a test harness for. It doesn't test the CLI interface directly, but I think that's a good thing—we're going to be able to exercise much more interesting paths via the library interface.

I'm trying to imagine how this would look. Essentially, we want a suite of conformance tests that we can universally apply across Sigstore clients without much effort.

Given that each client is written in its own programming language, how do you propose that we cross the language boundary if we're not invoking the CLI? The other options I can think of are invoking the library directly via SWIG or raw C FFI, both of which I'd say don't satisfy the "without much effort" criteria.

Or were you thinking more that the conformance test suite should just provide a set of test inputs and expected outputs and leaving the actual test runner logic to each client's unit tests? I think that could be complicated because we'll want to test interactions with Fulcio, Rekor and eventually mock out those services to test scenarios like getting a broken inclusion proof, SCT, etc.

Sorry if I've misunderstood what you're describing.

WDYT? I am open to discussion here if you feel strongly that CLI-level testing is better, or that there's a place for both.

I see value in using Protobuf messages but, the way I see it, they can complement the current approach. So at some point when these Protobuf definitions become standardised, we could simplify the CLI protocol and instead ask that clients expose some kind of --protobuf flag that takes an Input rather than the flags that we're asking for now.

Regardless, I think the Cosign repo isn't quite the right place for this issue—would you want to open one up in https://github.com/sigstore/protobuf-specs which seems to be the home of most cross-client concerns?

This issue was specifically to add the existing GitHub Action to cosign's CI so that's why I've opened it here. At the moment, we're wanting to pilot it with clients other than sigstore-python (the client I usually work on).

znewman01 · 2022-11-11T18:39:31Z

Hey @znewman01, thanks for the ideas and feedback.

Appreciate the additional context; I understand much better what you're trying to achieve here. I think overall our goals are aligned and I'm optimistic that we can come up with a plan we're both happy with (and again, I'm still open to merging the CLI tests as-written and even keeping them long-term, I just want to make sure that's part of a coherent conformance testing strategy).

Or were you thinking more that the conformance test suite should just provide a set of test inputs and expected outputs and leaving the actual test runner logic to each client's unit tests?

Precisely. IMO it's way more work to add new shell script-y test suites than to just have each client import standard test data. I also think the performance benefits of avoiding creating are important here.

It's also not clear to me that every language client should have a CLI.

I think that could be complicated because we'll want to test interactions with Fulcio, Rekor and eventually mock out those services to test scenarios like getting a broken inclusion proof, SCT, etc.

Hm...my imagination is that every CLI should be a pretty thin wrapper around a library interface, so anything you could do in a shell script (spin up a fake Fulcio server and point your tool at it) could happen inside a language-specific test harness just as easily (and integrate better with the repo's existing tests).

I guess this is mostly my personal biases—I worked full-time on a CLI for a while, and always found CLI-level testing to be a really blunt instrument, brittle, and slow. Further, architecting a CLI for testability without going through the CLI interface itself means that your library is much more usable as a library and that the tests are clearer (assertions don't require regexes and parsing STDOUT). Not sure—am I being at-all convincing here that CLI tests are something to avoid when possible?

I see value in using Protobuf messages but, the way I see it, they can complement the current approach. So at some point when these Protobuf definitions become standardised, we could simplify the CLI protocol and instead ask that clients expose some kind of --protobuf flag that takes an Input rather than the flags that we're asking for now.

Certainly the protobufs work really well for testing verification. I think it's doable to start expressing full sign+verification flows declaratively too, with faked responses—and in fact, I think a proto-centric approach makes the most sense for faking.

Maybe we could split the difference:

Move as much configuration and as many test cases as possible into a standard (declarative) specification.
- Verification-only cases are probably the easiest.
- But you could imagine entire sign+verify flows with requests/responses.
- This doesn't need to be in protobufs exclusively, but I'd strongly prefer a language-agnostic format to actual code.
Have drivers/harnesses to run these shared test cases.
- They'd be responsible for requests/responses, feeding test cases into the library, and checking that we get the right results.
- One such harness (the first one) could be a CLI harness, which could spin up fake servers and execute commands.

This may be overengineering for now, and it seems reasonable to start with everything in the "CLI harness" but perhaps with a longer-term vision of moving first the verification cases, then everything else, into the "declarative test specification" layer.

Regardless, I think the Cosign repo isn't quite the right place for this issue—would you want to open one up in https://github.com/sigstore/protobuf-specs which seems to be the home of most cross-client concerns?

This issue was specifically to add the existing GitHub Action to cosign's CI so that's why I've opened it here. At the moment, we're wanting to pilot it with clients other than sigstore-python (the client I usually work on).

In the medium-term, I'd expect "common test suite" to be tracked mostly over in the protobuf-specs repo, but it makes sense to have per-client repo issues as well. Maybe we can keep this issue open to track the integration for Golang, and I'll file another issue for the strategy/philosophy discussion.

(Part of the reason I want to move it out of this repo too is that at some point these tests would move over to https://github.com/sigstore/sigstore-go because that's where the core Sigstore logic for Go will live, and Cosign will mostly focus on OCI/container signing.)

di · 2022-11-15T15:27:55Z

I think there's a benefit to both testing the specs and testing the application as a black box (via CLI). Can we agree that it's worthwhile to do both?

bobcallaway · 2022-11-15T17:10:06Z

FWIW, I agree with @di - I think it is definitely worthwhile to do both.

znewman01 · 2022-11-16T23:37:22Z

I think there's a benefit to both testing the specs and testing the application as a black box (via CLI). Can we agree that it's worthwhile to do both?

I think if we can articulate what we're trying to get out of the CLI testing and why it differs from the "test vector" style testing, then definitely. If not, I'd rather avoid maintaining two parallel suites of test cases. Perhaps the following delineation of responsibilities?

CLI testing: common e2e tests for both clients and infrastructure

Against prod/staging Sigstore infrastructure. (Maybe can even run on incoming PRs to Rekor/Fulcio with all existing clients.)
"Workflow" focused, rather than "test case" focused.
Fewer hard-coded cases.
Run against older versions of tools (see "API Support for Old Cosign Versions" in Proposal: Cosign Versioning)

I could also have the goals of this test suite all wrong; if so, please correct me.

"Test vector" testing: test spec compliance

Cover every edge case in the spec.
As many cases as possible.
Fuzz-friendly: faster to run (no network ever)
- If a test vector ever trips up clients we can add it to the repository.

I actually don't buy the "test the applications" argument—I expect that for most client libraries, there is no application, so the "black box" testing we'd be doing is just of the CLI test runner. And I still don't love CLI as the main interface, but I'm willing to concede that one.

A few open questions, assuming the above is roughly how we want to split up responsibilities:

What should we do about clients like Gitsign? I think we'd want a similar style of tests, but the workflow doesn't quite make sense. (The reason I don't think we should just leave Gitsign to its own devices is that we want to avoid making changes that break Gitsign but nobody else to Sigstore infra.)
Can we run these both centrally (e.g., against prod/staging Sigstore in a cron job, or even against Fulcio/Rekor PRs) and per-client?

tetsuo-cpp · 2022-11-21T13:24:09Z

I think if we can articulate what we're trying to get out of the CLI testing and why it differs from the "test vector" style testing, then definitely. If not, I'd rather avoid maintaining two parallel suites of test cases.

I won't speak for @di and @woodruffw but the way I see it, the CLI test suite is a broader, coarse-grained view of conformance that covers client behaviour as a whole including side effects, rather than testing specifically whether it passes/fails for given input bundles. So it can answer questions like:

What files get created by the client in what situations?
Does the client fail to sign when the certificate it gets from Fulcio isn't signed by the root CA?
Does the client fail to verify for an invalid inclusion proof that it gets from the Rekor response?
Does the client fail to sign for an invalid signed certificate timestamp from the Fulcio response?
etc

I'd like the conformance suite to become an informal specification on how Sigstore clients are expected to behave.

I think this is a different level of granularity compared with what having a series of test vectors gives us. You mentioned earlier that spinning up mock servers could be achieved in each client's language specific test harness, however this pushes all of the management of processes and mocks onto each and every client which will be a significant effort.

I think realistically, the test vector approach is going to be better at testing offline workflows at a low-level and testing every permutation of what's allowed in the Protobuf spec and seeing whether the client chokes on anything (more or less what you described), whereas the CLI testing is going to be more about testing high-level client behaviour including all network interactions with Fulcio, Rekor and using mocks to test for atypical responses.

Against prod/staging Sigstore infrastructure. (Maybe can even run on incoming PRs to Rekor/Fulcio with all existing clients.)

"Workflow" focused, rather than "test case" focused.

Fewer hard-coded cases.

Run against older versions of tools (see "API Support for Old Cosign Versions" in Proposal: Cosign Versioning)

I think the first 3 points are in line with what we were thinking, although I'd add that we'll be using mocks in addition to testing against prod/staging infra. I'm unable to view the doc linked in the 4th point so I can't comment on that.

What should we do about clients like Gitsign?

I'm not sure about this. I'm not super familiar with Gitsign but I'll take a look this week and get back to you with ideas. Do you have an opinion on which type of testing would be a better fit for Gitsign?

Can we run these both centrally (e.g., against prod/staging Sigstore in a cron job, or even against Fulcio/Rekor PRs) and per-client?

What's the appeal of running them centrally? At the moment, the CLI suite is a GitHub Action that each client can adopt into their CI. If the goal was to have a single place to look and see whether any client is failing conformance tests, we could just have a page with all the clients' CI badges in a list.

I think testing against custom Fulcio and Rekor instances would be possible in the long term, but we wouldn't be aiming for that to start off with.

di · 2022-11-21T16:12:06Z

Agree wholeheartedly with @tetsuo-cpp, one point though:

What's the appeal of running them centrally?

I think one advantage would be getting good insight into whether the 'online' services have created breaking changes for one or more of the clients, which right now we theoretically catch by running sigstore-python tests on a cron against staging, but might make more sense to centralize.

znewman01 · 2022-11-21T16:27:33Z

Great, that all makes perfect sense to me. Given that, I'm pro-CLI conformance testing and happy to help wherever I can. (I'd love some of the above points to make their way into the README.)

Sorry for being a bit of a drag here, I just want to understand why we're doing these tests before I commit to maintaining them indefinitely.

I'm unable to view the doc linked in the 4th point so I can't comment on that.

You'll need to join sigstore-dev@googlegroups.com for access to most Sigstore Google Docs.

I'm not sure about this. I'm not super familiar with Gitsign but I'll take a look this week and get back to you with ideas. Do you have an opinion on which type of testing would be a better fit for Gitsign?

I bring up Gitsign because it's not written around a workflow of "sign an artifact and output a signature;" rather, the main main logic of Gitsign involves pulling payloads out of and cramming signature data into Git's data model. So a test script would look quite different. (You might be able to contort yourself into writing something that conforms to the CLI interface, but it'd be a stretch).

I think if we're using the CLI tests as the central high-level e2e testing flows, we'll be interested in slightly awkward clients like that as well. (We don't have to solve this now.)

I think testing against custom Fulcio and Rekor instances would be possible in the long term, but we wouldn't be aiming for that to start off with.

My hope is that this will be a simple matter of export SIGSTORE_FULCIO_URL=... or something. Not needed for a first draft though 🙂

I think one advantage would be getting good insight into whether the 'online' services have created breaking changes for one or more of the clients, which right now we theoretically catch by running sigstore-python tests on a cron against staging, but might make more sense to centralize.

Thanks, @di: that's exactly what I was getting at.

tetsuo-cpp · 2022-11-23T12:31:30Z

Great, that all makes perfect sense to me. Given that, I'm pro-CLI conformance testing and happy to help wherever I can. (I'd love some of the above points to make their way into the README.)

Absolutely. I'll follow up with a PR to sigstore-conformance's README to capture what we discussed here.

Coming back to adding the action to Cosign CLI, I need to write a wrapper for Cosign. I'm thinking this should be a simple one-file Go program similar to this wrapper.

Sorry for being a bit of a drag here, I just want to understand why we're doing these tests before I commit to maintaining them indefinitely.

Not at all! That totally makes sense and anyhow, I think it's really valuable that we came up with a clear delineation of what we want out of each testing method.

tetsuo-cpp added the enhancement New feature or request label Nov 10, 2022

znewman01 mentioned this issue Nov 11, 2022

Common test suite strategy sigstore/protobuf-specs#15

Open

tetsuo-cpp mentioned this issue Dec 9, 2022

README: Elaborate on what we aim to achieve with CLI testing sigstore/sigstore-conformance#57

Merged

znewman01 mentioned this issue Mar 8, 2023

Proposal: add "test vector" tests to this repo. sigstore/sigstore-conformance#64

Open

haydentherapper mentioned this issue Feb 26, 2024

Create a test to run cosign from main sigstore/sigstore-probers#105

Open

haydentherapper added v3ish pre-theseus and removed v3ish labels May 9, 2024

haydentherapper mentioned this issue Aug 5, 2024

Conformance testing for cosign #3806

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add conformance testing #2434

Add conformance testing #2434

tetsuo-cpp commented Nov 10, 2022 •

edited

Loading

znewman01 commented Nov 10, 2022

tetsuo-cpp commented Nov 11, 2022 •

edited

Loading

znewman01 commented Nov 11, 2022 •

edited

Loading

di commented Nov 15, 2022

bobcallaway commented Nov 15, 2022

znewman01 commented Nov 16, 2022

tetsuo-cpp commented Nov 21, 2022 •

edited

Loading

di commented Nov 21, 2022

znewman01 commented Nov 21, 2022

tetsuo-cpp commented Nov 23, 2022 •

edited

Loading

Add conformance testing #2434

Add conformance testing #2434

Comments

tetsuo-cpp commented Nov 10, 2022 • edited Loading

znewman01 commented Nov 10, 2022

tetsuo-cpp commented Nov 11, 2022 • edited Loading

znewman01 commented Nov 11, 2022 • edited Loading

di commented Nov 15, 2022

bobcallaway commented Nov 15, 2022

znewman01 commented Nov 16, 2022

tetsuo-cpp commented Nov 21, 2022 • edited Loading

di commented Nov 21, 2022

znewman01 commented Nov 21, 2022

tetsuo-cpp commented Nov 23, 2022 • edited Loading

tetsuo-cpp commented Nov 10, 2022 •

edited

Loading

tetsuo-cpp commented Nov 11, 2022 •

edited

Loading

znewman01 commented Nov 11, 2022 •

edited

Loading

tetsuo-cpp commented Nov 21, 2022 •

edited

Loading

tetsuo-cpp commented Nov 23, 2022 •

edited

Loading