Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Providers as a fixed interface per type #725

Closed
wants to merge 2 commits into from

Conversation

evankanderson
Copy link
Member

Per discussion with @JAORMX earlier today, a sketch of what policy (examples/github), providers (pkg/providers/providers.go), and our interface with providers (providers.go again and proto/mediator/providers/providers.proto) would look like in a world where the provider contract was a fixed set of RPCs rather than a chained-JSON-API-fetch-rule world.

It might be possible to implement some of these interfaces via chained-JSON-API-fetch behind the interface, but the key point would be that we could extract RepoProvider, BuildProvider, etc into grpc services which we could call from Mediator. This would allow users to implement their own providers and contribute to Mediator without needing Stacklok to run every provider in-core (which reduces the amount of blocking review Stacklok needs to provide for e.g. "Apache Foundation's self-hosted git provider").

@JAORMX JAORMX marked this pull request as ready for review August 22, 2023 19:24
@@ -7,41 +7,52 @@ context:
group: Root Group
provider: github
repository:
- context: github
- context: repo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what would the context mean in this case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically a mapping to a provider information call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to have different provider implementations, we might want to have a format similar to this:

context:
  provider: GitHub
  object: repo

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The provider is on line 8; I'd like the repository rules to be able to apply to multiple providers. This means that repo maps to the output of GetRepository from any provider.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(actually, I think I'd prefer that the provider not be part of the policy at all, or that it was a property-match situation when we have that in the future)

@JAORMX
Copy link
Contributor

JAORMX commented Aug 23, 2023

Overall I can see the benefits of this proposal. It decouples a lot of the complexity outside of Mediator and stashes that into each provider implementation. I understand where this is going in terms of transforming rule types into RPC implementations, and making policies apply to those instead. I also like that this would allow us to mock providers in an easier way.

I also like that this would move signaling away from mediator and into the providers themselves. We could have an RPC in mediator that a provider could call in case a signal is received and reconciliation of data is needed.

Sooner rather than later, we'll need to evaluate policies on git repository contents (e.g. checking specific files). I can see a way to do it in the current model by creating a new upstream retrieval type. How do you envision this working on this contractual model?

For systems that are not fully relying on GitHub (e.g. using an alternative CI system) we could start doing checks that specific steps in their pipelines exist. This requires some dynamism if we want to cover more CI systems that may not have a proper API contract in place (e.g. Buildkite).

@evankanderson
Copy link
Member Author

Sooner rather than later, we'll need to evaluate policies on git repository contents (e.g. checking specific files). I can see a way to do it in the current model by creating a new upstream retrieval type. How do you envision this working on this contractual model?

That's a good question -- I could imagine a different type of rule than property that was something like file, but I'm also wondering what the checks on the file contents look like -- is it "file exists", "file is of type X with at least contents Y", ??

For systems that are not fully relying on GitHub (e.g. using an alternative CI system) we could start doing checks that specific steps in their pipelines exist. This requires some dynamism if we want to cover more CI systems that may not have a proper API contract in place (e.g. Buildkite).

Is Buildkite substantially different in that respect than GitHub Actions or CircleCI?

@JAORMX
Copy link
Contributor

JAORMX commented Aug 23, 2023

That's a good question -- I could imagine a different type of rule than property that was something like file, but I'm also wondering what the checks on the file contents look like -- is it "file exists", "file is of type X with at least contents Y", ??

From previous experience, most of the time you'd want to check the contents of files. Comparisons would need to be done. Here's a use case:

As an organization I want to check that my teams have adopted my approved SAST tool and have it enabled in their pipelines.

@JAORMX
Copy link
Contributor

JAORMX commented Aug 23, 2023

Is Buildkite substantially different in that respect than GitHub Actions or CircleCI?

It isn't substantially different. I just wanted to give an example of a tool that we couldn't simply check with an API call as we could with github.

@jhrozek
Copy link
Contributor

jhrozek commented Aug 23, 2023

Sooner rather than later, we'll need to evaluate policies on git repository contents (e.g. checking specific files). I can see a way to do it in the current model by creating a new upstream retrieval type. How do you envision this working on this contractual model?

That's a good question -- I could imagine a different type of rule than property that was something like file, but I'm also wondering what the checks on the file contents look like -- is it "file exists", "file is of type X with at least contents Y", ??

One example might be "make sure that the FROM images in your Dockerfile are identified using SHAs and not tags"

But I was actually wondering about this in the context of the vulnerability scanning work. There, we'd like to check if a file that contains the list of dependencies (e.g. package.json for JS) doesn't contain dependencies with known vulnerabilities. We can (and probably will..) start with everything just coded up in mediator, but it would be nice if the parsers for different languages (and more generally different files like the Dockerfile example above) could be developed outside the core mediator codebase.

Would it be too crazy to have the policy say just "check this file pattern with this plugin" where a plugin might be e.g. a WASM plugin that the policy would point to?

@evankanderson
Copy link
Member Author

Sooner rather than later, we'll need to evaluate policies on git repository contents (e.g. checking specific files). I can see a way to do it in the current model by creating a new upstream retrieval type. How do you envision this working on this contractual model?

That's a good question -- I could imagine a different type of rule than property that was something like file, but I'm also wondering what the checks on the file contents look like -- is it "file exists", "file is of type X with at least contents Y", ??

One example might be "make sure that the FROM images in your Dockerfile are identified using SHAs and not tags"

But I was actually wondering about this in the context of the vulnerability scanning work. There, we'd like to check if a file that contains the list of dependencies (e.g. package.json for JS) doesn't contain dependencies with known vulnerabilities. We can (and probably will..) start with everything just coded up in mediator, but it would be nice if the parsers for different languages (and more generally different files like the Dockerfile example above) could be developed outside the core mediator codebase.

Would it be too crazy to have the policy say just "check this file pattern with this plugin" where a plugin might be e.g. a WASM plugin that the policy would point to?

I'm wondering whether we want mediator to do this directly, or to ensure that a tool is present that does this, e.g. "ensure that dependabot OR renovate is set up", rather than "flag dependencies that need updates". Then our job is not to duplicate those tools, but to help guide people to getting them set up properly (the remediation would be to set up one of the tools, and we could eventually do that automatically / PR it into a repo).

@evankanderson
Copy link
Member Author

My bias is that "PR a file into a repo to remediate" is probably going to be tricky to align with a policy language unless we get esoteric. Being able to extract that into some external imperative code seems like it could be helpful, e.g. "to remediate, call X to do the remediation"... and then we need to work out the trust model for the credentials needed for the pull request. To start with, we could certainly just use a stacklok bot.

@evankanderson
Copy link
Member Author

Is Buildkite substantially different in that respect than GitHub Actions or CircleCI?

It isn't substantially different. I just wanted to give an example of a tool that we couldn't simply check with an API call as we could with github.

Actually, thinking about this, it might be even trickier for build tools which work across both GitHub and GitLab. Presumably, we'd want the architecture to look like:

Mediator --> Buildkite --> RepoProvider

Which suggests that the Buildkite build environment would need to link in some way to the repository configuration.

Even more fun is that you can have setups (like ourselves or sigstore) where the system is composed of multiple repositories, and one of them contains the Buildkite config, but that build process actually assembles the multiple repositories together to produce a single artifact.

@evankanderson
Copy link
Member Author

I think what I'm saying is that we'll discover a really beautiful generalized architecture about 3.5 years after we get a bunch of users, at which point we'll all wail and gnash our teeth, and proclaim that "we could do it so much better the next time".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants