WIP: Providers as a fixed interface per type #725

evankanderson · 2023-08-22T19:21:08Z

Per discussion with @JAORMX earlier today, a sketch of what policy (examples/github), providers (pkg/providers/providers.go), and our interface with providers (providers.go again and proto/mediator/providers/providers.proto) would look like in a world where the provider contract was a fixed set of RPCs rather than a chained-JSON-API-fetch-rule world.

It might be possible to implement some of these interfaces via chained-JSON-API-fetch behind the interface, but the key point would be that we could extract RepoProvider, BuildProvider, etc into grpc services which we could call from Mediator. This would allow users to implement their own providers and contribute to Mediator without needing Stacklok to run every provider in-core (which reduces the amount of blocking review Stacklok needs to provide for e.g. "Apache Foundation's self-hosted git provider").

JAORMX · 2023-08-23T08:57:56Z

examples/github/policies/policy.yaml

@@ -7,41 +7,52 @@ context:
  group: Root Group
  provider: github
 repository:
-  - context: github
+  - context: repo


what would the context mean in this case?

This is basically a mapping to a provider information call.

If we're going to have different provider implementations, we might want to have a format similar to this:

context: provider: GitHub object: repo

The provider is on line 8; I'd like the repository rules to be able to apply to multiple providers. This means that repo maps to the output of GetRepository from any provider.

(actually, I think I'd prefer that the provider not be part of the policy at all, or that it was a property-match situation when we have that in the future)

JAORMX · 2023-08-23T11:04:17Z

Overall I can see the benefits of this proposal. It decouples a lot of the complexity outside of Mediator and stashes that into each provider implementation. I understand where this is going in terms of transforming rule types into RPC implementations, and making policies apply to those instead. I also like that this would allow us to mock providers in an easier way.

I also like that this would move signaling away from mediator and into the providers themselves. We could have an RPC in mediator that a provider could call in case a signal is received and reconciliation of data is needed.

Sooner rather than later, we'll need to evaluate policies on git repository contents (e.g. checking specific files). I can see a way to do it in the current model by creating a new upstream retrieval type. How do you envision this working on this contractual model?

For systems that are not fully relying on GitHub (e.g. using an alternative CI system) we could start doing checks that specific steps in their pipelines exist. This requires some dynamism if we want to cover more CI systems that may not have a proper API contract in place (e.g. Buildkite).

evankanderson · 2023-08-23T14:05:07Z

Sooner rather than later, we'll need to evaluate policies on git repository contents (e.g. checking specific files). I can see a way to do it in the current model by creating a new upstream retrieval type. How do you envision this working on this contractual model?

That's a good question -- I could imagine a different type of rule than property that was something like file, but I'm also wondering what the checks on the file contents look like -- is it "file exists", "file is of type X with at least contents Y", ??

For systems that are not fully relying on GitHub (e.g. using an alternative CI system) we could start doing checks that specific steps in their pipelines exist. This requires some dynamism if we want to cover more CI systems that may not have a proper API contract in place (e.g. Buildkite).

Is Buildkite substantially different in that respect than GitHub Actions or CircleCI?

JAORMX · 2023-08-23T14:07:05Z

That's a good question -- I could imagine a different type of rule than property that was something like file, but I'm also wondering what the checks on the file contents look like -- is it "file exists", "file is of type X with at least contents Y", ??

From previous experience, most of the time you'd want to check the contents of files. Comparisons would need to be done. Here's a use case:

As an organization I want to check that my teams have adopted my approved SAST tool and have it enabled in their pipelines.

JAORMX · 2023-08-23T14:08:18Z

Is Buildkite substantially different in that respect than GitHub Actions or CircleCI?

It isn't substantially different. I just wanted to give an example of a tool that we couldn't simply check with an API call as we could with github.

jhrozek · 2023-08-23T15:55:43Z

Sooner rather than later, we'll need to evaluate policies on git repository contents (e.g. checking specific files). I can see a way to do it in the current model by creating a new upstream retrieval type. How do you envision this working on this contractual model?

That's a good question -- I could imagine a different type of rule than property that was something like file, but I'm also wondering what the checks on the file contents look like -- is it "file exists", "file is of type X with at least contents Y", ??

One example might be "make sure that the FROM images in your Dockerfile are identified using SHAs and not tags"

But I was actually wondering about this in the context of the vulnerability scanning work. There, we'd like to check if a file that contains the list of dependencies (e.g. package.json for JS) doesn't contain dependencies with known vulnerabilities. We can (and probably will..) start with everything just coded up in mediator, but it would be nice if the parsers for different languages (and more generally different files like the Dockerfile example above) could be developed outside the core mediator codebase.

Would it be too crazy to have the policy say just "check this file pattern with this plugin" where a plugin might be e.g. a WASM plugin that the policy would point to?

evankanderson · 2023-08-23T17:13:50Z

Sooner rather than later, we'll need to evaluate policies on git repository contents (e.g. checking specific files). I can see a way to do it in the current model by creating a new upstream retrieval type. How do you envision this working on this contractual model?

That's a good question -- I could imagine a different type of rule than property that was something like file, but I'm also wondering what the checks on the file contents look like -- is it "file exists", "file is of type X with at least contents Y", ??

One example might be "make sure that the FROM images in your Dockerfile are identified using SHAs and not tags"

But I was actually wondering about this in the context of the vulnerability scanning work. There, we'd like to check if a file that contains the list of dependencies (e.g. package.json for JS) doesn't contain dependencies with known vulnerabilities. We can (and probably will..) start with everything just coded up in mediator, but it would be nice if the parsers for different languages (and more generally different files like the Dockerfile example above) could be developed outside the core mediator codebase.

Would it be too crazy to have the policy say just "check this file pattern with this plugin" where a plugin might be e.g. a WASM plugin that the policy would point to?

I'm wondering whether we want mediator to do this directly, or to ensure that a tool is present that does this, e.g. "ensure that dependabot OR renovate is set up", rather than "flag dependencies that need updates". Then our job is not to duplicate those tools, but to help guide people to getting them set up properly (the remediation would be to set up one of the tools, and we could eventually do that automatically / PR it into a repo).

evankanderson · 2023-08-23T17:16:00Z

My bias is that "PR a file into a repo to remediate" is probably going to be tricky to align with a policy language unless we get esoteric. Being able to extract that into some external imperative code seems like it could be helpful, e.g. "to remediate, call X to do the remediation"... and then we need to work out the trust model for the credentials needed for the pull request. To start with, we could certainly just use a stacklok bot.

evankanderson · 2023-08-23T17:24:49Z

Is Buildkite substantially different in that respect than GitHub Actions or CircleCI?

It isn't substantially different. I just wanted to give an example of a tool that we couldn't simply check with an API call as we could with github.

Actually, thinking about this, it might be even trickier for build tools which work across both GitHub and GitLab. Presumably, we'd want the architecture to look like:

Mediator --> Buildkite --> RepoProvider

Which suggests that the Buildkite build environment would need to link in some way to the repository configuration.

Even more fun is that you can have setups (like ourselves or sigstore) where the system is composed of multiple repositories, and one of them contains the Buildkite config, but that build process actually assembles the multiple repositories together to produce a single artifact.

evankanderson · 2023-08-23T17:25:40Z

I think what I'm saying is that we'll discover a really beautiful generalized architecture about 3.5 years after we get a bunch of users, at which point we'll all wail and gnash our teeth, and proclaim that "we could do it so much better the next time".

evankanderson added 2 commits August 22, 2023 12:15

Checkpoint for Ozz to noodle on

c1b41ed

Forgot to save providers.go buffer

dc82825

JAORMX marked this pull request as ready for review August 22, 2023 19:24

JAORMX reviewed Aug 23, 2023

View reviewed changes

teodor-yanev mentioned this pull request Sep 25, 2023

Support Go dependency scanning for pull requests #1012

Merged

evankanderson closed this Oct 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Providers as a fixed interface per type #725

WIP: Providers as a fixed interface per type #725

evankanderson commented Aug 22, 2023

JAORMX Aug 23, 2023

evankanderson Aug 23, 2023

JAORMX Aug 23, 2023

evankanderson Aug 23, 2023

evankanderson Aug 23, 2023

JAORMX commented Aug 23, 2023 •

edited

Loading

evankanderson commented Aug 23, 2023

JAORMX commented Aug 23, 2023

JAORMX commented Aug 23, 2023

jhrozek commented Aug 23, 2023

evankanderson commented Aug 23, 2023

evankanderson commented Aug 23, 2023

evankanderson commented Aug 23, 2023

evankanderson commented Aug 23, 2023

WIP: Providers as a fixed interface per type #725

WIP: Providers as a fixed interface per type #725

Conversation

evankanderson commented Aug 22, 2023

JAORMX Aug 23, 2023

Choose a reason for hiding this comment

evankanderson Aug 23, 2023

Choose a reason for hiding this comment

JAORMX Aug 23, 2023

Choose a reason for hiding this comment

evankanderson Aug 23, 2023

Choose a reason for hiding this comment

evankanderson Aug 23, 2023

Choose a reason for hiding this comment

JAORMX commented Aug 23, 2023 • edited Loading

evankanderson commented Aug 23, 2023

JAORMX commented Aug 23, 2023

JAORMX commented Aug 23, 2023

jhrozek commented Aug 23, 2023

evankanderson commented Aug 23, 2023

evankanderson commented Aug 23, 2023

evankanderson commented Aug 23, 2023

evankanderson commented Aug 23, 2023

JAORMX commented Aug 23, 2023 •

edited

Loading