Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial introduction of providers to the database #955

Merged
merged 12 commits into from
Sep 25, 2023
Merged

Conversation

JAORMX
Copy link
Contributor

@JAORMX JAORMX commented Sep 13, 2023

This simply replaces the usages of the github provider checks and instead
provides a simple implementation of providers that will be expanded upon.

The intent is to leave the same functionality as-is, but instead rely on a
provider that's installed on every organization by default.

I've left traces of how the implementation will evolve by adding a definition
and a providers_type column to the providers database. These will serve as the means
of instantiating providers dynamically, instead of always building the github one
as we do today. But, to keep things short, I've left that out of this PR.

Related-to: stacklok/epics#34

@JAORMX JAORMX changed the title Add initial introduction of providers to the database WIP: Add initial introduction of providers to the database Sep 13, 2023
@JAORMX JAORMX requested a review from jhrozek September 13, 2023 12:05
@JAORMX JAORMX force-pushed the providers-init branch 2 times, most recently from b61edbf to 86fad62 Compare September 13, 2023 12:11
Copy link
Contributor

@jhrozek jhrozek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two quick comments, I need to wrap my head around where are you going with this. I /really/ like that we would get rid of referencing GitHub everywhere with a constant though!


-- Create default GitHub provider
INSERT INTO providers (name, group_id, definition)
VALUES ('github', 1, '{}');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not strictly related to what this work is trying to accomplish, but shouldn't we create higher numbered migrations from this point on?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of patches like this (work that still needs to be done) is why I was not too keen on freezing the initial migration and starting to work on higher numbers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change from group IDs to projects will be equally disruptive.

database/migrations/000001_init.up.sql Outdated Show resolved Hide resolved
@JAORMX JAORMX force-pushed the providers-init branch 7 times, most recently from 18e5ea5 to 41e65b0 Compare September 14, 2023 08:03
@JAORMX JAORMX changed the title WIP: Add initial introduction of providers to the database Add initial introduction of providers to the database Sep 14, 2023
jhrozek
jhrozek previously approved these changes Sep 14, 2023
database/migrations/000001_init.up.sql Show resolved Hide resolved
database/migrations/000001_init.up.sql Outdated Show resolved Hide resolved
Comment on lines 107 to 108
provider_id UUID NOT NULL REFERENCES providers(id) ON DELETE CASCADE,
group_id INTEGER NOT NULL REFERENCES groups(id) ON DELETE CASCADE,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If providers are nested under group_ids, do we need an explicit group_id on this table and the following ones?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused by this comment, there is a group I'd in this table and most of the ones I've changed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my point -- if provider_id is unique by group, then if you know the provider_id, you know the corresponding group. My suggestion was to remove the group_id field here, since it was redundant with the provider. (Alternatively, since providers are unique by group_id and name, you could have a provider_name here along with the group_id and use that to index into the Providers table -- possibly even as a primary index rather than the UUID.)

I think part of this is that my brain is used to thinking about partitioning SQL data stores across shards, and I'm trying to latch onto those sharding keys as an organizing mechanism even though we don't need it for scale at this point.

Copy link
Contributor Author

@JAORMX JAORMX Sep 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah! I see the misunderstanding. No, there could be several providers in one group. The name is just unique.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be several providers in one group, but we want to maintain the constraint that <this row>.group_id == <this row>.provider_id.<providers row>.group_id, right? I'm not sure there's a good way to express that in SQL; I was suggesting that we could remove the <this row>.group_id, or use group_id and and provider_name to avoid possible future database inconsistencies, e.g.

CREATE TABLE provider_access_tokens (
    id SERIAL PRIMARY KEY,
    group_id INTEGER NOT NULL REFERENCES groups(id) ON DELETE CASCADE,
    provider TEXT NOT NULL,
    owner_filter TEXT,
    encrypted_token TEXT NOT NULL,
    expiriation_time TIMESTAMP NOT NULL,
    created_at TIMESTAMP NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMP NOT NULL DEFAULT NOW()
    FOREIGN KEY (group_id, provider) REFERENCES providers(group_id, name) ON DELETE CASCADE
);

@@ -154,7 +168,7 @@ CREATE TABLE artifact_versions (

CREATE TABLE session_store (
id SERIAL PRIMARY KEY,
provider TEXT NOT NULL,
provider UUID NOT NULL REFERENCES providers(id) ON DELETE CASCADE,
grp_id INTEGER,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this group_id as we'd generally conceive of it? (Maybe I shouldn't poke that sleeping bear...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, which in the near future will have to change to projects

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I specifically meant the spelling was different than elsewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, yeah, I don´t know why that is. We should probably change this in a separate PR.

database/query/providers.sql Outdated Show resolved Hide resolved
database/query/providers.sql Show resolved Hide resolved
internal/controlplane/handlers_authz.go Show resolved Hide resolved
@@ -32,13 +34,13 @@ import (
// instead of a concrete github client.
func BuildClient(
ctx context.Context,
prov string,
prov uuid.UUID,
groupID int32,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need groupID here anymore.

@evankanderson
Copy link
Member

Do we have a guess as to how many more database drops we have coming up? Should we plan for weekly drops until the second week of October, or is this the last one we forsee?

@JAORMX
Copy link
Contributor Author

JAORMX commented Sep 14, 2023

Two drops based on providers, one drop based.on group to projects, then one on IAM. That's what I foresee.

@evankanderson
Copy link
Member

Thanks for the forecast on database drops. A few clarifications, but I'm willing to drop my required change.

@evankanderson evankanderson dismissed their stale review September 15, 2023 04:30

Agreed on number of times we drop the database.

@JAORMX JAORMX force-pushed the providers-init branch 2 times, most recently from a5c215e to 53d5a0c Compare September 15, 2023 09:11
@JAORMX
Copy link
Contributor Author

JAORMX commented Sep 15, 2023

I talked with @yrobla and ended up changing the logic for the RevokeOAuthTokens call to what it intended to be with this providers concept.

@JAORMX JAORMX force-pushed the providers-init branch 2 times, most recently from a1de46c to 7392ff8 Compare September 15, 2023 11:51
@evankanderson
Copy link
Member

(Let me know when you want another review)

@JAORMX
Copy link
Contributor Author

JAORMX commented Sep 22, 2023

Now 😃

@JAORMX
Copy link
Contributor Author

JAORMX commented Sep 22, 2023

@jhrozek could you also take a look at this one when you get a chance? Would really use as many eyes as possible here.

@jhrozek
Copy link
Contributor

jhrozek commented Sep 22, 2023

@jhrozek could you also take a look at this one when you get a chance? Would really use as many eyes as possible here.

Sure, thanks for the reminder!

jhrozek
jhrozek previously approved these changes Sep 24, 2023
Copy link
Contributor

@jhrozek jhrozek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't really find any issues during code review. I only left one comment.


-- Create default GitHub provider
INSERT INTO providers (name, group_id, implements, definition)
VALUES ('github', 1, ARRAY ['github', 'git', 'rest']::provider_type[], '{}');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question and something that can be fixed in a migration later on - why did you choose this particular set for GitHub? I understand github and git, but what would we use rest for here? Also, did you consider adding oci since ghcr.io is tied to GitHub and we already support it (although in kind of a tacky way) for artifacts?

Not saying this is wrong, I'm just trying to picture how would other providers look like.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't really see full-straight forward support for an oci provider from github. I can add it once I work on separating implementations from interfaces.

evankanderson
evankanderson previously approved these changes Sep 24, 2023
Copy link
Member

@evankanderson evankanderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments, but LGTM

@@ -115,7 +132,7 @@ CREATE TABLE signing_keys (
CREATE TABLE repositories (
id SERIAL PRIMARY KEY,
provider TEXT NOT NULL,
group_id INTEGER NOT NULL REFERENCES groups(id) ON DELETE CASCADE,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to keep this REFERENCES foreign key constraint? I suppose it's not strictly needed...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's part of the group_id + provider foreign key. Are you suggesting we re-add it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also a foreign key into the groups table, but we can rely on the transitive ON DELETE CASCADE from providers if we prefer.

database/migrations/000001_init.up.sql Outdated Show resolved Hide resolved
internal/controlplane/common.go Outdated Show resolved Hide resolved
@@ -316,20 +306,27 @@ func (s *Server) parseGithubEventForProcessing(
return nil
}

return s.parseRepoEvent(context.Background(), ghclient.Github, payload, msg)
return s.parseRepoEvent(context.Background(), payload, msg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm -- the way we've done this right now, we don't pass pull_request or package type events through to s.parseRepoEvent. Not to fix here, but it feels like we should be using watermill's built-in fanout, as tested here:

https://github.com/stacklok/mediator/blob/main/internal/events/eventer_test.go#L88

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, we probably need to revisit this and reverify the logic. Right now what we do in each function is build a message with appropriate metadata for the event handler. Each branch will have a little more metadata needed, but I guess most of this could be refactored into pieces to be more readable.


// need to read all tokens from the provider and revoke them
tokens, err := s.store.GetAccessTokenByProvider(ctx, ghclient.Github)
// This is in case of a security breach, where we need to revoke all tokens
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this make more sense to execute as raw SQL if that happens?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that you raise this, I think I would prefer this to be part of an administrator's playbook instead of a function an admin can do via the control plane. This merely documented what the feature was intended for and made it work with the newer logic. Let me raise a bug to remove this altogether in further iterations.

This does indeed remove the tokens from the DB, but it also calls the GH API to invalidate the token, so that part might need a nice utility. We could have it as a separate sub-command or a separate container an admin could leverage.

if err != nil {
return nil, status.Errorf(codes.InvalidArgument, "cannot infer group id")
}
in.GroupId = group
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: You could put in.GroupId up in the assignment on line 263 if you want, since changes to the request message will be ignored when you return an error.

Comment on lines +88 to +91
if existingRepo.GroupID != groupId {
fmt.Println("got request to sync repository of different group. Skipping")
continue
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point, we should record a metric here, but feel free to hold off for a future PR.

internal/reconcilers/run_policy.go Outdated Show resolved Hide resolved
proto/mediator/v1/mediator.proto Outdated Show resolved Hide resolved
JAORMX and others added 11 commits September 25, 2023 08:32
This simply replaces the usages of the `github` provider checks and instead
provides a simple implementation of providers that will be expanded upon.

The intent is to leave the same functionality as-is, but instead rely on a
provider that's installed on every organization by default.

I've left traces of how the implementation will evolve by adding a definition
and a `providers_type` column to the providers database. These will serve as the means
of instantiating providers dynamically, instead of always building the github one
as we do today. But, to keep things short, I've left that out of this PR.
This removes the group information from entities in favor of
relying on providers for data integrity.
Co-Authored-By: Evan Anderson <evan@stacklok.com>
These prevent us from creating multiple objects which is not the desired outcome.
@JAORMX
Copy link
Contributor Author

JAORMX commented Sep 25, 2023

Rebased and addressed some of the comments.

Copy link
Member

@rdimitrov rdimitrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit, but noticed that on some places you missed using the providerError (and used its code representation) when searching for a provider.

Copy link
Contributor

@jhrozek jhrozek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think most if not all comments have been addressed and the PR is now large enough that it makes sense to just keep on going with incremental PRs.

@@ -115,7 +132,7 @@ CREATE TABLE signing_keys (
CREATE TABLE repositories (
id SERIAL PRIMARY KEY,
provider TEXT NOT NULL,
group_id INTEGER NOT NULL REFERENCES groups(id) ON DELETE CASCADE,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also a foreign key into the groups table, but we can rely on the transitive ON DELETE CASCADE from providers if we prefer.

@JAORMX JAORMX merged commit 625e1f8 into main Sep 25, 2023
13 checks passed
@JAORMX JAORMX deleted the providers-init branch September 25, 2023 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants