Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

authz/github: repo-centric perms sync from team/org perms caches #24328

Merged
merged 38 commits into from Aug 31, 2021

Conversation

bobheadxi
Copy link
Member

@bobheadxi bobheadxi commented Aug 25, 2021

Implements caching of groups permissions for repo-centric permissions sync. Follows up on the GitHub user-centric perms sync caching introduced in #23978 - see that PR for more information.

Repo sync now lists direct collaborators to a repo before querying for the organization and teams for users with access to this repo, and caching them as groups.

Group caches now includes both repositories and users. In general, the expectation is that:

  1. User-centric sync will use and populate only Repositories in the cache
  2. Repo-centric sync will use and populate only Users in the cache
  3. If either comes across a prepopulated value for a field it is not meant to populate, it will attempt to add itself to the list and update the cache

Token-wise, verified that all this needs is repo and read:org

To review

  • Bulk of real code is in authz/github/github.go
  • Some API requests might be worth reading in internal/extsvc/github/v3.go
  • Expanded integration tests in enterprise/cmd/repo-updater/internal/authz/integration_test.go
    • The recorded interactions here make it pretty visible how inefficient this is compared to just listing for users * repos < 5000 * 100 - in most smaller cases it is better to use the cacheless implementation. An argument for making this opt-in only.

Subsequent patches

(if we cut a patch release with this, we must include the following as well)

Comment on lines 22 to 31
// Repositories entities associated with this group has access to.
//
// This should ONLY be populated on a USER-centric sync, but may be appended to if
// already populated.
Repositories []extsvc.RepoID
// Users associated with this group
//
// This should ONLY be populated on a REPO-centric sync, but maybe to appended to if
// already populated.
Users []extsvc.AccountID
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. cache values might get very large
  2. there might be a better way to do this, but I'm opting to focus on getting the functionality right and improving the implementation in a separate pass

@bobheadxi bobheadxi requested a review from unknwon August 25, 2021 21:43
Comment on lines +88 to +89
group.Repositories = nil
group.Users = nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing in-place modification within a method call for a passed-in object to abandon it feels too magic in the long term, especially when the modification is not doing cache invalidation (ie. we are not saving en empty object but delete the key entirely).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would improved docstring help, or rename this to invalidateGroup?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we just stop using the group object all the call sites? I mean cachedGroups is and should ideally only purely for cache management.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using group makes dealing with this a lot more convenient I think, and helps reduce a lot of repetition and things that can go wrong (e.g. forgetting to set Repos and Users or creating a new Group when invalidating a cache)

cachedGroups is and should ideally only purely for cache management.

What I'm thinking here is ensuring the cache values being used in the code is valid also falls under pure cache management 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's improve the docstring then, not sure how we could improve elsewhere top off my head.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does 471682e look?

internal/authz/github/client.go Show resolved Hide resolved
return userIDs, nil
}

// Check if repo belongs in an org
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also factor rest of the body into a helper method/function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

74b953d starts this, and also makes the code mirror the user perms syncing a bit more

@bobheadxi bobheadxi changed the title authz/github: repo perms syncing authz/github: improved repo-centric perms sync from teams/orgs perms caches Aug 26, 2021
@bobheadxi bobheadxi changed the title authz/github: improved repo-centric perms sync from teams/orgs perms caches authz/github: faster repo-centric perms sync from teams/orgs perms caches Aug 26, 2021
@bobheadxi bobheadxi changed the title authz/github: faster repo-centric perms sync from teams/orgs perms caches authz/github: faster repo-centric perms sync from team/org perms caches Aug 26, 2021
@bobheadxi bobheadxi force-pushed the authz/github-repo-perms-caching branch from 0dc737d to e4136b4 Compare August 26, 2021 18:58
@bobheadxi bobheadxi force-pushed the authz/github-repo-perms-caching branch from d9f2c29 to 69b10f2 Compare August 26, 2021 19:37
@bobheadxi bobheadxi marked this pull request as ready for review August 26, 2021 22:30
@bobheadxi bobheadxi requested a review from unknwon August 26, 2021 22:30
@sourcegraph-bot
Copy link
Contributor

sourcegraph-bot commented Aug 26, 2021

Notifying subscribers in CODENOTIFY files for diff 82d260b...5af46f9.

Notify File(s)
@christinaforney doc/admin/external_service/github.md
doc/admin/repo/permissions.md
@eseliger internal/extsvc/github/common.go
internal/extsvc/github/common_test.go
internal/extsvc/github/testdata/golden/ListRepositoryTeams
internal/extsvc/github/testdata/vcr/GetOrganization.yaml
internal/extsvc/github/testdata/vcr/ListRepositoryCollaborators_direct-collaborator-outside-collaborator.yaml
internal/extsvc/github/testdata/vcr/ListRepositoryCollaborators_direct-collaborator-repo-owner.yaml
internal/extsvc/github/testdata/vcr/ListRepositoryCollaborators_private-repo.yaml
internal/extsvc/github/testdata/vcr/ListRepositoryCollaborators_public-repo.yaml
internal/extsvc/github/testdata/vcr/ListRepositoryTeams.yaml
internal/extsvc/github/testdata/vcr/TestListMembers/org_admins.yaml
internal/extsvc/github/testdata/vcr/TestListMembers/org_members.yaml
internal/extsvc/github/testdata/vcr/TestListMembers/team_members.yaml
internal/extsvc/github/v3.go
internal/extsvc/github/v3_test.go
@indradhanush enterprise/cmd/repo-updater/internal/authz/integration_test.go
enterprise/cmd/repo-updater/internal/authz/testdata/vcr/Integration_GitHubPermissions.yaml
enterprise/cmd/repo-updater/internal/authz/testdata/vcr/TestIntegration_GitHubPermissions/repo-centric/groups-enabled.yaml
enterprise/cmd/repo-updater/internal/authz/testdata/vcr/TestIntegration_GitHubPermissions/repo-centric/no-groups.yaml
enterprise/cmd/repo-updater/internal/authz/testdata/vcr/TestIntegration_GitHubPermissions/user-centric/groups-enabled.yaml
enterprise/cmd/repo-updater/internal/authz/testdata/vcr/TestIntegration_GitHubPermissions/user-centric/no-groups.yaml
@sourcegraph/distribution doc/admin/external_service/github.md
doc/admin/repo/permissions.md
@unknwon enterprise/cmd/frontend/internal/auth/githuboauth/provider.go
enterprise/cmd/repo-updater/internal/authz/integration_test.go
enterprise/cmd/repo-updater/internal/authz/testdata/vcr/Integration_GitHubPermissions.yaml
enterprise/cmd/repo-updater/internal/authz/testdata/vcr/TestIntegration_GitHubPermissions/repo-centric/groups-enabled.yaml
enterprise/cmd/repo-updater/internal/authz/testdata/vcr/TestIntegration_GitHubPermissions/repo-centric/no-groups.yaml
enterprise/cmd/repo-updater/internal/authz/testdata/vcr/TestIntegration_GitHubPermissions/user-centric/groups-enabled.yaml
enterprise/cmd/repo-updater/internal/authz/testdata/vcr/TestIntegration_GitHubPermissions/user-centric/no-groups.yaml
internal/authz/github/cache.go
internal/authz/github/client.go
internal/authz/github/github.go
internal/authz/github/github_test.go

@bobheadxi bobheadxi changed the title authz/github: faster repo-centric perms sync from team/org perms caches authz/github: repo-centric perms sync from team/org perms caches Aug 26, 2021
@bobheadxi bobheadxi force-pushed the authz/github-repo-perms-caching branch from d127b34 to 68f4b3e Compare August 30, 2021 14:44
internal/authz/github/client.go Outdated Show resolved Hide resolved
// Just use cache if available and not invalidated
addUserToRepoPerms(group.Users...)
// Perform partial cache update to repositories iff non-empty
if len(group.Repositories) > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure this if is redundant, it is OK to loop over a slice with zero element.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to change this around a bit, after testing (7a90087) I realized this wasn't behaving correctly. The code now looks like this:

// If this is a partial cache, add self to group
if len(group.Users) > 0 {
hasUser := false
for _, user := range group.Users {
if user == accountID {
hasUser = true
break
}
}
if !hasUser {
group.Users = append(group.Users, accountID)
p.groupsCache.setGroup(group)
}
}

Since the whole block is unnecessary if the field being checked for is empty, the check is still useful in denoting that

@bobheadxi bobheadxi requested a review from a team August 30, 2021 22:00
Co-authored-by: Geoffrey Gilmore <geoffrey@sourcegraph.com>
@bobheadxi bobheadxi merged commit fa91928 into main Aug 31, 2021
@bobheadxi bobheadxi deleted the authz/github-repo-perms-caching branch August 31, 2021 15:47
ggilmore added a commit that referenced this pull request Sep 1, 2021
)

Implements caching of groups permissions for repo-centric permissions sync. Follows up on the GitHub user-centric perms sync caching introduced in #23978. Repo sync now lists direct collaborators to a repo before querying for the organization and teams for users with access to this repo, and caching them as groups.

Also adds `allowGroupsPermissionsSync` to the GitHub OAuth provider config, which requests the additional `read:repo` scope required to enable `authorization.groupsCacheTTL` in GitHub code hosts.

Co-authored-by: ᴜɴᴋɴᴡᴏɴ <joe@sourcegraph.com>
Co-authored-by: Geoffrey Gilmore <geoffrey@sourcegraph.com>
@bobheadxi bobheadxi self-assigned this Sep 1, 2021
bobheadxi added a commit that referenced this pull request Sep 3, 2021
… enabled (#24561)

allowGroupsPermissionsSync, introduced in #24328, is actually a prerequisite to enabling groupsCacheTTL. This change checks if allowGroupsPermissionsSync is enabled, and if not, forcibly sets groupsCacheTTL to nil and reports a warning.

This is a breaking change, but it is currently flagged as an experimental feature and opt-in only, so will stick to a changelog item.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants