feat(bulk-import): add support for on-hehalf-of user access#2647
feat(bulk-import): add support for on-hehalf-of user access#2647PatAKnight merged 11 commits intoredhat-developer:mainfrom
Conversation
Review Summary by QodoImplement "On Behalf of User Access" for Bulk Import plugin with OAuth token forwarding
WalkthroughsDescription• Implements "On Behalf of User Access" feature enabling the Bulk Import plugin to fetch repository and organization listings using signed-in user's OAuth tokens instead of server-wide credentials • **Backend changes:** - New /api/bulk-import/scm-hosts endpoint to discover configured GitHub and GitLab integration hosts - Added x-scm-tokens header support for forwarding user OAuth tokens to backend API calls - Implemented parseScmTokensHeader and extractUserTokens functions with 4KB size limit validation - Created unified GitApiService interface for consistent GitHub and GitLab provider abstraction - Updated GitHub and GitLab services to accept optional userTokens parameter and create user-scoped API instances without cache to prevent cross-user data leakage - Refactored repository, organization, and import handlers to use GitApiService interface - Extracted shared SCM types to unified module for code reuse • **Frontend changes:** - Implemented useAsync hook in useRepositories to fetch SCM hosts and collect user OAuth tokens via ScmAuthApi - Updated API client to include X-SCM-Tokens header when user tokens are available - Added getSCMHosts() method to BulkImportAPI type - Refactored path providers to extend BulkImportRESTPathProviderBase with proper URL encoding using URLSearchParams • **Graceful fallback:** When OAuth tokens are unavailable or providers not configured, backend silently falls back to server-side credentials with no errors • **Comprehensive test coverage:** Added tests for SCM host discovery, token collection, header validation, and user token paths in both GitHub and GitLab services • **Full documentation:** Updated README files, OpenAPI schemas, and API documentation for both backend and frontend Diagramflowchart LR
User["User Signs In"]
Frontend["Frontend Plugin"]
ScmAuth["ScmAuthApi"]
Backend["Backend Router"]
GithubSvc["GitHub Service"]
GitlabSvc["GitLab Service"]
GithubAPI["GitHub API"]
GitlabAPI["GitLab API"]
User -->|Authenticates| Frontend
Frontend -->|Discovers Hosts| Backend
Backend -->|Returns SCM Hosts| Frontend
Frontend -->|Collects Tokens| ScmAuth
ScmAuth -->|User OAuth Tokens| Frontend
Frontend -->|Forwards Tokens<br/>x-scm-tokens Header| Backend
Backend -->|Routes to Services| GithubSvc
Backend -->|Routes to Services| GitlabSvc
GithubSvc -->|User-Scoped Request| GithubAPI
GitlabSvc -->|User-Scoped Request| GitlabAPI
GithubAPI -->|User Repositories| GithubSvc
GitlabAPI -->|User Repositories| GitlabSvc
GithubSvc -->|Results| Backend
GitlabSvc -->|Results| Backend
Backend -->|Filtered Results| Frontend
File Changes1. workspaces/bulk-import/plugins/bulk-import-backend/src/generated/openapi.d.ts
|
Code Review by Qodo
1. Repo list falls back server
|
| // When a user token is present, build a user-scoped Octokit per integration | ||
| // config and call addGithubTokenRepositories with it. This makes | ||
| // listForAuthenticatedUser return only repos the user personally has access to. | ||
| // When absent, fall through to the existing server-credential path. | ||
| if (userTokens && Object.keys(userTokens).length > 0) { | ||
| const { errors: allErrors, totalCount } = | ||
| await this.fetchUserTokenGithubIntegrationRepositories( | ||
| userTokens, | ||
| (userOctokit, userCredential, dataFetchErrors) => | ||
| addGithubTokenRepositories( | ||
| { logger: this.logger }, | ||
| userOctokit, | ||
| userCredential, | ||
| repositories, | ||
| dataFetchErrors, | ||
| { search, pageNumber, pageSize }, | ||
| ), | ||
| ); | ||
| const repoList = Array.from(repositories.values()); | ||
| return { repositories: repoList, errors: allErrors, totalCount }; | ||
| } | ||
|
|
||
| const result = await fetchFromAllIntegrations( | ||
| { | ||
| logger: this.logger, |
There was a problem hiding this comment.
1. Repo list falls back server 📎 Requirement gap ⛨ Security
When no X-SCM-Tokens header is provided (e.g., ScmAuthApi not registered), the frontend omits user tokens and the backend falls through to server-wide integration credentials for repository listing. This can return repositories beyond what the currently logged-in user can access, violating the user-scoped listing requirement.
Agent Prompt
## Issue description
Repository listing can fall back to server-wide integration credentials when `X-SCM-Tokens` is absent/empty, which can expose repositories outside the current user's access.
## Issue Context
Compliance requires the Bulk Import GitHub repo list to be scoped to the currently logged-in user. Today, user tokens are optional and the backend falls back to server credentials when tokens are unavailable.
## Fix Focus Areas
- workspaces/bulk-import/plugins/bulk-import/src/hooks/useRepositories.ts[73-106]
- workspaces/bulk-import/plugins/bulk-import/src/api/BulkImportBackendClient.ts[118-137]
- workspaces/bulk-import/plugins/bulk-import-backend/src/github/githubApiService.ts[373-409]
- workspaces/bulk-import/plugins/bulk-import-backend/src/service/router.ts[138-176]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| - **`scm-hosts-read`**: tracks `GET` requests to the `/scm-hosts` endpoint, which returns the list of configured GitHub and GitLab integration host URLs. | ||
|
|
||
| - **`org-read`**: tracks `GET` requests to the `/organizations` endpoint, which returns the list of organizations accessible from all configured SCM Integrations (GitHub and GitLab). |
There was a problem hiding this comment.
2. /scm-hosts audit event mismatch 📎 Requirement gap ⚙ Maintainability
Backend documentation says /scm-hosts emits scm-hosts-read, but the router actually emits the event id scm-read. This makes the documentation inaccurate for the new behavior.
Agent Prompt
## Issue description
The README documents the `/scm-hosts` audit event id as `scm-hosts-read`, but the implementation emits `scm-read`.
## Issue Context
This PR adds the `/scm-hosts` endpoint and updates documentation; the documentation must accurately reflect emitted audit events.
## Fix Focus Areas
- workspaces/bulk-import/plugins/bulk-import-backend/README.md[303-306]
- workspaces/bulk-import/plugins/bulk-import-backend/src/service/router.ts[805-807]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| function extractUserTokens( | ||
| headers: | ||
| | Paths.FindAllRepositories.HeaderParameters | ||
| | Paths.FindRepositoriesByOrganization.HeaderParameters, | ||
| logger: LoggerService, | ||
| ): Record<string, string> | undefined { | ||
| const raw = headers['x-scm-tokens'] as string | undefined; | ||
| delete headers['x-scm-tokens']; // consumed; strip before any logging path | ||
| return parseScmTokensHeader(raw, logger); | ||
| } |
There was a problem hiding this comment.
3. Tokens logged in audits 🐞 Bug ⛨ Security
Audit events are created with the full Express request object, so the new x-scm-tokens header (containing user OAuth tokens) can be persisted in audit logs/events. The header is only deleted from a copied header object, leaving req.headers unchanged when auditing occurs.
Agent Prompt
### Issue description
`x-scm-tokens` contains user OAuth access tokens but is currently included in the Express `req` that is passed into `auditCreateEvent`, which can persist credentials in audit logs.
### Issue Context
The current code deletes `x-scm-tokens` only from a copied headers object (not `req.headers`) and does so inside specific handlers, after the router-level audit middleware already created the audit event.
### Fix Focus Areas
- workspaces/bulk-import/plugins/bulk-import-backend/src/service/router.ts[131-176]
- workspaces/bulk-import/plugins/bulk-import-backend/src/service/router.ts[752-787]
### Implementation guidance
- Parse and validate `x-scm-tokens` in an early middleware that runs **before** `permissionCheck` and before the audit middleware.
- After extracting, immediately `delete req.headers['x-scm-tokens']` (and also handle the case where it may be `string[]`).
- Store the parsed token map in a safe per-request location (e.g., `res.locals.scmTokens` or a typed request extension) and use that in the repository/org-repo handlers instead of reading from headers later.
- Ensure audit events never contain the raw token string.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| xSCMTokensHeaderParam: | ||
| in: header | ||
| name: x-scm-tokens | ||
| description: > | ||
| Optional JSON-encoded map of SCM host URL to user authentication token. | ||
| Used to fetch repositories on behalf of the user for each configured SCM host. | ||
| required: false | ||
| schema: | ||
| $ref: '#/components/schemas/SCMTokenMap' |
There was a problem hiding this comment.
4. Openapi header type mismatch 🐞 Bug ✓ Correctness
The OpenAPI spec defines x-scm-tokens as an object schema (SCMTokenMap), but the implementation expects a JSON-encoded string header and JSON.parse’s it. With OpenAPIBackend validation enabled, sending a string header where the spec expects an object can cause validationFail (400) and break on-behalf-of requests.
Agent Prompt
### Issue description
The OpenAPI schema models `x-scm-tokens` as an `object` (`SCMTokenMap`), but HTTP headers are strings and the server expects a JSON-encoded string and parses it with `JSON.parse`. With OpenAPIBackend request validation enabled, this schema mismatch can cause 400 validation failures when the header is present.
### Issue Context
The backend explicitly parses the header as JSON text; the schema should reflect that wire format.
### Fix Focus Areas
- workspaces/bulk-import/plugins/bulk-import-backend/src/schema/openapi.yaml[613-621]
- workspaces/bulk-import/plugins/bulk-import-backend/src/schema/openapi.yaml[716-731]
- workspaces/bulk-import/plugins/bulk-import-backend/src/generated/openapidocument.ts[1031-1039]
- workspaces/bulk-import/plugins/bulk-import-backend/src/generated/openapi.d.ts[30-41]
### Implementation guidance
- Change `xSCMTokensHeaderParam.schema` to `type: string` (and update description/example to indicate it is JSON).
- Optionally keep `SCMTokenMap` schema for documentation purposes, but do not reference it as a header parameter schema unless you also implement OpenAPI object header serialization.
- Regenerate `openapidocument.ts` and `openapi.d.ts` so generated types match the actual string header.
- Keep server-side JSON parsing/validation as-is (or move it earlier per the audit fix).
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| const tokenGeneration = useMemo( | ||
| () => | ||
| scmAuthTokens | ||
| ? Object.entries(scmAuthTokens) | ||
| .sort(([a], [b]) => a.localeCompare(b)) | ||
| .map(([k, v]) => `${k}=${v}`) | ||
| .join(',') | ||
| : '', | ||
| [scmAuthTokens], | ||
| ); | ||
|
|
||
| const { | ||
| data: value, | ||
| error, | ||
| isLoading: isQueryLoading, | ||
| } = useQuery( | ||
| [options?.showOrganizations ? 'organizations' : 'repositories', options], | ||
| [ | ||
| options?.showOrganizations ? 'organizations' : 'repositories', | ||
| options, | ||
| tokenGeneration, | ||
| ], |
There was a problem hiding this comment.
5. Token in query key 🐞 Bug ⛨ Security
The frontend includes raw OAuth token values in the React Query key, which can surface in React Query devtools and other debugging/telemetry that logs query keys. This increases the risk of accidental credential exposure in the browser.
Agent Prompt
### Issue description
OAuth token values are embedded in the React Query key (`tokenGeneration`), which can be exposed via devtools/debugging/logging of query keys.
### Issue Context
The query key only needs to change when the *effective auth context* changes; it should not contain secrets.
### Fix Focus Areas
- workspaces/bulk-import/plugins/bulk-import/src/hooks/useRepositories.ts[116-143]
### Implementation guidance
- Replace `tokenGeneration` with a non-secret discriminator, e.g.:
- list of host URLs only: `Object.keys(scmAuthTokens ?? {}).sort().join(',')`
- or a stable boolean/enum: `hasScmAuthTokens` plus `approvalTool`
- or a one-way hash that does not include raw tokens (if you truly need to detect token changes)
- Ensure the query still refetches appropriately when tokens become available (you already use `enabled: !tokenLoading`).
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
fafcf3c to
32a88b2
Compare
|
Important This PR includes changes that affect public-facing API. Please ensure you are adding/updating documentation for new features or behavior. Changed Packages
|
1ca662a to
534b652
Compare
AndrienkoAleksandr
left a comment
There was a problem hiding this comment.
Tested with "scaffolder", "orchestrator" and "open-pull-requests". It works.
Signed-off-by: Patrick Knight <pknight@redhat.com>
Signed-off-by: Patrick Knight <pknight@redhat.com>
Signed-off-by: Patrick Knight <pknight@redhat.com>
Signed-off-by: Patrick Knight <pknight@redhat.com>
Signed-off-by: Patrick Knight <pknight@redhat.com>
Signed-off-by: Patrick Knight <pknight@redhat.com>
Signed-off-by: Patrick Knight <pknight@redhat.com>
Signed-off-by: Patrick Knight <pknight@redhat.com>
Signed-off-by: Patrick Knight <pknight@redhat.com>
Signed-off-by: Patrick Knight <pknight@redhat.com>
…tActionPath Signed-off-by: Patrick Knight <pknight@redhat.com>
|



Hey, I just made a Pull Request!
Bulk Import now loads repository and organization listings on behalf of the signed-in user. Listing calls use the user’s OAuth token for each configured source code management (SCM) host (GitHub and GitLab), so results match that user’s access instead of the server-wide integration identity (GitHub App, PAT, or GitLab token).
Listing is a hard requirement on user OAuth:
GET /repositoriesandGET /organizations/{org}/repositoriesrequire a validX-SCM-Tokensheader (see below). There is no fallback to integration credentials for those endpoints—requests without a usable token map are rejected with HTTP 401. Deployments that previously relied on integration-only listing must configure SCM auth providers and registerScmAuthApi(see plugin READMEs).How it works
Discover SCM hosts — The frontend calls
GET /api/bulk-import/scm-hosts, which returns the configured GitHub and GitLab integration base URLs.Collect user tokens — For each host, the frontend uses Backstage’s
ScmAuthApi(@backstage/integration-react) to request the signed-in user’s OAuth token for that host (read-oriented scope).Forward tokens to the backend — Tokens are sent as a JSON map keyed by integration base URL (e.g.
{ "https://github.com": "…" }) in theX-SCM-Tokensrequest header on listing calls. The header is required for the repository listing endpoints; invalid JSON, an empty map, or an oversized header results in 401.Backend calls GitHub/GitLab as the user — The backend resolves the token per host and uses it when calling the GitHub or GitLab API so listings are scoped to the user.
Security / audit — The
X-SCM-Tokensheader is removed from the request before permission checks and audit event creation so token values are not written to audit logs.Which issue(s) does this PR fix
✔️ Checklist
Documentation
plugins/bulk-import/README.md(including auth provider setup).plugins/bulk-import-backend/README.mdand generated API docs underplugins/bulk-import-backend/api-docs/.plugins/bulk-import-backend/src/schema/openapi.yaml.How to test
/scm-hosts— CallGET /api/bulk-import/scm-hosts(curl or Swagger). Confirm configured GitHub and, if applicable, GitLab hosts appear in the response.Happy path (GitHub or GitLab) — Configure the matching SCM OAuth provider and sign in. Open Bulk Import → repository selection. Confirm the list reflects repos the user can access (including cases where that differs from what the server integration alone could see).
GitLab parity — With GitLab integration + GitLab auth configured, repeat listing flows and confirm GitLab responses use the user token the same way as GitHub.
Hard requirement — Omit
X-SCM-Tokensor send an empty object on a listing request (e.g. direct API call); expect 401. With the UI, misconfigured or missingScmAuthApi/ OAuth should surface a clear error instead of silently using integration credentials for listing.Cross-user isolation (optional) — Sign in as two users with different SCM access; each listing should only include repositories that user is authorized to see.