OCPBUGS-82140: Remove PII from events#16365
OCPBUGS-82140: Remove PII from events#16365openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
Conversation
|
@logonoff: This pull request references Jira Issue OCPBUGS-82140, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
abafca2 to
66ac767
Compare
📝 WalkthroughWalkthroughThis pull request refactors the telemetry system to improve data privacy and payload shape consistency. The changes replace the 🚥 Pre-merge checks | ✅ 12✅ Passed checks (12 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
66ac767 to
eaa633f
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
frontend/packages/console-shared/src/hooks/useTelemetry.ts (1)
39-42: Harden email-domain extraction normalization.At Line 40-41, consider using the last
@, trim, and lowercase to avoid malformed/variant domains creating inconsistent telemetry values.Optional refinement
const getEmailDomainFromEmail = (email: string): string => { - const emailParts = email.split('@'); - return emailParts.length > 1 ? emailParts[1] : ''; + const atIndex = email.lastIndexOf('@'); + return atIndex >= 0 ? email.slice(atIndex + 1).trim().toLowerCase() : ''; };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@frontend/packages/console-shared/src/hooks/useTelemetry.ts` around lines 39 - 42, The getEmailDomainFromEmail function can return inconsistent domains for malformed emails; update it to locate the last '@' (use lastIndexOf) and, if found, take the substring after that character, then trim() and toLowerCase() the result, returning an empty string for missing or blank inputs; reference getEmailDomainFromEmail to implement this normalization so telemetry always receives a consistent, lowercased domain value.frontend/packages/console-shared/src/hooks/__tests__/useTelemetry.spec.ts (1)
205-226: Add a DEVSANDBOX edge-case test for missing annotation.Please add a case where
toolchain.dev.openshift.com/sso-user-idis absent, and assert behavior stays valid (no invalid identifier propagation). This protects the newsandboxUserIdcontract end-to-end.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@frontend/packages/console-shared/src/hooks/__tests__/useTelemetry.spec.ts` around lines 205 - 226, Add a new test in useTelemetry.spec.ts that mirrors the existing DEVSANDBOX test but sets the cluster annotations so the key "toolchain.dev.openshift.com/sso-user-id" is absent (e.g., telemetry DEVSANDBOX = 'true' and annotations = {} or omit the sso-user-id), call updateClusterPropertiesFromTests(), renderHook(() => useTelemetry()), fire the telemetry event (fireTelemetryEvent('...')), and assert listener was called once and the payload does NOT include sandboxUserId (i.e., expect(listener).toHaveBeenCalledWith('test', { ...exampleReturnValue, clusterType: 'DEVSANDBOX', consoleVersion: 'x.y.z' }) without sandboxUserId) to ensure no invalid identifier is propagated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@frontend/packages/console-shared/src/hooks/useTelemetry.ts`:
- Around line 55-57: clusterProperties.accountMailDomain is set from
window.SERVER_FLAGS.telemetry.ACCOUNT_EMAIL but during operator rollout the key
may be ACCOUNT_MAIL; update the assignment in useTelemetry.ts to fall back to
telemetry.ACCOUNT_MAIL when telemetry.ACCOUNT_EMAIL is missing or empty so
getEmailDomainFromEmail always receives a string (e.g., use
telemetry.ACCOUNT_EMAIL || telemetry.ACCOUNT_MAIL || ''), ensuring
getEmailDomainFromEmail and clusterProperties.accountMailDomain remain
backward-compatible during the operator transition.
In `@frontend/packages/console-telemetry-plugin/src/listeners/segment.ts`:
- Around line 67-71: The DEVSANDBOX branch sets processedUserId directly to the
optional sandboxUserId which can be empty; update the branch so processedUserId
uses sandboxUserId when present otherwise falls back to the anonymized user id
by calling anonymizeId(userId). Locate the logic around getClusterProperties(),
processedUserId, and sandboxUserId and ensure analytics.identify receives a
non-empty id by defaulting to await anonymizeId(userId) when sandboxUserId is
falsy.
---
Nitpick comments:
In `@frontend/packages/console-shared/src/hooks/__tests__/useTelemetry.spec.ts`:
- Around line 205-226: Add a new test in useTelemetry.spec.ts that mirrors the
existing DEVSANDBOX test but sets the cluster annotations so the key
"toolchain.dev.openshift.com/sso-user-id" is absent (e.g., telemetry DEVSANDBOX
= 'true' and annotations = {} or omit the sso-user-id), call
updateClusterPropertiesFromTests(), renderHook(() => useTelemetry()), fire the
telemetry event (fireTelemetryEvent('...')), and assert listener was called once
and the payload does NOT include sandboxUserId (i.e.,
expect(listener).toHaveBeenCalledWith('test', { ...exampleReturnValue,
clusterType: 'DEVSANDBOX', consoleVersion: 'x.y.z' }) without sandboxUserId) to
ensure no invalid identifier is propagated.
In `@frontend/packages/console-shared/src/hooks/useTelemetry.ts`:
- Around line 39-42: The getEmailDomainFromEmail function can return
inconsistent domains for malformed emails; update it to locate the last '@' (use
lastIndexOf) and, if found, take the substring after that character, then trim()
and toLowerCase() the result, returning an empty string for missing or blank
inputs; reference getEmailDomainFromEmail to implement this normalization so
telemetry always receives a consistent, lowercased domain value.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: d070b91d-69fa-433a-9e43-062457610533
📒 Files selected for processing (3)
frontend/packages/console-shared/src/hooks/__tests__/useTelemetry.spec.tsfrontend/packages/console-shared/src/hooks/useTelemetry.tsfrontend/packages/console-telemetry-plugin/src/listeners/segment.ts
📜 Review details
🧰 Additional context used
🔀 Multi-repo context openshift/console-operator
[::openshift/console-operator::pkg/console/operator/operator.go:104] — operator struct defines accountMail field accountMail string.
[::openshift/console-operator::pkg/console/operator/sync_v400.go:474-481] — telemetry flow: calls telemetry.GetOrganizationMeta(...) returning organizationID, accountMail, refreshCache and sets co.trackables.accountMail = accountMail and telemetryConfig["ACCOUNT_MAIL"] = accountMail.
Implication: this operator produces telemetry config using key "ACCOUNT_MAIL" and the full email value. The PR changes in frontend replace accountMail with accountMailDomain and expect anonymized domain-derived value (and server flag might be named ACCOUNT_EMAIL), and remove sending full user resource. The operator code above is a direct producer of the "ACCOUNT_MAIL" value and therefore is a relevant cross-repo consumer/producer to verify for compatibility with the PR.
🔇 Additional comments (3)
frontend/packages/console-telemetry-plugin/src/listeners/segment.ts (1)
48-48: Good payload minimization in identify flow.Line 48 cleanly drops
userResourcefrom listener properties and keeps onlysandboxUserId, which aligns with the PII-reduction objective.frontend/packages/console-shared/src/hooks/__tests__/useTelemetry.spec.ts (1)
352-378: Strong privacy regression test coverage.This test correctly verifies
accountMailDomainand explicitly asserts thatuser,userResource, andaccountMailare not emitted.frontend/packages/console-shared/src/hooks/useTelemetry.ts (1)
77-89: Good replacement of full user resource with scoped sandbox identifier.
injectSandboxPropertiesis a solid privacy improvement and keeps only the required sandbox identifier in DEVSANDBOX paths.
| if (getClusterProperties().clusterType === 'DEVSANDBOX') { | ||
| processedUserId = | ||
| userResource?.metadata?.annotations?.['toolchain.dev.openshift.com/sso-user-id']; | ||
| processedUserId = sandboxUserId; | ||
| } else { | ||
| processedUserId = await anonymizeId(userId); | ||
| } |
There was a problem hiding this comment.
Add fallback when sandbox ID is missing in DEVSANDBOX identify path.
At Line 68, processedUserId is set directly from optional sandboxUserId. If the annotation is absent, analytics.identify (Line 81) receives an invalid identifier. Fall back to anonymized userId when sandboxUserId is empty.
Suggested fix
- if (getClusterProperties().clusterType === 'DEVSANDBOX') {
- processedUserId = sandboxUserId;
- } else {
- processedUserId = await anonymizeId(userId);
- }
+ if (getClusterProperties().clusterType === 'DEVSANDBOX' && sandboxUserId) {
+ processedUserId = sandboxUserId;
+ } else {
+ processedUserId = await anonymizeId(userId);
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (getClusterProperties().clusterType === 'DEVSANDBOX') { | |
| processedUserId = | |
| userResource?.metadata?.annotations?.['toolchain.dev.openshift.com/sso-user-id']; | |
| processedUserId = sandboxUserId; | |
| } else { | |
| processedUserId = await anonymizeId(userId); | |
| } | |
| if (getClusterProperties().clusterType === 'DEVSANDBOX' && sandboxUserId) { | |
| processedUserId = sandboxUserId; | |
| } else { | |
| processedUserId = await anonymizeId(userId); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@frontend/packages/console-telemetry-plugin/src/listeners/segment.ts` around
lines 67 - 71, The DEVSANDBOX branch sets processedUserId directly to the
optional sandboxUserId which can be empty; update the branch so processedUserId
uses sandboxUserId when present otherwise falls back to the anonymized user id
by calling anonymizeId(userId). Locate the logic around getClusterProperties(),
processedUserId, and sandboxUserId and ensure analytics.identify receives a
non-empty id by defaulting to await anonymizeId(userId) when sandboxUserId is
falsy.
eaa633f to
6c8670b
Compare
|
/jira refresh |
|
@logonoff: This pull request references Jira Issue OCPBUGS-82140, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
| window.SERVER_FLAGS.releaseVersion || window.SERVER_FLAGS.consoleVersion; | ||
| clusterProperties.organizationId = window.SERVER_FLAGS.telemetry?.ORGANIZATION_ID; | ||
| clusterProperties.accountMail = window.SERVER_FLAGS.telemetry?.ACCOUNT_MAIL; | ||
| clusterProperties.accountMailDomain = getEmailDomain(window.SERVER_FLAGS.telemetry?.ACCOUNT_MAIL); |
There was a problem hiding this comment.
Shouldn't we use some email parser to avoid processing strings which are not proper email addresses?
There was a problem hiding this comment.
I'd like to avoid introducing new dependencies for this given the need for a backport. This is also already good enough for non-emails, e.g., redhat.com or matt@hicks@redhat.com this would return an empty string
c75bb3a to
c76346e
Compare
c76346e to
6187709
Compare
|
/cherry-pick release-4.22 |
|
@logonoff: once the present PR merges, I will cherry-pick it on top of DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: logonoff, spadgett The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/verified by @Leo6Leo After the fix,
|
|
@Leo6Leo: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/test all |
|
@logonoff: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
@logonoff: Jira Issue Verification Checks: Jira Issue OCPBUGS-82140 Jira Issue OCPBUGS-82140 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@logonoff: new pull request created: #16393 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@logonoff: #16365 failed to apply on top of branch "release-4.21": DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@logonoff: #16365 failed to apply on top of branch "release-4.20": DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Fix included in release 5.0.0-0.nightly-2026-05-02-104839 |
Analysis / Root cause:
accountMailis too much PII and the entire user resource is being sent to segment.tsSolution description:
Strip the username from the email and only send the sandbox user ID instead of the whole resource
Screenshots / screen recording:
n/a
Test setup:
Test cases:
useranduserResourceobjects are not sent everaccountMailDomainis sent instead ofaccountMail, which only sends the content after the@symbol instead of the whole emailSummary by CodeRabbit
New Features
Bug Fixes