MINOR: Address review comments from #27236#27255
Conversation
- Extract URL credential sanitization to generic `sanitize_url_credentials` in logger utils - Fix misleading log prefix `GitHubCloneReader::_clone` → `_clone_repo` - Add BigQuery INFORMATION_SCHEMA context to `split_table_name` comment - Add unit tests for `sanitize_url_credentials` Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Code Review 👍 Approved with suggestions 0 resolved / 1 findingsAddresses review comments with minor cleanup and improvements. Consider updating the URL credential sanitization regex to handle http:// schemes in addition to https://. 💡 Edge Case: sanitize_url_credentials does not handle http:// URLs📄 ingestion/src/metadata/utils/logger.py:360 The regex Suggested fix🤖 Prompt for agentsOptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
There was a problem hiding this comment.
Pull request overview
This PR refactors URL credential redaction into a shared ingestion logging utility, updates clone-related error log labeling to match the shared _clone_repo helper, clarifies a BigQuery INFORMATION_SCHEMA edge case comment in FQN parsing, and adds unit tests for the new sanitizer.
Changes:
- Added
sanitize_url_credentials()inmetadata.utils.loggerand unit tests for common credential-in-URL formats. - Updated Looker repo-clone error handling to use the shared sanitizer and a more accurate log prefix.
- Expanded
split_table_namecommentary to document BigQueryINFORMATION_SCHEMAmulti-part table name behavior.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| ingestion/tests/unit/utils/test_logger.py | Adds unit tests validating URL credential sanitization behavior. |
| ingestion/src/metadata/utils/logger.py | Introduces sanitize_url_credentials() utility alongside existing redaction helpers. |
| ingestion/src/metadata/utils/fqn.py | Improves inline documentation for multi-part BigQuery table names. |
| ingestion/src/metadata/ingestion/source/dashboard/looker/utils.py | Switches clone error sanitization to the shared utility and fixes the log prefix. |
| """Mask credentials embedded in URLs (e.g., https://token@host)""" | ||
| return re.sub(r"https://[^@]+@", "https://****@", message) |
|
🟡 Playwright Results — all passed (26 flaky)✅ 3597 passed · ❌ 0 failed · 🟡 26 flaky · ⏭️ 207 skipped
🟡 26 flaky test(s) (passed on retry)
How to debug locally# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip # view trace |



Summary
sanitize_url_credentials()utility inlogger.py(alongside existingredacted_config)GitHubCloneReader::_clone→_clone_reposince the function is shared across GitHub/GitLab/Bitbucket/Azure DevOpssplit_table_namecomment infqn.pysanitize_url_credentialsTest plan
test_sanitize_url_credentialspasses (PAT, oauth-basic, token-auth, and no-URL cases)test_clone_repo_error_does_not_leak_credentialsandtest_clone_repo_error_sanitizes_all_credential_formatsstill pass🤖 Generated with Claude Code