Skip to content

fix(backend): decode generic Git URL repo names#1389

Open
DivyamTalwar wants to merge 3 commits into
sourcebot-dev:mainfrom
DivyamTalwar:divyam/fix-generic-git-url-decoding
Open

fix(backend): decode generic Git URL repo names#1389
DivyamTalwar wants to merge 3 commits into
sourcebot-dev:mainfrom
DivyamTalwar:divyam/fix-generic-git-url-decoding

Conversation

@DivyamTalwar

@DivyamTalwar DivyamTalwar commented Jun 29, 2026

Copy link
Copy Markdown

Fixes #1384

Problem

Generic Git configs that point directly at an HTTP(S) URL kept percent-encoded path segments in the derived repo name, while file-based generic Git origins already decode them.

Root cause

compileGenericGitHostConfig_url built repoName from remoteUrl.pathname directly.

Solution

Decode valid URL-encoded pathname segments before deriving name, displayName, and zoekt metadata, with a safe fallback for malformed percent escapes.

Tests

  • node .yarn/releases/yarn-4.7.0.cjs workspace @sourcebot/backend test src/repoCompileUtils.test.ts
  • node .yarn/releases/yarn-4.7.0.cjs workspace @sourcebot/backend build

Risk

Low. The change is limited to generic Git URL repo-name derivation and preserves malformed pathnames instead of throwing.

Summary by CodeRabbit

  • Bug Fixes
    • Fixed repository naming for generic Git URL connections by decoding valid percent-encoded characters in remote URL pathnames so names and display names match expected values.
    • Prevented invalid or malformed URL-encoded segments from causing errors by preserving them as-is instead of altering them.
    • Ensured repository metadata remains consistent with the final displayed name across both valid and invalid encoding cases.
  • Tests
    • Added coverage for URL decoding success and malformed-encoding preservation scenarios.
  • Documentation
    • Updated the Unreleased changelog entry for this fix.

Decode valid percent-encoded path segments when deriving generic Git repo names from direct remote URLs, matching the existing file-origin path while preserving malformed escapes instead of throwing.\n\nConstraint: Direct generic Git URLs and file-origin remote URLs should derive consistent zoekt repo names.\nRejected: Decode with raw decodeURIComponent only | malformed percent escapes can throw during config compilation.\nConfidence: high\nScope-risk: narrow\nDirective: Keep repo-name derivation aligned with zoekt and local-origin generic Git behavior.\nTested: node .yarn/releases/yarn-4.7.0.cjs workspace @sourcebot/backend test src/repoCompileUtils.test.ts; node .yarn/releases/yarn-4.7.0.cjs workspace @sourcebot/backend build\nNot-tested: Full monorepo test suite
@coderabbitai

coderabbitai Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d0593470-a2e7-4035-8175-cc39f734c231

📥 Commits

Reviewing files that changed from the base of the PR and between 04dedab and 6b7e5ed.

📒 Files selected for processing (1)
  • CHANGELOG.md
✅ Files skipped from review due to trivial changes (1)
  • CHANGELOG.md

Walkthrough

A new helper safely decodes URL pathnames with fallback behavior. Both generic git host compilation paths use it, tests cover decoded and malformed inputs, and the changelog notes the fix.

Changes

Safe pathname decoding

Layer / File(s) Summary
decodePathname helper and call sites
packages/backend/src/repoCompileUtils.ts, CHANGELOG.md
Adds decodePathname(pathname) with a try/catch fallback, then uses it in both generic git host compilation paths instead of direct decodeURIComponent; the changelog notes the repo-name matching fix.
Tests for encoded and malformed pathnames
packages/backend/src/repoCompileUtils.test.ts
Two new compileGenericGitHostConfig_url tests assert that valid encoded characters are decoded into name/displayName/gitConfig, and that malformed sequences are preserved as-is.

Estimated code review effort: 1 (Trivial) | ~5 minutes

Possibly related PRs

  • sourcebot-dev/sourcebot#899: Modifies the same repoCompileUtils.ts flow to decode URL-encoded pathname components when deriving repo names from generic git host remotes.
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main backend change to decode generic Git URL repo names.
Linked Issues check ✅ Passed The code matches the issue by decoding direct HTTP(S) generic Git paths into name, displayName, and zoekt metadata, with safe fallback for malformed escapes.
Out of Scope Changes check ✅ Passed The changes stay on scope with a targeted backend fix, tests, and changelog update for the reported repo-name decoding bug.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

Document the generic Git URL repo-name fix in the Unreleased changelog after the PR number was assigned.\n\nConstraint: Sourcebot requires every PR to include a changelog entry with the PR link.\nConfidence: high\nScope-risk: narrow\nDirective: Keep changelog entries as follow-up commits once PR numbers exist.\nTested: not run; changelog-only change\nNot-tested: Runtime behavior
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Generic Git URL configs keep percent-encoded repo names

1 participant