fix: allow tilde in repository path components for Bitbucket DC personal repos (#1375)#1377
Conversation
…nal repos Bitbucket Data Center / Server uses '~username' as the URL prefix for personal user repositories (e.g. https://example.com/scm/~myuser/repo.git). The non-ADO path-component whitelist rejected '~', causing 'apm install' to fail with 'Invalid repository path component: ~myuser'. Added '~' to the allowed character set at all three validation sites (_resolve_shorthand_dependency, _validate_url_repo_path, _validate_final_repo_fields). Tilde is an RFC 3986 unreserved character, so this adds no traversal risk. The '$bad' rejection test still passes, confirming the whitelist remains tight against injection characters. Fixes #1375 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR updates dependency reference parsing so Bitbucket Data Center / Server personal repository paths containing ~username are accepted on non-Azure-DevOps git hosts.
Changes:
- Allows
~in non-ADO repository path segment validation. - Adds regression coverage for HTTPS and FQDN shorthand Bitbucket personal repo references.
- Adds an Unreleased changelog entry for #1375.
Show a summary per file
| File | Description |
|---|---|
src/apm_cli/models/dependency/reference.py |
Expands non-ADO path component validation to include ~. |
tests/unit/test_generic_git_urls.py |
Adds positive parser tests for Bitbucket personal repo URLs and shorthand. |
CHANGELOG.md |
Documents the Bitbucket personal repository parsing fix. |
Copilot's findings
- Files reviewed: 3/3 changed files
- Comments generated: 2
| # ``~`` is allowed on non-ADO hosts to support Bitbucket Data Center / | ||
| # Server personal repository URLs (``/scm/~username/repo.git``). | ||
| allowed_pattern = r"^[a-zA-Z0-9._\- ]+$" if is_ado_host else r"^[a-zA-Z0-9._~-]+$" |
| def test_bitbucket_personal_repo_tilde_url(self): | ||
| """Bitbucket Data Center personal repos use ``~username`` path segments.""" | ||
| dep = DependencyReference.parse("https://example.com/scm/~myuser/my-apm-repo.git") | ||
| assert dep.host == "example.com" | ||
| assert dep.repo_url == "scm/~myuser/my-apm-repo" | ||
| assert dep.is_virtual is False | ||
|
|
||
| def test_bitbucket_personal_repo_tilde_shorthand(self): | ||
| """Tilde-prefixed user segment is also valid in FQDN shorthand form.""" | ||
| dep = DependencyReference.parse("example.com/scm/~myuser/my-apm-repo") | ||
| assert dep.host == "example.com" | ||
| assert dep.repo_url == "scm/~myuser/my-apm-repo" | ||
|
|
APM Review Panel:
|
| Persona | B | R | N | Takeaway |
|---|---|---|---|---|
| Python Architect | 0 | 1 | 1 | Correct fix; the tripled regex literal is a DRY smell that should be extracted to a module-level constant or helper. |
| CLI Logging Expert | 0 | 0 | 1 | Error messages are pre-existing and unchanged; no CLI logging regression. Nit: messages could hint at allowed chars. |
| DevX UX Expert | 0 | 1 | 1 | Bug-fix restores expected behavior for Bitbucket DC users; docs should update the path-segment grammar to avoid future surprise. |
| Supply Chain Security Expert | 0 | 0 | 1 | Tilde addition is safe: no shell-injection vector, no traversal primitive, RFC 3986 unreserved char; ship. |
| OSS Growth Hacker | 0 | 1 | 2 | Good CHANGELOG entry closes the reporter loop; a one-line Bitbucket DC example in the private-packages doc would compound enterprise discoverability. |
| Auth Expert | 0 | 0 | 1 | Tilde in path components does not affect host classification or token routing; generic-host credential-fill is the expected auth path for Bitbucket DC. |
| Doc Writer | 0 | 2 | 1 | CHANGELOG entry is clear and in-voice; reference/manifest-schema.md owner/repo regex now drifts from implementation, and no doc shows the /scm/~user/ form users would search for. |
| Test Coverage Expert | 0 | 1 | 1 | HTTPS and shorthand tilde paths well-covered; missing regression traps for SSH tilde form and ADO tilde rejection asymmetry. |
B = blocking-severity findings, R = recommended, N = nits.
Counts are signal strength, not gates. The maintainer ships.
Top 4 follow-ups
- [DevX UX Expert] Update reference/manifest-schema.md path-component regex to include tilde for non-ADO hosts -- Docs-as-contract violation: three panelists flagged the same drift. Users treating the spec table as authoritative will believe tilde is invalid. Fix in-PR or immediate follow-up.
- [Test Coverage Expert] Add regression test asserting ADO URLs reject tilde in path segments -- Guards the security-critical asymmetry between ADO and generic-host whitelists. Missing test on a secure_by_default surface.
- [Python Architect] Extract duplicated path-segment regex to a module-level constant (NON_ADO_PATH_SEGMENT_RE / ADO_PATH_SEGMENT_RE) -- Same literal repeated 3x triggers APM's DRY extraction threshold. Prevents silent charset divergence on future edits.
- [OSS Growth Hacker] Add Bitbucket DC personal-repo (~user) example to private-and-org-packages doc page -- Enterprise evaluators searching 'APM Bitbucket personal repo' land on this page. One example line compounds discoverability.
Architecture
classDiagram
direction LR
class DependencyReference {
<<Parser>>
+parse(dep_str) DependencyReference
-_parse_full_url_reference()
-_validate_url_repo_path()
-_resolve_ado_parts()
}
class path_security {
<<Module>>
+validate_path_segments(path_str, context)
+ensure_path_within(base, target)
}
class re {
<<stdlib>>
+match(pattern, string)
}
DependencyReference ..> path_security : calls validate_path_segments
DependencyReference ..> re : inline allowed_pattern match x3
class DependencyReference:::touched
classDef touched fill:#fff3b0,stroke:#d47600
flowchart TD
A["User input: dep string"] --> B["DependencyReference.parse()"]
B --> C{"URL or shorthand?"}
C -->|full URL| D["_parse_full_url_reference()"]
C -->|shorthand| E["_resolve_ado_parts()"]
D --> F["_validate_url_repo_path()"]
D --> G["validate_path_segments()"]
D --> H["re.match(allowed_pattern)<br/>lines 1094, 1209, 1318"]
F --> G
F --> H
E --> G
E --> H
H -->|no match| I["raise ValueError"]
H -->|match| J["Return parsed DependencyReference"]
G -->|traversal| I
Recommendation
Ship the fix as-is -- it is correct, secure, and well-tested for the primary paths. Track the docs-charset update as the highest-priority follow-up (ideally same release); the ADO regression-trap test and regex extraction are good-hygiene items for the next PR.
Full per-persona findings
Python Architect
- [recommended] Same regex pattern duplicated 3 times in one file -- extract to a module-level constant or helper at
src/apm_cli/models/dependency/reference.py:1094
APM design philosophy: 'Abstract when 3+ call sites share the same logic pattern.' Patternr"^[a-zA-Z0-9._~-]+$"now at lines 1094, 1209, 1318 with identical semantics. Future character-set change requires editing 3 sites that can silently diverge. Extract to a constant pair (NON_ADO_PATH_SEGMENT_RE / ADO_PATH_SEGMENT_RE) or_validate_path_componenthelper. Also removes 3x comment duplication. - [nit] Consider co-locating the character-set check with validate_path_segments in path_security.py
path_security.py already owns traversal validation. A_validate_path_charshelper there would keep the entire path-validation contract in one module.
CLI Logging Expert
- [nit] Rejection messages do not tell the user which characters ARE allowed at
src/apm_cli/models/dependency/reference.py:1098
Pre-existing -- the PR does not regress it. A follow-up could interpolate the allowed set ('only alphanumeric, dot, underscore, hyphen, and tilde are allowed').
DevX UX Expert
- [recommended] manifest-schema.md documents the path-component charset as [a-zA-Z0-9.-]+ but code now allows tilde at
docs/src/content/docs/reference/manifest-schema.md:285
Line 285 of docs/src/content/docs/reference/manifest-schema.md defines owner/repo as '2+ path segments of [a-zA-Z0-9.-]+'. After this PR the actual accepted charset for non-ADO hosts is [a-zA-Z0-9._~-]+. Docs-as-contract violation. - [nit] Consider mentioning Sourcehut tilde-prefix convention alongside Bitbucket DC in the code comment at
src/apm_cli/models/dependency/reference.py:1092
Sourcehut (sr.ht) also uses ~username paths. Broadening the comment to mention RFC 3986 unreserved + Sourcehut would prevent a future contributor from wondering whether this is over- or under-scoped.
Supply Chain Security Expert
- [nit] Add a brief inline comment at the regex explaining WHY tilde is safe from a security perspective.
Future reviewers will benefit from a one-liner noting that ~ is not a POSIX path traversal token and all subprocess calls use list-form.
OSS Growth Hacker
- [recommended] Add a Bitbucket DC personal-repo example to the private-and-org-packages doc page at
docs/src/content/docs/consumer/private-and-org-packages.md
The private-and-org-packages doc already has Bitbucket team-project examples but no personal-repo (~user) example. Enterprise evaluators searching 'APM Bitbucket personal repo' will land here. - [nit] CHANGELOG could mention Sourcehut as incidentally unblocked
- [nit] Reply to issue [BUG] Tilde causes invalid repository path component error #1375 with release-note phrasing when shipping
Auth Expert
- [nit] org extraction from repo_url may yield tilde-prefixed org like ~myuser for per-org PAT lookup
Per-org PAT lookup is GitHub-only and skipped for generic hosts. No credential mis-routing.
Doc Writer
- [recommended] reference/manifest-schema.md owner/repo pattern '[a-zA-Z0-9._-]+' contradicts the new whitelist; '~' is now valid on non-Azure-DevOps hosts but the spec table still excludes it. at
docs/src/content/docs/reference/manifest-schema.md:285 - [recommended] No dependency-reference example shows the Bitbucket DC personal-repo form '/scm/~user/repo.git' that this PR enables; users will not discover the now-supported syntax. at
packages/apm-guide/.apm/skills/apm-usage/dependencies.md - [nit] CHANGELOG bullet is clear, accurate, technical, in voice.
Test Coverage Expert
- [recommended] No test for ADO URLs rejecting tilde (defense-in-depth asymmetry guarantee) at
tests/unit/test_generic_git_urls.py
If someone accidentally unifies the regexes later, no test would fail.
Proof (missing at):tests/unit/test_generic_git_urls.py - [nit] No test for SSH-form Bitbucket DC personal repos with tilde
Same regex code path as HTTPS, just different entry. SSH forms parse correctly manually-verified.
This panel is advisory. It does not block merge. Re-apply the panel-review label after addressing feedback to re-run.
Convergent recommendations from the advisory panel on PR #1377: - python-architect: extract the 3x duplicated path-segment regex literal to module-level constants (_ADO_PATH_SEGMENT_RE / _NON_ADO_PATH_SEGMENT_RE) plus a _path_segment_pattern helper. Prevents silent charset divergence on future edits and removes 3x comment duplication. - devx-ux + doc-writer: widen reference/manifest-schema.md owner/repo grammar to document the actual non-ADO charset [a-zA-Z0-9._~-]+ and call out the ADO asymmetry (spaces allowed, tilde not). - oss-growth + doc-writer: add a Bitbucket Data Center personal-repo (~user) example to private-and-org-packages.md and dependencies.md. - test-coverage: add test_ado_rejects_tilde_in_repo_path regression trap for the secure_by_default ADO/non-ADO whitelist asymmetry. - oss-growth nit: CHANGELOG now mentions Sourcehut as incidentally unblocked. All three call sites in reference.py now share one constant; behavior unchanged. 8650 tests pass, lint silent. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Addresses Copilot PR review comment on PR #1377: extend tilde coverage to SCP shorthand (git@host:path) and ssh:// URL forms with a custom port. Same code path as the HTTPS tests, but guards the documented SSH transport that Bitbucket DC commonly emits. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Addressed both Copilot review comments in commit 5949045:
Lint silent, 8650+ unit tests pass. |
…de-in-repo-path # Conflicts: # CHANGELOG.md
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* chore: cut 0.14.0 Renames the [Unreleased] block in CHANGELOG.md to [0.14.0] - 2026-05-18 and bumps the package version from 0.13.0 to 0.14.0 in pyproject.toml (and uv.lock by regeneration). 0.14.0 ships the producer-experience epic (#1348) on the CLI side -- notably: - apm pack --check-versions / --check-clean (#1365), the release gates consumed by apm-action mode: release. - apm plugin init (#1370), the noun-verb successor to apm init --plugin. - apm pack multi-format outputs (--marketplace, --marketplace-path, --json, marketplace.outputs map form) (#1317). - New producer docs corpus (repo-shapes / releasing-from-any-ci / versioning-strategies) (#1370). - Breaking: MCP registry client adopts the official v0.1 spec; self- hosted registries must serve /v0.1/ paths (#1337). Plus the deprecations and fixes already listed in the moved block. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs(changelog): tighten v0.14.0 entries; add post-cut PRs - One concise line per PR answering 'so what?' for end users - Add 5 missing entries: #1376 (perf resolver), #1373 (shared/apm.md matrix secret-stripping), #1246 (install.ps1 GHES env vars), #1255 (warn missing apm.yml), #1248 (extends:org unmanaged_files) - Drop internal/CI/test-infra entries (#1270, #1271, #1272, #1274, #1276, #1291, #1360 refactor) - Consolidate three #605 lines and four #1317 lines into one entry per PR where appropriate - Promote MCP Registry v0.1 to a dedicated Breaking section Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs(changelog): add #1377 Bitbucket DC tilde fix Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Daniel Meppiel <copilot-rework@github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
fix: allow tilde in repository path components for Bitbucket DC personal repos
TL;DR
apm installrejects valid Bitbucket Data Center / Server personal-repository URLs (e.g.https://example.com/scm/~myuser/repo.git) withInvalid repository path component: ~myuser. Bitbucket DC uses~usernameas the canonical path prefix for personal user projects on both HTTP and SSH clones. This PR adds~to the non-Azure-DevOps path-component whitelist at all three validation sites.~is an RFC 3986 unreserved character, so the injection-character guarantee that backs the existing$badrejection test is preserved.Problem (WHY)
https://example.com/scm/path/~myuser/my-apm-repo.gitinapm.ymlandapm installfails with[x] Failed to parse apm.yml: Invalid APM dependency '...': Invalid repository path component: ~myuser./scm/~<username>/<repo>.git(HTTP) andssh://git@host/~<username>/<repo>.git(SSH). Verified against a real Bitbucket Server webhook payload published intektoncd/triggers/examples/v1beta1/bitbucket-server/README.md:"href": "http://localhost:7990/scm/~test/helloworld.git". GitHub code search returns ~240 public matches for the literal/scm/~clone pattern.^[a-zA-Z0-9._-]+$for non-Azure-DevOps hosts.~is not in the set, so any Bitbucket DC personal repo is unreachable.Approach (WHAT)
Single-character whitelist extension at the three sites that validate per-segment characters on non-ADO hosts. No new code paths, no host-specific carve-out.
~is RFC 3986 unreserved, so it adds no traversal or injection surface beyond whatvalidate_path_segmentsalready covers.host/user/repo)src/apm_cli/models/dependency/reference.py:1092src/apm_cli/models/dependency/reference.py:1205src/apm_cli/models/dependency/reference.py:1312repo_urlreturned to callerImplementation (HOW)
src/apm_cli/models/dependency/reference.py-- three sibling regex updates from^[a-zA-Z0-9._-]+$to^[a-zA-Z0-9._~-]+$on the non-ADO branch only. ADO retains its existing whitelist (ADO repo/project naming rules do not include~). One inline comment per site explains the Bitbucket DC motivation so the next reader does not strip~as "weird".tests/unit/test_generic_git_urls.py-- two positive tests:test_bitbucket_personal_repo_tilde_urlcovers the reporter's HTTPS URL verbatim.test_bitbucket_personal_repo_tilde_shorthandcovers the FQDN shorthand form.test_invalid_characters_rejected(which rejects$) is preserved unchanged, confirming the whitelist remains tight.CHANGELOG.md-- one Unreleased / Fixed bullet referencing [BUG] Tilde causes invalid repository path component error #1375.Diagram
Decision flow for path-component validation, with the NEW behavior boxed.
flowchart LR A["URL path part<br/>e.g. ~myuser"] --> B{"is_ado_host?"} B -- yes --> C["match<br/>^[a-zA-Z0-9._\- ]+$"] B -- no --> D["match<br/>^[a-zA-Z0-9._~-]+$"] C --> E{match?} D --> E E -- yes --> F["accept"] E -- no --> G["raise ValueError:<br/>Invalid repository<br/>path component"] style D fill:#d4f4dd,stroke:#1f883d,stroke-width:2pxTrade-offs
~behind ais_bitbucket_dc_hostnamecheck, but Bitbucket DC is self-hosted on arbitrary FQDNs (no fixed domain to detect). A blanket~allowance on non-ADO hosts is simpler and matches RFC 3986. Cost: a non-Bitbucket host that happens to also use~segments gets implicit support -- not a security concern because~cannot construct path traversal.~, and the ADO whitelist (which already allows space) is tightly tied to ADO's own validation surface. No reason to drift it.Benefits
$-rejection test is untouched, so the security posture against injection characters is provably unchanged.Validation
Full CI mirror (lint + unit tests) is green locally.
Lint (CI Lint job mirror)
Unit tests -- full suite
Reproduces the reporter's failure mode -- before and after
Before (main):
After (this branch):
Scenario evidence
apm installaccept ittests/unit/test_generic_git_urls.py::TestSecurityValidation::test_bitbucket_personal_repo_tilde_urltests/unit/test_generic_git_urls.py::TestSecurityValidation::test_bitbucket_personal_repo_tilde_shorthandtests/unit/test_generic_git_urls.py::TestSecurityValidation::test_invalid_characters_rejected(preserved)How to test
uv run --extra dev pytest tests/unit/test_generic_git_urls.py::TestSecurityValidation -v. The two new tilde tests pass; the$badrejection test still passes.apm.yml:apm install. Parsing now succeeds (the fetch will fail becauseexample.comis not a real Bitbucket host -- that is expected and confirms we got past the parser).~:DependencyReference.parse("https://dev.azure.com/org/proj/_git/~bad")should still raise.Fixes #1375
Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com