fix(marketplace): enhance repository input parsing for GitLab subgroups and HTTPS URLs#1034
fix(marketplace): enhance repository input parsing for GitLab subgroups and HTTPS URLs#1034Antonin-Rouxel-LaPoste-BGPN wants to merge 5 commits intomicrosoft:mainfrom
Conversation
…ps and HTTPS URLs
There was a problem hiding this comment.
Pull request overview
Enhances apm marketplace add repository input parsing to support GitLab subgroup paths (N-segment owners) and full HTTP(S) repository URLs, addressing the limitation that previously only accepted 2- or 3-segment shorthands.
Changes:
- Replace rigid 2/3-segment parsing with a unified parser that supports N-segment subgroup paths and full HTTP(S) URLs.
- Update user-facing hint text to mention the HTTPS URL form.
- Add unit tests covering GitLab subgroup shorthand, HTTPS URL parsing,
.gitstripping, and rejection cases.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
src/apm_cli/commands/marketplace.py |
Implements the new unified parsing logic for marketplace repo inputs and updates CLI messaging. |
src/apm_cli/marketplace/errors.py |
Expands the “not registered” hint to mention full HTTPS URLs. |
tests/unit/marketplace/test_marketplace_commands.py |
Adds new test cases for subgroup shorthands, HTTPS URLs, .git stripping, and invalid inputs. |
| owner = "/".join(parts[1:-1]) | ||
| repo_name = parts[-1] | ||
| else: | ||
| # OWNER/.../REPO (no host prefix, any number of segments) | ||
| owner = "/".join(parts[:-1]) | ||
| repo_name = parts[-1] |
There was a problem hiding this comment.
This parser now allows multi-segment owner values even when the resolved host is the GitHub API backend (default github.com). Downstream, _github_contents_url() interpolates source.owner directly into /repos/{owner}/{repo}/..., so an owner containing / will generate a malformed/ambiguous API path and can accidentally target the wrong repo/path. Consider rejecting owner values containing / when resolved_host is a GitHub host (or otherwise ensuring the client builds URLs safely for multi-segment owners).
| if host and host.lower() != url_host: | ||
| logger.error( | ||
| f"Invalid host: '{parts[0]}'. " | ||
| f"Use 'OWNER/REPO' or 'HOST/OWNER/REPO' format." | ||
| f"Conflicting host: --host '{host}' vs '{url_host}' in URL." | ||
| ) | ||
| sys.exit(1) |
There was a problem hiding this comment.
Host comparisons use host.lower() without stripping whitespace (e.g. --host "gitlab.com "), which can raise a false "Conflicting host" error even though the normalized host is the same. Consider normalizing the --host value once up front (strip + lower) and using that for both conflict checks and later validation.
| """Register a marketplace from OWNER/REPO, HOST/OWNER/.../REPO, or a full HTTPS URL.""" | ||
| logger = CommandLogger("marketplace-add", verbose=verbose) |
There was a problem hiding this comment.
The CLI now accepts full HTTP(S) URLs and N-segment subgroup paths, but documentation still shows only OWNER/REPO and HOST/OWNER/REPO forms. Please update the relevant Starlight docs pages (e.g. docs/src/content/docs/guides/marketplaces.md, docs/src/content/docs/reference/cli-commands.md, docs/src/content/docs/guides/marketplace-authoring.md) and the APM guide resource (packages/apm-guide/.apm/skills/apm-usage/commands.md) to include the HTTPS URL form and subgroup examples.
| # ------------------------------------------------------------------ | ||
| # GitLab subgroup / deep-path support | ||
| # ------------------------------------------------------------------ | ||
|
|
||
| @patch("apm_cli.marketplace.client.fetch_marketplace") | ||
| @patch("apm_cli.marketplace.client._auto_detect_path") | ||
| def test_add_gitlab_subgroup_shorthand(self, mock_detect, mock_fetch, runner): | ||
| """HOST/group/subgroup/.../repo shorthand stores all intermediate segments in owner.""" | ||
| from apm_cli.commands.marketplace import marketplace |
There was a problem hiding this comment.
The new parsing behavior around host detection vs OWNER/REPO (especially for owners containing dots like foo.bar/repo) is not covered by tests here. Adding a regression test that asserts foo.bar/repo is treated as owner="foo.bar", repo="repo" (not as a host-prefixed shorthand) would prevent reintroducing the ambiguity fix.
|
@microsoft-github-policy-service agree company="La Poste Groupe" |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
APM Review Panel Verdict: REJECT
Required before merge (12 items)
Nits (11 items, skip if you want)
CEO arbitrationPR #1034 ships a genuinely valuable feature -- paste-from-browser HTTPS URLs and GitLab subgroup paths -- but it arrives with three categories of blocking defect that must be resolved before merge. First, and most critically, the implementation accepts (redacted) scheme URLs with no warning, no opt-in flag, and no documentation of the risk. Two panelists (DevX/UX and Supply Chain Security) flag this independently: from a UX standpoint it breaks the established --allow-insecure contract every other APM fetch path honors; from a security standpoint it is a direct MITM injection point for marketplace manifests. These two angles reinforce each other and are not in conflict -- the fix is the same: reject (redacted) at the parser boundary and document that only https:// is accepted. Second, the PR introduces a percent-encoding bypass in validate_path_segments ('%2E%2E' evades the '..' check) and multi-segment owner strings interpolated into API URLs without per-segment encoding, leaving '?', '#', and '@' characters as live URL-injection vectors. These are supply-chain correctness issues, not edge cases. Third, the 50-line parsing block embedded in the Click handler conflates command orchestration with URL/path resolution logic, making the new behavior untestable in isolation; extraction to _parse_repo_argument() with a defined return type is a structural prerequisite for the security fixes above, since it creates one auditable entry point. Two process gaps must close in the same PR: cli-commands.md must reflect the new HOST/OWNER/.../REPO and full-URL input forms (the docs currently show only the fixed 3-segment shape, making the new feature undiscoverable from help alone), and CHANGELOG.md must carry an [Unreleased] entry. Neither is cosmetic -- the docs gap means GitLab users who need subgroup support cannot find the syntax without reading source, and the CHANGELOG gap makes the feature invisible to evaluators deciding whether to upgrade. The recommended path forward is: (1) land the _parse_repo_argument() extraction first as it is the foundation for auditable security fixes; (2) add urllib.parse.unquote() normalization before validate_path_segments and per-segment urllib.parse.quote() encoding before URL construction; (3) restrict accepted schemes to https:// only, raising a clear actionable error for (redacted) (4) update cli-commands.md and CHANGELOG.md; (5) replace the placeholder '...' stub test bodies with real assertions. The feature strategy is correct and the growth signal is real -- ship it clean. Dissent resolved: No genuine inter-panelist disagreement exists. The (redacted) finding surfaces from both DevX/UX (UX contract) and Supply Chain Security (MITM risk) and the two rationales are complementary -- a single fix satisfies both. CLI Logging Expert's PathTraversalError message finding and Python Architect's structural extraction finding are also complementary: extraction to _parse_repo_argument() creates the natural site for a single user-readable error raise, resolving both concerns simultaneously. Growth/positioning note: The paste-from-browser UX (full HTTPS URL input) is a concrete, demonstrable hook for a 'GitLab teams: APM works for you' campaign. OSS Growth Hacker recommends a standalone social beat -- a tweet thread or dev.to post anchored to the MANIFESTO 'Portability over Vendor Lock-in' principle -- rather than bundling the feature into a release roundup. Schedule it as soon as the fixed PR merges. Per-persona findings (full)Python ArchitectclassDiagram
direction LR
class add {
<<IOBoundary>>
+repo: str
+name: str
+branch: str
+host: str
+verbose: bool
}
class MarketplaceSource {
<<ValueObject>>
+host: str
+owner: str
+repo: str
+branch: str
+name: str
}
class CommandLogger {
<<Base>>
+error()
+progress()
+complete()
}
class validate_path_segments {
<<Pure>>
}
class PathTraversalError {
<<Exception>>
}
class is_valid_fqdn {
<<Pure>>
}
class default_host {
<<Pure>>
}
class fetch_marketplace {
<<IOBoundary>>
}
class _auto_detect_path {
<<IOBoundary>>
}
class MarketplaceNotFoundError {
<<Exception>>
}
add --> CommandLogger : uses
add --> MarketplaceSource : constructs
add --> fetch_marketplace : calls
add --> _auto_detect_path : calls
add --> validate_path_segments : calls
add --> is_valid_fqdn : calls
add --> default_host : calls
validate_path_segments ..> PathTraversalError : raises
MarketplaceNotFoundError --> MarketplaceSource : references
class add:::touched
class PathTraversalError:::touched
class validate_path_segments:::touched
class MarketplaceNotFoundError:::touched
classDef touched fill:#ffe0b2,stroke:#e65100
flowchart TD
A([CLI: apm marketplace add REPO]) --> B[add -- marketplace.py:368]
B --> C{starts with https:// ?}
C -- yes --> D[urlparse repo_input pure parse]
D --> E{is_valid_fqdn url_host ?}
E -- no --> ERR1[logger.error + sys.exit 1]
E -- yes --> F{--host conflicts?}
F -- yes --> ERR2[logger.error + sys.exit 1]
F -- no --> G[split path to owner and repo_name]
C -- no --> H[split by slash to parts]
H --> I{len parts ge 3 and is_valid_fqdn parts0 ?}
I -- yes --> J[resolved_host=parts0 owner=join parts1 to -1 repo_name=parts-1]
I -- no --> K[owner=join parts to -1 repo_name=parts-1]
G --> L[validate_path_segments owner and repo_name]
J --> L
K --> L
L -- PathTraversalError --> ERR3[logger.error str exc + sys.exit 1]
L -- ok --> M{resolved_host is None?}
M -- yes --> N{--host flag set?}
N -- yes --> O[is_valid_fqdn host check]
O -- invalid --> ERR4[logger.error + sys.exit 1]
O -- valid --> P[resolved_host = normalized host]
N -- no --> Q[resolved_host = default_host FS read env/config]
M -- no --> R[validate --name flag via _is_valid_alias]
P --> R
Q --> R
R -- invalid --> ERR5[logger.error + sys.exit 1]
R -- ok --> S[_auto_detect_path probe_source NET HTTP probe]
S -- None --> ERR6[logger.error + sys.exit 1]
S -- path --> T[fetch_marketplace fetch_source NET download JSON]
T --> U[three-tier alias resolution: --name > manifest.name > repo_name]
U --> V[add_marketplace source FS write registry]
V --> W([logger.complete exit 0])
Design patterns
Required findings: (see aggregated list above) Nits:
CLI Logging ExpertRequired findings: (see aggregated list above) Nits:
DevX UX ExpertRequired findings: (see aggregated list above) Nits:
Supply Chain Security ExpertRequired findings: (see aggregated list above) Nits:
Auth ExpertInactive -- PR changes marketplace source URL parsing only and does not modify AuthResolver, token management, or credential resolution logic (touched files: src/apm_cli/commands/marketplace.py, src/apm_cli/marketplace/errors.py, tests only). OSS Growth HackerRequired findings: (see aggregated list above) Nits:
Side-channel: This is the right feature to anchor a 'GitLab teams: APM works for you' tweet thread or dev.to post. The paste-from-browser UX (full HTTPS URL) is the concrete, demonstrable hook. Recommend scheduling this as a standalone social beat rather than bundling it into a release roundup. Verdict computed deterministically: 12 required findings across 5 active panelists. APPROVE iff N == 0. Push a new commit to clear this verdict label automatically. Note 🔒 Integrity filter blocked 2 itemsThe following items were blocked because they don't meet the GitHub integrity level.
To allow these resources, lower tools:
github:
min-integrity: approved # merged | approved | unapproved | none
|
…nce for improved validation Co-authored-by: Copilot <copilot@github.com>
Note
Closes #1027
TL;DR
apm marketplace addwas limited to exactly 2 or 3 path segments, making it impossible to register a GitLab marketplace hosted under subgroups (e.g.mycompany/myorg/specs-and-standards/repo). This PR replaces the rigid segment counter with a unified parser that accepts any N-segment shorthand path and full HTTPS URLs.Problem
Why the old parser failed
The previous implementation split the input on
/and checkedlen(parts) == 2orlen(parts) == 3. Any deeper path — which GitLab subgroups require — was rejected with "Expected 'OWNER/REPO'".Two separate root causes were reported in #1027:
OWNER/REPO(2 parts) orHOST/OWNER/REPO(3 parts) were accepted.Important
The
MarketplaceSourcemodel already storesowneras a plain string with no segment-count constraint, so the fix is entirely in the parser; no schema migration is required.Approach
OWNER/REPOHOST/OWNER/REPOHOST/group/sub/.../REPOOWNER/group/sub/.../REPOhttps://host/group/sub/.../repo[.git]The rule is: everything except the last segment is
owner; the last segment isrepo. This mirrors howdependency/reference.pyalready handles generic hosts forapm add(see_resolve_shorthand_to_parsed_url), giving both surfaces consistent behaviour.Implementation
Files changed
src/apm_cli/commands/marketplace.py—addcommand parserReplaced the 2/3-segment
if/elif/elseblock with a three-branch strategy:Path-traversal sequences (
..,.) in the parsedownerandrepo_nameare validated through the existingvalidate_path_segmentsguard (required by the path-security rules incopilot-instructions.md). Conflicting--hostflags are still caught in all branches.src/apm_cli/marketplace/errors.py—MarketplaceNotFoundErrorUpdated the user-facing hint to mention the HTTPS URL form.
tests/unit/marketplace/test_marketplace_commands.pyAdded 6 new test cases to
TestMarketplaceAdd:HOST/group/sub/.../repo).gitsuffix stripping--hostflag with HTTPS URLFlow diagram
The diagram below shows the updated parse strategy inside the
addcommand.flowchart TD A(["apm marketplace add INPUT"]) --> B{"starts with https://?"} B -- yes --> C["urlparse / strip .git"] C --> D{"path segments >= 2?"} D -- no --> ERR1(["error: need OWNER/REPO"]) D -- yes --> E["owner = all-but-last\nrepo = last segment"] B -- no --> F["split on / filter empty"] F --> G{"segments >= 2?"} G -- no --> ERR2(["error: need OWNER/REPO"]) G -- yes --> H{"first segment is valid FQDN?"} H -- yes --> I{"segments >= 3?"} I -- no --> ERR3(["error: HOST/OWNER/REPO required"]) I -- yes --> J["host = first\nowner = middle segments\nrepo = last"] H -- no --> K["owner = all-but-last\nrepo = last segment"] E & J & K --> L["validate_path_segments\nowner + repo_name"] L --> M["resolve --host flag\nor default_host"] M --> N(["_auto_detect_path + register"])Trade-offs
owneris a multi-segment string —MarketplaceSource.ownermay now be"mycompany/myorg/specs-and-standards". The model already stored it as a plain string and the_github_contents_urlbuilder inlines it directly, so the API URL is assembled correctly. No other callers were found to assume a single-segment owner.MarketplaceSourcestorable viaapm marketplace add.http://accepted alongsidehttps://— kept for parity withdependency/reference.py; production usage is expected to be HTTPS-only.Validation
21/21 unit tests pass
Full unit suite — 6656 tests, 0 failures
Live invocation with a 4-segment subgroup path
Parser correctly sets
owner = "solutions-distributeurs/yz_-alf_framework/sandbox",repo = "github-copilot-agents",host = "gitlab.udd.net.intra.laposte.fr". (Fetch aborted manually — no live GitLab API support yet.)How to test
uv run pytest tests/unit/marketplace/test_marketplace_commands.py -v— all 21 tests greenuv run apm marketplace add gitlab.com/myorg/subgroup/my-marketplace --name my-mkt— parses cleanly, fails at fetch (expected without a live GitLab API)uv run apm marketplace add https://gitlab.com/myorg/subgroup/my-marketplace.git --name my-mkt— same result,.gitstrippeduv run apm marketplace add gitlab.com/myorg/repo --host github.com— exits with "Conflicting host" erroruv run apm marketplace add gitlab.com/myorg/../evil/repo— exits with traversal-rejection error