Skip to content

Fix: apm install https://... no longer falls back to SSH on generic Git hosts#995

Merged
danielmeppiel merged 4 commits intomainfrom
copilot/fix-apm-install-https-ssh-issue
Apr 27, 2026
Merged

Fix: apm install https://... no longer falls back to SSH on generic Git hosts#995
danielmeppiel merged 4 commits intomainfrom
copilot/fix-apm-install-https-ssh-issue

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 27, 2026

Description

apm install https://corp-bitbucket.example/... was silently retrying SSH on port 22 when the HTTPS validation probe failed, masking the real HTTPS error (auth/redirect) behind a 30s SSH timeout. Verbose logging only surfaced the last attempt, so the misleading SSH error was all the user saw.

The _validate_package_exists pre-check was the only remaining place still doing cross-protocol fallback for explicit URLs — _clone_with_fallback / TransportSelector are already strict-by-default. This change brings validation in line.

  • src/apm_cli/install/validation.py — strict probe order on generic hosts:
    • explicit http:// / https:// → web URL only
    • explicit ssh:// (incl. SCP shorthand git@host:org/repo) → SSH only
    • shorthand (no scheme) → unchanged legacy [ssh, https] chain
    • APM_ALLOW_PROTOCOL_FALLBACK=1 restores the legacy chain (mirrors clone).
  • Verbose logging — each ls-remote attempt is logged with its scheme and stderr scrubbed via _sanitize_git_error, so the actual failing leg is visible:
    Trying git ls-remote for bitbucket.example.internal (1 attempt)
    git ls-remote (https) rc=128: fatal: Authentication failed for ...
    
  • Tests
    • new tests/unit/install/test_validation_strict_transport.py: explicit https/http/ssh strict mode, shorthand legacy chain, env opt-in, per-attempt verbose log assertions.
    • updated tests/unit/test_install_command.py::TestGenericHostSshFirstValidation to the strict-by-default contract; added coverage for the APM_ALLOW_PROTOCOL_FALLBACK=1 opt-in.
  • CHANGELOG.md — entry under ## [Unreleased] / Fixed.

Designed and ratified via the apm-review-panel skill (Auth Expert + Python Architect + CLI Logging + DevX UX + Supply Chain Security + OSS Growth + CEO).

Type of change

  • Bug fix
  • New feature
  • Documentation
  • Maintenance / refactor

Testing

  • Tested locally
  • All existing tests pass (5783 unit tests)
  • Added tests for new functionality (if applicable)

…neric hosts

Agent-Logs-Url: https://github.com/microsoft/apm/sessions/9a44cd2a-e60a-403e-bba2-214190c29d8b

Co-authored-by: danielmeppiel <51440732+danielmeppiel@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix APM install command for HTTPS URLs failing with SSH access Fix: apm install https://... no longer falls back to SSH on generic Git hosts Apr 27, 2026
Copilot AI requested a review from danielmeppiel April 27, 2026 14:16
@danielmeppiel danielmeppiel marked this pull request as ready for review April 27, 2026 14:38
Copilot AI review requested due to automatic review settings April 27, 2026 14:38
@danielmeppiel danielmeppiel added the panel-review Trigger the apm-review-panel gh-aw workflow label Apr 27, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Aligns _validate_package_exists with the install clone path by making transport probing strict-by-default for explicit URL schemes on generic Git hosts, preventing misleading SSH fallbacks and improving verbose diagnostics.

Changes:

  • Enforces strict transport probing for explicit http://, https://, and ssh:// URLs on generic hosts, with APM_ALLOW_PROTOCOL_FALLBACK=1 to restore the legacy chain.
  • Improves verbose output by logging each git ls-remote attempt with scheme and sanitized stderr.
  • Adds/updates unit tests and records the behavior change in the changelog.
Show a summary per file
File Description
src/apm_cli/install/validation.py Implements strict probe ordering for explicit schemes on generic hosts and logs per-attempt ls-remote results.
tests/unit/install/test_validation_strict_transport.py New focused tests for strict transport selection + per-attempt verbose logging.
tests/unit/test_install_command.py Updates existing generic-host validation tests to match the strict-by-default contract and env opt-in fallback.
CHANGELOG.md Adds an Unreleased/Fixed entry documenting the behavior change and verbose logging improvements.

Copilot's findings

Comments suppressed due to low confidence (1)

tests/unit/test_install_command.py:952

  • This assertion any("git@git.example.org:" in arg for arg in first_cmd) is a URL substring check and is likely to trigger CodeQL's py/incomplete-url-substring-sanitization (the repo has prior fixes to avoid this). Prefer asserting the exact URL argument value for the SSH probe instead of substring matching.
        first_cmd = mock_run.call_args_list[0][0][0]
        assert any("git@git.example.org:" in arg for arg in first_cmd)
  • Files reviewed: 4/4 changed files
  • Comments generated: 2

Comment on lines +927 to +930
first_cmd = mock_run.call_args_list[0][0][0]
assert any("git@git.example.org:" in arg for arg in first_cmd), (
f"Expected SSH URL in only call, got: {first_cmd}"
)
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid substring-matching on an SSH URL argument ("git@git.example.org:" in arg). This pattern is flagged by CodeQL as py/incomplete-url-substring-sanitization in this repo. Prefer asserting exact equality to the expected SCP-style URL argument (or otherwise assert on a parsed/structured form) so CI doesn't fail on security checks.

This issue also appears on line 951 of the same file.

Copilot uses AI. Check for mistakes.
Comment on lines +16 to +17
import pytest

Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pytest is imported but not used in this test module. If linting (ruff/flake8) is enabled in CI, this will fail; please remove the unused import (or use it).

Suggested change
import pytest

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Copy Markdown

APM Review Panel Verdict

Disposition: APPROVE (with one minor pre-merge cleanup)


Per-persona findings

Python Architect: This is a routine bug fix -- one functional change confined to _validate_package_exists in validation.py, plus tests. No new abstract classes, registries, or hierarchy changes. Two mermaid blocks (no Before/After pairs needed for a single-function scope change).

OO / class diagram

classDiagram
    direction LR
    class validation {
      <<IOBoundary>>
      +_validate_package_exists(package, verbose, auth_resolver, logger) bool
      +_local_path_failure_reason(dep_ref) str
    }
    class DependencyReference {
      <<ValueObject>>
      +explicit_scheme: Optional[str]
      +is_insecure: bool
      +host: str
      +repo_url: str
    }
    class TransportSelection {
      <<Pure>>
      +is_fallback_allowed(cli_flag, env) bool
    }
    class GitHubPackageDownloader {
      +_build_repo_url(url, use_ssh, dep_ref, token) str
      +_build_noninteractive_git_env(...) dict
      +_sanitize_git_error(raw_stderr) str
    }
    class AuthResolver {
      <<Strategy>>
      +classify_host(host) HostInfo
      +resolve_for_dep(dep_ref) AuthContext
    }
    class HostInfo {
      <<ValueObject>>
      +kind: str
    }

    validation ..> DependencyReference : reads explicit_scheme
    validation ..> TransportSelection : reads is_fallback_allowed
    validation ..> GitHubPackageDownloader : delegates URL build and env
    validation ..> AuthResolver : resolves token per dep

    class validation:::touched
    classDef touched fill:#fff3b0,stroke:#d47600
Loading

Execution flow diagram

flowchart TD
    A([apm install pkg_url]) --> B["_validate_package_exists(package, verbose, auth_resolver)"]
    B --> C{dep_ref.host\nclassification}
    C -->|GitHub.com| D["[AUTH] unauth-first API probe"]
    C -->|ADO| E["[AUTH] bearer/PAT probe"]
    C -->|generic| F["[NET] read explicit_scheme\nfrom DependencyReference"]
    F --> G["[NET] is_fallback_allowed()\nreads APM_ALLOW_PROTOCOL_FALLBACK"]
    G --> H{explicit_scheme?}
    H -->|http or https| I["urls_to_try = [https_url]\nor [https, ssh] if fallback_env"]
    H -->|ssh or SCP git@| J["urls_to_try = [ssh_url]\nor [ssh, https] if fallback_env"]
    H -->|None - shorthand| K["urls_to_try = [ssh_url, https_url]\nlegacy chain unchanged"]
    I --> L["[EXEC] for probe_url in urls_to_try:\ngit ls-remote --heads probe_url"]
    J --> L
    K --> L
    L --> M["[LOG] _log_attempt_result per attempt\n_sanitize_git_error + env var redaction"]
    M --> N{returncode == 0?}
    N -->|Yes| O([return True])
    N -->|No - more attempts| L
    N -->|No - all exhausted| P([return False])
Loading

Design patterns: Chain of Responsibility (probe loop) -- urls_to_try is the responsibility chain. The PR narrows the chain from 2 members to 1 for explicit-scheme URLs (unless APM_ALLOW_PROTOCOL_FALLBACK re-enables the full chain), mirroring TransportSelector exactly. Fail-closed policy: no cross-protocol fallback by default -- the user's stated transport is honored. Minor style note: from apm_cli.deps.transport_selection import is_fallback_allowed at line 236 is a function-body lazy import; Auth Expert confirmed no circular dependency risk, so this can safely be promoted to a module-level import. Separately, prefer_web_probe_first is a minor misnomer post-PR -- it now drives preserve_config_isolation only, not probe order.


CLI Logging Expert: Clear improvement. Per-attempt _log_attempt_result moves logging from post-loop to in-loop, so git ls-remote (https) rc=128: fatal: Authentication failed surfaces before the SSH timeout logs -- giving users the actual diagnostic signal rather than a misleading final-attempt-only view. Two-layer sanitization (_sanitize_git_error + explicit env var redaction) correctly preserved at each attempt. Scheme labels (https) and (ssh) in verbose output are actionable. No _rich_* calls introduced directly in command paths. Minor: the _scheme_of() helper is defined as a nested function inside _validate_package_exists and also copied into the test file -- if it grows in use, extract to transport_selection.py. Not blocking.


DevX UX Expert: Core UX fix aligns APM with npm/pip/cargo/cargo mental model: give the tool an https:// URL, it tries HTTPS. The previous 30-second SSH timeout masking the real auth failure was a genuine funnel-killer for enterprise Bitbucket/on-prem GitLab users. The fix is correct and the APM_ALLOW_PROTOCOL_FALLBACK=1 escape hatch is modeled correctly -- power-user opt-in, not first-class UX. No CLI surface changes; no docs update required for a bug fix. Minor enhancement opportunity: the error message when explicit-https validation fails could hint --verbose for diagnosis, but this is not blocking. CHANGELOG entry is concise and user-facing.


Supply Chain Security Expert: Net security improvement. The strict-by-default transport enforcement prevents a scenario where attacker-influenced SSH credentials could be used when the user explicitly stated HTTPS. Two-layer stderr sanitization is faithfully preserved in the per-attempt logger -- _sanitize_git_error scrubs PAT-bearing URLs, and GIT_ASKPASS/GIT_CONFIG_GLOBAL env values are redacted. No new token surfaces, no path traversal, no lockfile changes. Pre-existing note (not introduced here): GIT_HTTP_EXTRAHEADER values are not explicitly scrubbed post-sanitize -- a hardening opportunity for a future PR. No action required for this PR.


Auth Expert: Activated (fast-path: src/apm_cli/install/validation.py). No AuthResolver precedence regression. Token sourced exclusively via resolve_for_dep(dep_ref) -- no direct os.getenv for tokens introduced. HTTPS URL embeds _url_token via _build_repo_url(token=_url_token) as before; SSH URL correctly built without a token (SSH uses key-based auth). is_fallback_allowed() is called once before the probe list is constructed and cached as allow_fallback_env -- correct, avoids repeated env reads in the loop. ADO bearer fallback path is untouched. Lazy import from apm_cli.deps.transport_selection import is_fallback_allowed at line 236: transport_selection.py does not import from validation.py, so no circular import risk -- this can be moved to module level safely.


OSS Growth Hacker: "APM now respects the URL you type" -- a strong enterprise credibility signal. This fix directly unblocks the Bitbucket Data Center / on-prem GitLab user segment, which is a key enterprise adoption vector. The CHANGELOG entry is well-written and usable as release note material as-is. Side-channel to CEO: APM_ALLOW_PROTOCOL_FALLBACK=1 is effectively undiscoverable by users who do not read CHANGELOG or run --verbose. A brief mention in the troubleshooting docs or enterprise adoption playbook would close this gap without blocking the PR.


CEO arbitration

All six specialists converge without meaningful disagreement: the fix is well-scoped, well-tested, architecturally consistent, and a net security improvement. The behavior change (explicit https:// no longer retries SSH) is correct, documented in CHANGELOG with the escape hatch named explicitly, and matches what every comparable package manager does. The only pre-merge recommendation is the lazy import cleanup (1 line, confirmed safe by Auth Expert) -- a style issue worth fixing before merge to keep the codebase clean, but not a blocker in the traditional sense. The Growth Hacker's annotation about APM_ALLOW_PROTOCOL_FALLBACK=1 discoverability is a valid follow-up that does not belong in this PR. Disposition ratified as APPROVE.


Required actions before merge

  1. Move lazy import to module level (src/apm_cli/install/validation.py, line 236). Change from apm_cli.deps.transport_selection import is_fallback_allowed (currently inside _validate_package_exists) to a module-level import alongside the existing from ..utils.github_host import default_host block. Auth Expert confirmed no circular import risk. This is a 1-line move and keeps the codebase consistent with the "lazy imports only for circular dependency breaks" convention.

Optional follow-ups

  • Rename prefer_web_probe_first to something reflecting its post-PR role (use_web_config_isolation or similar) -- it now drives preserve_config_isolation in _build_noninteractive_git_env, not probe order. Reduces future reader confusion.
  • Extract the _scheme_of() nested helper from _validate_package_exists to transport_selection.py -- it is duplicated in the new test file and is a natural fit alongside is_fallback_allowed.
  • Add a line to the troubleshooting docs or enterprise adoption playbook noting APM_ALLOW_PROTOCOL_FALLBACK=1 as the escape hatch for users migrating from SSH-first corporate setups.
  • Consider appending (run with --verbose for per-attempt diagnostics) to the generic validation failure message when an explicit-scheme URL fails -- reduces the feedback loop for enterprise users who hit auth issues.

Generated by PR Review Panel for issue #995 · ● 1.5M ·

@danielmeppiel danielmeppiel merged commit 2023e8c into main Apr 27, 2026
6 checks passed
@danielmeppiel danielmeppiel deleted the copilot/fix-apm-install-https-ssh-issue branch April 27, 2026 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

panel-review Trigger the apm-review-panel gh-aw workflow

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] apm install https tries to access via ssh

3 participants