Skip to content

fix: add threading.Lock to _fallback_port_warned dedup set#1238

Merged
danielmeppiel merged 7 commits intomainfrom
fix/801-fallback-port-warned-lock
May 11, 2026
Merged

fix: add threading.Lock to _fallback_port_warned dedup set#1238
danielmeppiel merged 7 commits intomainfrom
fix/801-fallback-port-warned-lock

Conversation

@sergio-sisternes-epam
Copy link
Copy Markdown
Collaborator

Description

Add a threading.Lock around the check-then-act on _fallback_port_warned in GitHubPackageDownloader, following the AuthResolver pattern already used at core/auth.py:105. This prevents duplicate [!] warnings when concurrent _clone_with_fallback callers race on the dedup set.

Currently latent (single-threaded install loop), but required pre-hardening before parallel downloads land.

Fixes #801

Type of change

  • Bug fix
  • New feature
  • Documentation
  • Maintenance / refactor

Testing

  • Tested locally
  • All existing tests pass
  • Added tests for new functionality (if applicable)

Work in progress — signals intent to fix #801.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@sergio-sisternes-epam sergio-sisternes-epam force-pushed the fix/801-fallback-port-warned-lock branch from 9f5c99e to 7b21d3a Compare May 10, 2026 10:48
@sergio-sisternes-epam sergio-sisternes-epam marked this pull request as ready for review May 10, 2026 10:57
Copilot AI review requested due to automatic review settings May 10, 2026 10:57
@sergio-sisternes-epam sergio-sisternes-epam added the panel-review Trigger the apm-review-panel gh-aw workflow label May 10, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds thread-safety around the cross-protocol fallback custom-port warning deduplication so concurrent clone attempts cannot emit duplicate [!] warnings. This aligns the downloader with upcoming parallel download hardening.

Changes:

  • Add a threading.Lock to GitHubPackageDownloader to guard the _fallback_port_warned dedup set.
  • Wrap the check-then-add sequence in CloneEngine under the lock while keeping warning emission outside the lock.
  • Add a concurrency-focused unit test to ensure only one warning is emitted under concurrent callers.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/apm_cli/deps/github_downloader.py Introduces a lock alongside _fallback_port_warned on the downloader instance.
src/apm_cli/deps/clone_engine.py Uses the lock to make deduplication atomic under concurrency before emitting the warning.
tests/unit/test_protocol_fallback_warning.py Adds a multi-threaded test intended to validate the dedup warning is emitted once.

Comment thread tests/unit/test_protocol_fallback_warning.py
Comment thread tests/unit/test_protocol_fallback_warning.py
@github-actions
Copy link
Copy Markdown

APM Review Panel: ship_with_followups

Ship now: threading.Lock dedup fix is correct and latent; two non-blocking follow-ups required before parallel downloads land

cc @sergio-sisternes-epam @danielmeppiel -- a fresh advisory pass is ready for your review.

All 8 panelists converge on the same verdict: the fix is mechanically correct, minimally scoped, and introduces no regressions. The lock is co-located with the mutable state it guards, the check-then-act extraction into a boolean before lock release is idiomatic, and the property-delegation pattern on CloneEngine is consistent with existing conventions. This is a sound pre-hardening commit.

The single substantive architectural note -- from python-architect, echoed structurally by devx-ux-expert -- is that the lock and its guarded set are sibling attributes with no enforced coupling. Any future caller who reads _fallback_port_warned without acquiring _fallback_port_warned_lock will introduce a silent regression. A private _WarnedSet dataclass encapsulating both is the right long-term shape, but it belongs in the parallel-download PR, not here. Scoping it to this PR would expand blast radius unnecessarily. The current fix is the correct minimal slice.

Supply-chain-security and auth-expert surface the same forward obligation independently: _github_token_from_credential_fill, github_token, has_github_token, and _registry_config_cache remain unprotected mutable state. These are safe today under the single-threaded install loop. They are a genuine audit obligation before parallel downloads land -- supply-chain flags a provenance audit gap, auth flags a credential-state race. These must be tracked as a hard prerequisite on the parallel-download issue, not deferred indefinitely.

Doc-writer and oss-growth-hacker both independently flag the missing CHANGELOG entry. This is a user-visible fix: duplicate [!] port-fallback warnings silently disappear. It follows the same pattern as the #1212 ADO stale-PAT entry already in [Unreleased] -> Fixed. The CHANGELOG entry should ship with this PR, not after.

Dissent. Test-coverage-expert correctly notes that the concurrent dedup test is probabilistically unreliable on a single-vCPU CI or under GIL serialization: the critical section in clone_engine.py lines 211-215 is short enough that threads may naturally serialize without the lock, causing the assertion to pass regardless of lock presence. The non-concurrent dedup tests carry the real regression-trapping weight. The concurrent test is a best-effort probe -- it should carry a comment documenting its probabilistic nature. No other panelist addressed test soundness at this level; panel defers to test-coverage on this finding.

Aligned with: Secure by default -- concurrency correctness is addressed before the race window is open, and unprotected auth-state vars are explicitly flagged as a parallel-download prerequisite. Pragmatic as npm -- users running parallel installs will see exactly one [!] port-fallback warning per unique (host, repo, port) identity rather than N duplicate warnings from N concurrent workers. OSS community driven -- pre-hardening commits tracked transparently in CHANGELOG, linked to the tracking issue (#801), and forward-referenced to the parallel-download milestone give external contributors a clear breadcrumb trail.

Growth signal. Pre-hardening PRs are growth-positive when they visibly signal production readiness. Recommended CHANGELOG line: "apm install no longer emits duplicate protocol-fallback port warnings when multiple packages are fetched in parallel; the _fallback_port_warned dedup set is now lock-guarded so each (host, repo, port) triple warns exactly once across threads. (#801)" -- the milestone tag amplifies the signal so that when parallel downloads ship, the release post can credibly cite a hardening trail.

Panel summary

Persona B R N Takeaway
Python Architect 0 1 2 Correct, minimal threading fix; lock co-located with guarded state; one encapsulation watch-for before parallel downloads land.
CLI Logging Expert 0 0 2 Lock-then-flag pattern is correct and idiomatic; [!] warning fires exactly once per (host, repo, port) identity; no output UX regressions.
DevX UX Expert 0 1 2 Net positive for install UX; eliminates duplicate warning noise; implicit host coupling on property access is the only concern.
Supply Chain Security Expert 0 1 1 Mechanically correct; no new attack surface; unprotected auth-state vars flagged as parallel-download prerequisite audit.
OSS Growth Hacker 0 1 1 Growth-positive pre-hardening signal; missing CHANGELOG entry and parallel-download milestone link are the only gaps.
Auth Expert 0 1 1 Auth-safe; lock pattern mirrors AuthResolver._lock; mutable auth-state vars flagged as parallel-download obligation.
Doc Writer 0 1 0 No doc files changed; CHANGELOG entry warranted for user-visible warning dedup fix, consistent with #1212 pattern.
Test Coverage Expert 0 1 2 Concurrent dedup test exists and is structurally sound; probabilistic race detection noted; missing CloneEngine isolation test.

B = blocking-severity findings, R = recommended, N = nits.
Counts are signal strength, not gates. The maintainer ships.

Top 5 follow-ups

  1. [Doc Writer + OSS Growth Hacker] (blocking-severity) Add CHANGELOG entry under [Unreleased] -> Fixed for PR fix: add threading.Lock to _fallback_port_warned dedup set #1238 before merge -- this is a user-visible fix (duplicate warnings disappear) and a public commitment pattern already established by [BUG] apm install -g --update fails with a DevOps auth error where apm install -g succeeds #1212. Should be in this PR, not a follow-up.
  2. [Supply Chain Security Expert + Auth Expert] Add explicit parallel-download prerequisite audit for unprotected mutable auth-state vars (_github_token_from_credential_fill, github_token, has_github_token, _registry_config_cache) to the parallel-download tracking issue -- both panelists independently surfaced the same exposure; must be a hard prerequisite, not a nice-to-have.
  3. [Test Coverage Expert] Add CloneEngine._fallback_port_warned_lock property-delegation isolation test (tests/unit/deps/test_clone_engine.py); document probabilistic nature of concurrent race probe inline -- missing delegation test means a host test-double lacking the attribute will surface as a runtime AttributeError rather than a test failure.
  4. [Python Architect] Before parallel-download PR: introduce _WarnedSet private dataclass encapsulating set + Lock with add_if_absent() to make lock-bypass structurally impossible -- sibling attributes with no enforced coupling are a regression trap at scale; scoping to the parallel-download PR is correct.
  5. [DevX UX Expert] Add guard or documented host-protocol precondition on CloneEngine._fallback_port_warned_lock property to prevent silent AttributeError on under-initialized host -- a one-line guard or docstring prevents a confusing runtime failure with no actionable message.

Architecture

classDiagram
    direction LR

    class GitHubPackageDownloader {
        <<Host>>
        +_fallback_port_warned : set
        +_fallback_port_warned_lock : Lock
        +_strategies : DownloadDelegate
        +_clone_with_fallback(repo_url, target, dep_ref)
    }

    class CloneEngine {
        <<Delegate>>
        -_host : GitHubPackageDownloader
        +_fallback_port_warned : set
        +_fallback_port_warned_lock : Lock
        +execute(repo_url_base, ...) Path
        -_env_for(attempt) dict
    }

    class DownloadDelegate {
        <<Strategy>>
        +_clone_engine : CloneEngine
    }

    class threading_Lock {
        <<Stdlib>>
        +acquire()
        +release()
        +__enter__()
        +__exit__()
    }

    class GitHubPackageDownloader:::touched
    class CloneEngine:::touched

    GitHubPackageDownloader *-- DownloadDelegate : delegates to
    DownloadDelegate *-- CloneEngine : owns
    CloneEngine ..> GitHubPackageDownloader : reads via _host
    GitHubPackageDownloader *-- threading_Lock : guards _fallback_port_warned

    note for GitHubPackageDownloader "Owns mutable dedup set + Lock.\nLock and set are co-located but\nnot encapsulated together (see finding)."
    note for CloneEngine "Property-delegates both set and lock\nfrom host. Adding lock property is\na leaky extension of an existing pattern."

    classDef touched fill:#fff3b0,stroke:#d47600
Loading
flowchart TD
    A(["apm install -- single or parallel caller"]) --> B["CloneEngine.execute()\nclone_engine.py:205"]

    B --> C{"warn_key needed?\n(scheme fallback occurred)"}
    C -- no --> Z(["return clone path"])
    C -- yes --> D["[LOCK] acquire _fallback_port_warned_lock\nclone_engine.py:211"]

    D --> E{"warn_key in\n_fallback_port_warned?"}
    E -- yes --> F["[LOCK] release lock\n_should_warn = False"]
    E -- no --> G["[LOCK] _fallback_port_warned.add(warn_key)\n_should_warn = True"]
    G --> H["[LOCK] release lock"]

    F --> Z
    H --> I["_rich_warning(\"Custom port ...\")\ngithub_downloader.py via _capture"]
    I --> Z

    style D fill:#fde8c8,stroke:#d47600
    style G fill:#fde8c8,stroke:#d47600
    style H fill:#fde8c8,stroke:#d47600
    style F fill:#fde8c8,stroke:#d47600
Loading

Recommendation

Ship after adding the CHANGELOG entry (follow-up #1, highest signal). The fix is correct, minimally scoped, and latent -- it costs nothing today and pays down the correctness debt before the parallel-download window opens. All 8 panelists agree on mechanical soundness. The two forward-looking concerns (auth-state audit, _WarnedSet encapsulation) are rightly deferred to the parallel-download PR where they become load-bearing. The concurrent test is a best-effort probe; the non-concurrent dedup tests provide the real regression trap. This PR sets the right pre-hardening precedent: small scope, documented rationale, explicit roadmap connection.


Full per-persona findings

Python Architect

  • [recommended] Lock and set are separate sibling attributes; no enforcement that they are always used together at src/apm_cli/deps/github_downloader.py:206
    In GitHubPackageDownloader.__init__ the set (_fallback_port_warned) and its lock (_fallback_port_warned_lock) are declared as two independent attributes. Any future caller who writes if warn_key not in self._fallback_port_warned without the lock will introduce a silent regression. A frozen-after-init _WarnedSet helper with add_if_absent() would make the invariant impossible to violate.
    Suggested: Consider a private _WarnedSet dataclass with add_if_absent(key) -> bool that encapsulates the lock internally. CloneEngine would hold one reference and call self._warned_set.add_if_absent(warn_key) -- no paired property delegation needed.
  • [nit] _fallback_port_warned_lock property on CloneEngine leaks the lock abstraction across the boundary at src/apm_cli/deps/clone_engine.py:135
    Exposing the lock separately doubles the surface. If the _WarnedSet wrapper is adopted this property disappears naturally.
    Suggested: If the _WarnedSet wrapper is not adopted, at minimum inline the lock access into a single _warn_once(key) -> bool method on the host object.
  • [nit] Local variable named _should_warn uses the private-convention leading underscore inside a function body at src/apm_cli/deps/clone_engine.py:208
    Leading underscores on local variables is unusual in Python; the convention is reserved for module-level or class-level names. should_warn (no underscore) is cleaner.
    Suggested: Rename _should_warn to should_warn.

CLI Logging Expert

  • [nit] Warning message embeds a literal newline + 4 spaces inside an f-string for indented continuation lines at src/apm_cli/deps/clone_engine.py:222
    This works today but the 4-space indent is load-bearing presentation logic hidden in a data string. If the output path ever switches to a structured Rich Panel or Text object the indentation will double-render.
  • [nit] Per-thread patch of _rich_warning in the concurrent test is non-obvious; hoisting the mock above the thread-spawn site makes thread-safety intent trivially obvious at tests/unit/test_protocol_fallback_warning.py:275
    All 4 patches install the same _capture closure so the test is functionally safe, but the intent is non-obvious and a reader might distrust it. Hoist the single patch(...) call above the thread-spawn site.

DevX UX Expert

  • [recommended] Exposing _fallback_port_warned_lock as a property delegating to _host creates implicit coupling with no precondition enforcement at src/apm_cli/deps/clone_engine.py:132-137
    If a future caller accesses this property before _host is fully initialized it will raise AttributeError with no actionable message.
    Suggested: Add a guard or document the host protocol requirement.
  • [nit] _should_warn local variable name uses leading underscore (private convention) in a function body at src/apm_cli/deps/clone_engine.py:211
    Rename to should_warn.
  • [nit] import shutil inside test function body; hoist to module top level in tests/unit/test_protocol_fallback_warning.py

Supply Chain Security Expert

  • [recommended] Parallel-download landing will require auditing unprotected shared mutable state at src/apm_cli/deps/github_downloader.py
    _github_token_from_credential_fill and _registry_config_cache are not protected by any lock. A race on _github_token_from_credential_fill could produce incorrect diagnostic output that misleads operators about which auth source is active -- a subtle provenance audit gap. Recommend tracking as a prerequisite in the parallel-download issue.
  • [nit] self.git_env is implicitly relied upon as read-only-after-init for thread safety at src/apm_cli/deps/github_downloader.py:196
    A one-line comment -- # Constructed once; must remain read-only after __init__ for thread safety -- would prevent regression when parallel downloads land.

OSS Growth Hacker

  • [recommended] Missing CHANGELOG entry; entry should forward-reference parallel-download roadmap to signal momentum
    Suggested: Fixed: GitHubPackageDownloader now uses a threading.Lock around the fallback-port warning dedup set, preventing duplicate [!] warnings under concurrent download workers -- groundwork for the upcoming parallel install mode. (#801)
  • [nit] PR body could link to parallel-downloads tracking issue as contributor breadcrumb -- low effort, compounding discovery value.

Auth Expert

  • [recommended] Mutable auth-state instance vars (github_token, has_github_token, _github_token_from_credential_fill) left unprotected for the parallel-download future at src/apm_cli/deps/github_downloader.py:261-263
    Currently safe (single-threaded loop) but will race when parallel-download work lands. AuthResolver already enforces this via its frozen AuthContext dataclass; the downloader should follow suit or document the freeze obligation.
  • [nit] _fallback_port_warned_lock property has no return-type annotation (-> threading.Lock) while sibling property _fallback_port_warned carries -> set[tuple] at src/apm_cli/deps/clone_engine.py:136

Doc Writer

Test Coverage Expert

  • [recommended] Concurrent test may not reliably trigger the race it claims to defend on a serializing scheduler at tests/unit/test_protocol_fallback_warning.py
    The critical section (lines 211-215 of clone_engine.py) is extremely short. On a single-vCPU CI or under GIL pressure during fixture setup, threads may naturally serialize through the critical section even without the lock, making len(port_warnings) == 1 pass whether the lock is present or not. The non-concurrent dedup tests are the real regression traps.
    Proof (passed): tests/unit/test_protocol_fallback_warning.py::TestConcurrentFallbackWarning::test_concurrent_fallback_warning_emitted_once -- proves: Under concurrent clone attempts for the same dep, the port warning fires at most once [devx]
    assert len(port_warnings) == 1, f"Expected exactly 1 port warning, got {len(port_warnings)}: {port_warnings}"
  • [nit] Per-thread patching of shared module attribute is fragile under concurrent patch entry/exit at tests/unit/test_protocol_fallback_warning.py
    unittest.mock.patch saves and restores the target attribute at __enter__/__exit__ time; concurrent calls can stack non-deterministically. Functionally safe today (all mocks share the same side_effect) but fragile if future refactors split capture lists. Hoist the single patch to the outer scope and pass the mock into _worker via closure.
  • [nit] No unit test for CloneEngine._fallback_port_warned_lock property delegation to host stub at tests/unit/deps/
    If CloneEngine is constructed with a host lacking _fallback_port_warned_lock, AttributeError surfaces at runtime rather than test time.
    Proof (missing): tests/unit/deps/test_clone_engine.py::test_clone_engine_lock_property_delegates_to_host -- proves: CloneEngine._fallback_port_warned_lock delegates correctly to the host context object [devx]

This panel is advisory. It does not block merge. Re-apply the
panel-review label after addressing feedback to re-run.

Generated by PR Review Panel for issue #1238 · ● 2.8M ·

@github-actions github-actions Bot removed the panel-review Trigger the apm-review-panel gh-aw workflow label May 10, 2026
Keep the runtime_factory suite green on CI runners without local LLM,
Copilot, or Codex installs by asserting against availability instead of
assuming llm is present.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@sergio-sisternes-epam sergio-sisternes-epam force-pushed the fix/801-fallback-port-warned-lock branch from c8abcfa to cda7b71 Compare May 10, 2026 17:53
@danielmeppiel danielmeppiel disabled auto-merge May 10, 2026 18:43
@danielmeppiel danielmeppiel enabled auto-merge May 10, 2026 18:45
@danielmeppiel danielmeppiel added this pull request to the merge queue May 10, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 10, 2026
@sergio-sisternes-epam sergio-sisternes-epam added this pull request to the merge queue May 10, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 10, 2026
@danielmeppiel danielmeppiel added this pull request to the merge queue May 10, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 10, 2026
danielmeppiel added a commit that referenced this pull request May 10, 2026
* fix(install,tests): repair 63 integration tests in merge queue (#1247)

The merge-queue Integration Tests job for #1238 surfaced 63 failures
that had accumulated in the suite since several integration tests were
finally wired into CI. Fixes group into seven clusters:

1. Copilot harness marker (post-#1154): bare `.github/` no longer
   counts; fixtures must drop `.github/copilot-instructions.md`.
2. `apm marketplace build` was removed in favor of `apm pack`;
   tests now assert the deprecation message + exit 2.
3. `tests/fixtures/azure-skills/.claude-plugin/marketplace.json`
   regenerated and SHA constant refreshed.
4. `apm.yml` invalid-deps assertion: tolerate Rich line-wrapping in
   `dependencies` literal echo.
5. Schema fix: stop using removed top-level `repo:` key; use `git:`.
6. ADO virtual collections (`collections/<name>`) now use the
   SUBDIRECTORY layout per #1094; orphan-detection and dependency-
   declaration-order tests updated to slash form.
7. `test_auto_bootstrap_creates_user_manifest` exposed a real
   regression introduced by #1149: `user_scope_rejection_reason()`
   rejected ALL local paths at user scope, contradicting the post-#937
   contract that only relative local paths are ambiguous. Restored
   absolute-vs-relative gating.

The two unit failures in `test_mcp_registry_module` and
`test_registry_client` reproduce on `origin/main` without these
changes and are pre-existing, out of scope for this PR.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* test: add regression traps for the two highest-leverage gaps

Adds two fast unit-level test files that pin the contracts whose
absence let the corresponding regressions ship to release:

1. tests/unit/install/test_user_scope_rejection_reason.py
   Pins the post-#937 / post-#1149 contract for local-path policy at
   user scope: relative -> reject; absolute -> accept; remote -> accept;
   git: parent -> reject; project scope -> never reject. Verified to
   fail when the predicate is reverted to the broken #1149 form
   (5 of 15 tests fail).

2. tests/unit/test_install_path_declaration_invariant.py
   Asserts that DependencyReference.get_install_path() and
   primitives.discovery.get_dependency_declaration_order() agree on
   the on-disk path for every dependency flavor: regular GitHub,
   regular ADO, virtual subdirectory GitHub, and -- critically --
   virtual subdirectory ADO (collections/<name>) which was the
   cluster-6 drift in the merge-queue run. Verified to fail when the
   declaration-order branch is mutated to emit the deleted-pre-#1094
   flattened form (2 of 6 tests fail).

Both files run in <1s combined, so the next refactor that touches
either contract fails one assertion in CI instead of waiting for a
slow E2E that may not even be wired in yet.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: address PR #1257 review comments

1. user_scope_rejection_reason now expanduser()s the local path before
   the absolute check. The rest of the install pipeline (sources.py,
   phases/resolve.py) already expanduser()s local paths, so without
   this alignment 'apm install --global ~/pkg' was incorrectly rejected
   as a relative path even though it expands to an absolute one.
   Trap added: tests/unit/install/test_user_scope_rejection_reason.py
   gains test_user_scope_accepts_tilde_local_path (parametrized x2).

2. tests/integration/test_install_dry_run_e2e.py: the temp_project
   fixture pre-created .github/copilot-instructions.md but
   _assert_no_install_artifacts asserted that same file is absent
   after the dry-run -- a structural contradiction that masked
   real dry-run regressions. Fix: drop the fixture file and set
   targets: [copilot] explicitly in the generated apm.yml so target
   detection no longer depends on the marker existing pre-install.

3. tests/integration/test_local_content_audit.py: refresh stale
   comment to reflect the post-#1154 detection contract (the marker
   is .github/copilot-instructions.md, not a bare .github/ directory).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Daniel Meppiel <copilot-rework@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danielmeppiel danielmeppiel merged commit ad5e9cd into main May 11, 2026
9 checks passed
@danielmeppiel danielmeppiel deleted the fix/801-fallback-port-warned-lock branch May 11, 2026 01:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] _fallback_port_warned set has no lock; concurrent _clone_with_fallback calls can race on dedup

3 participants