test: add transient cap regressions and gateway identity check #2122

sanity · 2025-11-22T04:49:37Z

Problem

Transient caps and inbound identity handling had regressions: promotion paths could exceed caps and gateways could register inbound peers under placeholder identities. We also lacked targeted diagnostics around neighbor selection and a small-cap repro harness.

This change

Add instrumentation of neighbor candidates/live tx tracking to help debug connect maintenance.
Enforce transient promotion caps and add a regression harness for small cap scenarios.
Add gateway inbound identity regression test to ensure inbound peers are registered with real IDs.

Testing

cargo check -p freenet (local).

sanity · 2025-11-23T01:56:16Z

@claude Please carefully review this PR, both high-level and the details.

claude · 2025-11-23T01:56:38Z

Claude finished @sanity's task —— View job

PR Review Complete ✅

I've conducted a thorough review of this PR, examining both the high-level architecture and implementation details. Here's my assessment:

High-Level Assessment

Problem Statement: Clear and well-articulated. The PR addresses two critical regressions:

Transient connections could be promoted beyond max connection caps
Gateways registered inbound peers under placeholder identities instead of real IDs

Solution Approach: Sound. The changes enforce cap checks during transient promotion and add targeted regression tests to prevent recurrence.

Detailed Code Review

✅ connection_manager.rs - Major Refactoring (163 additions, 273 deletions)

Strengths:

Simplified state tracking: Replaced separate AtomicUsize counters (open_connections, reserved_connections) with a single source of truth via BTreeMap lengths. This eliminates potential race conditions and counter drift.
Unified location tracking: Merged pending_locations into location_for_peer, making pending peers immediately discoverable for routing. Good architectural simplification.
Cap enforcement at promotion: Lines 436-449 now enforce the global cap when adding connections, preventing the regression where transient promotions bypassed limits.
Better idempotency: Lines 172-176 now treat duplicate should_accept() calls as no-ops rather than rejecting them, which is more robust for network retries.

Concerns:

Potential race in cap enforcement (lines 436-449):
```
if previous_location.is_none() && self.connection_count() >= self.max_connections {
    // ... reject and rollback
}
```
Between the connection_count() check and the rollback, another thread could add a connection. Consider using a write lock held across both the check and the insert/rollback to make this atomic.

Double reservation cleanup (lines 422-429):

if was_reserved {
    let old = self.pending_reservations.write().remove(&peer);
    if old.is_none() {
        tracing::warn!(%peer, "add_connection: expected pending reservation missing");
    }
}
if was_reserved {  // ← Duplicate check here
    self.pending_reservations.write().remove(&peer);
}

This appears to be redundant code. The second if was_reserved block (lines 428-430) repeats the same removal. Should remove one.

Removed unit tests: The refactoring deleted two unit tests:
- pending_connections_hidden_from_known_locations
- should_accept_does_not_leak_reservations_for_duplicate_peer
These tested important invariants. While the behavior changed (pending peers are now in location_for_peer), consider adding equivalent tests for the new semantics.

✅ p2p_protoc.rs - Transient Promotion Guards

Strengths:

Defense in depth: Lines 1293-1340 add both admission logic (should_accept()) AND a direct cap check before promotion. This ensures no bypass path exists.
Clear error handling: Proper callback notifications when promotion is rejected.
Instrumentation: Enhanced logging helps diagnose cap violations in production.

Minor issue:

Line 1348: was_reserved parameter is now true, which is correct since should_accept() now reserves the slot. However, this assumes should_accept() was called—if there's any code path that promotes a transient without calling should_accept() first, this would be incorrect. Verify all promotion paths call should_accept().

✅ New Regression Tests

connection_cap.rs:

Simple, focused test that spins up a small network and verifies no peer exceeds max_connections
Good use of freenet-test-network builder API
Suggestion: Consider lowering max_connections in the test config rather than using the default. The comment mentions "low max-connections setting (min=5, max=6)" but the code uses DEFAULT_MAX_CONNECTIONS which is likely much higher. This would make the test more sensitive to regressions.

gateway_inbound_identity.rs:

Well-documented test purpose and historical context
Correctly verifies that gateway connections reference real peer IDs, not the gateway itself
Minor: Line 46 assertion message could be clearer about what it means for a connection to "reference remote peers, not itself"

✅ Instrumentation Changes

live_tx.rs:

Added len() method for debugging/telemetry. Simple and safe.

ring/mod.rs:

Enhanced maintenance logging to include candidates and live_tx_peers counts
Helpful for diagnosing connection state issues

Potential Issues & Questions

Race condition in add_connection(): The cap enforcement check at line 436 is not atomic with the insertion. Could lead to temporary cap violations under concurrent load.
Duplicate cleanup code: Lines 422-429 contain redundant pending_reservations.write().remove(&peer) calls.
Missing test coverage: The deleted unit tests covered important edge cases. While integration tests now exist, consider restoring unit tests for:
- Idempotent should_accept() behavior
- Reservation tracking invariants
- Cap enforcement under concurrent adds
Transient promotion assumption: The was_reserved=true parameter in p2p_protoc.rs:1348 assumes should_accept() was called. Is this guaranteed for all promotion code paths?
Test configuration mismatch: connection_cap.rs comments mention "min=5, max=6" but uses DEFAULT_MAX_CONNECTIONS. Should the test override this to create a tighter constraint?

Testing Assessment

Testing claimed: "cargo check -p freenet (local)"

Concern: cargo check only verifies compilation, not correctness. For a PR addressing regressions with new tests, please run:

cargo test -p freenet-core --test connection_cap
cargo test -p freenet-core --test gateway_inbound_identity
cargo test -p freenet-core  # Full suite

Recommendations

Before Merge:

✅ Remove duplicate pending_reservations.write().remove(&peer) at lines 428-430 in connection_manager.rs
⚠️ Address the race condition in add_connection() cap enforcement (consider atomic check-and-insert pattern)
⚠️ Run full test suite, not just cargo check
💡 Consider adding unit tests for the refactored connection manager invariants

Post-Merge:

Monitor production logs for "add_connection: rejecting new connection to enforce cap" warnings to verify the fix works as expected
Consider adding metrics for transient promotion success/failure rates

Verdict

Architecturally sound with good test coverage for the regressions. The refactoring simplifies state management and closes the bypass paths. However, there are implementation details (race condition, duplicate code) that should be addressed before merge.

The PR description could be slightly improved per AGENTS.md guidelines—it lists what changed but could better explain why the previous approach (separate counters, separate pending map) was inadequate and what insight led to unifying them.

Recommendation: Request changes to fix the duplicate cleanup code and race condition, then approve after verification.

sanity · 2025-11-23T02:54:59Z

Thanks for the review. The concerns you raised (race in add_connection, duplicate reservation cleanup, restoring unit tests) are addressed in the later slices of this stack—trying to keep this PR narrowly scoped so we don’t churn the stack order. If we need the duplicate cleanup fix here too, I can backport, but it’s already covered in the follow-ups.

…ions

sanity marked this pull request as draft November 22, 2025 05:34

freenet deleted a comment from claude bot Nov 23, 2025

sanity mentioned this pull request Nov 23, 2025

fix: track transient connections separately #2107

Closed

sanity marked this pull request as ready for review November 23, 2025 02:50

iduartgomez approved these changes Nov 23, 2025

View reviewed changes

sanity added 6 commits November 23, 2025 15:02

test: instrument neighbor candidates and live tx tracking

8490a44

fix: enforce caps on transient promotion and add cap repro test

b52fda2

test: add small cap repro harness

c78147d

test: add gateway inbound identity regression; fix accessors

80751ee

chore: fmt and use published freenet-test-network

69a7cc5

test: avoid unsupported connection cap flags

7764539

sanity force-pushed the fix/transient-split-3 branch from 70213aa to 7764539 Compare November 23, 2025 21:22

fix: align handshake args to transient naming

243a8d7

sanity enabled auto-merge November 23, 2025 22:03

test: retry three-hop put in integration test

3fad1c9

sanity added this pull request to the merge queue Nov 23, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 23, 2025

test: cap put retry timeout for three-hop deflake

04effac

sanity enabled auto-merge November 23, 2025 22:44

sanity added 2 commits November 23, 2025 17:13

fix(connect): keep first acceptance persistent

51f501b

fix(connect): honor handshake transient flag when registering connect…

00cea66

…ions

sanity added this pull request to the merge queue Nov 23, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 23, 2025

sanity added this pull request to the merge queue Nov 23, 2025

Merged via the queue into main with commit 134b3b3 Nov 24, 2025
11 checks passed

sanity deleted the fix/transient-split-3 branch November 24, 2025 00:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

test: add transient cap regressions and gateway identity check #2122

test: add transient cap regressions and gateway identity check #2122

Uh oh!

sanity commented Nov 22, 2025

Uh oh!

sanity commented Nov 23, 2025

Uh oh!

claude bot commented Nov 23, 2025 •

edited

Loading

Uh oh!

sanity commented Nov 23, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

test: add transient cap regressions and gateway identity check #2122

test: add transient cap regressions and gateway identity check #2122

Uh oh!

Conversation

sanity commented Nov 22, 2025

Problem

This change

Testing

Uh oh!

sanity commented Nov 23, 2025

Uh oh!

claude bot commented Nov 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Complete ✅

High-Level Assessment

Detailed Code Review

✅ connection_manager.rs - Major Refactoring (163 additions, 273 deletions)

✅ p2p_protoc.rs - Transient Promotion Guards

✅ New Regression Tests

✅ Instrumentation Changes

Potential Issues & Questions

Testing Assessment

Recommendations

Verdict

Uh oh!

sanity commented Nov 23, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

claude bot commented Nov 23, 2025 •

edited

Loading