-
-
Notifications
You must be signed in to change notification settings - Fork 105
Open
Labels
A-networkingArea: Networking, ring protocol, peer discoveryArea: Networking, ring protocol, peer discoveryE-mediumExperience needed to fix/implement: Medium / intermediateExperience needed to fix/implement: Medium / intermediateP-highHigh priorityHigh priorityS-needs-reproductionStatus: Bug needs reproduction steps or confirmationStatus: Bug needs reproduction steps or confirmationT-bugType: Something is brokenType: Something is broken
Description
Parent Issue
Part of #2021 - fixing ignored integration tests
Problem
The small network GET test fails in CI with PUT operation timeouts followed by gateway crashes.
Affected test:
apps/freenet-ping/app/tests/test_small_network_get_issue.rs::test_small_network_get_failure
Current status:
#[ignore = "Test has reliability issues in CI - PUT operations timeout and gateway crashes"]Ignored by: @sanity in commit feccbde (June 3, 2025)
Test Description
This test simulates production network conditions to verify GET operations work correctly:
- 1 gateway node
- 3 regular nodes (4 total peers, matching production)
- Poor connectivity between peers (realistic scenario)
- Node1 publishes a contract
- Node2 attempts to GET the contract
The test was created to reproduce issue #2018 where GET operations failed in small networks.
Expected Behavior
- Node1 successfully PUTs the ping contract (should complete in < 30s)
- Gateway remains stable throughout test
- Node2 successfully GETs the contract
- Test completes without crashes
Actual Behavior
- PUT operations timeout after 30+ seconds
- Gateway process crashes during or after PUT operation
- Test fails with timeout/crash errors
- Issue is specific to CI environment (may work locally)
Investigation Tasks
- Run test locally with RUST_LOG=debug to establish baseline
- Run test in CI-like environment (limited resources) to reproduce
- Capture gateway logs at crash time
- Check if PUT timeout is due to network propagation issues
- Identify root cause of gateway crash (panic, assertion, channel closure)
- Review related issue GET operation fails when caching contracts with version-based update validation #2018 for context on GET operation failures
- Check if crash is specific to small network topology or general PUT issue
Related Code
- Test file:
apps/freenet-ping/app/tests/test_small_network_get_issue.rs - Relevant issue: GET operation fails when caching contracts with version-based update validation #2018 (GET operation failures)
- Network config: Gateway + 3 nodes with poor connectivity
Crash Analysis Needed
The gateway crash is particularly concerning as it indicates:
- Unhandled panic in gateway code path
- Resource exhaustion under load
- Race condition in connection management
- Improper error handling for timeout scenarios
Success Criteria
- PUT operations complete reliably within timeout
- Gateway remains stable throughout test
- Test passes consistently in CI
- Remove
#[ignore]annotation - No regressions in GET operation functionality (GET operation fails when caching contracts with version-based update validation #2018)
Impact
High priority - combines reliability issues (timeouts) with stability issues (crashes), and was created specifically to catch GET operation bugs that affected production.
[AI-assisted debugging and comment]
Metadata
Metadata
Assignees
Labels
A-networkingArea: Networking, ring protocol, peer discoveryArea: Networking, ring protocol, peer discoveryE-mediumExperience needed to fix/implement: Medium / intermediateExperience needed to fix/implement: Medium / intermediateP-highHigh priorityHigh priorityS-needs-reproductionStatus: Bug needs reproduction steps or confirmationStatus: Bug needs reproduction steps or confirmationT-bugType: Something is brokenType: Something is broken
Type
Projects
Status
Triage