Skip to content

test: fix PUT timeout and gateway crashes in small network test #2023

@sanity

Description

@sanity

Parent Issue

Part of #2021 - fixing ignored integration tests

Problem

The small network GET test fails in CI with PUT operation timeouts followed by gateway crashes.

Affected test:

  • apps/freenet-ping/app/tests/test_small_network_get_issue.rs::test_small_network_get_failure

Current status:

#[ignore = "Test has reliability issues in CI - PUT operations timeout and gateway crashes"]

Ignored by: @sanity in commit feccbde (June 3, 2025)

Test Description

This test simulates production network conditions to verify GET operations work correctly:

  • 1 gateway node
  • 3 regular nodes (4 total peers, matching production)
  • Poor connectivity between peers (realistic scenario)
  • Node1 publishes a contract
  • Node2 attempts to GET the contract

The test was created to reproduce issue #2018 where GET operations failed in small networks.

Expected Behavior

  1. Node1 successfully PUTs the ping contract (should complete in < 30s)
  2. Gateway remains stable throughout test
  3. Node2 successfully GETs the contract
  4. Test completes without crashes

Actual Behavior

  1. PUT operations timeout after 30+ seconds
  2. Gateway process crashes during or after PUT operation
  3. Test fails with timeout/crash errors
  4. Issue is specific to CI environment (may work locally)

Investigation Tasks

  • Run test locally with RUST_LOG=debug to establish baseline
  • Run test in CI-like environment (limited resources) to reproduce
  • Capture gateway logs at crash time
  • Check if PUT timeout is due to network propagation issues
  • Identify root cause of gateway crash (panic, assertion, channel closure)
  • Review related issue GET operation fails when caching contracts with version-based update validation #2018 for context on GET operation failures
  • Check if crash is specific to small network topology or general PUT issue

Related Code

Crash Analysis Needed

The gateway crash is particularly concerning as it indicates:

  • Unhandled panic in gateway code path
  • Resource exhaustion under load
  • Race condition in connection management
  • Improper error handling for timeout scenarios

Success Criteria

Impact

High priority - combines reliability issues (timeouts) with stability issues (crashes), and was created specifically to catch GET operation bugs that affected production.

[AI-assisted debugging and comment]

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-networkingArea: Networking, ring protocol, peer discoveryE-mediumExperience needed to fix/implement: Medium / intermediateP-highHigh priorityS-needs-reproductionStatus: Bug needs reproduction steps or confirmationT-bugType: Something is broken

    Type

    No type

    Projects

    Status

    Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions