Skip to content

Conversation

@iduartgomez
Copy link
Collaborator

@iduartgomez iduartgomez commented Dec 5, 2025

Fixes #2023

claude and others added 11 commits December 4, 2025 23:34
- Analyzed test_small_network_get_failure failure modes
- Identified gateway crash root cause (fixed in a283e23)
- Documented PUT operation timeout issues
- Provided recommendations for re-enabling the test
- Suggested modernization using #[freenet_test] macro

Related: freenet#2023, freenet#2043, freenet#2036, freenet#2011
Re-enabled the previously ignored test with key improvements:
- Removed #[ignore] attribute - recent fixes should resolve issues
- Increased PUT timeout: 30s → 90s (accounts for connection delays)
- Increased overall test timeout: 120s → 180s (3 minutes)
- Added detailed error messages for better debugging
- Added documentation of recent fixes that resolved the issues

Recent fixes that should prevent failures:
- a283e23: Fixed gateway crashes during timeout notifications
- 615f02d: Fixed PUT response routing through forwarding peers
- 5734a33: Fixed local caching before forwarding PUTs

Related: freenet#2023
The freenet-ping contract failed to compile because freenet-ping-types
uses freenet_stdlib::time::now() when 'std' feature is disabled, but the
'contract' feature wasn't propagated to freenet-stdlib.

Changes:
- Added 'contract' feature to freenet-ping-types Cargo.toml
- Enabled 'contract' feature in ping contract's types dependency
- This allows WASM contract compilation to access time::now() function

Fixes compilation error when test_small_network_get_failure loads and
compiles the ping contract at runtime.
Co-authored-by: nacho.d.g <iduartgomez@users.noreply.github.com>
Cleaned up redundant comments throughout the test file that were
explaining self-evident code. Kept the TODO comment as it's actionable.

Co-authored-by: nacho.d.g <iduartgomez@users.noreply.github.com>
…enet#2023

This change annotates test_small_network_get_failure with test_log to
capture test execution logs in CI. The test passes locally but fails in
CI, and these logs will help us debug the issue.

Changes:
- Add test-log 0.2 to dev-dependencies in freenet-ping-app
- Replace manual logger setup with #[test_log::test] attribute
- Remove unused LevelFilter import
- Logs will now be captured and displayed on test failure

This will help us understand what's happening during CI test execution
and identify the root cause of issue freenet#2023.
- Update Cargo.lock to include test-log dependency changes
- Fix formatting (remove extra blank line in test function)
- Resolves CI build failure with --locked flag

This addresses the CI error:
"the lock file needs to be updated but --locked was passed"
Removes unnecessary implementation notes that were no longer relevant
after previous fixes to connection management and timeout handling.

Related to issue freenet#2023 investigation.
Update lock file after rebasing PR freenet#2055 onto latest main.
…LuLNaER7JBmDiLQXuDAtpv

fix: critical operation state management issues (freenet#1977)
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ iduartgomez
❌ claude
You have signed the CLA already but the status is still pending? Let us recheck it.

@iduartgomez iduartgomez changed the title fix: critical operation state management issues (#1977) test: re-enable test_small_network_get_failure with increased timeouts Dec 5, 2025
@iduartgomez
Copy link
Collaborator Author

@claude remove INVESTIGATION_REPORT_ISSUE_2023.md

@claude
Copy link

claude bot commented Dec 5, 2025

Claude finished @iduartgomez's task —— View job


Task: Remove INVESTIGATION_REPORT_ISSUE_2023.md

Task Complete

Investigation Results

The file INVESTIGATION_REPORT_ISSUE_2023.md has already been removed from this PR.

Commit History:

  1. b83da719 (Nov 4, 2025) - Investigation report was added
  2. 32634c39 (Nov 4, 2025) - Investigation report was removed
  3. Both commits are part of this PR's history

Verification:

$ git log --oneline dd942bb5 | grep "32634c39"
32634c39 docs: remove investigation report file

The removal was done in commit 32634c39 with the message:

docs: remove investigation report file

Co-authored-by: nacho.d.g iduartgomez@users.noreply.github.com

Current Status:

  • ✅ File does not exist in the working directory
  • ✅ File was removed in commit 32634c3
  • ✅ No action needed - task is already complete

@iduartgomez
Copy link
Collaborator Author

@claude review the PR

@claude
Copy link

claude bot commented Dec 5, 2025

Claude finished @iduartgomez's task —— View job


PR Review

Reviewing PR #2213: Re-enabling test_small_network_get_failure with increased timeouts

Progress


Review Summary

This PR re-enables a previously ignored test (test_small_network_get_failure) by increasing timeout values. However, there are significant concerns that need to be addressed before merging.


🔴 Critical Issues

1. Test Still Marked as #[ignore]

Location: apps/freenet-ping/app/tests/test_small_network_get_issue.rs:23

#[ignore = "Test has reliability issues in CI - PUT operations timeout and gateway crashes"]
async fn test_small_network_get_failure() -> TestResult {

Problem: The test is still ignored! The PR title says "re-enable test_small_network_get_failure" but the test remains disabled with #[ignore]. This means:

Required Action: Remove the #[ignore] attribute entirely if the test is truly ready to be re-enabled. If it's not ready, the PR title is misleading.


2. Inadequate PR Description

Current description: "Fixes #2023"

Problem: This violates the project's PR description standards outlined in AGENTS.md. The description should explain:

  • WHY the test was failing (root cause analysis)
  • WHAT was the actual problem (timing? race condition? network setup?)
  • HOW increasing timeouts solves it (is this a real fix or papering over an issue?)
  • WHY these specific timeout values were chosen

Per AGENTS.md:

PR descriptions must explain WHY, not just WHAT... Start with the problem – What's broken? What's the user impact? Why does it matter?

Example of what's needed:

## Problem
The `test_small_network_get_failure` test was disabled in issue #2023 because PUT operations 
were timing out after 30 seconds and causing gateway crashes in CI. Analysis showed that...

## Root Cause
[Explain what was actually wrong - timing? network conditions? race condition?]

## This Solution
Increased timeouts from X to Y seconds because [specific reason based on analysis].
This is [a workaround | a fix | the correct values based on...]

## Testing
Verified locally that the test now passes consistently with these timeout values.

⚠️ Design Concerns

3. Symptom Treatment vs. Root Cause Fix

The PR increases timeouts without explaining whether this addresses the root cause or just treats symptoms:

Timeout changes observed:

  • Overall test timeout: presumably increased to 120s (line 155)
  • PUT response timeout: 30s (line 221)
  • GET response timeout: 45s (line 307)
  • Second GET timeout: 10s (line 356)

Questions that need answers:

  1. Why do these operations take so long? Is 45 seconds for a GET in a 4-node local test network acceptable?
  2. What about the "gateway crashes" mentioned in the ignore attribute? Are those fixed?
  3. The code has a TODO comment (lines 161-167) admitting the connection maintenance task runs too slowly for tests. Is that the real issue that should be fixed instead?
// TODO: The connection maintenance task runs every 60 seconds by default,
// which is too slow for tests. This causes the first GET operation to take
// 11+ seconds as connections are established on-demand.
//
// Proper fix: Make connection acquisition more aggressive during startup,
// or make the maintenance interval configurable for tests.

This suggests the real fix should be making connection maintenance configurable for tests, not just increasing timeouts.


4. Following Project Testing Standards

Per AGENTS.md:

The repository uses the special TODO-MUST-FIX marker to block commits that temporarily disable tests. If a test must be skipped, leave a // TODO-MUST-FIX: comment explaining why and create a follow-up issue.

Current state: The test uses #[ignore] without TODO-MUST-FIX, which means:

  • If it stays ignored, it needs the TODO-MUST-FIX marker
  • If it's being re-enabled, the #[ignore] should be completely removed

📝 Minor Issues

5. Dependency Changes Not Explained

The PR modifies several Cargo.toml files but doesn't explain why:

  • apps/freenet-ping/app/Cargo.toml (+1/-0)
  • apps/freenet-ping/types/Cargo.toml (+1/-0)
  • apps/freenet-ping/contracts/ping/Cargo.toml (+1/-1)

What dependencies were added/modified and why are they needed for this fix?


6. Test Reliability Analysis Missing

The ignore message says "Test has reliability issues in CI". Before re-enabling:

  • Has the test been run multiple times locally to verify reliability?
  • What's the success rate? (e.g., "Passed 20/20 local runs")
  • Have you identified what makes it fail intermittently?

📋 Recommendations

Before merging, this PR needs:

  1. Remove #[ignore] if truly re-enabling the test, or explain why it must stay
  2. Comprehensive PR description following AGENTS.md guidelines explaining the root cause and solution
  3. Justification for timeout values - why these specific durations?
  4. Address the TODO - should connection maintenance be made configurable instead?
  5. Document reliability testing - proof the test now passes consistently
  6. Explain Cargo.toml changes - what dependencies changed and why?
  7. Consider the proper fix - is this a workaround or the correct long-term solution?

🤔 Questions for the Author

  1. Has the #[ignore] been removed in later commits? I can only see the current file state, not the full diff from main.

  2. What about the gateway crashes? The ignore message mentions both timeouts AND crashes. Are both fixed?

  3. Is this the right approach? The TODO comment suggests making connection maintenance configurable. Would that be a better fix than just increasing timeouts?

  4. Local testing results? How many times has this passed locally? Any failures?


Decision

Recommendation: Request Changes

This PR cannot merge in its current state because:

  • The test appears to still be ignored (defeating the purpose)
  • The PR description doesn't meet project standards
  • It's unclear if this treats symptoms vs. fixing root causes

Once the above issues are addressed, this could be a valuable contribution to re-enabling an important test case.


@iduartgomez
Copy link
Collaborator Author

@claude remove the ignore

@claude
Copy link

claude bot commented Dec 5, 2025

Claude finished @iduartgomez's task —— View job


Task: Remove the ignore attribute from test

Progress

  • Verify branch and working directory
  • Read current test file
  • Remove #[ignore] attribute
  • Create commit with changes

✅ Change Made (Local Commit Created)

I've successfully removed the #[ignore] attribute from the test at line 23 in:

apps/freenet-ping/app/tests/test_small_network_get_issue.rs

Change:

 #[tokio::test(flavor = "multi_thread")]
-#[ignore = "Test has reliability issues in CI - PUT operations timeout and gateway crashes"]
 async fn test_small_network_get_failure() -> TestResult {

The test is now re-enabled and will run in CI.


⚠️ Unable to Push to Fork

I created the commit locally:

892d7565 test: remove ignore attribute from test_small_network_get_failure

However, I cannot push this change because:

  1. This PR appears to be from your fork (iduartgomez/freenet-core)
  2. The GitHub Action runs in the main repository context
  3. I don't have push access to your fork's branch

To apply this change to your PR:

You can either:

Option 1: Cherry-pick my commit

git fetch origin
git cherry-pick 892d7565
git push

Option 2: Apply the change manually
Remove line 23 from apps/freenet-ping/app/tests/test_small_network_get_issue.rs:

#[ignore = "Test has reliability issues in CI - PUT operations timeout and gateway crashes"]

📄 Full Commit Details

commit 892d7565
Author: Claude <noreply@anthropic.com>
Date: Thu Dec 5 00:27:02 2025 +0000

    test: remove ignore attribute from test_small_network_get_failure
    
    Re-enable the test now that timeout values have been increased to handle
    the longer connection establishment times in CI.
    
    Co-authored-by: nacho.d.g <iduartgomez@users.noreply.github.com>

@iduartgomez
Copy link
Collaborator Author

@claude Remove #[ignore] re-enabling the test

@claude
Copy link

claude bot commented Dec 5, 2025

Claude finished @iduartgomez's task —— View job


Task: Remove #[ignore] re-enabling the test

Progress:

  • Verify working directory and branch
  • Locate PR branch
  • Check current state of test in PR
  • Verify test is already re-enabled

✅ Task Already Complete

The #[ignore] attribute has already been removed from the test in the PR branch!

Current state in PR branch claude/investigate-issue-2023-011CUpRvTNALQ2wkMyEnnFoY:

  • File: apps/freenet-ping/app/tests/test_small_network_get_issue.rs:19
  • Current annotation: #[test_log::test(tokio::test(flavor = "multi_thread"))]
  • Status: ✅ No #[ignore] attribute present - test is fully enabled

The test was re-enabled in an earlier commit on this PR branch. It will run in CI when the PR is merged.

Note: The PR description could still be improved to explain the root cause analysis and why these timeout values were chosen (as mentioned in the previous review), but the test itself is already re-enabled as requested.


View job

@iduartgomez iduartgomez marked this pull request as ready for review December 5, 2025 00:31
@iduartgomez iduartgomez added this pull request to the merge queue Dec 5, 2025
Merged via the queue into freenet:main with commit f77ccfe Dec 5, 2025
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

test: fix PUT timeout and gateway crashes in small network test

3 participants