Skip to content

DO NOT MERGE fix: recovery#2730

Closed
julienrbrt wants to merge 1 commit intomainfrom
julien/recover
Closed

DO NOT MERGE fix: recovery#2730
julienrbrt wants to merge 1 commit intomainfrom
julien/recover

Conversation

@julienrbrt
Copy link
Member

Overview

@github-actions
Copy link
Contributor

github-actions bot commented Oct 3, 2025

The latest Buf updates on your PR. Results from workflow CI and Release / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedOct 3, 2025, 2:48 PM

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @julienrbrt, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request, explicitly marked as 'DO NOT MERGE', appears to be a temporary branch for debugging or testing a recovery mechanism. It significantly alters the system's operational flow by disabling key components, hardcoding data retrieval, and simplifying the data submission process to isolate and test specific behaviors related to recovery. The changes are not intended for production and serve as a development snapshot.

Highlights

  • Temporary Debugging Setup: Many core components and loops have been commented out or replaced with simplified, hardcoded logic, indicating a temporary debugging or testing configuration.
  • Disabled Executor and Reaper: The initialization and return of the executor and reaper components have been commented out in block/components.go, effectively disabling them.
  • Hardcoded Data Fetch: The getPending function in block/internal/cache/pending_base.go now fetches only a single data item from a specific hardcoded height (2402427) instead of iterating through a range of pending items.
  • Modified DA Submission Loop: The continuous DA submission loop in block/internal/submitting/submitter.go has been largely commented out and replaced with a single data submission attempt followed by an immediate program exit using os.Exit(0).
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@claude
Copy link
Contributor

claude bot commented Oct 3, 2025

Code Review for PR #2730: DO NOT MERGE fix: recovery

⚠️ CRITICAL ISSUES IDENTIFIED - This PR should NOT be merged in its current state

🔴 Critical Security & Stability Issues

1. Hard-coded Height Value (CRITICAL BUG)

Location: block/internal/cache/pending_base.go:44

item, err := pb.fetch(ctx, pb.store, 2402427)

Issues:

  • Magic number 2402427 is hard-coded instead of iterating through heights
  • This completely breaks the pending submission logic by only fetching one specific height
  • Original logic correctly iterated from lastSubmitted + 1 to current height
  • This will cause data loss and incorrect DA submissions

Impact: HIGH - This breaks the core data availability submission mechanism


2. Process Termination with os.Exit(0) (CRITICAL)

Location: block/internal/submitting/submitter.go:182

os.Exit(0)

Issues:

  • Immediately terminates the entire process after one DA submission attempt
  • Bypasses all cleanup, deferred functions, and graceful shutdown logic
  • No proper error handling or goroutine coordination
  • Violates Go best practices for lifecycle management

Impact: HIGH - Causes abrupt process termination, potential data corruption, and prevents proper cleanup

Recommendation: Remove os.Exit(0) and restore proper loop-based submission logic with context cancellation


3. Commented Out Critical Components

Location: block/components.go:216-245

The Executor and Reaper components are completely commented out:

  • executing.NewExecutor() - Handles state transitions
  • reaping.NewReaper() - Transaction reaping functionality
  • Components removed from struct but still referenced in Start() and Stop() methods

Issues:

  • Breaks aggregator nodes completely - they cannot produce or execute blocks
  • Components struct at line 28-29 still declares Executor and Reaper fields, but they're never initialized
  • Start() method (lines 65-74) and Stop() method (lines 101-109) still check and call these nil components
  • This will cause nil pointer dereferences at runtime

Impact: CRITICAL - Aggregator nodes will fail to start or panic at runtime


4. Disabled DA Inclusion Processing Loop

Location: block/internal/submitting/submitter.go:116-119

// go func() {
//     defer s.wg.Done()
//     s.processDAInclusionLoop()
// }()

Issues:

  • DA inclusion processing is essential for finality
  • Without this loop, blocks are never marked as DA-included
  • Affects both sync and aggregator nodes
  • The wg.Add(1) at line 115 is called but the corresponding goroutine is never started, causing potential WaitGroup imbalance

Impact: HIGH - Breaks finality mechanism across all node types


5. Disabled Last Submitted Height Tracking

Location: block/internal/submitting/da_submitter.go:252-255

// if l := len(submitted); l > 0 {
//     lastHeight := submitted[l-1].Height()
//     cache.SetLastSubmittedDataHeight(ctx, lastHeight)
// }

Issues:

  • Cache state becomes inconsistent with actual submissions
  • Future calls to getPending() will return incorrect results
  • Could lead to duplicate submissions or missed submissions

Impact: MEDIUM - Cache inconsistency leading to incorrect DA submission behavior


6. Completely Gutted DA Submission Loop

Location: block/internal/submitting/submitter.go:143-182

The entire timer-based submission logic is commented out:

  • No periodic ticking
  • No header submission logic
  • No data submission logic
  • No metrics updates
  • Replaced with a single synchronous call followed by os.Exit(0)

Impact: CRITICAL - Completely breaks the DA submission mechanism


🟡 Code Quality Issues

  1. Dead Code Pollution

    • Massive amounts of commented-out code instead of proper version control
    • Reduces code readability significantly
    • Violates the repository's code style guidelines from CLAUDE.md
  2. Test Coverage

    • No indication that tests were updated to reflect these changes
    • Existing tests will likely fail:
      • block/components_test.go
      • block/internal/cache/pending_base_test.go
      • block/internal/submitting/submitter_test.go
  3. Breaking API Changes

    • Components struct still declares Executor and Reaper fields but never initializes them
    • Any code depending on these components will panic with nil pointer dereference

📋 Recommendations

This PR appears to be debug/development code that should not be merged:

  1. DO NOT MERGE - The PR title clearly states "DO NOT MERGE"
  2. Hard-coded values suggest this is temporary debugging code
  3. os.Exit(0) is typically used for quick debugging/testing, not production
  4. Commented code should be removed, not committed

If this is intended as a fix:

  1. Remove the hard-coded height 2402427 and restore proper iteration logic
  2. Remove os.Exit(0) and restore proper goroutine lifecycle management
  3. Either restore or properly remove the Executor and Reaper components (update struct definition accordingly)
  4. Restore the DA inclusion processing loop
  5. Restore last submitted height tracking
  6. Restore proper DA submission loop with periodic execution
  7. Remove all commented-out code
  8. Update tests to reflect any legitimate changes
  9. Run make lint and make test-all to ensure code quality

Security Note: While this appears to be debugging code rather than malicious, the hard-coded height and abrupt process termination create significant reliability and data integrity risks.


✅ What to Do Next

Based on the PR title "DO NOT MERGE fix: recovery", this appears to be a work-in-progress debugging branch. Consider:

  1. Creating a proper fix with incremental, reviewed changes
  2. Adding context in the PR description about what issue is being investigated
  3. Using feature flags or separate debug tooling instead of breaking production code
  4. Ensuring all tests pass before requesting review

🤖 Generated with Claude Code

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request, titled 'DO NOT MERGE fix: recovery', appears to be a temporary solution for a recovery task. It introduces several changes that are unsuitable for a production environment, such as hardcoding a block height, disabling core processing loops, and including an os.Exit(0) call that would terminate the node. While these may be useful for a one-off debugging or recovery script, they introduce critical issues and must be reverted before this PR can be considered for merging. Additionally, there are large blocks of commented-out code that should be cleaned up to maintain code quality.

Comment on lines +43 to 49
pending := make([]T, 0)
item, err := pb.fetch(ctx, pb.store, 2402427)
if err != nil {
return nil, err
}
if lastSubmitted == height {
return nil, nil
}
if lastSubmitted > height {
return nil, fmt.Errorf("height of last submitted item (%d) is greater than height of last item (%d)", lastSubmitted, height)
}
pending := make([]T, 0, height-lastSubmitted)
for i := lastSubmitted + 1; i <= height; i++ {
item, err := pb.fetch(ctx, pb.store, i)
if err != nil {
return pending, err
}
pending = append(pending, item)
return pending, err
}
pending = append(pending, item)
return pending, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The getPending function has been modified to fetch a hardcoded height 2402427. This breaks the core functionality of fetching all pending items based on the last submitted height. This change must be reverted to restore the correct behavior. Hardcoding values like this is dangerous and makes the system brittle.

Suggested change
pending := make([]T, 0)
item, err := pb.fetch(ctx, pb.store, 2402427)
if err != nil {
return nil, err
}
if lastSubmitted == height {
return nil, nil
}
if lastSubmitted > height {
return nil, fmt.Errorf("height of last submitted item (%d) is greater than height of last item (%d)", lastSubmitted, height)
}
pending := make([]T, 0, height-lastSubmitted)
for i := lastSubmitted + 1; i <= height; i++ {
item, err := pb.fetch(ctx, pb.store, i)
if err != nil {
return pending, err
}
pending = append(pending, item)
return pending, err
}
pending = append(pending, item)
return pending, nil
lastSubmitted := pb.lastHeight.Load()
height, err := pb.store.Height(ctx)
if err != nil {
return nil, err
}
if lastSubmitted == height {
return nil, nil
}
if lastSubmitted > height {
return nil, fmt.Errorf("height of last submitted item (%d) is greater than height of last item (%d)", lastSubmitted, height)
}
pending := make([]T, 0, height-lastSubmitted)
for i := lastSubmitted + 1; i <= height; i++ {
item, err := pb.fetch(ctx, pb.store, i)
if err != nil {
return pending, err
}
pending = append(pending, item)
}
return pending, nil

Comment on lines +144 to +182
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The daSubmissionLoop has been drastically changed. The periodic submission logic is commented out and replaced with a single submission attempt followed by os.Exit(0). Calling os.Exit in a service like this is extremely dangerous and will cause the node to terminate unexpectedly. This entire block should be reverted to its original implementation with the ticker-based loop for continuous operation.

	ticker := time.NewTicker(s.config.DA.BlockTime.Duration)
	defer ticker.Stop()

	metricsTicker := time.NewTicker(30 * time.Second)
	defer metricsTicker.Stop()

	for {
		select {
		case <-s.ctx.Done():
			return
		case <-ticker.C:
			// Submit headers
			if s.cache.NumPendingHeaders() != 0 {
				if s.headerSubmissionMtx.TryLock() {
					go func() {
						defer s.headerSubmissionMtx.Unlock()
						if err := s.daSubmitter.SubmitHeaders(s.ctx, s.cache); err != nil {
							s.logger.Error().Err(err).Msg("failed to submit headers")
						}
					}()
				}
			}

			// Submit data
			if s.cache.NumPendingData() != 0 {
				if s.dataSubmissionMtx.TryLock() {
					go func() {
						defer s.dataSubmissionMtx.Unlock()
						if err := s.daSubmitter.SubmitData(s.ctx, s.cache, s.signer, s.genesis); err != nil {
							s.logger.Error().Err(err).Msg("failed to submit data")
						}
					}()
				}
			}
		case <-metricsTicker.C:
			s.updateMetrics()
		}
	}

Comment on lines +252 to +255
// if l := len(submitted); l > 0 {
// lastHeight := submitted[l-1].Height()
// cache.SetLastSubmittedDataHeight(ctx, lastHeight)
// }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The logic to update the last submitted data height has been commented out. This will prevent the system from tracking which data has been submitted to the DA layer, likely causing data to be re-submitted repeatedly. This should be uncommented to ensure correct operation.

Suggested change
// if l := len(submitted); l > 0 {
// lastHeight := submitted[l-1].Height()
// cache.SetLastSubmittedDataHeight(ctx, lastHeight)
// }
if l := len(submitted); l > 0 {
lastHeight := submitted[l-1].Height()
cache.SetLastSubmittedDataHeight(ctx, lastHeight)
}

Comment on lines +116 to +119
defer s.logger.Info().Msg("DA submission loop stopped")

ticker := time.NewTicker(s.config.DA.BlockTime.Duration)
defer ticker.Stop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The goroutine that starts processDAInclusionLoop has been commented out. This disables DA inclusion processing, which is a core feature for ensuring data availability. This should be uncommented to restore the node's functionality.

Suggested change
// go func() {
// defer s.wg.Done()
// s.processDAInclusionLoop()
// }()
go func() {
defer s.wg.Done()
s.processDAInclusionLoop()
}()

Comment on lines +216 to +245
// executor, err := executing.NewExecutor(
// store,
// exec,
// sequencer,
// signer,
// cacheManager,
// metrics,
// config,
// genesis,
// headerBroadcaster,
// dataBroadcaster,
// logger,
// blockOpts,
// errorCh,
// )
// if err != nil {
// return nil, fmt.Errorf("failed to create executor: %w", err)
// }

reaper, err := reaping.NewReaper(
exec,
sequencer,
genesis,
logger,
executor,
reaping.DefaultInterval,
)
if err != nil {
return nil, fmt.Errorf("failed to create reaper: %w", err)
}
// reaper, err := reaping.NewReaper(
// exec,
// sequencer,
// genesis,
// logger,
// executor,
// reaping.DefaultInterval,
// )
// if err != nil {
// return nil, fmt.Errorf("failed to create reaper: %w", err)
// }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This large block of commented-out code for creating the executor and reaper should be removed. If this logic is no longer needed, the code should be deleted. If it's temporarily disabled for debugging, it should be restored before this pull request is merged. Leaving commented-out code reduces maintainability.

"context"
"encoding/binary"
"fmt"
"os"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This os import is only used for the os.Exit(0) call in daSubmissionLoop. As that call is highly problematic and should be removed, this import will become unused and should be deleted as well.

@julienrbrt
Copy link
Member Author

done.

@julienrbrt julienrbrt closed this Oct 3, 2025
@github-project-automation github-project-automation bot moved this to Done in Evolve Oct 3, 2025
@julienrbrt julienrbrt deleted the julien/recover branch October 3, 2025 15:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant