feat: adding ready handler by chatton · Pull Request #2676 · evstack/ev-node

chatton · 2025-09-18T08:47:15Z

Overview

Adds the readiness check suggested in the linked issue.

github-actions · 2025-09-18T08:47:31Z

The latest Buf updates on your PR. Results from workflow CI and Release / buf-check (pull_request).

Build	Format	Lint	Breaking	Updated (UTC)
`✅ passed`	`⏩ skipped`	`✅ passed`	`✅ passed`	Sep 22, 2025, 8:23 AM

codecov · 2025-09-18T08:50:13Z

Codecov Report

❌ Patch coverage is 70.58824% with 20 lines in your changes missing coverage. Please review.
✅ Project coverage is 66.47%. Comparing base (844053c) to head (75d1ed6).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
pkg/rpc/server/http.go	73.91%	9 Missing and 3 partials ⚠️
node/full.go	16.66%	4 Missing and 1 partial ⚠️
node/light.go	50.00%	2 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2676      +/-   ##
==========================================
- Coverage   66.48%   66.47%   -0.02%     
==========================================
  Files          76       76              
  Lines        7657     7714      +57     
==========================================
+ Hits         5091     5128      +37     
- Misses       2097     2112      +15     
- Partials      469      474       +5

Flag	Coverage Δ
combined	`66.47% <70.58%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

claude · 2025-09-18T08:58:17Z

🔍 Code Review for PR #2676: Adding Ready Handler

Summary

This PR successfully implements a readiness endpoint to determine if a node is safe to receive client traffic. The implementation is well-structured with good test coverage and follows project patterns effectively.

✅ Strengths

Well-designed abstraction: The BestKnownHeightProvider interface cleanly abstracts height tracking across different node types
Comprehensive readiness logic: Properly checks peer connectivity, synchronization status, and node type-specific requirements
Excellent test coverage: Tests cover edge cases including being ahead of network, no peers, and various error conditions
Follows project patterns: Adheres to ev-node's modular architecture and coding conventions from CLAUDE.md
Production-ready: Appropriate HTTP status codes and error handling for load balancer integration

⚠️ Issues to Address

High Priority

Precision issue in full.go:68-74: Using math.Min with float64 conversion can cause precision loss. Use direct uint64 comparison instead:

hHeight := n.hSyncService.Store().Height()
dHeight := n.dSyncService.Store().Height()
if hHeight < dHeight {
    return hHeight
}
return dHeight

Hard-coded threshold: The 3-block tolerance at service.go:195 should be configurable for different deployment requirements

Medium Priority

Function complexity: NewServiceHandler now takes 6 parameters. Consider a config struct for better maintainability
Missing test coverage: Add tests for aggregator nodes and error scenarios (e.g., store.Height() failures)
Error context: When height retrieval fails, include the actual error in the response for better debugging

🛡️ Security Assessment

No security concerns identified. The endpoint appropriately avoids exposing sensitive information.

⚡ Performance Assessment

Minimal overhead with quick, non-blocking operations suitable for frequent health checks.

📊 Overall Rating: 8/10

✅ Recommendation: APPROVE with minor changes

This is a valuable feature that addresses real operational needs. The core implementation is solid, and the identified issues are minor and can be addressed before merging.

Priority fixes before merge:

Fix the float64 precision issue
Make the block tolerance configurable
Add aggregator node test coverage

Great work on this implementation! The readiness probe will significantly improve production deployments.

claude · 2025-09-18T09:20:42Z

Code Review for PR #2676: Adding Ready Handler

Thank you for implementing the readiness endpoint! This is a valuable addition for production deployments.

Strengths

Well-structured implementation following codebase patterns
Good test coverage with comprehensive scenarios
Configurable ReadinessMaxBlocksBehind parameter
Proper HTTP semantics (200/503 status codes)

Code Quality Concerns

1. Potential Race Condition (node/full.go:284-290)
The bestKnownHeightProvider reads from two sync services without synchronization. Heights could change between reads. Consider adding synchronization or documenting that eventual consistency is acceptable.

2. Error Response Structure
Consider returning JSON responses instead of plain text for better observability and monitoring integration.

Potential Issues

1. Nil Provider Documentation
BestKnownHeightProvider can be nil but this isn't documented in the function signature comment.

2. Test Setup Inconsistency
da_visualization_test.go passes nil parameters to RegisterCustomHTTPEndpoints which could cause issues if tests hit the readiness endpoint.

Performance Considerations

The endpoint will be called frequently. Consider caching height values with a short TTL.
GetPeers() might be expensive with many peers. Consider if only peer count is needed.

Test Coverage

Excellent table-driven tests! Consider adding a test case for when Height() returns an error.

Suggestions

Add deployment guidance comments for ReadinessMaxBlocksBehind values
Define the default value 3 as a named constant
Consider logging readiness state transitions

Summary

Solid implementation addressing issue #2673 requirements. Ready to merge after addressing the race condition concern.

Great work!

claude · 2025-09-18T09:30:07Z

Code Review for PR #2676: Adding Ready Handler

Thank you for implementing the readiness endpoint! This is a valuable addition for production deployments. Here's my comprehensive review:

✅ Strengths

Well-structured implementation: The readiness endpoint properly checks multiple conditions (peer connectivity, local height, sync status)
Good test coverage: Comprehensive test cases covering various scenarios
Follows existing patterns: Consistent with the codebase's architecture and style
Configurable tolerance: The ReadinessMaxBlocksBehind parameter allows operators to tune readiness based on their needs

🔍 Areas for Improvement

1. Race Condition in bestKnownHeightProvider

In node/full.go:284-291, the height provider function accesses sync service stores without synchronization:

bestKnownHeightProvider := func() uint64 {
    hHeight := n.hSyncService.Store().Height()  // Potential race
    dHeight := n.dSyncService.Store().Height()  // Potential race
    if hHeight < dHeight {
        return hHeight
    }
    return dHeight
}

Recommendation: Consider adding mutex protection or ensuring the Store().Height() methods are thread-safe.

2. Error Handling Enhancement

The readiness endpoint returns generic "UNREADY" messages. Consider adding more specific error details:

// Current
http.Error(w, "UNREADY: behind best-known head", http.StatusServiceUnavailable)

// Suggested
msg := fmt.Sprintf("UNREADY: local height %d is %d blocks behind best-known %d (max allowed: %d)", 
    localHeight, bestKnownHeight-localHeight, bestKnownHeight, allowedBlocksBehind)
http.Error(w, msg, http.StatusServiceUnavailable)

3. Missing Context Timeout

In pkg/rpc/server/http.go:46, the store height query uses the request context without timeout:

localHeight, err := s.Height(r.Context())

Recommendation: Add a timeout to prevent hanging requests:

ctx, cancel := context.WithTimeout(r.Context(), 5*time.Second)
defer cancel()
localHeight, err := s.Height(ctx)

4. Documentation Improvements

The BestKnownHeightProvider type comment could clarify what "best-known" means in different node types
Consider adding a comment explaining why we return min(headerHeight, dataHeight) for full nodes

🐛 Potential Issues

Nil Check Order: In the readiness handler, you check pm != nil but don't handle the case where s (store) might be nil before calling s.Height().
Edge Case: When bestKnownHeight == localHeight == 0, the node is considered unready. However, this might be valid for a fresh network. Consider special-casing genesis.

💡 Suggestions

Add Metrics: Consider exposing the readiness status and block lag as Prometheus metrics for monitoring
Grace Period: For nodes that just started, consider a grace period before marking as unready
Health Status Endpoint: Consider a /health/status endpoint that returns JSON with detailed health information

✨ Security & Performance

✅ No security concerns identified
✅ Minimal performance impact (simple height comparisons)
✅ No credentials or sensitive data exposed

📋 Test Coverage

The tests are comprehensive but could benefit from:

Testing concurrent access to the height provider
Testing behavior during active sync (heights changing during request)
Testing with nil store or peer manager

Overall Assessment

This is a solid implementation that addresses issue #2673 effectively. The code is clean, well-tested, and follows the project's patterns. With the suggested improvements around race conditions and error handling, this will be production-ready.

Verdict: Approved with suggestions ✅

The implementation is functional and safe to merge, but please consider addressing the race condition concern before deploying to production environments.

chatton · 2025-09-18T09:31:37Z

pkg/rpc/server/http.go

+
+		bestKnownHeight := bestKnownHeightProvider()
+		if bestKnownHeight == 0 {
+			http.Error(w, "UNREADY: best-known height unknown", http.StatusServiceUnavailable)


we can turn this into a structured output if it makes things easier to work with cc @auricom WDYT? We can do a contains UNREADY / READY from a liveness probe check like this.

claude · 2025-09-18T12:04:39Z

Comprehensive Pull Request Review

Overview

This PR implements a readiness endpoint (/health/ready) as requested in issue #2673. The implementation adds functionality to distinguish between nodes that are up (liveness) and nodes that are safe to query (readiness), which is crucial for load balancers and Kubernetes deployments.

✅ Positive Aspects

1. Well-Structured Implementation

Follows the modular design pattern mentioned in CLAUDE.md
Properly extends the existing HTTP endpoint registration system
Uses dependency injection pattern with the BestKnownHeightProvider function type
Maintains separation of concerns between node types (full vs light)

2. Good Test Coverage

Comprehensive table-driven tests covering various scenarios
Tests edge cases like no blocks, no peers, and varying block heights
Follows the testing patterns described in CLAUDE.md
Uses mocks appropriately for external dependencies

3. Configuration Management

Adds configurable ReadinessMaxBlocksBehind parameter with sensible default (3 blocks)
Properly updates configuration tests and flag management
Follows existing configuration patterns

🔧 Areas for Improvement

1. Error Handling and Context Usage

The readiness endpoint doesn't follow the error handling patterns mentioned in CLAUDE.md. Current implementation lacks context wrapping:

localHeight, err := s.Height(r.Context())
if err != nil {
    http.Error(w, "UNREADY: state unavailable", http.StatusServiceUnavailable)
    return
}

Suggestion: Consider providing more specific error context:

localHeight, err := s.Height(r.Context())
if err != nil {
    http.Error(w, fmt.Sprintf("UNREADY: state unavailable: %v", err), http.StatusServiceUnavailable)
    return
}

2. Function Signature Design

The BestKnownHeightProvider function type could be more robust:

type BestKnownHeightProvider func() uint64

Issues:

No error handling capability
No context support for cancellation
No way to distinguish between "unknown" (0) and "genesis" (0)

Suggestion:

type BestKnownHeightProvider func(ctx context.Context) (uint64, error)

3. Logic Edge Cases

The best-known height calculation in full nodes could be more sophisticated:

bestKnownHeightProvider := func() uint64 {
    hHeight := n.hSyncService.Store().Height()
    dHeight := n.dSyncService.Store().Height()
    if hHeight < dHeight {
        return hHeight
    }
    return dHeight
}

Potential Issues:

No error handling if either sync service fails
Uses minimum height, but this might not always be the best indicator
Could lead to false negatives if one sync service is lagging

4. HTTP Response Consistency

The endpoint mixes error messages in HTTP errors and success responses. Consider a more structured response format or at least consistent messaging.

5. Aggregator Node Logic

The peer check is skipped for aggregator nodes, but this might not be sufficient. Should aggregator nodes have different readiness criteria? They might need connectivity to DA layers or other specific requirements.

🔒 Security Considerations

Positive:

✅ Error messages don't expose sensitive internal state
✅ No private keys or sensitive data leaked

Areas to address:

⚠️ The HTTP handler doesn't validate request method (should be GET only)
⚠️ No rate limiting considerations for health endpoints

⚡ Performance Considerations

1. Sync Service Access

The bestKnownHeightProvider function calls Height() on sync services without error handling
These calls happen on every readiness check and could block

2. P2P Peer Query

GetPeers() call could be expensive for nodes with many peers
No caching or throttling of peer queries

📋 Recommendations

High Priority:

Add method validation (GET only)
Improve BestKnownHeightProvider signature with context and error handling

Medium Priority:

Add timeout/context to sync service height calls
Consider caching peer count for performance
Add structured logging for readiness check results

Low Priority:

Consider more sophisticated "best known height" calculation
Add metrics for readiness check success/failure rates

🎯 Overall Assessment

This is a well-implemented feature that addresses the requirements effectively. The code follows most of the project's patterns and includes good test coverage. The main areas for improvement are around error handling robustness and performance considerations for the sync service calls.

Recommendation: ✅ Approve with minor suggested improvements

The implementation successfully addresses the issue requirements and provides a solid foundation for production readiness checks. The suggested improvements would enhance robustness but don't block the core functionality.

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

claude · 2025-09-18T21:35:09Z

Pull Request Review: Readiness Handler Implementation

Thank you for implementing the readiness check handler! This is a valuable addition for deployment health monitoring. Here's my comprehensive review:

✅ Strengths

Clean implementation following the existing patterns in the codebase
Good separation of concerns with the BestKnownHeightProvider abstraction
Comprehensive test coverage with multiple edge cases
Follows the ev-node architecture principles from CLAUDE.md
Proper configuration with sensible defaults (3 blocks behind)

🎯 Code Quality & Best Practices

Function Signature Consistency: The addition of bestKnownHeightProvider parameter to NewServiceHandler is well-placed at the end, maintaining backward compatibility where possible.
Error Handling: Good error handling with descriptive messages that will help with debugging:
- "UNREADY: failed to query peers"
- "UNREADY: no peers connected"
- "UNREADY: state unavailable"
Configuration: The default value of 3 blocks for ReadinessMaxBlocksBehind seems reasonable for most deployments.

🐛 Potential Issues

Race Condition in Height Calculation: In full.go, the bestKnownHeightProvider function reads from two different sync services without synchronization:
```
hHeight := n.hSyncService.Store().Height()
dHeight := n.dSyncService.Store().Height()
```
Consider if these calls need atomic access or if the slight inconsistency is acceptable.
Nil Check Order: The readiness handler checks pm != nil but doesn't guard against s (store) being nil. While unlikely in practice, consider adding a nil check for consistency.

⚡ Performance Considerations

Function Call Overhead: The bestKnownHeightProvider is called on every readiness check. Since these checks can be frequent (every few seconds from k8s), consider if caching with a short TTL would be beneficial.
GetPeers() Call: The pm.GetPeers() call might be expensive depending on the implementation. Consider if peer count could be cached or if there's a lighter-weight way to check connectivity.

🔒 Security Considerations

Information Disclosure: The readiness endpoint reveals some internal state (behind by how many blocks). This is generally acceptable but ensure this aligns with your security posture.
No Authentication: The health endpoints are unauthenticated, which is standard practice but worth noting.

🧪 Test Coverage

Excellent test coverage in server_test.go! The table-driven tests cover:

Node at head
Node within acceptable lag
Node too far behind
Local node ahead of network
No blocks yet
Unknown best-known height
No peers connected

Suggestion: Consider adding a test case for aggregator nodes (where peer checking is skipped).

💡 Minor Suggestions

Logging: Consider adding debug logging when readiness status changes, which could help with troubleshooting:

if bestKnownHeight-localHeight > allowedBlocksBehind {
    // Consider logging: logger.Debug().Uint64("local", localHeight).Uint64("best", bestKnownHeight).Msg("Node unready: too far behind")
    http.Error(w, "UNREADY: behind best-known head", http.StatusServiceUnavailable)
    return
}

Metrics: Consider exposing the block lag as a metric for monitoring systems that prefer numeric values over binary ready/not-ready.
Documentation: The comment for BestKnownHeightProvider is excellent. Consider adding a similar comment in CLAUDE.md about the readiness endpoint for future developers.

✅ Overall Assessment

This is a solid implementation that addresses issue #2673 effectively. The code is clean, well-tested, and follows the project's patterns. The minor suggestions above are optional improvements rather than blocking issues.

Recommendation: APPROVE ✅

The implementation is production-ready and will provide valuable health monitoring capabilities for deployments.

julienrbrt · 2025-09-19T06:51:27Z

node/full.go

+	bestKnownHeightProvider := func() uint64 {
+		hHeight := n.hSyncService.Store().Height()
+		dHeight := n.dSyncService.Store().Height()
+		if hHeight < dHeight {


nit, return min(hHeight, dHeight)

claude · 2025-09-19T07:59:12Z

Code Review for PR #2676: Adding Ready Handler

Thank you for implementing the readiness endpoint! This is a valuable addition for production deployments. Here's my review:

✅ Strengths

Clean implementation following existing patterns in the codebase
Good test coverage with comprehensive edge cases
Proper configuration with sensible defaults (3 blocks tolerance)
Clear separation between full and light node behavior
Follows Go conventions and error handling patterns

🔍 Code Quality Observations

Architecture & Design: The implementation is well-structured with the BestKnownHeightProvider abstraction allowing different node types to provide their own height calculation logic.
Error Messages: The UNREADY messages are descriptive and helpful for debugging (e.g., "UNREADY: no peers connected", "UNREADY: behind best-known head").
Configuration: Good addition to the config system with proper flag registration and test updates.

⚠️ Potential Issues & Suggestions

Race Condition Risk: In node/full.go:284-288, the bestKnownHeightProvider function accesses sync services without synchronization. Consider if these Store() calls are thread-safe or if synchronization is needed.
Missing Metrics: Consider adding metrics/logging when readiness state changes. This would be valuable for monitoring and alerting.
Edge Case: When bestKnownHeight == 0 and localHeight == 0, the endpoint returns UNREADY with "no blocks yet". This is correct, but during initial chain startup, both aggregator and non-aggregator nodes might remain unready for extended periods. Consider if this is the desired behavior.
Documentation: The BestKnownHeightProvider type comment is clear, but consider adding a comment explaining why we use min(headerHeight, dataHeight) for full nodes - this ensures we only consider ourselves ready when both header and data are sufficiently synced.

🚀 Performance Considerations

The readiness check is lightweight and suitable for frequent health checks
No blocking operations or expensive computations in the critical path

🔒 Security Considerations

No security concerns - the endpoint exposes minimal information appropriate for health monitoring
Properly returns 503 status codes for unhealthy states

✅ Test Coverage

Excellent test coverage with table-driven tests covering all edge cases
Tests properly mock dependencies and verify both success and failure scenarios
Config test count properly updated

💡 Minor Suggestions

Consider making the readiness check more granular by returning JSON with details for debugging (could be a future enhancement if needed).
The ReadinessMaxBlocksBehind default of 3 seems reasonable, but you might want to document the rationale in CLAUDE.md or a comment.

✅ Overall Assessment

This is a well-implemented feature that addresses the requirements from issue #2673. The code is production-ready with only minor suggestions for improvement. The implementation correctly handles the different node types (full vs light) and provides appropriate readiness criteria.

Great work! 🎉

claude · 2025-09-19T21:13:40Z

Code Review: Readiness Handler Implementation

Thank you for this contribution! The readiness endpoint is a valuable addition for production deployments.

Strengths

Clean Architecture: Well-integrated with existing codebase patterns
Excellent Test Coverage: Comprehensive test cases covering edge scenarios
Good Separation of Concerns: Proper use of BestKnownHeightProvider function type
Follows Go Conventions: Clean error handling and HTTP patterns

Critical Issues (Must Fix)

1. Nil Pointer Safety

The code does not validate parameters before use, which could cause panics in pkg/rpc/server/http.go:319

localHeight, err := s.Height(r.Context()) could panic if s is nil
cfg.Node.ReadinessMaxBlocksBehind could panic if cfg is nil
Fix: Add nil checks at the beginning of the handler.

2. Context Timeout

The s.Height() call could hang indefinitely. Consider adding a timeout with context.WithTimeout.

Suggested Improvements

More Descriptive Error Messages - provide actual block heights and differences
Additional Test Coverage - error scenarios, context cancellation, nil provider testing
Documentation Enhancement - more comprehensive docs for BestKnownHeightProvider

Security & Performance

Security: Low risk, minimal information disclosure
Performance: Minimal overhead, efficient early returns
Consider rate limiting at infrastructure level

Overall Assessment

Recommendation: APPROVE after addressing critical issues

This is a well-implemented feature with high code quality. Once nil pointer safety issues are addressed, this will be production-ready.

Great work on this implementation!

tac0turtle · 2025-09-22T08:00:10Z

node/full.go

 	// Start RPC server
-	handler, err := rpcserver.NewServiceHandler(n.Store, n.p2pClient, n.genesis.ProposerAddress, n.Logger, n.nodeConfig)
+	bestKnownHeightProvider := func() uint64 {
+		hHeight := n.hSyncService.Store().Height()


the idea here is to push the check if the node is at the highest height onto the operator correct?

yeah exactly

claude · 2025-09-22T08:15:59Z

Code Review for PR #2676: Adding Ready Handler

Overall, this is a well-implemented feature that adds a valuable readiness check endpoint. The implementation is clean and follows good practices. Here's my detailed feedback:

✅ Strengths

Clean Architecture: The BestKnownHeightProvider function type is a good abstraction that allows different node types to provide their best-known height appropriately.
Comprehensive Test Coverage: Excellent test coverage with various edge cases tested, including scenarios where the node is ahead, behind, has no peers, or has no blocks.
Configuration Flexibility: The ReadinessMaxBlocksBehind configuration parameter allows operators to tune readiness checks based on their requirements.
Proper Error Messages: Clear and descriptive error messages that explain why the node is not ready.

🔧 Suggestions for Improvement

Default Value for ReadinessMaxBlocksBehind: Consider setting a sensible default value (e.g., 3-5 blocks) in config.DefaultConfig(). Currently it appears to default to 0, which might be too strict for production environments where brief lags are acceptable.
Race Condition Mitigation: The current implementation uses min(hHeight, dHeight) for full nodes. Consider adding a comment explaining why the minimum is used, as this might not be immediately obvious to future maintainers.

Configuration Comment Enhancement: The comment for ReadinessMaxBlocksBehind could be more descriptive. Consider:

comment: "Maximum number of blocks the node can lag behind the best-known network height while still being considered ready for serving traffic. Set to 0 for strict synchronization requirements."

🐛 Minor Issues

Missing Default Config Test: While the flag is tested, consider adding an explicit test for the default configuration value to ensure it's properly initialized.
Aggregator Mode Edge Case: For aggregator nodes, the peer check is skipped, but there's no readiness check specific to aggregators (e.g., checking if they can produce blocks). Consider if additional checks are needed.

💡 Performance Considerations

The readiness check is lightweight and should have minimal performance impact. The use of function providers for height calculations is efficient.

🔒 Security

No security concerns identified. The endpoint properly exposes only necessary information without leaking sensitive details.

📊 Test Coverage

Test coverage is comprehensive with good edge case handling. The test cases cover all critical paths and failure scenarios.

✨ Additional Recommendations

Consider adding metrics/observability around readiness state changes to help operators monitor when nodes transition between ready/unready states.

Overall, this is a solid implementation that addresses the requirements effectively. Great work! 👍

claude · 2025-09-22T08:19:03Z

Code Review for PR #2676: feat: adding ready handler

Thank you for implementing the readiness probe! This is a valuable addition for production deployments. I've reviewed the implementation and have the following feedback:

Strengths

Clean implementation - The readiness endpoint is well-structured and follows the established patterns in the codebase
Good test coverage - Comprehensive test cases covering various scenarios (at head, blocks behind, no peers, etc.)
Flexible configuration - The ReadinessMaxBlocksBehind config option provides good operational flexibility
Proper separation of concerns - Using BestKnownHeightProvider function type allows full and light nodes to provide their own logic

Suggestions for Improvement

1. Missing Default Value for ReadinessMaxBlocksBehind

The new config field doesn't have a default value set in DefaultConfig() (pkg/config/defaults.go:40-79). Consider adding a sensible default (e.g., 3 blocks) to match the test expectations.

2. Edge Case: bestKnownHeight == 0

When bestKnownHeight == 0, the endpoint returns UNREADY: best-known height unknown. However, during initial startup or network bootstrap, this might be a valid state. Consider whether the node should be ready if both local and best-known heights are 0 (initial network state).

3. Improve Error Messages

The error messages could be more descriptive to help operators diagnose issues. For example, include the actual heights and allowed lag in the behind best-known head message to aid debugging.

4. Performance Consideration

The readiness check queries peers on every request for non-aggregator nodes. In high-traffic scenarios, consider caching the peer count for a short duration (e.g., 1-5 seconds) to reduce overhead.

5. Documentation Enhancement

The comment for ReadinessMaxBlocksBehind could be clearer. Consider: Maximum number of blocks the node can lag behind the best-known network height while still being considered ready for serving queries. Set to 0 to require exact synchronization with the network head.

Potential Issues

1. Race Condition Risk

The bestKnownHeightProvider in full.go uses min(hHeight, dHeight) without synchronization. While likely safe due to atomic operations on the underlying stores, consider documenting this assumption or adding explicit synchronization if needed.

2. Aggregator Node Behavior

Aggregator nodes skip the peer check entirely. This might be intentional, but consider whether aggregators should have different readiness criteria (e.g., checking DA layer connectivity).

Security Considerations

No security issues identified. The endpoint doesn't expose sensitive information and properly validates all inputs.

Test Coverage

Excellent test coverage with multiple scenarios
Consider adding a test for the aggregator node case (where peer check is skipped)
Consider adding integration tests to verify the endpoint behavior during actual sync operations

Overall Assessment

This is a solid implementation that addresses the requirements from issue #2673. With the suggested improvements (especially adding the default config value), this PR is ready for merge. The readiness probe will significantly improve operational reliability by preventing traffic routing to nodes that aren't fully synced.

Great work on this feature!

claude · 2025-09-22T08:25:35Z

Code Review

This PR adds a readiness handler that monitors node synchronization state - a valuable addition for health monitoring and orchestration. The implementation is clean and follows the codebase patterns well.

✅ Strengths

Clean Architecture: The BestKnownHeightProvider abstraction nicely separates concerns between full and light nodes
Good Test Coverage: Comprehensive test cases covering edge conditions and different node states
Proper Configuration: Well-integrated config parameter with sensible default (3 blocks)
Follows Patterns: Consistent with the existing codebase structure and conventions

🔍 Minor Suggestions

Error Message Consistency: Consider standardizing the error format. Some return "UNREADY: " while the success just returns "READY". Consider "READY: synced" for consistency.
Documentation Enhancement: The comment for ReadinessMaxBlocksBehind in config.go could clarify the behavior when set to 0 (e.g., "0 means must be exactly at head - no tolerance")
Height Provider Naming: Consider renaming bestKnown variable to bestKnownProvider in light.go:81 for clarity

✅ Security & Performance

No security concerns identified
Minimal performance impact - simple height comparisons
No resource leaks or concurrency issues

✅ Test Coverage

Excellent coverage including:

Various sync states (at head, behind, ahead)
Peer connectivity checks
Error conditions
Both aggregator and non-aggregator nodes

Overall, this is a solid implementation that addresses #2673 effectively. The code is production-ready with only minor cosmetic suggestions. Great work!

claude · 2025-09-22T08:35:09Z

Code Review for PR #2676: Adding Ready Handler

Summary

This PR adds a readiness health check endpoint (/health/ready) to the ev-node system, which is essential for Kubernetes deployments and general operational monitoring. The implementation looks solid overall with good test coverage and sensible defaults.

Strengths

Clean implementation: The readiness check logic is well-structured and follows existing patterns
Good test coverage: Comprehensive unit tests covering various scenarios
Proper configuration: Added as configurable parameter with sensible default (3 blocks behind)
Appropriate checks: Verifies peer connectivity, local height, and compares with best-known height

Code Quality

Minor improvements suggested:

In pkg/rpc/server/http.go:47, error message could be more informative
The bestKnownHeightProvider nil check could be moved earlier for fail-fast behavior

Potential Issues

The bestKnownHeightProvider in node/full.go reads from two sync services without explicit synchronization. While likely safe with atomic uint64 operations, consider documenting this assumption.
Edge case handling when both heights are 0 could distinguish between "not started" vs "genesis state"

Performance

Lightweight and efficient implementation
Consider caching peer count check if endpoint gets hit frequently

Security

No security concerns identified
Appropriately exposes only operational status

Test Coverage

Excellent coverage with multiple scenarios
Consider adding integration tests for sync operation behavior

Suggestions

Add Prometheus metrics for height delta monitoring
Document typical ReadinessMaxBlocksBehind values
Consider a degraded state for slightly behind but functional nodes

Overall Assessment

Well-implemented feature following project patterns. Clean code with good tests. Ready for merge with minor improvements.

Approved with suggestions

julienrbrt

utACK

chore: adding handler for ready

30bffdf

github-project-automation bot added this to Evolve Sep 18, 2025

github-actions bot assigned chatton Sep 18, 2025

chore: adding table tests

80cfc27

chore: pr feedback

42ab0c9

chore: update flag test

d5e4358

chatton marked this pull request as ready for review September 18, 2025 09:30

chatton commented Sep 18, 2025

View reviewed changes

chatton mentioned this pull request Sep 18, 2025

bug: health endpoint still reports OK when node has stopped producing blocks. #2643

Closed

tac0turtle requested a review from auricom September 18, 2025 18:53

julienrbrt changed the title ~~chore: adding ready handler~~ feat: adding ready handler Sep 19, 2025

julienrbrt reviewed Sep 19, 2025

View reviewed changes

pr feedback

1e18a66

julienrbrt previously approved these changes Sep 19, 2025

View reviewed changes

tac0turtle reviewed Sep 22, 2025

View reviewed changes

tac0turtle previously approved these changes Sep 22, 2025

View reviewed changes

chore: fix merge conflicts

893c571

chatton dismissed stale reviews from tac0turtle and julienrbrt via 893c571 September 22, 2025 08:13

chore: fix incorrect argument in test

4ea6c09

chore: included the default blocks behind in default config

75d1ed6

julienrbrt approved these changes Sep 22, 2025

View reviewed changes

chatton added this pull request to the merge queue Sep 22, 2025

Merged via the queue into main with commit 727e591 Sep 22, 2025
29 of 31 checks passed

chatton deleted the cian/add-readiness-endpoint branch September 22, 2025 09:31

github-project-automation bot moved this to Done in Evolve Sep 22, 2025

Conversation

chatton commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Uh oh!

github-actions bot commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

claude bot commented Sep 18, 2025

🔍 Code Review for PR #2676: Adding Ready Handler

Summary

✅ Strengths

⚠️ Issues to Address

High Priority

Medium Priority

🛡️ Security Assessment

⚡ Performance Assessment

📊 Overall Rating: 8/10

✅ Recommendation: APPROVE with minor changes

Uh oh!

claude bot commented Sep 18, 2025

Code Review for PR #2676: Adding Ready Handler

Strengths

Code Quality Concerns

Potential Issues

Performance Considerations

Test Coverage

Suggestions

Summary

Uh oh!

claude bot commented Sep 18, 2025

Code Review for PR #2676: Adding Ready Handler

✅ Strengths

🔍 Areas for Improvement

1. Race Condition in bestKnownHeightProvider

2. Error Handling Enhancement

3. Missing Context Timeout

4. Documentation Improvements

🐛 Potential Issues

💡 Suggestions

✨ Security & Performance

📋 Test Coverage

Overall Assessment

Uh oh!

chatton Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

claude bot commented Sep 18, 2025

Comprehensive Pull Request Review

Overview

✅ Positive Aspects

🔧 Areas for Improvement

🔒 Security Considerations

⚡ Performance Considerations

📋 Recommendations

🎯 Overall Assessment

Uh oh!

claude bot commented Sep 18, 2025

Pull Request Review: Readiness Handler Implementation

✅ Strengths

🎯 Code Quality & Best Practices

🐛 Potential Issues

⚡ Performance Considerations

🔒 Security Considerations

🧪 Test Coverage

💡 Minor Suggestions

✅ Overall Assessment

Uh oh!

julienrbrt Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

claude bot commented Sep 19, 2025

Code Review for PR #2676: Adding Ready Handler

✅ Strengths

🔍 Code Quality Observations

chatton commented Sep 18, 2025 •

edited

Loading

github-actions bot commented Sep 18, 2025 •

edited

Loading

codecov bot commented Sep 18, 2025 •

edited

Loading

julienrbrt Sep 19, 2025 •

edited

Loading