Skip to content

Conversation

@rodrigo-o
Copy link
Collaborator

@rodrigo-o rodrigo-o commented Dec 19, 2025

Motivation

Have a single server continusly syncing hoodi, sepolia and mainnet

Description

This PR takes inspiration in the already present server_runner.py and creates a new docker_monitor.py accompanied with make targets and a new docker compose that spawn by default 3 nodes in parallel, hoodi, sepolia and mainnet, and monitors them in the following way:

  • Monitor snapsync time (with a timeout of 4 hours globally)
  • Once a network is synced it makes sure that it process blocks for at least 20 minutes
  • Once all of the above pass the ntwork is marked as successful
  • Once all networks succeed we send a notification, update a history log and store consensus/ethrex logs for all networks. Then restarts the containers and starts again
  • On failure, the containers aren't stopped, to be able to debug DB issues if needed, failures are also notified and stored in the history of runs

Status while running
image

Notification
image

History Log
image

Next Steps:

This is far from perfect but its working and adds a lot of value in it's current form, next we may want to:

  • Be able to validate the state after the syncs
  • tweak how we update or not the branch on each run
  • Refactor the script
  • Unify the way to run nodes in loop (server_runner.py)
  • Add additional information to the history log and notifications (some ideas were discussed, like peer_count, etc)

Closes #5718

@github-actions github-actions bot added the L1 Ethereum client label Dec 19, 2025
@rodrigo-o rodrigo-o force-pushed the parallel-snapsync-test branch from 95ea395 to 45fbd26 Compare December 22, 2025 22:13
@rodrigo-o rodrigo-o marked this pull request as ready for review December 23, 2025 17:56
@rodrigo-o rodrigo-o requested a review from a team as a code owner December 23, 2025 17:56
@ethrex-project-sync ethrex-project-sync bot moved this to In Review in ethrex_l1 Dec 23, 2025
Copilot AI review requested due to automatic review settings January 7, 2026 14:16
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a comprehensive monitoring and orchestration system for running parallel Ethereum snapsync operations across multiple networks (Holesky, Sepolia, and Mainnet). The system continuously monitors sync progress, tracks block processing, logs results, sends Slack notifications, and automatically restarts successful runs in an infinite loop.

Key Changes:

  • New Python monitoring script (docker_monitor.py) that tracks sync status with configurable timeouts and automatic container restart on success
  • Docker Compose configuration supporting parallel multi-network deployments with isolated volumes
  • Makefile targets for simplified operation (multisync-up, multisync-loop, multisync-monitor, etc.)

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 20 comments.

File Description
tooling/sync/docker_monitor.py Core monitoring script implementing status tracking, RPC polling, Slack notifications, log persistence, and automatic restart orchestration
tooling/sync/docker-compose.multisync.yaml Multi-network Docker Compose configuration with 4 network setups (hoodi, sepolia, mainnet, hoodi-2) each with isolated volumes and consensus clients
tooling/sync/Makefile New Make targets for starting, stopping, monitoring, and managing multi-network sync operations
.gitignore Exclusion of multisync_logs directory from version control

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +107 to +110
def slack_notify(run_id: str, run_count: int, instances: list, hostname: str, branch: str, commit: str):
"""Send a single summary Slack message for the run."""
all_success = all(i.status == "success" for i in instances)
url = os.environ.get("SLACK_WEBHOOK_URL_SUCCESS" if all_success else "SLACK_WEBHOOK_URL_FAILED")
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Slack webhook URLs are retrieved from environment variables without validation. If these URLs are compromised or point to an attacker-controlled endpoint, sensitive information about the sync process (hostname, branch, commit, network status) could be leaked. Consider validating that the webhook URLs match expected Slack webhook URL patterns or documenting this security consideration.

Suggested change
def slack_notify(run_id: str, run_count: int, instances: list, hostname: str, branch: str, commit: str):
"""Send a single summary Slack message for the run."""
all_success = all(i.status == "success" for i in instances)
url = os.environ.get("SLACK_WEBHOOK_URL_SUCCESS" if all_success else "SLACK_WEBHOOK_URL_FAILED")
def _get_slack_webhook_url(all_success: bool) -> Optional[str]:
"""
Retrieve and validate the Slack webhook URL from the environment.
This ensures we only send run metadata to real Slack webhook endpoints.
"""
env_var = "SLACK_WEBHOOK_URL_SUCCESS" if all_success else "SLACK_WEBHOOK_URL_FAILED"
url = os.environ.get(env_var)
if not url:
return None
# Basic validation: only allow standard Slack incoming webhook URLs.
if not url.startswith("https://hooks.slack.com/services/"):
print(f"⚠️ Ignoring invalid Slack webhook URL from {env_var}")
return None
return url
def slack_notify(run_id: str, run_count: int, instances: list, hostname: str, branch: str, commit: str):
"""Send a single summary Slack message for the run."""
all_success = all(i.status == "success" for i in instances)
url = _get_slack_webhook_url(all_success)

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an internal tool, we could enhance this but this isn't an issue for now

@jrchatruc jrchatruc added this pull request to the merge queue Jan 9, 2026
Merged via the queue into main with commit 70c4c53 Jan 9, 2026
57 checks passed
@jrchatruc jrchatruc deleted the parallel-snapsync-test branch January 9, 2026 15:39
@github-project-automation github-project-automation bot moved this from In Review to Done in ethrex_l1 Jan 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

L1 Ethereum client

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Run hoodi, sepolia and mainnet in parallel to monitor snapsync

4 participants