fix: subscribe non-aggregator validators to their attestation subnets#293
Conversation
Non-aggregator validator nodes were not joining the attestation mesh, so their attestations could only propagate via gossipsub fanout. In small devnets this collapses the mesh whenever peer churn thins out the single subscriber — observed as a permanent finalization stall with three validators voting three different targets. Align with leanSpec (src/lean_spec/__main__.py): every validator subscribes to its own subnet; aggregators additionally subscribe to explicit aggregate-subnet-ids with a fallback to subnet 0 when they have no validators of their own. Also restores the pre-regression semantics of the lean_attestation_committee_subnet metric, which should reflect this node's committee membership and therefore derive from validator subnets only, not from aggregator-only subscriptions. Regression introduced by #265; prior correct behavior was in #249.
🤖 Kimi Code ReviewOverall this is a clean refactor that aligns the implementation with the lean specification, but there are two issues to address: 1. Logic order bug in aggregator fallback (Line ~218-226) The fallback to subnet 0 now occurs before adding explicit aggregate subnet IDs, which changes behavior from the original code: // Current (new) logic:
if config.is_aggregator {
if subscription_subnets.is_empty() { // Check happens first
subscription_subnets.insert(0);
}
if let Some(ref explicit_ids) = config.aggregate_subnet_ids {
subscription_subnets.extend(explicit_ids); // Added after
}
}Previous behavior: An aggregator with no validators but with explicit New behavior: The same aggregator now subscribes to subnet 0 plus the explicit subnets. If the intent is to only fallback when there are neither validators nor explicit subnets, move the emptiness check after the if config.is_aggregator {
if let Some(ref explicit_ids) = config.aggregate_subnet_ids {
subscription_subnets.extend(explicit_ids);
}
if subscription_subnets.is_empty() {
subscription_subnets.insert(0);
}
}2. Division by zero risk (Line ~210)
Minor notes:
The refactoring improves maintainability by consolidating the subscription logic into a single iteration loop. Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt |
🤖 Claude Code ReviewNow I have enough context for a thorough review. PR #293 Review —
|
Greptile SummaryThis PR restores spec-compliant attestation-subnet subscription: every node with registered validators subscribes to its derived subnets unconditionally, enabling gossipsub mesh participation for non-aggregator peers. It also fixes the Confidence Score: 4/5Safe to merge; the core fix is correct and the one remaining finding is a P2 edge-case in the fallback ordering. The primary change (subscribing non-aggregators to their validator subnets) is correct and well-motivated. The only finding is a minor ordering inconsistency in the aggregator fallback: the subnet-0 safety net fires before explicit aggregate_subnet_ids are merged, so an aggregator with no validators but with explicit subnet IDs picks up an extra spurious subscription to subnet 0. Harmless in practice but a minor semantic regression. crates/net/p2p/src/lib.rs lines 225–231 (fallback ordering)
|
| Filename | Overview |
|---|---|
| crates/net/p2p/src/lib.rs | Unconditionally subscribes validators to their derived attestation subnets; aggregators additionally extend with explicit IDs. Minor ordering issue: subnet-0 fallback fires before explicit aggregate_subnet_ids are merged, causing an extra subscription in the edge case of an aggregator with no validators but explicit subnet IDs configured. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[build_swarm called] --> B[Compute validator_subnets\nfrom validator_ids % committee_count]
B --> C[Set metric_subnet = min validator_subnets\nunwrap_or 0]
C --> D{is_aggregator?}
D -- No --> G[subscription_subnets = validator_subnets]
D -- Yes --> E{subscription_subnets empty?}
E -- Yes --> F[Insert subnet 0]
E -- No --> H[Extend with aggregate_subnet_ids]
F --> H
G --> I[For each subnet_id in subscription_subnets]
H --> I
I --> J[subscribe to gossipsub topic\nbuild attestation_topics cache\nlog Subscribed to attestation subnet]
J --> K[Return BuiltSwarm]
Prompt To Fix All With AI
This is a comment left during a code review.
Path: crates/net/p2p/src/lib.rs
Line: 225-231
Comment:
**Fallback to subnet 0 fires before explicit IDs are merged**
The empty-check on `subscription_subnets` runs before `aggregate_subnet_ids` are folded in, so an aggregator configured with no `validator_ids` but with explicit `aggregate_subnet_ids` (e.g. `[5]`) will subscribe to **both** subnet 0 and subnet 5 instead of just subnet 5. The old code deferred the fallback until after extending with explicit IDs, so it only triggered when the set was still empty after both sources were considered. Moving the explicit-ID extension before the fallback check restores that behaviour:
```suggestion
if config.is_aggregator {
if let Some(ref explicit_ids) = config.aggregate_subnet_ids {
subscription_subnets.extend(explicit_ids);
}
// Aggregator with no validators and no explicit subnets: fallback to subnet 0
if subscription_subnets.is_empty() {
subscription_subnets.insert(0);
}
}
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "fix: subscribe non-aggregator validators..." | Re-trigger Greptile
| if config.is_aggregator { | ||
| let mut aggregate_subnets: HashSet<u64> = config | ||
| .validator_ids | ||
| .iter() | ||
| .map(|vid| vid % config.attestation_committee_count) | ||
| .collect(); | ||
| if let Some(ref explicit_ids) = config.aggregate_subnet_ids { | ||
| aggregate_subnets.extend(explicit_ids); | ||
| } | ||
| // Aggregator with no validators and no explicit subnets: fallback to subnet 0 | ||
| if aggregate_subnets.is_empty() { | ||
| aggregate_subnets.insert(0); | ||
| if subscription_subnets.is_empty() { | ||
| subscription_subnets.insert(0); | ||
| } | ||
| for &subnet_id in &aggregate_subnets { | ||
| let topic = attestation_subnet_topic(subnet_id); | ||
| swarm.behaviour_mut().gossipsub.subscribe(&topic)?; | ||
| info!(subnet_id, "Subscribed to attestation subnet"); | ||
| if let Some(ref explicit_ids) = config.aggregate_subnet_ids { | ||
| subscription_subnets.extend(explicit_ids); | ||
| } |
There was a problem hiding this comment.
Fallback to subnet 0 fires before explicit IDs are merged
The empty-check on subscription_subnets runs before aggregate_subnet_ids are folded in, so an aggregator configured with no validator_ids but with explicit aggregate_subnet_ids (e.g. [5]) will subscribe to both subnet 0 and subnet 5 instead of just subnet 5. The old code deferred the fallback until after extending with explicit IDs, so it only triggered when the set was still empty after both sources were considered. Moving the explicit-ID extension before the fallback check restores that behaviour:
| if config.is_aggregator { | |
| let mut aggregate_subnets: HashSet<u64> = config | |
| .validator_ids | |
| .iter() | |
| .map(|vid| vid % config.attestation_committee_count) | |
| .collect(); | |
| if let Some(ref explicit_ids) = config.aggregate_subnet_ids { | |
| aggregate_subnets.extend(explicit_ids); | |
| } | |
| // Aggregator with no validators and no explicit subnets: fallback to subnet 0 | |
| if aggregate_subnets.is_empty() { | |
| aggregate_subnets.insert(0); | |
| if subscription_subnets.is_empty() { | |
| subscription_subnets.insert(0); | |
| } | |
| for &subnet_id in &aggregate_subnets { | |
| let topic = attestation_subnet_topic(subnet_id); | |
| swarm.behaviour_mut().gossipsub.subscribe(&topic)?; | |
| info!(subnet_id, "Subscribed to attestation subnet"); | |
| if let Some(ref explicit_ids) = config.aggregate_subnet_ids { | |
| subscription_subnets.extend(explicit_ids); | |
| } | |
| if config.is_aggregator { | |
| if let Some(ref explicit_ids) = config.aggregate_subnet_ids { | |
| subscription_subnets.extend(explicit_ids); | |
| } | |
| // Aggregator with no validators and no explicit subnets: fallback to subnet 0 | |
| if subscription_subnets.is_empty() { | |
| subscription_subnets.insert(0); | |
| } | |
| } |
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/net/p2p/src/lib.rs
Line: 225-231
Comment:
**Fallback to subnet 0 fires before explicit IDs are merged**
The empty-check on `subscription_subnets` runs before `aggregate_subnet_ids` are folded in, so an aggregator configured with no `validator_ids` but with explicit `aggregate_subnet_ids` (e.g. `[5]`) will subscribe to **both** subnet 0 and subnet 5 instead of just subnet 5. The old code deferred the fallback until after extending with explicit IDs, so it only triggered when the set was still empty after both sources were considered. Moving the explicit-ID extension before the fallback check restores that behaviour:
```suggestion
if config.is_aggregator {
if let Some(ref explicit_ids) = config.aggregate_subnet_ids {
subscription_subnets.extend(explicit_ids);
}
// Aggregator with no validators and no explicit subnets: fallback to subnet 0
if subscription_subnets.is_empty() {
subscription_subnets.insert(0);
}
}
```
How can I resolve this? If you propose a fix, please make it concise.
🤖 Codex Code Review
No consensus-core logic changed here, so I did not find fork-choice / finalization / state-transition issues in this PR. Main residual risk is missing coverage for the new non-aggregator subscription mode. Automated review by OpenAI Codex · gpt-5.4 · custom prompt |
…gregators Non-aggregator nodes subscribe to their attestation subnet so they participate in the gossipsub mesh. They will therefore receive unaggregated attestations from peers, which is expected — only aggregators need to store and aggregate them. The warning was high-volume noise; the early return stays, but the log is removed.
leanSpec's on_gossip_attestation validates the attestation data and verifies the XMSS signature on every node that receives the message. Only the signature storage step (used later for aggregation at interval 2) is gated by is_aggregator. Non-aggregators validate, verify, and drop. Previous commit short-circuited before validation to save XMSS cost; aligning with the spec so bad signatures are still caught at the edge. The is_aggregator flag now flows into store::on_gossip_attestation.
An aggregator with no validators but with --aggregate-subnet-ids was picking up subnet 0 in addition to its explicit subnets, because the emptiness check fired before aggregate_subnet_ids were folded in. Move the fallback to run after the extend so it only applies when the set is truly empty, matching the pre-#265 semantics.
Since every validator now subscribes to its own attestation subnet (#293), the transient "no remote peer knows of this subnet yet" condition is no longer part of the expected publish path. If it surfaces, it's a real signal (partition, misconfigured peer) and deserves the warning. Removes: - SwarmHandle::publish_ignore_no_peers and the ignore_no_peers flag on SwarmCommand::Publish - The is_aggregator field on BuiltSwarm / P2PServer (only used to pick between the two publish paths) - The aggregator-only branch in publish_attestation Publish errors are now logged uniformly via inspect_err on the single publish path in the swarm adapter.
Summary
--is-aggregatorflag, so non-aggregator validator nodes join the gossipsub mesh and forward attestations instead of relying on fanout alone.lean_attestation_committee_subnetmetric to derive from validator subnets only, matching its pre-fix: do not subscribe to subnets when not aggregating #265 semantics (the metric reports this node's committee membership, not aggregator-only subscriptions).Motivation
On a multi-client devnet (ethlambda_0 aggregator + ethlambda_1 + grandine_0 + gean_0, single attestation subnet), finalization stalled at slot ~50 with validators voting three different attestation targets. gean's attestations never reached ethlambda_0 once peer churn thinned the attestation subnet mesh: gean was publishing via fanout, but with only one subscriber (ethlambda_0) and transient peer drops, the mesh collapsed and could not recover.
libp2p-gossipsub(including the v1.2/v1.3 features in our fork) couples subscription with mesh participation. A node that does notsubscribe()to a topic cannot be GRAFTed by peers and therefore cannot relay messages for it.flood_publishis publisher-side only and does not change this.leanSpec (
src/lean_spec/__main__.py) prescribes: any node with registered validators subscribes to its validator-derived subnets, so all validator peers contribute to the mesh. This PR aligns ethlambda with that rule.