Skip to content

Commit

Permalink
The super large grace period of 1 day has proved to be harmful on (#4986
Browse files Browse the repository at this point in the history
)

Cicada.

This PR lowers it to 2h.
For reminder, starting the detection of the node as dead,
the node gets into a zombie state for 1h.

We do share its KVs.

From timeofdeath+1h to timeofdeath+2h, we won't share the node.

After 2h, we will delete the node from the state.
  • Loading branch information
fulmicoton committed May 16, 2024
1 parent 1a58b71 commit c5c2b70
Showing 1 changed file with 6 additions and 1 deletion.
7 changes: 6 additions & 1 deletion quickwit/quickwit-cluster/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ mod metrics;
mod node;

use std::net::SocketAddr;
use std::time::Duration;

use async_trait::async_trait;
pub use chitchat::transport::ChannelTransport;
Expand Down Expand Up @@ -146,13 +147,17 @@ pub async fn start_cluster_service(node_config: &NodeConfig) -> anyhow::Result<C
indexing_tasks,
indexing_cpu_capacity,
};
let failure_detector_config = FailureDetectorConfig {
dead_node_grace_period: Duration::from_secs(2 * 60 * 60), // 2 hours
..Default::default()
};
let cluster = Cluster::join(
cluster_id,
self_node,
gossip_listen_addr,
peer_seed_addrs,
node_config.gossip_interval,
FailureDetectorConfig::default(),
failure_detector_config,
&CountingUdpTransport,
)
.await?;
Expand Down

0 comments on commit c5c2b70

Please sign in to comment.