Unexpected behaviour duing continious master election #14

igor-petruk · 2020-12-28T17:04:57Z

Hi,

I am trying to write a code for a cluster, where only one node would periodially download a file and store it for everyone else. I don't want to put the file via promote because I want to know who is the leader first, so that only that node does work. I plan to write the file to another key and let everyone observe. I could probably still use promote, but the file should still be readable if it is stale and I'll probably have to put an empty value during initial campaign call.

Anyway, it seems like I cannot implement a continious master election. I am having spurious wake-up from client.campaign(...) for both leaders and not leaders. This is due to lease expiry during the campaign call.

Here is the code that reproduces the issue

use etcd_client::*;
use tracing::info;

async fn elect_and_work(
    idx: i32,
    mut etcd_client: Client,
    etcd_path: String,
) -> anyhow::Result<()> {
    loop {
        let resp = etcd_client.lease_grant(10, None).await?;
        let lease_id = resp.id();
        // info!("grant ttl:{:?}, id:{:?}", resp.ttl(), resp.id());
        let election_path = format!("{}/test_election", etcd_path);

        // campaign
        let resp = etcd_client
            .campaign(election_path.as_str(), "123", lease_id)
            .await?;
        let leader = resp.leader().unwrap();
        info!(
            "Won election name: idx={}, {:?}, leaseId:{:?}, my_lease: {:?}",
            idx,
            leader.name_str(),
            leader.lease(),
            lease_id,
        );
        let (mut keeper, mut keeper_responses) = etcd_client.lease_keep_alive(lease_id).await?;
        for _ in 0..5 {
            info!("Keeping alive (idx={})...", idx);
            keeper.keep_alive().await?;
            if let Some(resp) = keeper_responses.message().await? {
                info!(
                    "Lease on idx={}, {:?} kept alive, new ttl {:?}",
                    idx,
                    resp.id(),
                    resp.ttl()
                );
            }
            tokio::time::delay_for(tokio::time::Duration::from_secs(3)).await;
        }
    }
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    ::std::env::set_var("RUST_LOG", "election_repro=info");
    let endpoints = vec!["http://127.0.0.1:2379"];
    tracing_subscriber::fmt::init();
    let etcd_client = etcd_client::Client::connect(endpoints, None).await?;

    let mut join_handles = vec![];

    for idx in 0..5 {
        let client = etcd_client.clone();
        let join = tokio::spawn(async move {
            elect_and_work(idx, client, "/election_repro".into())
                .await
                .unwrap();
        });
        join_handles.push(join);
    }
    futures::future::join_all(join_handles).await;

    Ok(())
}

The log:

Dec 28 16:55:42.986  INFO election_repro: Won election name: idx=0, Ok("/election_repro/test_election"), leaseId:7587851380158607485, my_lease: 7587851380158607485
Dec 28 16:55:42.987  INFO election_repro: Keeping alive (idx=0)...
Dec 28 16:55:42.988  INFO election_repro: Lease on idx=0, 7587851380158607485 kept alive, new ttl 10
Dec 28 16:55:45.989  INFO election_repro: Keeping alive (idx=0)...
Dec 28 16:55:45.991  INFO election_repro: Lease on idx=0, 7587851380158607485 kept alive, new ttl 10
Dec 28 16:55:48.992  INFO election_repro: Keeping alive (idx=0)...
Dec 28 16:55:48.993  INFO election_repro: Lease on idx=0, 7587851380158607485 kept alive, new ttl 10
Dec 28 16:55:51.994  INFO election_repro: Keeping alive (idx=0)...
Dec 28 16:55:51.995  INFO election_repro: Lease on idx=0, 7587851380158607485 kept alive, new ttl 10
Dec 28 16:55:54.996  INFO election_repro: Keeping alive (idx=0)...
Dec 28 16:55:54.998  INFO election_repro: Lease on idx=0, 7587851380158607485 kept alive, new ttl 10
Dec 28 16:56:05.031  INFO election_repro: Won election name: idx=4, Ok("/election_repro/test_election"), leaseId:7587851380158607493, my_lease: 7587851380158607493
Dec 28 16:56:05.031  INFO election_repro: Won election name: idx=1, Ok("/election_repro/test_election"), leaseId:7587851380158607487, my_lease: 7587851380158607487
Dec 28 16:56:05.032  INFO election_repro: Won election name: idx=0, Ok("/election_repro/test_election"), leaseId:7587851380158607512, my_lease: 7587851380158607512
Dec 28 16:56:05.032  INFO election_repro: Won election name: idx=2, Ok("/election_repro/test_election"), leaseId:7587851380158607489, my_lease: 7587851380158607489
Dec 28 16:56:05.032  INFO election_repro: Won election name: idx=3, Ok("/election_repro/test_election"), leaseId:7587851380158607491, my_lease: 7587851380158607491
Dec 28 16:56:05.033  INFO election_repro: Keeping alive (idx=1)...
Dec 28 16:56:05.034  INFO election_repro: Keeping alive (idx=4)...
Dec 28 16:56:05.034  INFO election_repro: Keeping alive (idx=3)...
Dec 28 16:56:05.034  INFO election_repro: Keeping alive (idx=0)...
Dec 28 16:56:05.034  INFO election_repro: Keeping alive (idx=2)...
Dec 28 16:56:05.035  INFO election_repro: Lease on idx=3, 7587851380158607491 kept alive, new ttl 0
Dec 28 16:56:05.035  INFO election_repro: Lease on idx=1, 7587851380158607487 kept alive, new ttl 0
Dec 28 16:56:05.035  INFO election_repro: Lease on idx=4, 7587851380158607493 kept alive, new ttl 0
Dec 28 16:56:05.035  INFO election_repro: Lease on idx=2, 7587851380158607489 kept alive, new ttl 0
Dec 28 16:56:05.035  INFO election_repro: Lease on idx=0, 7587851380158607512 kept alive, new ttl 10
Dec 28 16:56:08.037  INFO election_repro: Keeping alive (idx=2)...
Dec 28 16:56:08.037  INFO election_repro: Keeping alive (idx=4)...
Dec 28 16:56:08.037  INFO election_repro: Keeping alive (idx=1)...
Dec 28 16:56:08.037  INFO election_repro: Keeping alive (idx=3)...
Dec 28 16:56:08.037  INFO election_repro: Keeping alive (idx=0)...
Dec 28 16:56:08.038  INFO election_repro: Lease on idx=0, 7587851380158607512 kept alive, new ttl 10
Dec 28 16:56:08.038  INFO election_repro: Lease on idx=2, 7587851380158607489 kept alive, new ttl 0
Dec 28 16:56:08.038  INFO election_repro: Lease on idx=1, 7587851380158607487 kept alive, new ttl 0
Dec 28 16:56:08.038  INFO election_repro: Lease on idx=3, 7587851380158607491 kept alive, new ttl 0
Dec 28 16:56:08.038  INFO election_repro: Lease on idx=4, 7587851380158607493 kept alive, new ttl 0
Dec 28 16:56:11.039  INFO election_repro: Keeping alive (idx=4)...
Dec 28 16:56:11.039  INFO election_repro: Keeping alive (idx=2)...
Dec 28 16:56:11.039  INFO election_repro: Keeping alive (idx=0)...
Dec 28 16:56:11.039  INFO election_repro: Keeping alive (idx=3)...
Dec 28 16:56:11.039  INFO election_repro: Keeping alive (idx=1)...
Dec 28 16:56:11.040  INFO election_repro: Lease on idx=3, 7587851380158607491 kept alive, new ttl 0
Dec 28 16:56:11.040  INFO election_repro: Lease on idx=0, 7587851380158607512 kept alive, new ttl 10
Dec 28 16:56:11.040  INFO election_repro: Lease on idx=2, 7587851380158607489 kept alive, new ttl 0
Dec 28 16:56:11.040  INFO election_repro: Lease on idx=1, 7587851380158607487 kept alive, new ttl 0
Dec 28 16:56:11.040  INFO election_repro: Lease on idx=4, 7587851380158607493 kept alive, new ttl 0
Dec 28 16:56:14.042  INFO election_repro: Keeping alive (idx=3)...
Dec 28 16:56:14.042  INFO election_repro: Keeping alive (idx=1)...
Dec 28 16:56:14.042  INFO election_repro: Keeping alive (idx=2)...
Dec 28 16:56:14.042  INFO election_repro: Keeping alive (idx=0)...
Dec 28 16:56:14.042  INFO election_repro: Keeping alive (idx=4)...
Dec 28 16:56:14.043  INFO election_repro: Lease on idx=0, 7587851380158607512 kept alive, new ttl 10
Dec 28 16:56:14.043  INFO election_repro: Lease on idx=2, 7587851380158607489 kept alive, new ttl 0
Dec 28 16:56:14.043  INFO election_repro: Lease on idx=3, 7587851380158607491 kept alive, new ttl 0
Dec 28 16:56:14.043  INFO election_repro: Lease on idx=4, 7587851380158607493 kept alive, new ttl 0
Dec 28 16:56:14.043  INFO election_repro: Lease on idx=1, 7587851380158607487 kept alive, new ttl 0
Dec 28 16:56:17.044  INFO election_repro: Keeping alive (idx=2)...
Dec 28 16:56:17.044  INFO election_repro: Keeping alive (idx=4)...
Dec 28 16:56:17.044  INFO election_repro: Keeping alive (idx=3)...
Dec 28 16:56:17.044  INFO election_repro: Keeping alive (idx=0)...
Dec 28 16:56:17.045  INFO election_repro: Keeping alive (idx=1)...
Dec 28 16:56:17.046  INFO election_repro: Lease on idx=0, 7587851380158607512 kept alive, new ttl 10
Dec 28 16:56:17.046  INFO election_repro: Lease on idx=4, 7587851380158607493 kept alive, new ttl 0
Dec 28 16:56:17.046  INFO election_repro: Lease on idx=2, 7587851380158607489 kept alive, new ttl 0
Dec 28 16:56:17.046  INFO election_repro: Lease on idx=3, 7587851380158607491 kept alive, new ttl 0
Dec 28 16:56:17.046  INFO election_repro: Lease on idx=1, 7587851380158607487 kept alive, new ttl 0
Dec 28 16:56:27.537  INFO election_repro: Won election name: idx=0, Ok("/election_repro/test_election"), leaseId:7587851380158607519, my_lease: 7587851380158607519
Dec 28 16:56:27.539  INFO election_repro: Keeping alive (idx=0)...
Dec 28 16:56:27.539  INFO election_repro: Lease on idx=0, 7587851380158607519 kept alive, new ttl 10
Dec 28 16:56:30.541  INFO election_repro: Keeping alive (idx=0)...
Dec 28 16:56:30.542  INFO election_repro: Lease on idx=0, 7587851380158607519 kept alive, new ttl 10
Dec 28 16:56:33.543  INFO election_repro: Keeping alive (idx=0)...
Dec 28 16:56:33.544  INFO election_repro: Lease on idx=0, 7587851380158607519 kept alive, new ttl 10
Dec 28 16:56:36.546  INFO election_repro: Keeping alive (idx=0)...
Dec 28 16:56:36.547  INFO election_repro: Lease on idx=0, 7587851380158607519 kept alive, new ttl 10

Notice how first election runs correctly. Only one client gets to win the election. Everyone else waits.

The leader sends keep-alives as expected. To simulate failure it stops sending keep-alive after 5 times. Now the interesting part begins.

New election runs. Someone wins, but everyone's campaign call completes. I'd expect an error to be returned at this point, but instead everyone gets a LeaderKey with their own lease_id. Did everyone just won and lost instantly? I am not sure, but it does not look like it. Only one can successfully update keep-alive on the lease and this is idx=0 because this client got a chance to refresh the lease due to completing the loop. Still, I don't really known the leader.

Eventually they all complete this strange loop, refresh their grants and participate in the correct election again.

What is most suprising none of this ever resurns Result=Err. Is there anything wrong in my code and I should check something else? Should Err be returned on wrong lease? Both?

Thanks,
Igor.

The text was updated successfully, but these errors were encountered:

davidli2010 · 2021-01-03T05:39:32Z

You can't depend on etcd's election to do your business.

In my opinion, the etcd election service APIs exposed by etcd-client are only used for etcd's maintenance scenarios.

I am trying to write a code for a cluster, where only one node would periodially download a file and store it for everyone else.

According to the descrption, I think you should get a distributed lock firstly, then do your business. See Client::lock() and Client::unlock()

igor-petruk · 2021-01-03T12:35:50Z

Thank you for your response!

You can't depend on etcd's election to do your business.

I could not find any mention of this in the documentation. Use of etcd, Zookeeper and similar services for master election is a very common scenario. The API is not marked as internal.

Anyway, I've replaced master election with a lock, but it works in the same way. After lock-holder releases the lock by failing to keep-alive the lease - everyone awakens. They all get Ok(LockResponse(...)) and without additional calls I cannot determine who won. Probably master election is a very lightweight wrapper over a lock, so it has the same behavior.

During the lock-holder expiry, others were blocking on their original lease which is long expired. This is when the first surprise happens - Ok(LockResponse), I'd expect an error.

When I try to immediately run a keep-alive, I get ttl()=0 and break.

let (mut keeper, mut keeper_responses) = etcd_client.lease_keep_alive(lease_id).await?;
keeper.keep_alive().await?;
if let Some(resp) = keeper_responses.message().await? {
  if resp.ttl() == 0 {
    break;  // Let's create a new lease and start over.
  }
  // I win!

This is the only way that I see so far to figure out who won. Ok(LockResponse) is completely not informative as it can spuriously be returned without actually acquiring the lock.

davidli2010 · 2021-01-03T16:15:22Z

You can't depend on etcd's election to do your business.

Sorry, maybe I misled you.

In fact, the election and lock interfaces are not included in etcd API Reference.

In golang, election and lock are not in clientv3, but etcdserver. So I think election and lock APIs are used by etcd server, not client. There is a concurrency package implements concurrency operations on top of etcd such as distributed locks, barriers, and elections.

Perhaps you need a concurrency-like crate in rust.

davidli2010 closed this as completed Jan 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected behaviour duing continious master election #14

Unexpected behaviour duing continious master election #14

igor-petruk commented Dec 28, 2020

davidli2010 commented Jan 3, 2021

igor-petruk commented Jan 3, 2021

davidli2010 commented Jan 3, 2021

Unexpected behaviour duing continious master election #14

Unexpected behaviour duing continious master election #14

Comments

igor-petruk commented Dec 28, 2020

davidli2010 commented Jan 3, 2021

igor-petruk commented Jan 3, 2021

davidli2010 commented Jan 3, 2021