Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stm cannot set linearizable read #11963

Closed
PerseveranceJing opened this issue Jun 2, 2020 · 6 comments
Closed

stm cannot set linearizable read #11963

PerseveranceJing opened this issue Jun 2, 2020 · 6 comments
Labels

Comments

@PerseveranceJing
Copy link

PerseveranceJing commented Jun 2, 2020

Problem

I find that when I call stm put/get concurrently, etcd client may panic with the following error message:

etcdserver: mvcc: required revision is a future revision

Reproduce Procedure

  1. run 3 etcd server (version 3.4.9) instances at localhost:8081, localhost:8082, localhost:8083
  2. run a client program client.go, then this program will panic.

client.go:

package main

import (
	"log"
	"runtime/debug"
	"sync"
	"time"

	"go.etcd.io/etcd/clientv3"
	"go.etcd.io/etcd/clientv3/concurrency"
)

func stmPut(stm concurrency.STM) error {
	defer func() {
		if err := recover(); err != nil {
			log.Printf("panic occurs: %v \n %s", err, debug.Stack())
			panic("stm panic")
		}
	}()
	stm.Get("key1")
	stm.Get("key2")
	stm.Put("key1", "value1")
	return nil
}

func put(cli *clientv3.Client) error {
	apply := func(stm concurrency.STM) error {
		stmPut(stm)
		return nil
	}
	concurrency.NewSTM(cli, apply)
	return nil
}

func request(wg *sync.WaitGroup, cli *clientv3.Client) error {
	for i := 0; i < 200; i++ {
		put(cli)
	}
	log.Printf("complted 200 requests.")
	wg.Done()
	return nil
}

func main() {
	cli, err := clientv3.New(clientv3.Config{
        Endpoints:   []string{"localhost:8081", "localhost:8082", "localhost:8083"},
		DialTimeout: 5 * time.Second,
	})
	if err != nil {
		log.Fatalf("create etcd client failed.")
	}

	var wg sync.WaitGroup
	for i := 0; i < 4; i++ {
		wg.Add(1)
		go request(&wg, cli)
	}
	wg.Wait()
}

Run client.go, it will panic:

panic occurs: {{11 etcdserver: mvcc: required revision is a future revision}}
 goroutine 116 [running]:
runtime/debug.Stack(0xc42027bb30, 0x9b2420, 0xc4205df5b0)
	/home/user/opt/go/src/runtime/debug/stack.go:24 +0xa7
main.stmPut.func1()
	/home/user/client.go:16 +0x5a
panic(0x9b2420, 0xc4205df5b0)
	/home/user/opt/go/src/runtime/panic.go:502 +0x229
go.etcd.io/etcd/clientv3/concurrency.(*stm).fetch(0xc4202c8d20, 0xc42051b8d0, 0x1, 0x1, 0xdfc200)
	/home/user/gosrc/go.etcd.io/etcd/clientv3/concurrency/stm.go:289 +0x408
go.etcd.io/etcd/clientv3/concurrency.(*stmSerializable).Get(0xc4202c8d20, 0xc42051b8d0, 0x1, 0x1, 0xc4204fcfd8, 0x6)
	/home/user/gosrc/go.etcd.io/etcd/clientv3/concurrency/stm.go:316 +0x1da
main.stmPut(0xac55e0, 0xc4202c8d20, 0x0, 0x0)
	/home/user/client.go:21 +0x10f
main.put.func1(0xac55e0, 0xc4202c8d20, 0xc4202ab2c0, 0x0)
	/home/user/client.go:28 +0x35
go.etcd.io/etcd/clientv3/concurrency.runSTM.func1(0xc4202ab2c0, 0xac55e0, 0xc4202c8d20, 0xa7c3f8)
	/home/user/gosrc/go.etcd.io/etcd/clientv3/concurrency/stm.go:156 +0x83
created by go.etcd.io/etcd/clientv3/concurrency.runSTM

Analysis

panic happens at etcd/clientv3/concurrency/stm.go:

func (s *stmSerializable) Get(keys ...string) string {
	...
	resp := s.stm.fetch(keys...)  // NOTE: program panics here
    if firstRead {
        s.getOpts = []v3.OpOption{
            v3.WithRev(resp.Header.Revision),
            v3.WithSerializable(), // NOTE: if we comment this line, panic will disapper.
        }
    }
    return respToValue(resp)
}

WithSerializable() is defined in clientv3/op.go:

// WithSerializable makes 'Get' request serializable. By default,
// it's linearizable. Serializable requests are better for lower latency
// requirement.
func WithSerializable() OpOption {
    return func(op *Op) { op.serializable = true }
}

From its comments we see that when WithSerializable is set, stm get operation will not be executed linearly, may be this is the reason for this panic?
Since WithSerializable option is not configurable in stmSerializable.Get(), how to run stmSerializable.Get() linearly?

@nrailg
Copy link

nrailg commented Jul 3, 2020

Same problem. I tend to believe this is a bug?

@PerseveranceJing PerseveranceJing changed the title etcdserver: mvcc: required revision is a future revision - stm Get stm cannot set linearizable read Jul 14, 2020
@GeorgeMac
Copy link

Also experiencing this issue. We recently changed the way we connected to our etcd cluster.
We went from a horizontal pool of apps connecting to one etcd node each (round-robin assignment) to using the client side load balancing. So we ended up with round-robin requests.

However, this lead to the same situation above. I assume because the subsequent reads are happening on other nodes which are behind.

@GeorgeMac
Copy link

Trying to get my head around this. I am probably very mistaken, but here goes. The docs state:

Revision - the revision of the key-value store when generating the response.

This is the same revision chosen to predicate subsequent reads after the first in the STM.

What seems concerning is that a range request can return a revision here which other nodes might not have reached yet.
That seems wrong? Given that the first read is configured as linearalizable.

Or if this is not something that can be relied upon, how would one get that for the first read / range?

@stale
Copy link

stale bot commented Dec 24, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Dec 24, 2020
@stale stale bot closed this as completed Jan 14, 2021
@serathius serathius reopened this Feb 18, 2022
@stale stale bot removed the stale label Feb 18, 2022
@serathius
Copy link
Member

serathius commented Feb 18, 2022

Do't know much about STM implementation. I think it would be best to reach out to @heyitsanthony who wrote, and xiang90 who reviewed STM code in #4760.

@stale
Copy link

stale bot commented Jun 12, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jun 12, 2022
@stale stale bot closed this as completed Jul 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

4 participants