Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a low-frequency get may got wrong item #401

Closed
GoogleCodeExporter opened this issue Nov 5, 2015 · 5 comments
Closed

a low-frequency get may got wrong item #401

GoogleCodeExporter opened this issue Nov 5, 2015 · 5 comments

Comments

@GoogleCodeExporter
Copy link

What steps will reproduce the problem?
1. run memcached, set tail_repair_time to help quick reproduce:
[21:13:01]~/prog/memcached-1.4.22 $ ./memcached -m 5000 -o tail_repair_time=10

2. run the program below(mc.go), line 39 show never print but it does.
  1 package main
  2 
  3 import (
  4         "bytes"  
  5         "flag"
  6         "fmt"
  7         "runtime"
  8         "time"
  9 
 10         "github.com/bradfitz/gomemcache/memcache"
 11 )
 12 
 13 var addr = flag.String("addr", "localhost:11211", "addr of memcached")
 14 var sleep = flag.Int("sleep", 4, "time to slee")
 15 
 16 func main() {
 17 
 18         flag.Parse()
 19 
 20         runtime.GOMAXPROCS(runtime.NumCPU())
 21 
 22         abc := memcache.New(*addr)
 23         def := memcache.New(*addr)
 24 
 25         abc.Set(&memcache.Item{
 26                 Key:   "ctw9tjk",
 27                 Value: []byte("abc"),
 28         })
 29 
 30         for _ = range time.Tick(time.Duration(*sleep) * time.Second) {
 31                 go func() {
 32                         fmt.Println("start get")
 33                         item, err := abc.Get("ctw9tjk")
 34                         if err != nil {
 35                                 fmt.Println("abc.Get", err)
 36                                 return
 37                         }
 38                         if !bytes.Equal(item.Value, []byte("abc")) {
 39                                 fmt.Printf("get ctw9tjk => {%v, %v}!!!!!!!!!!!!!!\n", item.Key, string(item.Value))
 40                                 return
 41                         }
 42                         fmt.Printf("get ctw9tjk => {%v, %v}\n", item.Key, string(item.Value))
 43                 }()
 44 
 45                 go func() {
 46                         fmt.Println("start set")
 47                         err := def.Set(&memcache.Item{
 48                                 Key:   "cqv87p7",
 49                                 Value: []byte("def"),
 50                         })
 51                         fmt.Println("def.Set", err)
 52                 }()
 53         }
 54 }


What is the expected output? What do you see instead?
line 39 show never print, but it does:
[21:38:07]~/code/go $ ./mc -sleep=13
start get
start set
get ctw9tjk => {ctw9tjk, abc}
def.Set <nil>
start get
start set
def.Set <nil>
get ctw9tjk => {ctw9tjk, abc}
start set
start get
def.Set <nil>
get ctw9tjk => {ctw9tjk, abc}
start get
start set
def.Set <nil>
get ctw9tjk => {cqv87p7, def}!!!!!!!!!!!!!!

What version of the product are you using? On what operating system?
1.4.22, Darwin Kernel Version 14.1.0 on mac.

Please provide any additional information below.
the logic here exactly hitted: 
https://github.com/memcached/memcached/blob/1.4.22/items.c#L150


Original issue reported on code.google.com by zarcardfly@gmail.com on 28 Mar 2015 at 1:59

@GoogleCodeExporter
Copy link
Author

I try to understand the process, tell me if i misunderstood:
1. a "get" operation hit: 
https://github.com/memcached/memcached/blob/1.4.22/memcached.c#L2961 , the 
state is it->refcount > 1 and it->time is not updated.
2. at the same time, a "set" operation is going and hit 
https://github.com/memcached/memcached/blob/1.4.22/items.c#L150, the "search" 
item happens to be the "get" item.
3. the "search" item's refcount force to be 1 and free by slabs_free!
4. the "set" operation then call slabs_alloc and get the same item.
5. the "set" operation update(overlayed) the item.
6. the "get" operation response the overlayed item to requester.

Original comment by zarcardfly@gmail.com on 28 Mar 2015 at 2:28

@GoogleCodeExporter
Copy link
Author

That is the bug with tail_repair_time, and is why we turned it off by default 
in newer versions. It's only supposed to exist to work around a legitimate 
refcount leak, which we haven't had in a while.

Do you have this bug outside of tail_repair_time?

Original comment by dorma...@rydia.net on 28 Mar 2015 at 5:36

@GoogleCodeExporter
Copy link
Author

No, thanks for your reply.

Original comment by zarcardfly@gmail.com on 29 Mar 2015 at 3:11

@GoogleCodeExporter
Copy link
Author

ok, closing this.

Original comment by dorma...@rydia.net on 29 Mar 2015 at 4:44

  • Changed state: Invalid

@GoogleCodeExporter
Copy link
Author

The help tips say that tail_repair_time is default to be 3 hours, it may need 
to be fixed, and maybe add a warning.

[10:43:34]~/prog/memcached-1.4.22 $ ./memcached -h | grep -A 2 
'tail_repair_time'
              - tail_repair_time: Time in seconds that indicates how long to wait before
                forcefully taking over the LRU tail item whose refcount has leaked.
                The default is 3 hours.

Original comment by zarcardfly@gmail.com on 30 Mar 2015 at 2:49

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant