Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: v3.0.0 gc may failed #76

Closed
stormluke opened this issue Nov 18, 2022 · 5 comments
Closed

Bug Report: v3.0.0 gc may failed #76

stormluke opened this issue Nov 18, 2022 · 5 comments

Comments

@stormluke
Copy link

Thanks for this great software! In my usage scenario, i observed that cache size keep going up and dropped equals 0 for a long time, this eventually result in oom. I think there maybe some bug here.

Here is a test code:

func TestPrune(t *testing.T) {
	maxSize := int64(5000)
	cache := ccache.New(ccache.Configure[string]().MaxSize(maxSize))
	epoch := 0
	for {
		epoch += 1
		expired := make([]string, 0)
		for i := 0; i < 50; i += 1 {
			key := strconv.FormatInt(rand.Int63n(maxSize*20), 10)
			item := cache.Get(key)
			if item == nil || item.TTL() > 1*time.Minute {
				expired = append(expired, key)
			}
		}
		for _, key := range expired {
			cache.Set(key, key, 5*time.Minute)
		}
		if epoch%5000 == 0 {
			size := cache.GetSize()
			dropped := cache.GetDropped()
			fmt.Printf("size=%d dropped=%d\n", size, dropped)
			time.Sleep(100 * time.Millisecond)
		}
	}
}

When running this code, the size will greater than 5000, and dropped keep equals 0.

=== RUN   TestPrune
size=30270 dropped=171439
size=48587 dropped=149851
size=7654 dropped=225531
size=42967 dropped=146521
size=28343 dropped=162295
size=93191 dropped=195
size=98497 dropped=0
size=93155 dropped=15731
size=98476 dropped=0
size=98913 dropped=0
size=98936 dropped=0
size=98965 dropped=0

This may be the reason:

  • a key is updated, the existing oldItem is send to deletables but not consumed
  • gc() triggered, oldItem removed by List.Remove(), oldItem.node.Prev set to nil
  • deletables consumed, oldItem removed again by List.Remove(), l.Tail set to nil (oldItem.node.Prev)
  • the upcoming gc() will be skipped, because node = c.list.Tail = nil, size keep going up and dropped keep equals 0
  • if other items in deletables not been gc, l.Tail maybe set to non-nil, gc() may recovered, but it will fail in the furture with the same reason
func (c *Cache[T]) gc() int {
   dropped := 0
   node := c.list.Tail

   itemsToPrune := int64(c.itemsToPrune)
   if min := c.size - c.maxSize; min > itemsToPrune {
      itemsToPrune = min
   }

   for i := int64(0); i < itemsToPrune; i++ {
      if node == nil { // gc is skipped if c.list.Tail = nil
         return dropped
      }
      prev := node.Prev
      item := node.Value
      if c.tracking == false || atomic.LoadInt32(&item.refCount) == 0 {
         c.bucket(item.key).delete(item.key)
         c.size -= item.size
         c.list.Remove(node)
         if c.onDelete != nil {
            c.onDelete(item)
         }
         dropped += 1
         item.promotions = -2
      }
      node = prev
   }
   return dropped
}

func (l *List[T]) Remove(node *Node[T]) {
   next := node.Next
   prev := node.Prev

   if next == nil {
      l.Tail = node.Prev // second Remove will make Tail = nil
   } else {
      next.Prev = prev
   }

   if prev == nil {
      l.Head = node.Next
   } else {
      prev.Next = next
   }
   node.Next = nil
   node.Prev = nil
}
karlseguin added a commit that referenced this issue Nov 19, 2022
As documented in #76, an entry which
is both GC'd and deleted (either via a delete or an update) will result in the
internal link list having a nil tail (because removing the same node multiple times
from the linked list does that).

doDelete was already aware of "invalid" nodes (where item.node == nil), so the
solution seems to be as simple as setting item.node = nil during GC.
@karlseguin
Copy link
Owner

thanks for the detailed report, it really helped narrow it down. I added a slightly modified version of your test :)

@stormluke
Copy link
Author

I think there is still a problem here. If a node is in both the deletables and the promotables, and deletables is consumed before promotables, the node will be Remove() twice, causing gc() to fail.

  • deletables -> doDelete() -> Remove()
  • promotables -> doPromote() -> MoveToFront() -> Remove()
  • gc() node == nil, then fail

If you keep this test case running, it will fail

func Test_CachePrune(t *testing.T) {
	maxSize := int64(500)
	cache := New(Configure[string]().MaxSize(maxSize).ItemsToPrune(50))
	epoch := 0
	for {
		epoch += 1
		expired := make([]string, 0)
		for i := 0; i < 50; i += 1 {
			key := strconv.FormatInt(rand.Int63n(maxSize*20), 10)
			item := cache.Get(key)
			if item == nil || item.TTL() > 1*time.Minute {
				expired = append(expired, key)
			}
		}
		for _, key := range expired {
			cache.Set(key, key, 5*time.Minute)
		}
		if epoch%500 == 0 {
			fmt.Printf("size=%d, dropped=%d\n", cache.GetSize(), cache.GetDropped())
		}
		if cache.GetSize() > 5000 {
			fmt.Printf("size=%d, dropped=%d\n", cache.GetSize(), cache.GetDropped())
			t.Fail()
			return
		}
	}
}

I think the fix is let list.node = nil (and maybe item.promotions = -2) after Remove() in doDelete(), same as in gc().

func (c *Cache[T]) doDelete(item *Item[T]) {
	if item.node == nil {
		item.promotions = -2
	} else {
		c.size -= item.size
		if c.onDelete != nil {
			c.onDelete(item)
		}
		c.list.Remove(item.node)
		item.node = nil // fix
                item.promotions = -2 // fix
	}
}

@karlseguin
Copy link
Owner

FYI, not sure I'll be able to look into this until early next week, sorry.

karlseguin added a commit that referenced this issue Dec 13, 2022
Also, item.promotions doesn't need to be loaded/stored using atomic. Once upon a
time it did. Cache was updated long ago to not use atomic operations on it, but
LayeredCache wasn't. They are both consistent now (they don't use atomic
operations).

Fixes: #76
@stormluke
Copy link
Author

I think this commit solves the issue, could you release a new tag? thanks.

@karlseguin
Copy link
Owner

sure, done

kahalorah9f added a commit to kahalorah9f/ccache that referenced this issue Aug 27, 2024
As documented in karlseguin/ccache#76, an entry which
is both GC'd and deleted (either via a delete or an update) will result in the
internal link list having a nil tail (because removing the same node multiple times
from the linked list does that).

doDelete was already aware of "invalid" nodes (where item.node == nil), so the
solution seems to be as simple as setting item.node = nil during GC.
kahalorah9f added a commit to kahalorah9f/ccache that referenced this issue Aug 27, 2024
Also, item.promotions doesn't need to be loaded/stored using atomic. Once upon a
time it did. Cache was updated long ago to not use atomic operations on it, but
LayeredCache wasn't. They are both consistent now (they don't use atomic
operations).

Fixes: karlseguin/ccache#76
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants