Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client-go - ThreadSafeMap indicies - empty sets not deleted - memory increase with high cardinality short lived resources #84959

Closed
pdzwart-atlassian opened this issue Nov 8, 2019 · 7 comments · Fixed by #84970

Comments

@pdzwart-atlassian
Copy link
Contributor

@pdzwart-atlassian pdzwart-atlassian commented Nov 8, 2019

What happened:

@kubernetes/sig-api-machinery-bugs

Consider this use case, a k8s cluster with many short lived name spaces with a high cardinality. (Think UUIDs).

Over time, some of the threadSafeMap indices contain string sets for each name space with no objects in the backing string sets; which are never reused.

Leads to process memory increasing over time until the process exceeds available memory. Potentially causing issues as an api-server or controller is terminated by the operating system.

What you expected to happen:

Process memory does not increase unbounded as the number of unique items added then deleted from a thread safe map, causing indices to hold on to zero length string sets for the key.

This could be mitigated when an index's string set for a key is empty, that string set is removed from the index.

How to reproduce it (as minimally and precisely as possible):

Thanks to @awprice for this reproduction code:

package main

import (
    "github.com/google/uuid"
    "k8s.io/client-go/tools/cache"
)

func main() {
    indexers := cache.Indexers{
        "bla": func(obj interface{}) (strings []string, e error) {
            indexes := []string{obj.(string)}
            return indexes, nil
        },
    }
    indices := cache.Indices{}
    store := cache.NewThreadSafeStore(indexers, indices)
    for {
        key := uuid.New().String()
        store.Add(key, key)
        store.Delete(key)
    }
}

Anything else we need to know?:

Can be mitigated with the following patch against master at 75aca1f:

diff --git a/staging/src/k8s.io/client-go/tools/cache/thread_safe_store.go b/staging/src/k8s.io/client-go/tools/cache/thread_safe_store.go
index 33e6239a69..1151a33cd0 100644
--- a/staging/src/k8s.io/client-go/tools/cache/thread_safe_store.go
+++ b/staging/src/k8s.io/client-go/tools/cache/thread_safe_store.go
@@ -292,6 +292,10 @@ func (c *threadSafeMap) deleteFromIndices(obj interface{}, key string) {
                        set := index[indexValue]
                        if set != nil {
                                set.Delete(key)
+
+                               if len(set) == 0 {
+                                       delete(index, indexValue)
+                               }
                        }
                }
        }

threadSafeMap.updateIndices() has a code path for creating sets that don't exist, so if a set is temporarily reduced to zero; it will be re-created as if it was the initial case where the set was never created.

Here's an apiserver pprof heap base png from a clean start for a test case of ≈800k name space create/delete in kube v1.14.7:
https://raw.githubusercontent.com/pdzwart-atlassian/github-issue-attachments/master/kubernetes/kubernetes/84959/k8s_apiserver_v1.14.7_apiserver_heap_800k_ns_add-del.png

Note the increase in heap utilisation from base of threadSafeMap and sets nodes in the hot path. Similar increases were observed in v1.13.11 and v1.15.5.

Same as above, but with the suggested fix and ≈100k name space create/delete:
https://raw.githubusercontent.com/pdzwart-atlassian/github-issue-attachments/master/kubernetes/kubernetes/84959/k8s_apiserver_v1.14.7_apiserver_heap_fixed_100k_ns_add-del.png

Note lack of threadSafeMap and sets nodes in the hot path.

(Was not able to directly attach the images to this issue due to Something went really wrong, and we can't process that file.)

Environment:

  • Kubernetes version (use kubectl version): Issue reproduced in v1.13.11, v1.14.7, 1.15.5 directly by adding/deleting namespaces. Above mitigation directly tested against a v1.14.7 cluster.
@pdzwart-atlassian

This comment has been minimized.

Copy link
Contributor Author

@pdzwart-atlassian pdzwart-atlassian commented Nov 8, 2019

/sig api-machinery

@tedyu

This comment has been minimized.

Copy link
Contributor

@tedyu tedyu commented Nov 8, 2019

Edit:
I found threadSafeMap.updateIndices() - was searching with misspelled func name.

@awprice

This comment has been minimized.

Copy link

@awprice awprice commented Nov 8, 2019

@tedyu I'm not sure I follow, but updateIndices is still present - https://github.com/kubernetes/client-go/blob/master/tools/cache/thread_safe_store.go

@pdzwart-atlassian

This comment has been minimized.

Copy link
Contributor Author

@pdzwart-atlassian pdzwart-atlassian commented Nov 8, 2019

Nah, my bad; didn't spell the method right originally in the text.

@tedyu

This comment has been minimized.

Copy link
Contributor

@tedyu tedyu commented Nov 8, 2019

One option is to mark the time when set becomes empty:

type setWithTimestamp struct {
	set       sets.String
	// timestamp marks the time when the set becomes empty
	timestamp time.Time
}

type Index map[string]setWithTimestamp

We can clean Index entry when certain period has passed since the set becomes empty.

@awprice

This comment has been minimized.

Copy link

@awprice awprice commented Nov 8, 2019

@tedyu I really don't think that solution is a viable and adds unneeded complexity. You would then need to have something running in the background to prune empty sets.

The solution proposed in the original post is the most straight forward, when the set is empty - delete it. If something is then created that needs the set, it will be recreated when needed.

@tedyu

This comment has been minimized.

Copy link
Contributor

@tedyu tedyu commented Nov 8, 2019

I think that is fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.