Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add generic cache for Azure VM/LB/NSG/RouteTable #59520

Merged
merged 7 commits into from Feb 11, 2018

Conversation

@feiskyer
Copy link
Member

feiskyer commented Feb 8, 2018

What this PR does / why we need it:

Part of #58770. This PR adds a generic cache of Azure VM/LB/NSG/RouteTable for reducing ARM calls.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Part of #58770

Special notes for your reviewer:

Release note:

Add generic cache for Azure VM/LB/NSG/RouteTable
@feiskyer

This comment has been minimized.

Copy link
Member Author

feiskyer commented Feb 8, 2018

This is still WIP, the code is ready but I'm still validating it within my clusters.

@feiskyer

This comment has been minimized.

Copy link
Member Author

feiskyer commented Feb 8, 2018

@khenidak There is something wrong with Kusto. Could you help to validate this within your cluster?

@feiskyer

This comment has been minimized.

Copy link
Member Author

feiskyer commented Feb 8, 2018

/retest

@feiskyer feiskyer changed the title WIP: Add generic cache for Azure VM/LB/NSG/RouteTable Add generic cache for Azure VM/LB/NSG/RouteTable Feb 8, 2018

@feiskyer feiskyer added this to the v1.10 milestone Feb 8, 2018

@khenidak

This comment has been minimized.

Copy link
Contributor

khenidak commented Feb 8, 2018

Will do. Would be great if you have an image already built

@feiskyer

This comment has been minimized.

Copy link
Member Author

feiskyer commented Feb 8, 2018

The image is pushed at feisky/hyperkube-amd64:v1.10.0-alpha.3-352-ge5468e4

@feiskyer

This comment has been minimized.

Copy link
Member Author

feiskyer commented Feb 8, 2018

/retest

@khenidak
Copy link
Contributor

khenidak left a comment

almost there 👍

entry, exists, err := t.store.GetByKey(key)
if err != nil {
return nil, err
}
if exists {
return (entry.(*timedcacheEntry)).data, nil
return entry.(*cacheEntry), nil
}

t.lock.Lock()

This comment has been minimized.

@khenidak

khenidak Feb 8, 2018

Contributor

Can we move lock() to first like with defer unlock()

This comment has been minimized.

@feiskyer

feiskyer Feb 9, 2018

Author Member

Move to first means we will lock the entire cache while getting any keys. It is not required when the data has already been in the cache and hasn't expired yet. So here tries to get first, and only locks the cache when the key does not exist.

t.store.Add(&timedcacheEntry{

// Data is still not cached yet, cache it by getter.
if entry.data == nil {

This comment has been minimized.

@khenidak

khenidak Feb 8, 2018

Contributor

What would be the situation where we need key with nil value?

This comment has been minimized.

@feiskyer

feiskyer Feb 9, 2018

Author Member

This is something like lazy initialization, so we could update different keys in the cache at the same time while preventing update same key simultaneously.

entry.data is set nil here: https://github.com/kubernetes/kubernetes/pull/59520/files#diff-413059dc3744227dc0d21506f9883ad8R92

_ = t.store.Delete(&timedcacheEntry{
// Delete removes an item from the cache.
func (t *timedCache) Delete(key string) error {
return t.store.Delete(&cacheEntry{

This comment has been minimized.

@khenidak

khenidak Feb 8, 2018

Contributor

lock() .. unlock()

This comment has been minimized.

@feiskyer

feiskyer Feb 9, 2018

Author Member

ack, good cache

feiskyer added some commits Feb 7, 2018

@feiskyer feiskyer force-pushed the feiskyer:new-cache branch from e5468e4 to 920c136 Feb 9, 2018

@k8s-ci-robot k8s-ci-robot added size/XL and removed size/L labels Feb 9, 2018

@@ -244,6 +249,30 @@ func NewCloud(configReader io.Reader) (cloudprovider.Interface, error) {
az.vmSet = newAvailabilitySet(&az)
}

vmCache, err := az.newVMCache()

This comment has been minimized.

@karataliu

karataliu Feb 9, 2018

Contributor

'az.vmCache, err =' directly? Also for followings

This comment has been minimized.

@feiskyer

feiskyer Feb 9, 2018

Author Member

ack

done, err := processRetryResponse(resp.Response, err)
if done && err == nil {
// Invalidate the cache right after updating
az.lbCache.Delete(*sg.Name)

This comment has been minimized.

@karataliu

karataliu Feb 9, 2018

Contributor

nsgCache?

This comment has been minimized.

@feiskyer

feiskyer Feb 9, 2018

Author Member

yep, good catch

return processRetryResponse(resp, err)
done, err := processRetryResponse(resp, err)
if done && err == nil {
// Invalidate the cache right after deleting

This comment has been minimized.

@karataliu

karataliu Feb 9, 2018

Contributor

What about CreateOrUpdateVMWithRetry,CreateOrUpdateRouteTableWithRetry in this file, also need to invalidate?

This comment has been minimized.

@feiskyer

feiskyer Feb 9, 2018

Author Member

they are not required because the cache is invalidated after calling them

This comment has been minimized.

@karataliu

karataliu Feb 11, 2018

Contributor

Seems following should be combined to single function and we'd put cache invalidation there:

VirtualMachinesClient.CreateOrUpdate & CreateOrUpdateVMWithRetry
RouteTablesClient.CreateOrUpdate & CreateOrUpdateRouteTableWithRetry

But that can be in separate PR

store: cache.NewTTLStore(cacheKeyFunc, ttl),
// cacheKeyFunc defines the key function required in TTLStore.
func cacheKeyFunc(obj interface{}) (string, error) {
if entry, ok := obj.(*cacheEntry); ok {

This comment has been minimized.

@karataliu

karataliu Feb 9, 2018

Contributor

items in cache would always be cacheEntry, do we need a check here?

See also object_cache.go

This comment has been minimized.

@feiskyer

feiskyer Feb 9, 2018

Author Member

ack

if entry.data == nil {
entry.lock.Lock()
defer entry.lock.Unlock()

This comment has been minimized.

@karataliu

karataliu Feb 9, 2018

Contributor

double check entry.data == nil here

This comment has been minimized.

@feiskyer

feiskyer Feb 9, 2018

Author Member

good catch

time.Sleep(fakeCacheTTL)
v, err = cache.Get(key)
assert.NoError(t, err)
assert.Equal(t, val, v, "cache should get correct data even after expired")

This comment has been minimized.

@karataliu

karataliu Feb 9, 2018

Contributor

If the cache does ignore TTL value at all, the behavior is the same and it will also pass this test.

Better to keep a numberic record in dataSource.get, and validate it is exactly called 1 time before cache expires, and 2 times after cache expires.

This comment has been minimized.

@feiskyer

feiskyer Feb 9, 2018

Author Member

good catch


type fakeDataSource struct {
data map[string]*fakeDataObj
lock sync.Mutex

This comment has been minimized.

@karataliu

karataliu Feb 9, 2018

Contributor

dataSource is not going to be modified concurrently in the test, is a lock needed?

This comment has been minimized.

@feiskyer

feiskyer Feb 9, 2018

Author Member

Let's keep the lock in case some concurrently test cases are added in the future

get1, _ := c.GetOrCreate("b1", f1)
if get1 != 1 {
t.Error("Value not equal")
getter := dataSource.get

This comment has been minimized.

@karataliu

karataliu Feb 9, 2018

Contributor

since getter func might return err, we'd better have some case to test the behaviour if getter returns error. This should be common since it involves network calls.

This comment has been minimized.

@feiskyer

feiskyer Feb 9, 2018

Author Member

Added

}

// Update sets an item in the cache to its updated state.
func (t *timedCache) Update(key string, data interface{}) error {

This comment has been minimized.

@karataliu

karataliu Feb 9, 2018

Contributor

Do we need this function? Suppose delete is enough, and the value should only be updated through getter

This comment has been minimized.

@feiskyer

feiskyer Feb 9, 2018

Author Member

Let's keep the cache function complete in case it be used in the future

cache.Delete(key)
v, err = cache.Get(key)
assert.NoError(t, err)
assert.Equal(t, nil, v, "cache should get nil after data is removed")

This comment has been minimized.

@karataliu

karataliu Feb 9, 2018

Contributor

better to validate getter is called.

This comment has been minimized.

@feiskyer

feiskyer Feb 9, 2018

Author Member

ack

@feiskyer feiskyer force-pushed the feiskyer:new-cache branch from 920c136 to 5f47dde Feb 9, 2018

@feiskyer

This comment has been minimized.

Copy link
Member Author

feiskyer commented Feb 9, 2018

@karataliu Addressed comments. PTAL

@feiskyer feiskyer force-pushed the feiskyer:new-cache branch from 5f47dde to 7634eac Feb 9, 2018

@feiskyer

This comment has been minimized.

Copy link
Member Author

feiskyer commented Feb 9, 2018

/retest

1 similar comment
@feiskyer

This comment has been minimized.

Copy link
Member Author

feiskyer commented Feb 9, 2018

/retest

@karataliu
Copy link
Contributor

karataliu left a comment

/lgtm

Consider having an issue to track more tests for the cache. Since the cache is becoming an infra component of cloud provider.

return processRetryResponse(resp, err)
done, err := processRetryResponse(resp, err)
if done && err == nil {
// Invalidate the cache right after deleting

This comment has been minimized.

@karataliu

karataliu Feb 11, 2018

Contributor

Seems following should be combined to single function and we'd put cache invalidation there:

VirtualMachinesClient.CreateOrUpdate & CreateOrUpdateVMWithRetry
RouteTablesClient.CreateOrUpdate & CreateOrUpdateRouteTableWithRetry

But that can be in separate PR

if entry.data == nil {
data, err := t.getter(key)
if err != nil {
return nil, err

This comment has been minimized.

@karataliu

karataliu Feb 11, 2018

Contributor

Can consider having some improvement here, this could be in separate PR:
Do we consider caching error responses? So that a failing request to some resource would also be cached.

For example if a resource does not exist yet, a number of calls (to be failed) in short period would all be sent out.

This comment has been minimized.

@feiskyer

feiskyer Feb 11, 2018

Author Member

That's what I'm thinking of caching a nil object originally, which could ensure one API call for each TTL period.

@khenidak suggested reporting an error for such cases.

We are not sure how often would this happen, let's revisit this later.

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Feb 11, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: feiskyer, karataliu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these OWNERS Files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-github-robot

This comment has been minimized.

Copy link
Contributor

k8s-github-robot commented Feb 11, 2018

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot

This comment has been minimized.

Copy link
Contributor

k8s-github-robot commented Feb 11, 2018

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit f0e573d into kubernetes:master Feb 11, 2018

12 of 13 checks passed

Submit Queue Required Github CI test is not green: pull-kubernetes-e2e-gce
Details
cla/linuxfoundation feiskyer authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke Skipped
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce Job succeeded.
Details
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-unit Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details

@feiskyer feiskyer deleted the feiskyer:new-cache branch Feb 11, 2018

k8s-github-robot pushed a commit that referenced this pull request Feb 12, 2018

Kubernetes Submit Queue
Merge pull request #59652 from feiskyer/vmss-cache
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add generic cache for Azure VMSS

**What this PR does / why we need it**:

This PR adds a generic cache for VMSS and removes old list-based cache.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:

Continue of ##58770.

**Special notes for your reviewer**:

Depends on #59520.

**Release note**:

```release-note
Add generic cache for Azure VMSS
```

@khenidak khenidak referenced this pull request Feb 13, 2018

Open

Relevant PRs/Issues #1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.