New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apiserver: start only one compactor per unique storagebackend transport config #68557

Merged
merged 2 commits into from Jan 3, 2019

Conversation

@sttts
Copy link
Contributor

sttts commented Sep 12, 2018

Before this PR we were running one compactor per storage (this includes one per registry instance and one per CRD, another one on each CRD change!).

Run one etcd storage compaction per default interval of 5min. Do not run one for each resource and each CRD. This fixes the compaction log spam and reduces load on etcd.
@sttts

This comment has been minimized.

Copy link
Contributor

sttts commented Sep 12, 2018

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Sep 12, 2018

@sttts: GitHub didn't allow me to request PR reviews from the following users: joelegasse.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @joelegasse

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sttts sttts force-pushed the sttts:sttts-storage-compaction-once branch from 4a03b39 to 32490cc Sep 12, 2018

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Sep 12, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sttts

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sttts

This comment has been minimized.

Copy link
Contributor

sttts commented Sep 12, 2018

@joelegasse does this sound reasonable to start only one compactor? What is the load today with 100ish many compactors?

@@ -293,8 +293,8 @@ func CreateKubeAPIServerConfig(
return
}

if _, port, err := net.SplitHostPort(s.Etcd.StorageConfig.ServerList[0]); err == nil && port != "0" && len(port) != 0 {
if err := utilwait.PollImmediate(etcdRetryInterval, etcdRetryLimit*etcdRetryInterval, preflight.EtcdConnection{ServerList: s.Etcd.StorageConfig.ServerList}.CheckEtcdServers); err != nil {
if _, port, err := net.SplitHostPort(s.Etcd.StorageConfig.Transport.ServerList[0]); err == nil && port != "0" && len(port) != 0 {

This comment has been minimized.

@yue9944882

yue9944882 Sep 12, 2018

Member

NPE here?

This comment has been minimized.

@sttts

sttts Jan 2, 2019

Contributor

Why a NPE? the server list length is validated in the options to be non-zero.

This comment has been minimized.

@yue9944882

yue9944882 Jan 3, 2019

Member

mmm.. yeah, no chance of NPE here

lock.Lock()
defer lock.Unlock()

key := fmt.Sprintf("%v", c)

This comment has been minimized.

@yue9944882

yue9944882 Sep 12, 2018

Member

but it looks slightly odd me. key := fmt.Sprintf("%v", c.ServerList) cleaner?

This comment has been minimized.

@sttts

sttts Jan 2, 2019

Contributor

no, I want the keys to be part of this as the etcd client depends on those too.

Added a comment.

if compactors[key].refs == 0 {
compactor.cancel()
compactor.client.Close()
delete(compactors, key)

This comment has been minimized.

@yue9944882

yue9944882 Sep 12, 2018

Member

deleting by key here this looks dangerous. we may delete wrong storage on racing by mistake amiright? deleting by pointer equality would be safer.

This comment has been minimized.

@sttts

sttts Jan 2, 2019

Contributor

not sure I can follow. The compactor is not by storage, but by transport config. Moreover, we lock before reading or writing to compactors.

compactors = map[string]*runningCompactor{}
)

// startCompactorOnce start one compactor per transport. If the interval get smaller on repeated calls, the

This comment has been minimized.

@yue9944882

yue9944882 Sep 12, 2018

Member

If the interval get smaller on repeated calls

do we have such use case? 🤔

This comment has been minimized.

@sttts

sttts Jan 2, 2019

Contributor

Our storage API allows different intervals. So we should support it here in the backend as well.

@liggitt

This comment has been minimized.

Copy link
Member

liggitt commented Sep 12, 2018

/assign @jpbetz
/unassign

@lavalamp

This comment has been minimized.

Copy link
Member

lavalamp commented Sep 14, 2018

We probably also shouldn't run one per apiserver in an HA setup...

Better yet, can we just delete this now? #46705 (comment)

@sttts sttts force-pushed the sttts:sttts-storage-compaction-once branch from 32490cc to 0d5968f Nov 23, 2018

@sttts sttts force-pushed the sttts:sttts-storage-compaction-once branch from 0d5968f to e133576 Nov 23, 2018

@sttts

This comment has been minimized.

Copy link
Contributor

sttts commented Jan 2, 2019

@hexfusion could you take a look?

@sttts sttts force-pushed the sttts:sttts-storage-compaction-once branch from e133576 to 25bc163 Jan 2, 2019

@sttts

This comment has been minimized.

Copy link
Contributor

sttts commented Jan 2, 2019

@yue9944882 updated. Ptal.

@hexfusion

This comment has been minimized.

Copy link

hexfusion commented Jan 2, 2019

Run one etcd storage compaction per default interval of 5min. Do not run one for each resource and each CRD. This fixes the compaction log spam and reduces load on etcd.

@sttts I agree with your premise, lgtm.

@yue9944882
Copy link
Member

yue9944882 left a comment

nits and suspicious nil panic, otherwise lgtm

// Hence, we only destroy once.
// TODO: fix duplicated storage destroy calls higher level
once.Do(func() {
stopCompactor()

This comment has been minimized.

@yue9944882

yue9944882 Jan 3, 2019

Member
  compactorClient, err := newETCD3Client(c)
  if err != nil {
  	return nil, err
  }

if startCompactorOnce quits here w/ an error, calling stopCompactor() will cause nil panic b/c in

compactor := compactors[key]
compactor.refs--

the compactor will be nil, we need to check key existence here. and sync.Once could be removed since that calling destroyFunc multiple times now is safe w/ a key existence check. we can keep the comments documenting that the destroyFunc might also be called by subresource storage. wdyt? @sttts

This comment has been minimized.

@sttts

sttts Jan 3, 2019

Contributor

if startCompactorOnce quits here w/ an error, calling stopCompactor() will cause nil panic b/c in

In that case newETCD3Storage also returns an error. So there is no destroy func returned to be called. What do I miss?

We can't remove the sync.Once because the ref counting logic expect exactly one stopCompactor call per startCompactorOnce. For history wiring reasons we don't know how often the destroyFunc is called (storage objects are reused leading to multiple calls; bad design long ago, needs a major cleanup).


etcd3.StartCompactor(ctx, compactorClient, interval)
} else {
compactor.refs++

This comment has been minimized.

@yue9944882

yue9944882 Jan 3, 2019

Member

nits: move compactor.refs++ out of if-else block

This comment has been minimized.

@sttts

sttts Jan 3, 2019

Contributor

👍

This comment has been minimized.

@sttts

sttts Jan 3, 2019

Contributor

fixed

@sttts sttts force-pushed the sttts:sttts-storage-compaction-once branch from 25bc163 to 00a717b Jan 3, 2019

@yue9944882

This comment has been minimized.

Copy link
Member

yue9944882 commented Jan 3, 2019

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label Jan 3, 2019

@sttts

This comment has been minimized.

Copy link
Contributor

sttts commented Jan 3, 2019

/retest

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Jan 3, 2019

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@k8s-ci-robot k8s-ci-robot merged commit 73bca32 into kubernetes:master Jan 3, 2019

19 checks passed

cla/linuxfoundation sttts authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke Skipped
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-e2e-kubeadm-gce Skipped
pull-kubernetes-godeps Job succeeded.
Details
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped
pull-kubernetes-local-e2e-containerized Skipped
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
tide In merge pool.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment