Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Give the “Compactor” work to etcd #46705

Closed
oikomi opened this issue May 31, 2017 · 35 comments
Closed

Give the “Compactor” work to etcd #46705

oikomi opened this issue May 31, 2017 · 35 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.

Comments

@oikomi
Copy link

oikomi commented May 31, 2017

Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see https://kubernetes.io/docs/tasks/debug-application-cluster/troubleshooting/.): no

What keywords did you search in Kubernetes issues before filing this one? (If you have found any duplicates, you should instead reply there.): etcd compact


Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG REPORT

Kubernetes version (use kubectl version):
[ ~]$ kubectl version
Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.0", GitCommit:"fff5156092b56e6bd60fff75aad4dc9de6b6ef37", GitTreeState:"clean", BuildDate:"2017-03-28T16:36:33Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.0", GitCommit:"fff5156092b56e6bd60fff75aad4dc9de6b6ef37", GitTreeState:"clean", BuildDate:"2017-03-28T16:24:30Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
    [~]$ cat /etc/os-release
    NAME="CentOS Linux"
    VERSION="7 (Core)"
    ID="centos"
    ID_LIKE="rhel fedora"
    VERSION_ID="7"
    PRETTY_NAME="CentOS Linux 7 (Core)"
    ANSI_COLOR="0;31"
    CPE_NAME="cpe:/o:centos:centos:7"
    HOME_URL="https://www.centos.org/"
    BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

What happened:
[~]$ tail -f /apps/logs/kubernetes/kube-apiserver.log
E0531 11:49:20.139129 2949 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
E0531 12:09:22.391360 2949 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
E0531 12:29:24.651345 2949 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
E0531 12:39:25.717216 2949 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
E0531 13:39:32.418625 2949 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
E0531 13:59:04.624372 2949 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
E0531 16:49:23.451694 2949 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
E0531 17:39:29.085406 2949 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
E0531 18:09:32.358578 2949 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted
E0531 18:59:07.953836 2949 watcher.go:188] watch chan error: etcdserver: mvcc: required revision has been compacted

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know:
we should not do 'compact' in k8s , just let etcd to do it use '--auto-compaction-retention'.

@k8s-github-robot k8s-github-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label May 31, 2017
@0xmichalis
Copy link
Contributor

@kubernetes/sig-api-machinery-misc

@k8s-ci-robot k8s-ci-robot added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Jun 2, 2017
@k8s-github-robot k8s-github-robot removed the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jun 2, 2017
@smarterclayton
Copy link
Contributor

The etcd folks were the ones who set up this compaction in Kubernetes when we moved to v3. @xiang90 @hongchaodeng

@hongchaodeng
Copy link
Contributor

Hi @oikomi
What's the etcd version? How did you find this? Is it reproducible?

@hongchaodeng
Copy link
Contributor

hongchaodeng commented Jun 3, 2017

Some context layout for the problem:
On 1.6.0 codebase:

It first do a get and set initialRev

wc.initialRev = getResp.Header.Revision

Then it watch from initialRev + 1:

	opts := []clientv3.OpOption{clientv3.WithRev(wc.initialRev + 1), clientv3.WithPrevKV()}
	...
	wch := wc.watcher.client.Watch(wc.ctx, wc.key, opts...)
	for wres := range wch {
		if wres.Err() != nil {
			err := wres.Err()
			// If there is an error on server (e.g. compaction), the channel will return it before closed.
			glog.Errorf("watch chan error: %v", err)

From watch API's view, it could return "revision compacted" error.

Some questions:

  • If compaction happened, will a running watcher be cut off? (No. It won't)
  • Is the watcher supposed to never stop? It seems like it creates new watcher quite frequently.

@hongchaodeng
Copy link
Contributor

Note: etcd client dep of v1.6.0 is "v3.0.17"

@hongchaodeng
Copy link
Contributor

hongchaodeng commented Jun 3, 2017

So my guess is that between sync() and Watch(), it took too long to process those events from sync()

Better need reproduce steps.

@oikomi
Copy link
Author

oikomi commented Jun 5, 2017

@hongchaodeng hi my etcd version is

[~]$ etcd --version
etcd Version: 3.1.4
Git SHA: 41e52eb
Go Version: go1.7.5
Go OS/Arch: linux/amd64

we found it in our production env, we're looking for reasons, too

we suggest provide an option to k8s to enable 'compact' and 'set compact interval‘

@xiang90
Copy link
Contributor

xiang90 commented Jun 5, 2017

@oikomi

I do not think this issue has anything to do with who control the compaction interval. Even if you let etcd itself do compaction, you will hit the same issue.

The warning means that your watcher tries to watch from a compacted revision for some reason. it probably should not mark as error. The control should do a relist and resync from the current revision.

Also by looking through the log, it seems everything is fine. Probably the watcher just does not get any update in 10 minutes and when it gets cut off it tries to watch from an old revision. We should probably enable progress notification (a new feature in etcd3) to update watcher revision even if there is no new notification for that watcher.

/cc @wojtek-t might know better than me about this.

@wojtek-t
Copy link
Member

wojtek-t commented Jun 5, 2017

Watches are generally solved from apiserver and we have per-resource-types history cached in apiserver (in cacher.go). That said, assuming that you are not starting the watch from very old resource version, this should very very rarely affect the end client (though those logs are visible in apiserver).

That said, I think we can try improving it. Basically, currently when we are doing compaction, we are compacting everything up until now:
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/storage/etcd3/compact.go#L132
@xiang90 - would it be reasonable to e.g. compact to revision "now - 1000" or sth like that?

@xiang90
Copy link
Contributor

xiang90 commented Jun 5, 2017

@wojtek-t

would it be reasonable to e.g. compact to revision "now - 1000" or sth like that?

it is reasonable (this is what we do for internally and suggested for other projects). we can do this in a time based manner or number based manner.

/cc @hongchaodeng can you take a look? or i can take a look later this week.

@hongchaodeng
Copy link
Contributor

Some tradeoff:

  • time-based (current option) might have issue that there could be no change (really?) between 5 minutes.
  • number based could have issue that there could be too many changes (unlikely 1000?) in a short period.

It depends on the use case. I'm not sure about which is better so it is up to you @wojtek-t .

I still have questions why the watcher connection cut off? Because of resync? If it didn't cut off (keep watching), then it should be fine. Another solution is to use progress notification.

/cc @hongchaodeng can you take a look? or i can take a look later this week.

@xiang90 I will let you take a look first. If you don't have time I can take a look later.

@wojtek-t
Copy link
Member

wojtek-t commented Jun 5, 2017

@hongchaodeng - I think either I misunderstood you or you misunderstood me.
The option that I'm suggesting is to:

  • still do compaction every X minutes (as we are doing now)
  • but instead of compacting to "current resource version", compact it to sth like "current resource version - 1000" or sth like that.

@hongchaodeng
Copy link
Contributor

but instead of compacting to "current resource version"

The current code is compacting the version of X minutes ago, not current time. Right?

@wojtek-t
Copy link
Member

wojtek-t commented Jun 5, 2017

The current code is compacting the version of X minutes ago, not current time. Right?

Ohh, if so, then I guess I misunderstood the code. If that's the case, I'm fine with that.

@xiang90
Copy link
Contributor

xiang90 commented Jun 5, 2017

@wojtek-t

Yea. If we already use time based approach, then it should be fine. I feel time based makes more sense since the main use case is to deal with temp reconnection, which should happen within minutes. or a full recovery should be triggered anyway.

@wojtek-t
Copy link
Member

wojtek-t commented Jun 5, 2017

Yea. If we already use time based approach, then it should be fine. I feel time based makes more sense since the main use case is to deal with temp reconnection, which should happen within minutes. or a full recovery should be triggered anyway.

+1

@javipolo
Copy link

javipolo commented Jun 7, 2017

This happent to me when I was updating some flags on etcd. I did restart all etcd servers with the new flags, and the kube-apiservers got all stuck with this message in the error log, and the requests to the api were giving timeouts

Etcd servers were fine (I checked with etcdctl cluster-health and member list), and just by restarting the apiservers the problem went away ...

This has happent twice in a row, so I guess it should not be hard to reproduce it

versions:
etcd 3.1.8
kube-apiserver 1.6.2

@xiang90
Copy link
Contributor

xiang90 commented Jun 7, 2017

@javipolo can you create a new issue for what you saw? i feel it may be a different thing. also it would be helpful if you can write down how to reproduce it in detail. thanks.

@xiang90
Copy link
Contributor

xiang90 commented Jun 7, 2017

@wojtek-t @hongchaodeng I feel we should still keep compaction inside k8s for the time being. I cannot see a strong reason to move it out.

The actionable item in this issue is to change the Error level logging to Warning (or even hide it when it is expected) probably.

@tmjd
Copy link
Contributor

tmjd commented Jul 6, 2017

One reason to remove compaction from k8s is because it requires giving k8s the root role in etcd when using RBAC. I have been setting up etcd RBAC (with certificates) and was seeing a compact error and, when I asked in the etcd IRC, I was told that to allow compaction to work it was necessary to have the root role.

I'm not sure I would advocate to remove it all together but it would be nice to have a flag that can be used to disable compaction. That would make sense to me with the caveat that compaction will need to be done by some other means.

@smarterclayton
Copy link
Contributor

If we enable paging in the api server we may want to increase the compaction interval on large clusters. For instance, on a 1 million key cluster, doing a JSON -> protobuf storage migration took longer than 5 minutes when using the paging prototype to test it. I suspect as well that people will want to tune some ranges to be retained longer for other tooling to be able to watch and record that data to historical archives. We're seeing a fair number of people want to control compaction according to when the archiving tools complete a record, not when k8s decides it.

@mml
Copy link
Contributor

mml commented Aug 7, 2017

cc @jpbetz

@waeljammal
Copy link

waeljammal commented Aug 25, 2017

Seeing this issue as well with 1.7.3 & etcd 3.1.8

Also looks like it does not update the revision once failed I see requests go out for a revision that fails because it's compacted but a short while later it tries to get the same revision again with same error instead of using latest revision.

@jpbetz
Copy link
Contributor

jpbetz commented Sep 27, 2017

Etcd is adding support for fine grained compaction periods, using go Durations, in etcd-io/etcd#8563. We need to wait until that is available since without only periods of hours can be specified. It will likely be available in etcd 3.3.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 18, 2018
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 17, 2018
@redbaron
Copy link
Contributor

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Feb 18, 2018
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 19, 2018
@redbaron
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 19, 2018
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 17, 2018
@redbaron
Copy link
Contributor

etcd 3.3 is available for a while now

/remove-lifecycle stale

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 18, 2018
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 18, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.
Projects
None yet
Development

No branches or pull requests