Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cilium-dedicated etcd is reaching space quota limit without AUTO_COMPACTION #10663

Closed
luanguimaraesla opened this issue Jan 26, 2021 · 2 comments · Fixed by #11961
Closed

Cilium-dedicated etcd is reaching space quota limit without AUTO_COMPACTION #10663

luanguimaraesla opened this issue Jan 26, 2021 · 2 comments · Fixed by #11961
Assignees

Comments

@luanguimaraesla
Copy link

1. What kops version are you running? The command kops version, will display
this information.

1.18.2 (git-84495481e4)

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.

Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.4", GitCommit:"d360454c9bcd1634cf4cc52d1867af5491dc9c5f", GitTreeState:"archive", BuildDate:"2020-11-25T13:19:56Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.14", GitCommit:"89182bdd065fbcaffefec691908a739d161efc03", GitTreeState:"clean", BuildDate:"2020-12-18T12:02:35Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

3. What cloud provider are you using?

AWS

4. What commands did you run? What is the simplest way to reproduce this issue?

Create a new cluster using cilium dedicated etcd created with etcd-manager and wait for a few days.

5. What happened after the commands executed?

Without AUTO_COMPACTION options configured, after a few days, the etcd cluster reaches its space quota limit of 2Gb, and Cilium stops working with the following message:

# snippet of cilium pod log
level=fatal msg="Unable to connect to kvstore" error="etcdserver: mvcc: database space exceeded" module=etcd subsys=kvstore

The etcd cluster reports the following alert:

memberID:2157140721128943973 alarm:NOSPACE 
memberID:17459609605570688463 alarm:NOSPACE

6. What did you expect to happen?

If I edit my cluster and add the following environment variables to the manager configuration, it starts working as expected

# snippet cluster.yaml spec for etcdClusters item.
  - name: cilium
    version: 3.3.10
    manager:
      env:
      - name: ETCD_AUTO_COMPACTION_MODE
        value: revision
      - name: ETCD_AUTO_COMPACTION_RETENTION
        value: "1000"

Then I could see this cluster reporting:

2021-01-26 19:40:54.223524 I | pkg/flags: recognized and used environment variable ETCD_AUTO_COMPACTION_MODE=revision
2021-01-26 19:40:54.223530 I | pkg/flags: recognized and used environment variable ETCD_AUTO_COMPACTION_RETENTION=1000

It's likely to be a default option for all etcd clusters, or at least kops should have a section about this in the documentation, especially for Cilium configuration.

@olemarkus
Copy link
Member

/kind office-hours

Should this be handled on etcd-manager side?
Also worth considering if etcd-manager should run defrag from time to time.

@kubernetes kubernetes deleted a comment from k8s-ci-robot Jan 27, 2021
@kubernetes kubernetes deleted a comment from olemarkus Jan 27, 2021
@justinsb justinsb self-assigned this Apr 9, 2021
@justinsb justinsb added this to the v1.21 milestone Apr 9, 2021
@johngmyers johngmyers removed this from the v1.21 milestone Jun 10, 2021
@olemarkus
Copy link
Member

k8s api server does compaction for the other etcd clusters. Will enable autocompaction on the cilium etcd cluster on new kops clusters

/assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants