Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcd-snapshot save without local state file waits for state file to get copied #2492

Closed
superseb opened this issue Mar 25, 2021 · 3 comments
Closed

Comments

@superseb
Copy link
Contributor

superseb commented Mar 25, 2021

RKE version:
v1.2.6

Steps to Reproduce:

  1. Create cluster.yml
  2. Run rke up
  3. Remove cluster.rkestate
  4. Run rke etcd-snapshot save

Results:
See the process trying to copy the state file and waiting for the state file to be copied (around 20-30 seconds) and still successfully create the snapshot but waiting for the state file is unnecessary.

We probably need to error out without a local state file to indicate the state file is missing and add a flag to override it without local state (in the case we just need the etcd snapshot) and skip the copy state step to save time.

gz#14991

@superseb
Copy link
Contributor Author

This needs to be tested so that without a state file, the logging for waiting for state file to be copied is not seen.

And a regular snapshot-save with state file present should still work as normal.

@anupama2501
Copy link

Reproduced on rke version v1.2.6

  1. On an aws ec2 instance, install docker
  2. Create a cluster.yml, in the yaml file input the IP address, internal address of the instance
  3. Run rke up and once the process is finished successfully, run rke etcd snapshot-save
WARN[0000] Name of the snapshot is not specified using [rke_etcd_snapshot_2021-05-28T08:17:05-07:00]
INFO[0000] Starting saving snapshot on etcd hosts
INFO[0000] [dialer] Setup tunnel for host []
INFO[0000] Removing container [cluster-state-deployer] on host [], try #1
INFO[0001] [remove/cluster-state-deployer] Successfully removed container on host []
INFO[0001] [state] Deploying state file to [/etc/kubernetes/rke_etcd_snapshot_2021-05-28T08:17:05-07:00.rkestate] on host []
INFO[0001] Image [rancher/rke-tools:v0.1.72] exists on host []
INFO[0002] Starting container [cluster-state-deployer] on host [], try #1
INFO[0002] [state] Successfully started [cluster-state-deployer] container on host []
INFO[0003] Waiting for [cluster-state-deployer] container to exit on host []
INFO[0003] Container [cluster-state-deployer] is still running on host []: stderr: [], stdout: [Waiting for file [/etc/kubernetes/cluster.rkestate] to be successfully copied to this container, retry count 1
]
INFO[0004] Waiting for [cluster-state-deployer] container to exit on host []
INFO[0004] Container [cluster-state-deployer] is still running on host []: stderr: [], stdout: [Waiting for file [/etc/kubernetes/cluster.rkestate] to be successfully copied to this container, retry count 1
]
INFO[0005] Waiting for [cluster-state-deployer] container to exit on host []
INFO[0005] Container [cluster-state-deployer] is still running on host []: stderr: [], stdout: [Waiting for file [/etc/kubernetes/cluster.rkestate] to be successfully copied to this container, retry count 1
]
INFO[0006] Waiting for [cluster-state-deployer] container to exit on host []
INFO[0006] Container [cluster-state-deployer] is still running on host []: stderr: [], stdout: [Waiting for file [/etc/kubernetes/cluster.rkestate] to be successfully copied to this container, retry count 1
]
INFO[0007] Waiting for [cluster-state-deployer] container to exit on host []
INFO[0008] Removing container [cluster-state-deployer] on host [], try #1
INFO[0008] [remove/cluster-state-deployer] Successfully removed container on host []
INFO[0008] [etcd] Running snapshot save once on host []
INFO[0008] Image [rancher/rke-tools:v0.1.72] exists on host []
INFO[0009] Starting container [etcd-snapshot-once] on host [], try #1
INFO[0009] [etcd] Successfully started [etcd-snapshot-once] container on host []
INFO[0009] Waiting for [etcd-snapshot-once] container to exit on host []
INFO[0009] Container [etcd-snapshot-once] is still running on host []: stderr: [time="2021-05-28T15:17:15Z" level=info msg="Initializing Onetime Backup" name="rke_etcd_snapshot_2021-05-28T08:17:05-07:00"
], stdout: []
INFO[0010] Waiting for [etcd-snapshot-once] container to exit on host []
INFO[0011] Removing container [etcd-snapshot-once] on host [], try #1
INFO[0011] Finished saving/uploading snapshot [rke_etcd_snapshot_2021-05-28T08:17:05-07:00] on all etcd hosts

Now removed the cluster.rkestate and ran the command rke etcd snapshot-save
Following logs are seen:

INFO[0048] Waiting for [cluster-state-deployer] container to exit on host []
INFO[0048] Container [cluster-state-deployer] is still running on host []: stderr: [], stdout: [Waiting for file [/etc/kubernetes/cluster.rkestate] to be successfully copied to this container, retry count 10]
INFO[0049] Waiting for [cluster-state-deployer] container to exit on host []
INFO[0049] Container [cluster-state-deployer] is still running on host []: stderr: [], stdout: [Waiting for file [/etc/kubernetes/cluster.rkestate] to be successfully copied to this container, retry count 10]
INFO[0050] Waiting for [cluster-state-deployer] container to exit on host []
INFO[0050] Container [cluster-state-deployer] is still running on host []: stderr: [], stdout: [Waiting for file [/etc/kubernetes/cluster.rkestate] to be successfully copied to this container, retry count 10
]
INFO[0051] Waiting for [cluster-state-deployer] container to exit on host []
INFO[0051] Container [cluster-state-deployer] is still running on host []: stderr: [], stdout: [Waiting for file [/etc/kubernetes/cluster.rkestate] to be successfully copied to this container, retry count 10
]
INFO[0052] Waiting for [cluster-state-deployer] container to exit on host []
INFO[0052] Container [cluster-state-deployer] is still running on host []: stderr: [], stdout: [Waiting for file [/etc/kubernetes/cluster.rkestate] to be successfully copied to this container, retry count 11
]
INFO[0053] Waiting for [cluster-state-deployer] container to exit on host []
INFO[0053] Container [cluster-state-deployer] is still running on host []: stderr: [], stdout: [Waiting for file [/etc/kubernetes/cluster.rkestate] to be successfully copied to this container, retry count 11
]
INFO[0054] Waiting for [cluster-state-deployer] container to exit on host []
INFO[0054] Container [cluster-state-deployer] is still running on host []: stderr: [], stdout: [Waiting for file [/etc/kubernetes/cluster.rkestate] to be successfully copied to this container, retry count 11
]
INFO[0055] Waiting for [cluster-state-deployer] container to exit on host []
INFO[0055] Container [cluster-state-deployer] is still running on host []: stderr: [], stdout: [Waiting for file [/etc/kubernetes/cluster.rkestate] to be successfully copied to this container, retry count 11
]
INFO[0056] Waiting for [cluster-state-deployer] container to exit on host []
INFO[0057] Container [cluster-state-deployer] is still running on host []: stderr: [], stdout: [Waiting for file [/etc/kubernetes/cluster.rkestate] to be successfully copied to this container, retry count 11
]
INFO[0058] Waiting for [cluster-state-deployer] container to exit on host []
INFO[0058] Container [cluster-state-deployer] is still running on host []: stderr: [], stdout: [Waiting for file [/etc/kubernetes/cluster.rkestate] to be successfully copied to this container, retry count 12
]
INFO[0059] Waiting for [cluster-state-deployer] container to exit on host []
INFO[0059] Container [cluster-state-deployer] is still running on host []: stderr: [], stdout: [Waiting for file [/etc/kubernetes/cluster.rkestate] to be successfully copied to this container, retry count 12
]
INFO[0060] Waiting for [cluster-state-deployer] container to exit on host []
INFO[0060] Container [cluster-state-deployer] is still running on host []: stderr: [], stdout: [Waiting for file [/etc/kubernetes/cluster.rkestate] to be successfully copied to this container, retry count 12
]
INFO[0061] Waiting for [cluster-state-deployer] container to exit on host []
INFO[0061] Container [cluster-state-deployer] is still running on host []: stderr: [], stdout: [Waiting for file [/etc/kubernetes/cluster.rkestate] to be successfully copied to this container, retry count 12
]
INFO[0062] Waiting for [cluster-state-deployer] container to exit on host []
INFO[0062] Removing container [cluster-state-deployer] on host [], try #1
INFO[0062] [remove/cluster-state-deployer] Successfully removed container on host []
INFO[0062] [etcd] Running snapshot save once on host []
INFO[0062] Image [rancher/rke-tools:v0.1.72] exists on host []
INFO[0062] Starting container [etcd-snapshot-once] on host [], try #1
INFO[0063] [etcd] Successfully started [etcd-snapshot-once] container on host []
INFO[0063] Waiting for [etcd-snapshot-once] container to exit on host []
INFO[0063] Container [etcd-snapshot-once] is still running on host []: stderr: [time="2021-05-28T15:21:18Z" level=info msg="Initializing Onetime Backup" name="rke_etcd_snapshot_2021-05-28T08:20:15-07:00"
], stdout: []
INFO[0064] Waiting for [etcd-snapshot-once] container to exit on host []
INFO[0064] Removing container [etcd-snapshot-once] on host [], try #1
INFO[0064] Finished saving/uploading snapshot [rke_etcd_snapshot_2021-05-28T08:20:15-07:00] on all etcd hosts

@anupama2501
Copy link

Verified on the rke version v1.3.0-rc2

  • On an aws ec2 instance, install docker
  • Create a cluster.yml, in the yaml file input the IP address, internal address of the instance
  • Run rke up and once the process is finished successfully, run rke etcd snapshot-save.
  • Now removed the cluster.rkestate and ran the command rke etcd snapshot-save
    Logs seen in the console:
INFO[0000] Running RKE version: v1.3.0-rc2
**WARN[0000] Name of the snapshot is not specified, using [rke_etcd_snapshot_2021-05-28T08:41:11-07:00]**
INFO[0000] Starting saving snapshot on etcd hosts
INFO[0000] [dialer] Setup tunnel for host []
**WARN[0001] Could not read cluster state file from [./cluster.rkestate], file does not exist. Snapshot will be created without cluster state file. You can retrieve the cluster state file using 'rke util get-state-file'**
INFO[0001] [etcd] Running snapshot save once on host []
INFO[0001] Image [rancher/rke-tools:v0.1.74] exists on host []
INFO[0001] Starting container [etcd-snapshot-once] on host [], try #1
INFO[0002] [etcd] Successfully started [etcd-snapshot-once] container on host []
INFO[0002] Waiting for [etcd-snapshot-once] container to exit on host []
INFO[0002] Container [etcd-snapshot-once] is still running on host []: stderr: [time="2021-05-28T15:41:13Z" level=info msg="Initializing Onetime Backup" name="rke_etcd_snapshot_2021-05-28T08:41:11-07:00"
], stdout: []
INFO[0003] Waiting for [etcd-snapshot-once] container to exit on host []
INFO[0003] Removing container [etcd-snapshot-once] on host [], try #1
INFO[0003] Finished saving/uploading snapshot [rke_etcd_snapshot_2021-05-28T08:41:11-07:00] on all etcd hosts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants