-
Notifications
You must be signed in to change notification settings - Fork 307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Upgrade stuck on post-drain job with the unresponsive VM #4095
Comments
Analysis of the issue:
We saw plan secret changes; attached the old and new plans: --- old-plan 2023-06-20 14:02:28.012183902 +0800
+++ new-plan 2023-06-20 14:02:24.608116748 +0800
@@ -1,7 +1,10 @@
{
"files": [
{
- "content": "ewogICJhZ2VudC10b2tlbiI6ICJhYSBiYiBjYyIsCiAgImNuaSI6ICJjYWxpY28iLAogICJrdWJlLWNvbnRyb2xsZXItbWFuYWdlci1hcmciOiBbCiAgICAiY2VydC1kaXI9L3Zhci9saWIvcmFuY2hlci9ya2UyL3NlcnZlci90bHMva3ViZS1jb250cm9sbGVyLW1hbmFnZXIiLAogICAgInNlY3VyZS1wb3J0PTEwMjU3IgogIF0sCiAgImt1YmUtY29udHJvbGxlci1tYW5hZ2VyLWV4dHJhLW1vdW50IjogWwogICAgIi92YXIvbGliL3JhbmNoZXIvcmtlMi9zZXJ2ZXIvdGxzL2t1YmUtY29udHJvbGxlci1tYW5hZ2VyOi92YXIvbGliL3JhbmNoZXIvcmtlMi9zZXJ2ZXIvdGxzL2t1YmUtY29udHJvbGxlci1tYW5hZ2VyIgogIF0sCiAgImt1YmUtc2NoZWR1bGVyLWFyZyI6IFsKICAgICJjZXJ0LWRpcj0vdmFyL2xpYi9yYW5jaGVyL3JrZTIvc2VydmVyL3Rscy9rdWJlLXNjaGVkdWxlciIsCiAgICAic2VjdXJlLXBvcnQ9MTAyNTkiCiAgXSwKICAia3ViZS1zY2hlZHVsZXItZXh0cmEtbW91bnQiOiBbCiAgICAiL3Zhci9saWIvcmFuY2hlci9ya2UyL3NlcnZlci90bHMva3ViZS1zY2hlZHVsZXI6L3Zhci9saWIvcmFuY2hlci9ya2UyL3NlcnZlci90bHMva3ViZS1zY2hlZHVsZXIiCiAgXSwKICAibm9kZS1sYWJlbCI6IFsKICAgICJya2UuY2F0dGxlLmlvL21hY2hpbmU9Zjg0MjJhYjktN2Q2NS00MDk5LWE1ZjYtMGFjOTQ0OWQ2OTc3IgogIF0sCiAgInRva2VuIjogImFhIGJiIGNjIgp9",
+ "path": "/etc/rancher/rke2/registries.yaml"
+ },
+ {
+ "content": "ewogICJhZ2VudC10b2tlbiI6ICJhYSBiYiBjYyIsCiAgImNuaSI6ICJjYWxpY28iLAogICJrdWJlLWNvbnRyb2xsZXItbWFuYWdlci1hcmciOiBbCiAgICAiY2VydC1kaXI9L3Zhci9saWIvcmFuY2hlci9ya2UyL3NlcnZlci90bHMva3ViZS1jb250cm9sbGVyLW1hbmFnZXIiLAogICAgInNlY3VyZS1wb3J0PTEwMjU3IgogIF0sCiAgImt1YmUtY29udHJvbGxlci1tYW5hZ2VyLWV4dHJhLW1vdW50IjogWwogICAgIi92YXIvbGliL3JhbmNoZXIvcmtlMi9zZXJ2ZXIvdGxzL2t1YmUtY29udHJvbGxlci1tYW5hZ2VyOi92YXIvbGliL3JhbmNoZXIvcmtlMi9zZXJ2ZXIvdGxzL2t1YmUtY29udHJvbGxlci1tYW5hZ2VyIgogIF0sCiAgImt1YmUtc2NoZWR1bGVyLWFyZyI6IFsKICAgICJjZXJ0LWRpcj0vdmFyL2xpYi9yYW5jaGVyL3JrZTIvc2VydmVyL3Rscy9rdWJlLXNjaGVkdWxlciIsCiAgICAic2VjdXJlLXBvcnQ9MTAyNTkiCiAgXSwKICAia3ViZS1zY2hlZHVsZXItZXh0cmEtbW91bnQiOiBbCiAgICAiL3Zhci9saWIvcmFuY2hlci9ya2UyL3NlcnZlci90bHMva3ViZS1zY2hlZHVsZXI6L3Zhci9saWIvcmFuY2hlci9ya2UyL3NlcnZlci90bHMva3ViZS1zY2hlZHVsZXIiCiAgXSwKICAibm9kZS1sYWJlbCI6IFsKICAgICJya2UuY2F0dGxlLmlvL21hY2hpbmU9Zjg0MjJhYjktN2Q2NS00MDk5LWE1ZjYtMGFjOTQ0OWQ2OTc3IgogIF0sCiAgInByaXZhdGUtcmVnaXN0cnkiOiAiL2V0Yy9yYW5jaGVyL3JrZTIvcmVnaXN0cmllcy55YW1sIiwKICAidG9rZW4iOiAiYWEgYmIgY2MiCn0=",
"path": "/etc/rancher/rke2/config.yaml.d/50-rancher.yaml"
},
{
@@ -10,7 +13,7 @@
"minor": true
},
{
- "content": "CmFwaVZlcnNpb246IHYxCmtpbmQ6IENvbmZpZ01hcAptZXRhZGF0YToKICBuYW1lOiBya2UyLWV0Y2Qtc25hcHNob3QtZXh0cmEtbWV0YWRhdGEKICBuYW1lc3BhY2U6IGt1YmUtc3lzdGVtCmRhdGE6CiAgcHJvdmlzaW9uaW5nLWNsdXN0ZXItc3BlYzogSDRzSUFBQUFBQUFBLytTU3dXcmpRQXlHMzBYWE5XRWQ5dVRiRW51M3R3WlMwck15Vmh6aHNXUTBtcFFRL080bFR1SkNYNkczbWUvL2tmUUxYYUhQQnpJaHA3UW5TNndDRlp6TDFmclBxaXgvV1U5cks2RUE2Mm1qY3VRT3FpdmtzVE5zYWVlR1R0M2xob0tLbThadFJLSGFrT1YxZEZaSk40MEVENUZhcUk0WUV4VndWQXUwL0xnVE5hcVJCcFVkZVlKS2NveFAzcGlwcGNYY1VpU25aaGo5VXJQVjZQZ2xjYnExYWM0Y2ZFN3g0SjFob0MwWmF3dlY3d0tjQjlMczh6djFQTDRqK3orMWVxNzhkaGQzRkZUYU5IdEd1d2Q2VWUyWDRVWk4vcDFPQlh5bzlXUS9OZjlVUURpaCtSNWpwc1U2WURpeDBQK29CNHpQRzNyc0syckF1SWs1T2RuZjdLZEcybEZaSEtyck5IMENBQUQvL3dFQUFQLy9UdkNKTnBzQ0FBQT0K",
+ "content": "CmFwaVZlcnNpb246IHYxCmtpbmQ6IENvbmZpZ01hcAptZXRhZGF0YToKICBuYW1lOiBya2UyLWV0Y2Qtc25hcHNob3QtZXh0cmEtbWV0YWRhdGEKICBuYW1lc3BhY2U6IGt1YmUtc3lzdGVtCmRhdGE6CiAgcHJvdmlzaW9uaW5nLWNsdXN0ZXItc3BlYzogSDRzSUFBQUFBQUFBLytTU1FXdnpNQXlHLzR1dlh5aGZ5azY1alNiYmJpdDBkR2ZWVVZJVFJUS3kzRkZLLy90SXVtWnNmMkUzKzMxZVpEK2dpeHZ5QVpYUk1PMVJVeEIybFR1VnEvWERxaXovNllCckxWM2hkTUNOY0JkNlYxMWNqcjFDaXp0VE1PelBVK1NGVFlXMkJJeTFRdURYYUVFNFRRd1pEb1N0cXpxZ2hJWHJSRDB1dDlDektOYUFvL0FPTGJtS005RTliMVJGMDFKdWtkQ3dHYU9kNjZBMUdIeWprS1pubWxQd05sdDg1YjJDeHkxcWtOWlYvd3RuWVVUSk5wL1RFT0k3QkhzU3JlZkpiemU0UXkvY3Bya1Q5U2IwSWpJc240dVM3SGQ2TGR5SDZJRDZWLzJ2aGZOSFVOc0RaVnlxSS9oallId21PUURkZCtnSDJvcFFqUjFrbXV3djB4d1NEN1NobkF6MU1kdXg0VFpLWUp2eEp3QUFBUC8vQVFBQS8vK05YOS94dEFJQUFBPT0K",
"path": "/var/lib/rancher/rke2/server/manifests/rancher/rke2-etcd-snapshot-extra-metadata.yaml",
"dynamic": true,
"minor": true
@@ -22,6 +25,12 @@
{
"path": "/var/lib/rancher/rke2/server/manifests/rancher/managed-chart-config.yaml",
"dynamic": true
+ },
+ {
+ "content": "CiMhL2Jpbi9zaAoKY3VycmVudEhhc2g9IiIKa2V5PSQxCnRhcmdldEhhc2g9JDIKaGFzaGVkQ21kPSQzCmNtZD0kNApzaGlmdCA0CgpkYXRhUm9vdD0iL3Zhci9saWIvcmFuY2hlci9jYXByL2lkZW1wb3RlbmNlLyRrZXkvJGhhc2hlZENtZCIKaGFzaEZpbGU9IiRkYXRhUm9vdC9oYXNoIgphdHRlbXB0RmlsZT0iJGRhdGFSb290L2F0dGVtcHQiCgpjdXJyZW50SGFzaD0kKGNhdCAiJGhhc2hGaWxlIiB8fCBlY2hvICIiKQpjdXJyZW50QXR0ZW1wdD0kKGNhdCAiJGF0dGVtcHRGaWxlIiB8fCBlY2hvICItMSIpCgppZiBbICIkY3VycmVudEhhc2giICE9ICIkdGFyZ2V0SGFzaCIgXSAmJiBbICIkY3VycmVudEF0dGVtcHQiICE9ICIkQ0FUVExFX0FHRU5UX0FUVEVNUFRfTlVNQkVSIiBdOyB0aGVuCglta2RpciAtcCAiJGRhdGFSb290IgoJZWNobyAiJHRhcmdldEhhc2giID4gIiRoYXNoRmlsZSIKCWVjaG8gIiRDQVRUTEVfQUdFTlRfQVRURU1QVF9OVU1CRVIiID4gIiRhdHRlbXB0RmlsZSIKCWV4ZWMgIiRjbWQiICIkQCIKZWxzZQoJZWNobyAiYWN0aW9uIGhhcyBhbHJlYWR5IGJlZW4gcmVjb25jaWxlZCB0byB0aGUgY3VycmVudCBoYXNoICRjdXJyZW50SGFzaCBhdCBhdHRlbXB0ICRjdXJyZW50QXR0ZW1wdCIKZmkK",
+ "path": "/var/lib/rancher/capr/idempotence/idempotent.sh",
+ "dynamic": true,
+ "minor": true
}
],
"instructions": [
@@ -29,7 +38,7 @@
"name": "install",
"image": "rancher/system-agent-installer-rke2:v1.24.11-rke2r1",
"env": [
- "RESTART_STAMP=febf3b60fea17222e7649db8b1527380b3355d7a9d11e33d98bf575da0cf0600"
+ "RESTART_STAMP=35bbaf5a8ce0b602d5297fc226e966c847bfa7de8a278e1a52ec19612c81294a"
],
"args": [
"-c", |
Also enconter similar issue (more like #4096) when testing PR #4117, the upgrade takes longer time, but at least #4117 runs OK .
|
In my single-node clutser upgrade environment, it finally runs into #4119 with such error in
|
The upgrade will proceed by temporarily scaling the |
Create a Rancher issue to ask for help: rancher/rancher#41965 |
Pre Ready-For-Testing Checklist
|
Describe the bug
Upgrade stuck on the post-drin job, and you could observe these two situations.
To Reproduce
Expected behavior
The post-drin job should be smooth.
Support bundle
None
Environment
Additional context
We could force delete this VM. After this VM re-launch, the upgrade will continue.
The text was updated successfully, but these errors were encountered: