cluster/gce/upgrade.sh fails to upgrade nodes when run from a mac #37474

roberthbailey · 2016-11-25T08:00:26Z

Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see http://kubernetes.io/docs/troubleshooting/.): No

What keywords did you search in Kubernetes issues before filing this one? (If you have found any duplicates, you should instead reply there.):

Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG REPORT

Kubernetes version (use kubectl version): 1.5 beta 2

Environment:

Cloud provider or hardware configuration: GCE
OS (e.g. from /etc/os-release): CVM
Kernel (e.g. uname -a):
Install tools: kube-up.sh

What happened: Step 6 of https://docs.google.com/document/d/19Q4AzWLD5jd2FNaPyKy2xdTN4JIGUUpvwDwg0tcBkyc/edit# fails

What you expected to happen: Node upgrade to succeed.

How to reproduce it (as minimally and precisely as possible): Follow the manual upgrade steps outlined in the linked document.

Anything else do we need to know: The error is because the metadata is too large for the new node instance template.

ERROR: (gcloud.compute.instance-templates.create) Some requests did not succeed:
 - Value for field 'resource.properties.metadata.items[2].value' is too large: maximum size 32768 character(s); actual size 33595.

The text was updated successfully, but these errors were encountered:

roberthbailey · 2016-11-25T08:03:16Z

More complete error output:

Attempt 1 to create kubernetes-minion-template-v1-5-0-beta-2
WARNING: You have selected a disk size of under [200GB]. This may result in poor I/O performance. For more information, see: https://developers.google.com/compute/docs/disks#pdperformance.
ERROR: (gcloud.compute.instance-templates.create) Some requests did not succeed:
 - Value for field 'resource.properties.metadata.items[2].value' is too large: maximum size 32768 character(s); actual size 33595.

Attempt 1 failed to create instance template kubernetes-minion-template-v1-5-0-beta-2. Retrying.

Attempt 2 to create kubernetes-minion-template-v1-5-0-beta-2
WARNING: You have selected a disk size of under [200GB]. This may result in poor I/O performance. For more information, see: https://developers.google.com/compute/docs/disks#pdperformance.
ERROR: (gcloud.compute.instance-templates.create) Some requests did not succeed:
 - Value for field 'resource.properties.metadata.items[2].value' is too large: maximum size 32768 character(s); actual size 33595.

Attempt 2 failed to create instance template kubernetes-minion-template-v1-5-0-beta-2. Retrying.

Attempt 3 to create kubernetes-minion-template-v1-5-0-beta-2
WARNING: You have selected a disk size of under [200GB]. This may result in poor I/O performance. For more information, see: https://developers.google.com/compute/docs/disks#pdperformance.
ERROR: (gcloud.compute.instance-templates.create) Some requests did not succeed:
 - Value for field 'resource.properties.metadata.items[2].value' is too large: maximum size 32768 character(s); actual size 33595.

Attempt 3 failed to create instance template kubernetes-minion-template-v1-5-0-beta-2. Retrying.

Attempt 4 to create kubernetes-minion-template-v1-5-0-beta-2
WARNING: You have selected a disk size of under [200GB]. This may result in poor I/O performance. For more information, see: https://developers.google.com/compute/docs/disks#pdperformance.
ERROR: (gcloud.compute.instance-templates.create) Some requests did not succeed:
 - Value for field 'resource.properties.metadata.items[2].value' is too large: maximum size 32768 character(s); actual size 33595.

Attempt 4 failed to create instance template kubernetes-minion-template-v1-5-0-beta-2. Retrying.


Attempt 5 to create kubernetes-minion-template-v1-5-0-beta-2
WARNING: You have selected a disk size of under [200GB]. This may result in poor I/O performance. For more information, see: https://developers.google.com/compute/docs/disks#pdperformance.
ERROR: (gcloud.compute.instance-templates.create) Some requests did not succeed:
 - Value for field 'resource.properties.metadata.items[2].value' is too large: maximum size 32768 character(s); actual size 33595.

Attempt 5 failed to create instance template kubernetes-minion-template-v1-5-0-beta-2. Retrying.


Attempt 6 to create kubernetes-minion-template-v1-5-0-beta-2
WARNING: You have selected a disk size of under [200GB]. This may result in poor I/O performance. For more information, see: https://developers.google.com/compute/docs/disks#pdperformance.
ERROR: (gcloud.compute.instance-templates.create) Some requests did not succeed:
 - Value for field 'resource.properties.metadata.items[2].value' is too large: maximum size 32768 character(s); actual size 33595.

Failed to create instance template kubernetes-minion-template-v1-5-0-beta-2

soltysh · 2016-11-25T09:30:52Z

This is blocking test and per email I'm bumpting this to P0.

wojtek-t · 2016-11-25T14:49:29Z

Does it mean that "node-kube-env.yaml" is too large?
https://github.com/kubernetes/kubernetes/blob/master/cluster/gce/debian/node-helper.sh#L25

It seems like that too me. If so, we should understand why those are different than on startup...

paralin · 2016-11-26T00:50:45Z

This is broken for me too:

@kubernetes-master ~ $ sudo systemctl status kube-master-installation -l
● kube-master-installation.service - Download and install k8s binaries and configurations
   Loaded: loaded (/etc/systemd/system/kube-master-installation.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Sat 2016-11-26 00:47:08 UTC; 2min 26s ago
  Process: 1070 ExecStart=/home/kubernetes/bin/configure.sh (code=exited, status=1/FAILURE)
  Process: 1066 ExecStartPre=/bin/chmod 544 /home/kubernetes/bin/configure.sh (code=exited, status=0/SUCCESS)
  Process: 1062 ExecStartPre=/usr/bin/curl --fail --retry 5 --retry-delay 3 --silent --show-error -H X-Google-Metadata-Request: True -o /home/kubernetes/bin/configure.sh http://metadata.google.internal/computeMetadata/v1/instance/attributes/configure-sh (code=exited, status=0/SUCCESS)
  Process: 1058 ExecStartPre=/bin/mount -o remount,exec /home/kubernetes/bin (code=exited, status=0/SUCCESS)
  Process: 1054 ExecStartPre=/bin/mount --bind /home/kubernetes/bin /home/kubernetes/bin (code=exited, status=0/SUCCESS)
  Process: 1050 ExecStartPre=/bin/mkdir -p /home/kubernetes/bin (code=exited, status=0/SUCCESS)
 Main PID: 1070 (code=exited, status=1/FAILURE)

Nov 26 00:47:06 kubernetes-master configure.sh[1070]: % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
Nov 26 00:47:06 kubernetes-master configure.sh[1070]: Dload  Upload   Total   Spent    Left  Speed
Nov 26 00:47:07 kubernetes-master configure.sh[1070]: [155B blob data]
Nov 26 00:47:07 kubernetes-master configure.sh[1070]: == Downloaded https://storage.googleapis.com/kubernetes-release/network-plugins/cni-07a8a28637e97b22eb8dfe710eeae1344f69d16e.tar.gz (SHA1 = 19d49f7b2b99cd2493d5ae0ace896c64e289ccbb) ==
Nov 26 00:47:08 kubernetes-master configure.sh[1070]: Downloading k8s manifests tar
Nov 26 00:47:08 kubernetes-master configure.sh[1070]: % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
Nov 26 00:47:08 kubernetes-master configure.sh[1070]: Dload  Upload   Total   Spent    Left  Speed
Nov 26 00:47:08 kubernetes-master configure.sh[1070]: [155B blob data]
Nov 26 00:47:08 kubernetes-master configure.sh[1070]: == Downloaded https://storage.googleapis.com/kubernetes-release/release/v1.4.6/kubernetes-manifests.tar.gz (SHA1 = e9c52530a14612c91f45e017743925a0dba6dcc8) ==
Nov 26 00:47:08 kubernetes-master configure.sh[1070]: cp: cannot stat '/home/kubernetes/kube-manifests/kubernetes/gci-trusty/gci-mounter': No such file or directory

After an upgrade.

dims · 2016-11-27T20:01:23Z

@soltysh @wojtek-t @paralin : so who is going to be the assignee? and come up with a plan of attack? :)

roberthbailey · 2016-11-28T16:52:44Z

The problem is that configure-vm.sh is too large. In my $KUBE_TEMP directory:

$ wc *
       1       1      11 cluster-name.txt
    1036    3546   33595 configure-vm.sh
      55     110   11863 node-kube-env.yaml
    1092    3657   45469 total

You can see that the number of bytes in configure-vm.sh (33595) matches the output in the error message (actual size 33595).

What is interesting is that configure-vm.sh in the repo is actually larger. In the 1.5 branch:

$ wc configure-vm.sh 
    1120    4289   37962 configure-vm.sh

and in the 1.4 branch:

$ wc configure-vm.sh 
    1090    4178   36917 configure-vm.sh

But we aren't seeing an error during new cluster creation in either release branch.

roberthbailey · 2016-11-28T17:14:29Z

In the $KUBE_TEMP directory when creating a new cluster from the 1.4 branch:

$ wc configure-vm.sh 
    1006    3435   32550 configure-vm.sh

which is below the size limit.

roberthbailey · 2016-11-28T17:39:09Z

Here's the diff between the two configure-vm.sh scripts:

$ diff configure-vm-1.4-new.sh configure-vm-1.5-upgrade.sh 
400d399
< dns_replicas: '$(echo "$DNS_REPLICAS" | sed -e "s/'/''/g")'
402a402
> enable_dns_horizontal_autoscaler: '$(echo "$ENABLE_DNS_HORIZONTAL_AUTOSCALER" | sed -e "s/'/''/g")'
404d403
< storage_backend: '$(echo "$STORAGE_BACKEND" | sed -e "s/'/''/g")'
418a418
> 
420a421,425
>     if [ -n "${STORAGE_BACKEND:-}" ]; then
>       cat <<EOF >>/srv/salt-overlay/pillar/cluster-params.sls
> storage_backend: '$(echo "$STORAGE_BACKEND" | sed -e "s/'/''/g")'
> EOF
>     fi
440a446,454
>     if [[ -n "${ETCD_CA_KEY:-}" && -n "${ETCD_CA_CERT:-}" && -n "${ETCD_PEER_KEY:-}" && -n "${ETCD_PEER_CERT:-}" ]]; then
>       cat <<EOF >>/srv/salt-overlay/pillar/cluster-params.sls
> etcd_over_ssl: 'true'
> EOF
>     else
>       cat <<EOF >>/srv/salt-overlay/pillar/cluster-params.sls
> etcd_over_ssl: 'false'
> EOF
>     fi
877d890
<   cbr-cidr: 10.123.45.0/29
896d908
<   cbr-cidr: 10.123.45.0/29
953c965,973
<   salt-call --local state.highstate || true
---
>   local rc=0
>   for i in {0..6}; do
>     salt-call --local state.highstate && rc=0 || rc=$?
>     if [[ "${rc}" == 0 ]]; then
>       return 0
>     fi
>   done
>   echo "Salt failed to run repeatedly" >&2
>   return "${rc}"
966a987,995
> function create-salt-master-etcd-auth {
>   if [[ -n "${ETCD_CA_CERT:-}" && -n "${ETCD_PEER_KEY:-}" && -n "${ETCD_PEER_CERT:-}" ]]; then
>     local -r auth_dir="/srv/kubernetes"
>     echo "${ETCD_CA_CERT}" | base64 --decode | gunzip > "${auth_dir}/etcd-ca.crt"
>     echo "${ETCD_PEER_KEY}" | base64 --decode > "${auth_dir}/etcd-peer.key"
>     echo "${ETCD_PEER_CERT}" | base64 --decode | gunzip > "${auth_dir}/etcd-peer.crt"
>   fi
> }
> 
982a1012
>     create-salt-master-etcd-auth

which makes it appear that #35516 is the culprit.

@jszczepkowski

roberthbailey · 2016-11-28T17:44:18Z

This may be a mac issue -- when I manually run the sed command from the prepare-startup-script function it makes no changes. But if I run gsed instead of sed then it strips a bunch of comments, reducing the file size below the allowed limit:

$ wc *
     978    2951   30107 configure-vm-1.5-gsed.sh
    1036    3546   33595 configure-vm-1.5-upgrade.sh
    2014    6497   63702 total

roberthbailey · 2016-11-28T17:45:24Z

We should be able to do the same thing we do in this file https://github.com/kubernetes/kubernetes/blob/master/hack/make-rules/test-cmd.sh#L177 to fix it.

roberthbailey · 2016-11-28T17:48:40Z

I created #37562 but haven't had a chance to test it yet.

davidopp · 2016-11-28T22:08:29Z

(Master and) node upgrade worked for me on Ubuntu, so this does indeed seem to be Mac-specific.

roberthbailey · 2016-11-29T09:01:07Z

I tested #37562 and it looks like it fixed my issue. I suppose I'll need to cherry pick it into the 1.5 branch once it merges to master.

Automatic merge from submit-queue Use gsed on the mac. **What this PR does / why we need it**: Fixes node upgrades when run from a mac **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #37474 **Special notes for your reviewer**:

soltysh added sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. area/upgrade priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-blocker labels Nov 25, 2016

saad-ali added this to the v1.5 milestone Nov 26, 2016

roberthbailey self-assigned this Nov 28, 2016

roberthbailey mentioned this issue Nov 28, 2016

Use gsed on the mac. #37562

Merged

roberthbailey changed the title ~~cluster/gce/upgrade.sh fails to upgrade nodes~~ cluster/gce/upgrade.sh fails to upgrade nodes when run from a mac Nov 29, 2016

k8s-github-robot closed this as completed in #37562 Nov 30, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cluster/gce/upgrade.sh fails to upgrade nodes when run from a mac #37474

cluster/gce/upgrade.sh fails to upgrade nodes when run from a mac #37474

roberthbailey commented Nov 25, 2016

roberthbailey commented Nov 25, 2016

soltysh commented Nov 25, 2016

wojtek-t commented Nov 25, 2016

paralin commented Nov 26, 2016 •

edited

Loading

dims commented Nov 27, 2016

roberthbailey commented Nov 28, 2016

roberthbailey commented Nov 28, 2016

roberthbailey commented Nov 28, 2016

roberthbailey commented Nov 28, 2016

roberthbailey commented Nov 28, 2016

roberthbailey commented Nov 28, 2016

davidopp commented Nov 28, 2016

roberthbailey commented Nov 29, 2016

cluster/gce/upgrade.sh fails to upgrade nodes when run from a mac #37474

cluster/gce/upgrade.sh fails to upgrade nodes when run from a mac #37474

Comments

roberthbailey commented Nov 25, 2016

roberthbailey commented Nov 25, 2016

soltysh commented Nov 25, 2016

wojtek-t commented Nov 25, 2016

paralin commented Nov 26, 2016 • edited Loading

dims commented Nov 27, 2016

roberthbailey commented Nov 28, 2016

roberthbailey commented Nov 28, 2016

roberthbailey commented Nov 28, 2016

roberthbailey commented Nov 28, 2016

roberthbailey commented Nov 28, 2016

roberthbailey commented Nov 28, 2016

davidopp commented Nov 28, 2016

roberthbailey commented Nov 29, 2016

paralin commented Nov 26, 2016 •

edited

Loading