Protokube fails to mount etcd master volume in Kubernetes v1.8 #3552

KashifSaadat · 2017-10-06T16:47:17Z

Steps to recreate:

Build kops, nodeup & protokube off master (I've tested at commit 51d3cde)
Set your environment variables (NODEUP_URL, PROTOKUBE_IMAGE)
Create a Kubernetes cluster (standard options should be fine)

The cluster will fail to start. Logging into a master node, you can see some run.scope service failures and one of the etcd volumes failed to mount (events is fine, main does not mount successfully). Viewing from near the top of the protokube logs I can see:

volume_mounter.go:135] Creating mount directory "/rootfs/mnt/master-vol-0fd788d4873320c84"
volume_mounter.go:140] Mounting device "/rootfs/dev/xvdu" on "/rootfs/mnt/master-vol-0fd788d4873320c84"
mount_linux.go:366] Checking for issues with fsck on disk: /rootfs/dev/xvdu
mount_linux.go:379] `fsck` error fsck from util-linux 2.25.2
fsck.ext2: Bad magic number in super-block while trying to open /rootfs/dev/xvdu
/rootfs/dev/xvdu:
The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem.  If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>
 or
    e2fsck -b 32768 <device>

mount_linux.go:384] Attempting to mount disk:  /rootfs/dev/xvdu /rootfs/mnt/master-vol-0fd788d4873320c84
mount_linux.go:135] Mounting cmd (systemd-run) with arguments ([--description=Kubernetes transient mount for /rootfs/mnt/master-vol-0fd788d4873320c84 --scope -- mount -o defaults /rootfs/dev/xvdu /rootfs/mnt/master-vol-0fd788d4873320c84])
mount_linux.go:140] Mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /rootfs/mnt/master-vol-0fd788d4873320c84 --scope -- mount -o defaults /rootfs/dev/xvdu /rootfs/mnt/master-vol-0fd788d4873320c84
Output: Running as unit run-14.scope.
mount: wrong fs type, bad option, bad superblock on /rootfs/dev/xvdu,
       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try
       dmesg | tail or so.

mount_linux.go:430] Attempting to determine if disk "/rootfs/dev/xvdu" is formatted using lsblk with args: ([-n -o FSTYPE /rootfs/dev/xvdu])
mount_linux.go:433] Output: "\n"
mount_linux.go:404] Disk "/rootfs/dev/xvdu" appears to be unformatted, attempting to format as type: "ext4" with options: [-F /rootfs/dev/xvdu]
mount_linux.go:408] Disk successfully formatted (mkfs): ext4 - /rootfs/dev/xvdu /rootfs/mnt/master-vol-0fd788d4873320c84
mount_linux.go:135] Mounting cmd (systemd-run) with arguments ([--description=Kubernetes transient mount for /rootfs/mnt/master-vol-0fd788d4873320c84 --scope -- mount -t ext4 -o defaults /rootfs/dev/xvdu /rootfs/mnt/master-vol-0fd788d4873320c84])
volume_mounter.go:76] unable to mount master volume: "error formatting and mounting disk \"/dev/xvdu\" on \"/mnt/master-vol-0fd788d4873320c84\" in host: exit status 1"

Testing with protokube-1.8.0-alpha.1 the volumes mount and cluster builds successfully, so something merged into master since then appears to have caused the regression. I believe it may be related to the following change, but haven't managed to verify: https://github.com/kubernetes/kops/blame/51d3cde6970bfc29018eab9c8f05753079115f7c/protokube/pkg/protokube/volume_mounter.go#L110

The text was updated successfully, but these errors were encountered:

chrislovecnm · 2017-10-07T16:42:13Z

Thanks I will test a full build. The interesting thing is that e2e should be running this same test continuously ... interesting meaning that if you are correct, e2e is busted ;(

I appreciate your help!

chrislovecnm · 2017-10-08T00:02:08Z

Note

I found some challenges in the Makefile, I have put in one PR, about to drop another

Create cluster

So here is what I did, I use scripts so this bash is untested. I have standard manifests that I use, that have k8s version 1.8.0 set.

export MY_BUCKET=whatyourbucknameis
export KOPS_FILE=myclutster.yaml
S3_BUCKET=s3://$MY_BUCKET make kops-install upload`
export KOPS_BASE_URL=https://s3.amazonaws.com/$MY_BUCKET/kops/1.8.0-alpha.1/
# probably do not have to do this
export NODEUP_URL=${KOPS_BASE_URL}linux/amd64/nodeup
export KOPS=kops
CLUSTER_NAME=$(head -n 10 $KOPS_FILE | grep name | awk '{ print $2 }')
$KOPS create -f $KOPS_FILE
$KOPS create secret --name $CLUSTER_NAME sshpublickey admin -i ~/.ssh/id_rsa.pub
$KOPS update cluster $CLUSTER_NAME --yes

Validate Protokube

ssh'ed into the master and it seems to be working

It installed correctly

Oct  7 23:45:01 ip-172-60-0-183 nodeup[734]: I1007 23:45:01.077672     734 executor.go:157] Executing task "LoadImage.protokube": LoadImageTask: https://s3.amazonaws.com/redacted/kops/1.8.0-alpha.1/images/protokube.tar.gz

And it mounted protokube correctly

admin@ip-172-60-0-183:~$ sudo docker logs f14247974dbc
protokube version 0.1
I1007 23:45:20.483998       1 aws_volume.go:71] AWS API Request: ec2metadata/GetMetadata
I1007 23:45:20.485084       1 aws_volume.go:71] AWS API Request: ec2metadata/GetMetadata
I1007 23:45:20.485921       1 aws_volume.go:71] AWS API Request: ec2metadata/GetMetadata
I1007 23:45:20.488424       1 aws_volume.go:71] AWS API Request: ec2/DescribeInstances
I1007 23:45:20.581503       1 aws_volume.go:71] AWS API Request: ec2/DescribeVolumes
I1007 23:45:20.581692       1 dnscontroller.go:101] starting DNS controller
I1007 23:45:20.582481       1 dnscache.go:75] querying all DNS zones (no cached results)
I1007 23:45:20.582571       1 route53.go:50] AWS request: route53 ListHostedZones
I1007 23:45:20.650251       1 volume_mounter.go:227] Trying to mount master volume: "vol-0ee184d96cdd5ff49"
I1007 23:45:20.650410       1 aws_volume.go:71] AWS API Request: ec2/AttachVolume
I1007 23:45:20.861699       1 aws_volume.go:329] AttachVolume request returned {
  AttachTime: 2017-10-07 23:45:20.859 +0000 UTC,
  Device: "/dev/xvdu",
  InstanceId: "i-0da8fbcbac9b3763e",
  State: "attaching",
  VolumeId: "vol-0ee184d96cdd5ff49"
}
I1007 23:45:20.861923       1 aws_volume.go:71] AWS API Request: ec2/DescribeVolumes
I1007 23:45:20.934395       1 volume_mounter.go:227] Trying to mount master volume: "vol-0cc5454b7fbdae79b"
I1007 23:45:20.934568       1 aws_volume.go:71] AWS API Request: ec2/AttachVolume
I1007 23:45:21.150469       1 aws_volume.go:329] AttachVolume request returned {
  AttachTime: 2017-10-07 23:45:21.15 +0000 UTC,
  Device: "/dev/xvdv",
  InstanceId: "i-0da8fbcbac9b3763e",
  State: "attaching",
  VolumeId: "vol-0cc5454b7fbdae79b"
}
I1007 23:45:21.150737       1 aws_volume.go:71] AWS API Request: ec2/DescribeVolumes
I1007 23:45:21.226847       1 volume_mounter.go:246] Currently attached volumes: [{"ID":"vol-0ee184d96cdd5ff49","LocalDevice":"/dev/xvdu","AttachedTo":"","Mountpoint":"","Status":"available","Info":{"Description":"vol-0ee184d96cdd5ff49","EtcdClusters":[{"clusterKey":"main","nodeName":"d","nodeNames":["d"]}]}} {"ID":"vol-0cc5454b7fbdae79b","LocalDevice":"/dev/xvdv","AttachedTo":"","Mountpoint":"","Status":"available","Info":{"Description":"vol-0cc5454b7fbdae79b","EtcdClusters":[{"clusterKey":"events","nodeName":"d","nodeNames":["d"]}]}}]
I1007 23:45:21.226944       1 volume_mounter.go:58] Master volume "vol-0ee184d96cdd5ff49" is attached at "/dev/xvdu"
I1007 23:45:21.227583       1 volume_mounter.go:72] Doing safe-format-and-mount of /dev/xvdu to /mnt/master-vol-0ee184d96cdd5ff49
I1007 23:45:21.227616       1 volume_mounter.go:103] Waiting for device "/dev/xvdu" to be attached
I1007 23:45:22.227746       1 volume_mounter.go:106] Found device "/dev/xvdu"
I1007 23:45:22.230991       1 mount_linux.go:168] Detected OS with systemd
I1007 23:45:22.231582       1 volume_mounter.go:135] Creating mount directory "/rootfs/mnt/master-vol-0ee184d96cdd5ff49"
I1007 23:45:22.231719       1 volume_mounter.go:140] Mounting device "/rootfs/dev/xvdu" on "/rootfs/mnt/master-vol-0ee184d96cdd5ff49"
I1007 23:45:22.231733       1 mount_linux.go:366] Checking for issues with fsck on disk: /rootfs/dev/xvdu
I1007 23:45:22.247351       1 mount_linux.go:379] `fsck` error fsck from util-linux 2.25.2
fsck.ext2: Bad magic number in super-block while trying to open /rootfs/dev/xvdu
/rootfs/dev/xvdu:
The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem.  If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>
 or
    e2fsck -b 32768 <device>

I1007 23:45:22.247377       1 mount_linux.go:384] Attempting to mount disk:  /rootfs/dev/xvdu /rootfs/mnt/master-vol-0ee184d96cdd5ff49
I1007 23:45:22.247500       1 mount_linux.go:135] Mounting cmd (systemd-run) with arguments ([--description=Kubernetes transient mount for /rootfs/mnt/master-vol-0ee184d96cdd5ff49 --scope -- mount -o defaults /rootfs/dev/xvdu /rootfs/mnt/master-vol-0ee184d96cdd5ff49])
E1007 23:45:22.266151       1 mount_linux.go:140] Mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /rootfs/mnt/master-vol-0ee184d96cdd5ff49 --scope -- mount -o defaults /rootfs/dev/xvdu /rootfs/mnt/master-vol-0ee184d96cdd5ff49
Output: Running as unit run-14.scope.
mount: wrong fs type, bad option, bad superblock on /rootfs/dev/xvdu,
       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try
       dmesg | tail or so.

I1007 23:45:22.266211       1 mount_linux.go:430] Attempting to determine if disk "/rootfs/dev/xvdu" is formatted using lsblk with args: ([-n -o FSTYPE /rootfs/dev/xvdu])
I1007 23:45:22.268133       1 mount_linux.go:433] Output: "\n"
I1007 23:45:22.268153       1 mount_linux.go:404] Disk "/rootfs/dev/xvdu" appears to be unformatted, attempting to format as type: "ext4" with options: [-F /rootfs/dev/xvdu]
I1007 23:45:23.777400       1 mount_linux.go:408] Disk successfully formatted (mkfs): ext4 - /rootfs/dev/xvdu /rootfs/mnt/master-vol-0ee184d96cdd5ff49
I1007 23:45:23.777522       1 mount_linux.go:135] Mounting cmd (systemd-run) with arguments ([--description=Kubernetes transient mount for /rootfs/mnt/master-vol-0ee184d96cdd5ff49 --scope -- mount -t ext4 -o defaults /rootfs/dev/xvdu /rootfs/mnt/master-vol-0ee184d96cdd5ff49])
I1007 23:45:23.858021       1 volume_mounter.go:80] mounted master volume "vol-0ee184d96cdd5ff49" on /mnt/master-vol-0ee184d96cdd5ff49
I1007 23:45:23.858057       1 volume_mounter.go:58] Master volume "vol-0cc5454b7fbdae79b" is attached at "/dev/xvdv"
I1007 23:45:23.858096       1 volume_mounter.go:72] Doing safe-format-and-mount of /dev/xvdv to /mnt/master-vol-0cc5454b7fbdae79b
I1007 23:45:23.858127       1 volume_mounter.go:106] Found device "/dev/xvdv"
I1007 23:45:23.875391       1 mount_linux.go:168] Detected OS with systemd
I1007 23:45:23.876357       1 volume_mounter.go:135] Creating mount directory "/rootfs/mnt/master-vol-0cc5454b7fbdae79b"
I1007 23:45:23.876762       1 volume_mounter.go:140] Mounting device "/rootfs/dev/xvdv" on "/rootfs/mnt/master-vol-0cc5454b7fbdae79b"
I1007 23:45:23.876780       1 mount_linux.go:366] Checking for issues with fsck on disk: /rootfs/dev/xvdv
I1007 23:45:23.895085       1 mount_linux.go:379] `fsck` error fsck from util-linux 2.25.2
fsck.ext2: Bad magic number in super-block while trying to open /rootfs/dev/xvdv
/rootfs/dev/xvdv:
The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem.  If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>
 or
    e2fsck -b 32768 <device>

I1007 23:45:23.895112       1 mount_linux.go:384] Attempting to mount disk:  /rootfs/dev/xvdv /rootfs/mnt/master-vol-0cc5454b7fbdae79b
I1007 23:45:23.895209       1 mount_linux.go:135] Mounting cmd (systemd-run) with arguments ([--description=Kubernetes transient mount for /rootfs/mnt/master-vol-0cc5454b7fbdae79b --scope -- mount -o defaults /rootfs/dev/xvdv /rootfs/mnt/master-vol-0cc5454b7fbdae79b])
E1007 23:45:23.913945       1 mount_linux.go:140] Mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /rootfs/mnt/master-vol-0cc5454b7fbdae79b --scope -- mount -o defaults /rootfs/dev/xvdv /rootfs/mnt/master-vol-0cc5454b7fbdae79b
Output: Running as unit run-22.scope.
mount: wrong fs type, bad option, bad superblock on /rootfs/dev/xvdv,
       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try
       dmesg | tail or so.

I1007 23:45:23.913986       1 mount_linux.go:430] Attempting to determine if disk "/rootfs/dev/xvdv" is formatted using lsblk with args: ([-n -o FSTYPE /rootfs/dev/xvdv])
I1007 23:45:23.916778       1 mount_linux.go:433] Output: "\n"
I1007 23:45:23.916802       1 mount_linux.go:404] Disk "/rootfs/dev/xvdv" appears to be unformatted, attempting to format as type: "ext4" with options: [-F /rootfs/dev/xvdv]
I1007 23:45:27.468125       1 mount_linux.go:408] Disk successfully formatted (mkfs): ext4 - /rootfs/dev/xvdv /rootfs/mnt/master-vol-0cc5454b7fbdae79b
I1007 23:45:27.468169       1 mount_linux.go:135] Mounting cmd (systemd-run) with arguments ([--description=Kubernetes transient mount for /rootfs/mnt/master-vol-0cc5454b7fbdae79b --scope -- mount -t ext4 -o defaults /rootfs/dev/xvdv /rootfs/mnt/master-vol-0cc5454b7fbdae79b])
I1007 23:45:27.512945       1 volume_mounter.go:80] mounted master volume "vol-0cc5454b7fbdae79b" on /mnt/master-vol-0cc5454b7fbdae79b
I1007 23:45:27.512981       1 kube_boot.go:103] Found etcd cluster spec on volume "vol-0cc5454b7fbdae79b": {"clusterKey":"events","nodeName":"d","nodeNames":["d"]}
I1007 23:45:27.513069       1 kube_boot.go:103] Found etcd cluster spec on volume "vol-0ee184d96cdd5ff49": {"clusterKey":"main","nodeName":"d","nodeNames":["d"]}

Validate Clustere

cluster name is redacted

$ kops validate cluster
Using cluster from kubectl context: mycluster

Validating cluster mycluster

INSTANCE GROUPS
NAME			ROLE	MACHINETYPE	MIN	MAX	SUBNETS
master-us-west-2c	Master	c4.large	1	1	us-west-2c
nodes			Node	m3.medium	2	2	us-west-2c

NODE STATUS
NAME						ROLE	READY
ip-172-60-0-183.us-west-2.compute.internal	master	True
ip-172-60-0-206.us-west-2.compute.internal	node	True
ip-172-60-0-228.us-west-2.compute.internal	node	True

Your cluster mycluster is ready

KashifSaadat · 2017-10-09T08:14:34Z

Hey Chris, thanks for looking into this!

Did you build protokube fresh off the master branch? Could you try with a new image?

When I built a cluster with protokube v1.8.0-alpha1 the volume mounts were successful, but with a custom protokube image (just built off of commit hash 51d3cde, no local changes) that's when I started getting the problem. I think it's possibly regressed between those commit hashes.

chrislovecnm · 2017-10-10T01:05:01Z

Tested on master.

KashifSaadat · 2017-10-13T16:59:04Z

Apologies I forgot to mention, I had this issue using CoreOS (which it seems this issue may be limited to). It may be fixed by the following PR (I'll test): #3621

KashifSaadat · 2017-10-16T13:05:37Z

PR #3621 has been merged into master now. I gave this another test and the cluster builds successfully on CoreOS (nodes healthy, kuberang tests pass), however I noticed the following unit failures (CoreOS, etcd v3.0.17, tls enabled):

Failed Units: 2
  run-1360.scope
  run-1411.scope

<hostname> systemd[1]: run-1360.scope: Failed to add PIDs to scope's control group: No such process
<hostname> systemd[1]: Failed to start Kubernetes systemd probe.
<hostname> systemd[1]: run-1360.scope: Unit entered failed state.

<hostname> systemd[1]: run-1411.scope: Failed to add PIDs to scope's control group: No such process
<hostname> systemd[1]: Failed to start Kubernetes systemd probe.
<hostname> systemd[1]: run-1411.scope: Unit entered failed state.

It doesn't appear to be causing any issues but figured I'd raise just incase. @justinsb any thoughts?

KashifSaadat · 2017-10-30T15:27:11Z

Been testing recently and not spotted any more issues since the fix in #3621 got merged. Not sure why the above service errors are occurring but they don't appear to have any negative impact on the cluster functioning.

Will close this issue, thanks for the help! 👍

chrislovecnm assigned KashifSaadat Oct 8, 2017

chrislovecnm added the blocks-next label Oct 8, 2017

KashifSaadat closed this as completed Oct 30, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Protokube fails to mount etcd master volume in Kubernetes v1.8 #3552

Protokube fails to mount etcd master volume in Kubernetes v1.8 #3552

KashifSaadat commented Oct 6, 2017

chrislovecnm commented Oct 7, 2017

chrislovecnm commented Oct 8, 2017

KashifSaadat commented Oct 9, 2017

chrislovecnm commented Oct 10, 2017

KashifSaadat commented Oct 13, 2017 •

edited

KashifSaadat commented Oct 16, 2017

KashifSaadat commented Oct 30, 2017

Protokube fails to mount etcd master volume in Kubernetes v1.8 #3552

Protokube fails to mount etcd master volume in Kubernetes v1.8 #3552

Comments

KashifSaadat commented Oct 6, 2017

chrislovecnm commented Oct 7, 2017

chrislovecnm commented Oct 8, 2017

Note

Create cluster

Validate Protokube

Validate Clustere

KashifSaadat commented Oct 9, 2017

chrislovecnm commented Oct 10, 2017

KashifSaadat commented Oct 13, 2017 • edited

KashifSaadat commented Oct 16, 2017

KashifSaadat commented Oct 30, 2017

KashifSaadat commented Oct 13, 2017 •

edited