New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Protokube fails to mount etcd master volume in Kubernetes v1.8 #3552
Comments
Thanks I will test a full build. The interesting thing is that e2e should be running this same test continuously ... interesting meaning that if you are correct, e2e is busted ;( I appreciate your help! |
NoteI found some challenges in the Makefile, I have put in one PR, about to drop another Create clusterSo here is what I did, I use scripts so this bash is untested. I have standard manifests that I use, that have k8s version 1.8.0 set. export MY_BUCKET=whatyourbucknameis
export KOPS_FILE=myclutster.yaml
S3_BUCKET=s3://$MY_BUCKET make kops-install upload`
export KOPS_BASE_URL=https://s3.amazonaws.com/$MY_BUCKET/kops/1.8.0-alpha.1/
# probably do not have to do this
export NODEUP_URL=${KOPS_BASE_URL}linux/amd64/nodeup
export KOPS=kops
CLUSTER_NAME=$(head -n 10 $KOPS_FILE | grep name | awk '{ print $2 }')
$KOPS create -f $KOPS_FILE
$KOPS create secret --name $CLUSTER_NAME sshpublickey admin -i ~/.ssh/id_rsa.pub
$KOPS update cluster $CLUSTER_NAME --yes Validate Protokubessh'ed into the master and it seems to be working It installed correctly Oct 7 23:45:01 ip-172-60-0-183 nodeup[734]: I1007 23:45:01.077672 734 executor.go:157] Executing task "LoadImage.protokube": LoadImageTask: https://s3.amazonaws.com/redacted/kops/1.8.0-alpha.1/images/protokube.tar.gz And it mounted protokube correctly admin@ip-172-60-0-183:~$ sudo docker logs f14247974dbc
protokube version 0.1
I1007 23:45:20.483998 1 aws_volume.go:71] AWS API Request: ec2metadata/GetMetadata
I1007 23:45:20.485084 1 aws_volume.go:71] AWS API Request: ec2metadata/GetMetadata
I1007 23:45:20.485921 1 aws_volume.go:71] AWS API Request: ec2metadata/GetMetadata
I1007 23:45:20.488424 1 aws_volume.go:71] AWS API Request: ec2/DescribeInstances
I1007 23:45:20.581503 1 aws_volume.go:71] AWS API Request: ec2/DescribeVolumes
I1007 23:45:20.581692 1 dnscontroller.go:101] starting DNS controller
I1007 23:45:20.582481 1 dnscache.go:75] querying all DNS zones (no cached results)
I1007 23:45:20.582571 1 route53.go:50] AWS request: route53 ListHostedZones
I1007 23:45:20.650251 1 volume_mounter.go:227] Trying to mount master volume: "vol-0ee184d96cdd5ff49"
I1007 23:45:20.650410 1 aws_volume.go:71] AWS API Request: ec2/AttachVolume
I1007 23:45:20.861699 1 aws_volume.go:329] AttachVolume request returned {
AttachTime: 2017-10-07 23:45:20.859 +0000 UTC,
Device: "/dev/xvdu",
InstanceId: "i-0da8fbcbac9b3763e",
State: "attaching",
VolumeId: "vol-0ee184d96cdd5ff49"
}
I1007 23:45:20.861923 1 aws_volume.go:71] AWS API Request: ec2/DescribeVolumes
I1007 23:45:20.934395 1 volume_mounter.go:227] Trying to mount master volume: "vol-0cc5454b7fbdae79b"
I1007 23:45:20.934568 1 aws_volume.go:71] AWS API Request: ec2/AttachVolume
I1007 23:45:21.150469 1 aws_volume.go:329] AttachVolume request returned {
AttachTime: 2017-10-07 23:45:21.15 +0000 UTC,
Device: "/dev/xvdv",
InstanceId: "i-0da8fbcbac9b3763e",
State: "attaching",
VolumeId: "vol-0cc5454b7fbdae79b"
}
I1007 23:45:21.150737 1 aws_volume.go:71] AWS API Request: ec2/DescribeVolumes
I1007 23:45:21.226847 1 volume_mounter.go:246] Currently attached volumes: [{"ID":"vol-0ee184d96cdd5ff49","LocalDevice":"/dev/xvdu","AttachedTo":"","Mountpoint":"","Status":"available","Info":{"Description":"vol-0ee184d96cdd5ff49","EtcdClusters":[{"clusterKey":"main","nodeName":"d","nodeNames":["d"]}]}} {"ID":"vol-0cc5454b7fbdae79b","LocalDevice":"/dev/xvdv","AttachedTo":"","Mountpoint":"","Status":"available","Info":{"Description":"vol-0cc5454b7fbdae79b","EtcdClusters":[{"clusterKey":"events","nodeName":"d","nodeNames":["d"]}]}}]
I1007 23:45:21.226944 1 volume_mounter.go:58] Master volume "vol-0ee184d96cdd5ff49" is attached at "/dev/xvdu"
I1007 23:45:21.227583 1 volume_mounter.go:72] Doing safe-format-and-mount of /dev/xvdu to /mnt/master-vol-0ee184d96cdd5ff49
I1007 23:45:21.227616 1 volume_mounter.go:103] Waiting for device "/dev/xvdu" to be attached
I1007 23:45:22.227746 1 volume_mounter.go:106] Found device "/dev/xvdu"
I1007 23:45:22.230991 1 mount_linux.go:168] Detected OS with systemd
I1007 23:45:22.231582 1 volume_mounter.go:135] Creating mount directory "/rootfs/mnt/master-vol-0ee184d96cdd5ff49"
I1007 23:45:22.231719 1 volume_mounter.go:140] Mounting device "/rootfs/dev/xvdu" on "/rootfs/mnt/master-vol-0ee184d96cdd5ff49"
I1007 23:45:22.231733 1 mount_linux.go:366] Checking for issues with fsck on disk: /rootfs/dev/xvdu
I1007 23:45:22.247351 1 mount_linux.go:379] `fsck` error fsck from util-linux 2.25.2
fsck.ext2: Bad magic number in super-block while trying to open /rootfs/dev/xvdu
/rootfs/dev/xvdu:
The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem. If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 8193 <device>
or
e2fsck -b 32768 <device>
I1007 23:45:22.247377 1 mount_linux.go:384] Attempting to mount disk: /rootfs/dev/xvdu /rootfs/mnt/master-vol-0ee184d96cdd5ff49
I1007 23:45:22.247500 1 mount_linux.go:135] Mounting cmd (systemd-run) with arguments ([--description=Kubernetes transient mount for /rootfs/mnt/master-vol-0ee184d96cdd5ff49 --scope -- mount -o defaults /rootfs/dev/xvdu /rootfs/mnt/master-vol-0ee184d96cdd5ff49])
E1007 23:45:22.266151 1 mount_linux.go:140] Mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /rootfs/mnt/master-vol-0ee184d96cdd5ff49 --scope -- mount -o defaults /rootfs/dev/xvdu /rootfs/mnt/master-vol-0ee184d96cdd5ff49
Output: Running as unit run-14.scope.
mount: wrong fs type, bad option, bad superblock on /rootfs/dev/xvdu,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so.
I1007 23:45:22.266211 1 mount_linux.go:430] Attempting to determine if disk "/rootfs/dev/xvdu" is formatted using lsblk with args: ([-n -o FSTYPE /rootfs/dev/xvdu])
I1007 23:45:22.268133 1 mount_linux.go:433] Output: "\n"
I1007 23:45:22.268153 1 mount_linux.go:404] Disk "/rootfs/dev/xvdu" appears to be unformatted, attempting to format as type: "ext4" with options: [-F /rootfs/dev/xvdu]
I1007 23:45:23.777400 1 mount_linux.go:408] Disk successfully formatted (mkfs): ext4 - /rootfs/dev/xvdu /rootfs/mnt/master-vol-0ee184d96cdd5ff49
I1007 23:45:23.777522 1 mount_linux.go:135] Mounting cmd (systemd-run) with arguments ([--description=Kubernetes transient mount for /rootfs/mnt/master-vol-0ee184d96cdd5ff49 --scope -- mount -t ext4 -o defaults /rootfs/dev/xvdu /rootfs/mnt/master-vol-0ee184d96cdd5ff49])
I1007 23:45:23.858021 1 volume_mounter.go:80] mounted master volume "vol-0ee184d96cdd5ff49" on /mnt/master-vol-0ee184d96cdd5ff49
I1007 23:45:23.858057 1 volume_mounter.go:58] Master volume "vol-0cc5454b7fbdae79b" is attached at "/dev/xvdv"
I1007 23:45:23.858096 1 volume_mounter.go:72] Doing safe-format-and-mount of /dev/xvdv to /mnt/master-vol-0cc5454b7fbdae79b
I1007 23:45:23.858127 1 volume_mounter.go:106] Found device "/dev/xvdv"
I1007 23:45:23.875391 1 mount_linux.go:168] Detected OS with systemd
I1007 23:45:23.876357 1 volume_mounter.go:135] Creating mount directory "/rootfs/mnt/master-vol-0cc5454b7fbdae79b"
I1007 23:45:23.876762 1 volume_mounter.go:140] Mounting device "/rootfs/dev/xvdv" on "/rootfs/mnt/master-vol-0cc5454b7fbdae79b"
I1007 23:45:23.876780 1 mount_linux.go:366] Checking for issues with fsck on disk: /rootfs/dev/xvdv
I1007 23:45:23.895085 1 mount_linux.go:379] `fsck` error fsck from util-linux 2.25.2
fsck.ext2: Bad magic number in super-block while trying to open /rootfs/dev/xvdv
/rootfs/dev/xvdv:
The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem. If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 8193 <device>
or
e2fsck -b 32768 <device>
I1007 23:45:23.895112 1 mount_linux.go:384] Attempting to mount disk: /rootfs/dev/xvdv /rootfs/mnt/master-vol-0cc5454b7fbdae79b
I1007 23:45:23.895209 1 mount_linux.go:135] Mounting cmd (systemd-run) with arguments ([--description=Kubernetes transient mount for /rootfs/mnt/master-vol-0cc5454b7fbdae79b --scope -- mount -o defaults /rootfs/dev/xvdv /rootfs/mnt/master-vol-0cc5454b7fbdae79b])
E1007 23:45:23.913945 1 mount_linux.go:140] Mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /rootfs/mnt/master-vol-0cc5454b7fbdae79b --scope -- mount -o defaults /rootfs/dev/xvdv /rootfs/mnt/master-vol-0cc5454b7fbdae79b
Output: Running as unit run-22.scope.
mount: wrong fs type, bad option, bad superblock on /rootfs/dev/xvdv,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so.
I1007 23:45:23.913986 1 mount_linux.go:430] Attempting to determine if disk "/rootfs/dev/xvdv" is formatted using lsblk with args: ([-n -o FSTYPE /rootfs/dev/xvdv])
I1007 23:45:23.916778 1 mount_linux.go:433] Output: "\n"
I1007 23:45:23.916802 1 mount_linux.go:404] Disk "/rootfs/dev/xvdv" appears to be unformatted, attempting to format as type: "ext4" with options: [-F /rootfs/dev/xvdv]
I1007 23:45:27.468125 1 mount_linux.go:408] Disk successfully formatted (mkfs): ext4 - /rootfs/dev/xvdv /rootfs/mnt/master-vol-0cc5454b7fbdae79b
I1007 23:45:27.468169 1 mount_linux.go:135] Mounting cmd (systemd-run) with arguments ([--description=Kubernetes transient mount for /rootfs/mnt/master-vol-0cc5454b7fbdae79b --scope -- mount -t ext4 -o defaults /rootfs/dev/xvdv /rootfs/mnt/master-vol-0cc5454b7fbdae79b])
I1007 23:45:27.512945 1 volume_mounter.go:80] mounted master volume "vol-0cc5454b7fbdae79b" on /mnt/master-vol-0cc5454b7fbdae79b
I1007 23:45:27.512981 1 kube_boot.go:103] Found etcd cluster spec on volume "vol-0cc5454b7fbdae79b": {"clusterKey":"events","nodeName":"d","nodeNames":["d"]}
I1007 23:45:27.513069 1 kube_boot.go:103] Found etcd cluster spec on volume "vol-0ee184d96cdd5ff49": {"clusterKey":"main","nodeName":"d","nodeNames":["d"]} Validate Clusterecluster name is redacted $ kops validate cluster
Using cluster from kubectl context: mycluster
Validating cluster mycluster
INSTANCE GROUPS
NAME ROLE MACHINETYPE MIN MAX SUBNETS
master-us-west-2c Master c4.large 1 1 us-west-2c
nodes Node m3.medium 2 2 us-west-2c
NODE STATUS
NAME ROLE READY
ip-172-60-0-183.us-west-2.compute.internal master True
ip-172-60-0-206.us-west-2.compute.internal node True
ip-172-60-0-228.us-west-2.compute.internal node True
Your cluster mycluster is ready |
Hey Chris, thanks for looking into this! Did you build protokube fresh off the master branch? Could you try with a new image? When I built a cluster with protokube |
Tested on master. |
Apologies I forgot to mention, I had this issue using CoreOS (which it seems this issue may be limited to). It may be fixed by the following PR (I'll test): #3621 |
PR #3621 has been merged into master now. I gave this another test and the cluster builds successfully on CoreOS (nodes healthy, kuberang tests pass), however I noticed the following unit failures (CoreOS, etcd v3.0.17, tls enabled):
It doesn't appear to be causing any issues but figured I'd raise just incase. @justinsb any thoughts? |
Been testing recently and not spotted any more issues since the fix in #3621 got merged. Not sure why the above service errors are occurring but they don't appear to have any negative impact on the cluster functioning. Will close this issue, thanks for the help! 👍 |
Steps to recreate:
NODEUP_URL
,PROTOKUBE_IMAGE
)The cluster will fail to start. Logging into a master node, you can see some run.scope service failures and one of the etcd volumes failed to mount (
events
is fine,main
does not mount successfully). Viewing from near the top of the protokube logs I can see:Testing with
protokube-1.8.0-alpha.1
the volumes mount and cluster builds successfully, so something merged into master since then appears to have caused the regression. I believe it may be related to the following change, but haven't managed to verify: https://github.com/kubernetes/kops/blame/51d3cde6970bfc29018eab9c8f05753079115f7c/protokube/pkg/protokube/volume_mounter.go#L110The text was updated successfully, but these errors were encountered: