About CAPV related to CAPI #1700

andyzheung · 2022-11-24T06:29:26Z

from this picture, is capv can't match cpai v1.2? @srm09

andyzheung · 2022-11-24T08:02:10Z

other question is:

the ova of this can't setup.

I follow this to:
https://medium.com/@abhishek.amjeet/clusterapi-for-kubernetes-a-detailed-look-on-vmware-cluster-api-2ddd541bafa9

use
CAPI: 1.2.0
CAPV: 1.2
management cluster: 1.21.0

andyzheung · 2022-11-24T08:40:08Z

I try this ova, that can setup the work cluster...

but a new problem is that only one controller plane node is ready.

kubectl logs -n capi-system capi-controller-manager-fbd594dc6-frfj8

Is this any logs need to check to find out the problem?

andyzheung · 2022-11-24T09:13:16Z

I use the template in this repo:
demo-template.zip
CONTROL_PLANE_MACHINE_COUNT='3'
Is there any other things to consider?

andyzheung · 2022-11-24T10:01:25Z

I got into the not normal node, see the keubelt:
journalctl -xefu kubelet

Is there any problem about kubeadm at this environment.

andyzheung · 2022-11-24T12:58:26Z

I try to ssh to the controller plane node2, and
kubeadm reset
kubeadm join 10.250.71.221:6443 --token xxxxx --discovery-token-ca-cert-hash sha256:xxxxx --control-plane --certificate-key xxxxx

few minites later, I can see:

but still not have node name:

and I can't see the third controller node:

logs in capi-kubeadm-control-plane-system
kubectl logs -n capi-kubeadm-control-plane-system capi-kubeadm-control-plane-controller-manager-75d5f9d99-5vdgc

andyzheung · 2022-11-24T13:20:31Z

two boss form vmware can help me solve the problem above or give some idea!
really thanks。
@srm09 @fabriziopandini

I just want to solve the cluster autoscaler on vsphere. and I have used single controller plane + autoscaler as following picuture, and running ok, and can get CA capibility..

next I want solve the controller plane 3 node HA above, and then manage more workload clusters like this, I don't know if this architecture is right?

andyzheung · 2022-11-25T02:39:32Z

I see this issue, I seem need to deploy CAPD? what is? I only deploy CAPI and CAPV.
kubernetes-sigs/cluster-api#4027

andyzheung · 2022-11-25T03:18:17Z

kubectl get kubeadmcontrolplanes

kubectl describe kubeadmcontrolplanes
Name: autonomy-elastic-dev-cluster
Namespace: default
Labels: cluster.x-k8s.io/cluster-name=autonomy-elastic-dev-cluster
Annotations:
API Version: controlplane.cluster.x-k8s.io/v1beta1
Kind: KubeadmControlPlane
Metadata:
Creation Timestamp: 2022-11-24T09:29:42Z
Finalizers:
kubeadm.controlplane.cluster.x-k8s.io
Generation: 1
Managed Fields:
API Version: controlplane.cluster.x-k8s.io/v1beta1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.:
f:kubectl.kubernetes.io/last-applied-configuration:
f:spec:
.:
f:kubeadmConfigSpec:
.:
f:clusterConfiguration:
.:
f:apiServer:
.:
f:extraArgs:
.:
f:cloud-provider:
f:controllerManager:
.:
f:extraArgs:
.:
f:cloud-provider:
f:files:
f:initConfiguration:
.:
f:nodeRegistration:
.:
f:criSocket:
f:kubeletExtraArgs:
.:
f:cloud-provider:
f:name:
f:joinConfiguration:
.:
f:nodeRegistration:
.:
f:criSocket:
f:kubeletExtraArgs:
.:
f:cloud-provider:
f:name:
f:preKubeadmCommands:
f:users:
f:machineTemplate:
.:
f:infrastructureRef:
f:replicas:
f:rolloutStrategy:
.:
f:rollingUpdate:
.:
f:maxSurge:
f:type:
f:version:
Manager: kubectl-client-side-apply
Operation: Update
Time: 2022-11-24T09:29:42Z
API Version: controlplane.cluster.x-k8s.io/v1beta1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:finalizers:
.:
v:"kubeadm.controlplane.cluster.x-k8s.io":
f:labels:
.:
f:cluster.x-k8s.io/cluster-name:
f:ownerReferences:
.:
k:{"uid":"c90f68e8-9764-4c69-8ec3-e5771b2304d7"}:
.:
f:apiVersion:
f:blockOwnerDeletion:
f:controller:
f:kind:
f:name:
f:uid:
f:status:
.:
f:conditions:
f:initialized:
f:observedGeneration:
f:ready:
f:readyReplicas:
f:replicas:
f:selector:
f:unavailableReplicas:
f:updatedReplicas:
f:version:
Manager: manager
Operation: Update
Time: 2022-11-24T09:38:51Z
Owner References:
API Version: cluster.x-k8s.io/v1beta1
Block Owner Deletion: true
Controller: true
Kind: Cluster
Name: autonomy-elastic-dev-cluster
UID: c90f68e8-9764-4c69-8ec3-e5771b2304d7
Resource Version: 111433
UID: 42624926-090a-47ca-8e77-911e3f59c996
Spec:
Kubeadm Config Spec:
Cluster Configuration:
API Server:
Extra Args:
Cloud - Provider: external
Controller Manager:
Extra Args:
Cloud - Provider: external
Dns:
Etcd:
Networking:
Scheduler:
Files:
Content: apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
name: kube-vip
namespace: kube-system
spec:
containers:

args:
- manager
  env:
- name: cp_enable
  value: "true"
- name: vip_interface
  value:
- name: address
  value: 10.250.71.221
- name: port
  value: "6443"
- name: vip_arp
  value: "true"
- name: vip_leaderelection
  value: "true"
- name: vip_leaseduration
  value: "15"
- name: vip_renewdeadline
  value: "10"
- name: vip_retryperiod
  value: "2"
  image: ghcr.io/kube-vip/kube-vip:v0.5.5
  imagePullPolicy: IfNotPresent
  name: kube-vip
  resources: {}
  securityContext:
  capabilities:
  add:
  - NET_ADMIN
  - NET_RAW
    volumeMounts:
- mountPath: /etc/kubernetes/admin.conf
  name: kubeconfig
  hostAliases:
hostnames:
- kubernetes
  ip: 127.0.0.1
  hostNetwork: true
  volumes:
hostPath:
path: /etc/kubernetes/admin.conf
type: FileOrCreate
name: kubeconfig
status: {}

Owner: root:root
Path: /etc/kubernetes/manifests/kube-vip.yaml
Format: cloud-config
Init Configuration:
Local API Endpoint:
Node Registration:
Cri Socket: /var/run/containerd/containerd.sock
Kubelet Extra Args:
Cloud - Provider: external
Name: {{ ds.meta_data.hostname }}
Join Configuration:
Discovery:
Node Registration:
Cri Socket: /var/run/containerd/containerd.sock
Kubelet Extra Args:
Cloud - Provider: external
Name: {{ ds.meta_data.hostname }}
Pre Kubeadm Commands:
hostname "{{ ds.meta_data.hostname }}"
echo "::1 ipv6-localhost ipv6-loopback" >/etc/hosts
echo "127.0.0.1 localhost" >>/etc/hosts
echo "127.0.0.1 {{ ds.meta_data.hostname }}" >>/etc/hosts
echo "{{ ds.meta_data.hostname }}" >/etc/hostname
Users:
Name: capv
Ssh Authorized Keys:
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDjXZum6TwE2qL5wWgp38YA51C2fyfFHYQR7+jFrxq9QW1k3KKIPIc1wA8yMhbA3OMEeaM2/ry37ZdNUsMbATBKSvezhWs77OkZXoWPEWXTvydWf1Nze/Ny9GJAeYIPI8WfTeAo7b7+JpIqQGDMaTK4qX8wLOjTUWJ+ztWAUrXdsHMvhIEKZOUoBBiK+QELrWAS/PKT+UPf/LHnJf4VQ1cGGA/uRjjvcQTdB/XQMzT2GsbuCIDWRX6JIm3+l9VD1Q3Ehv1+zXpjVK7eU9k8XB5iTbFldDLroUlbOcgl7e8BHWUiC2iig7k4Co3Ae4+ubALIlPKXoEaFmK16j9PI+Ajp root@mgmt-master01
Sudo: ALL=(ALL) NOPASSWD:ALL
Machine Template:
Infrastructure Ref:
API Version: infrastructure.cluster.x-k8s.io/v1beta1
Kind: VSphereMachineTemplate
Name: autonomy-elastic-dev-cluster
Namespace: default
Metadata:
Replicas: 3
Rollout Strategy:
Rolling Update:
Max Surge: 1
Type: RollingUpdate
Version: v1.21.11
Status:
Conditions:
Last Transition Time: 2022-11-24T09:31:31Z
Message: Scaling up control plane to 3 replicas (actual 2)
Reason: ScalingUp
Severity: Warning
Status: False
Type: Ready
Last Transition Time: 2022-11-24T09:31:04Z
Status: True
Type: Available
Last Transition Time: 2022-11-24T09:29:43Z
Status: True
Type: CertificatesAvailable
Last Transition Time: 2022-11-24T09:31:30Z
Status: True
Type: ControlPlaneComponentsHealthy
Last Transition Time: 2022-11-24T12:48:26Z
Message: etcd member autonomy-elastic-dev-cluster-bmwb9 does not have a corresponding machine
Reason: EtcdClusterUnhealthy
Severity: Error
Status: False
Type: EtcdClusterHealthy
Last Transition Time: 2022-11-24T09:32:21Z
Status: True
Type: MachinesReady
Last Transition Time: 2022-11-24T09:31:31Z
Message: Scaling up control plane to 3 replicas (actual 2)
Reason: ScalingUp
Severity: Warning
Status: False
Type: Resized
Initialized: true
Observed Generation: 1
Ready: true
Ready Replicas: 2
Replicas: 2
Selector: cluster.x-k8s.io/cluster-name=autonomy-elastic-dev-cluster,cluster.x-k8s.io/control-plane
Unavailable Replicas: 0
Updated Replicas: 2
Version: v1.21.11
Events:
Type Reason Age From Message

Warning ControlPlaneUnhealthy 2m24s (x4435 over 17h) kubeadm-control-plane-controller Waiting for control plane to pass preflight checks to continue reconciliation: [machine autonomy-elastic-dev-cluster-bmwb9 does not have APIServerPodHealthy condition, machine autonomy-elastic-dev-cluster-bmwb9 does not have ControllerManagerPodHealthy condition, machine autonomy-elastic-dev-cluster-bmwb9 does not have SchedulerPodHealthy condition, machine autonomy-elastic-dev-cluster-bmwb9 does not have EtcdPodHealthy condition, machine autonomy-elastic-dev-cluster-bmwb9 does not have EtcdMemberHealthy condition]

andyzheung · 2022-11-25T03:28:47Z

Struggle to solve it, I find some related issues paste here:
kubernetes-sigs/cluster-api#5477
kubernetes-sigs/cluster-api#5509
vmware-tanzu/tanzu-framework#954

andyzheung · 2022-11-25T04:01:33Z

I try to change the:

--bootstrap-token-ttl=90m
then:
kubectl logs -n capi-kubeadm-control-plane-system capi-kubeadm-control-plane-controller-manager-75d5f9d99-5vdgc

and I find ssh into my second controller plane node , see the cloud-init.logs:
vi cloud-init-output.log

the first normal controller plane node cloud-init-output.log like this:

try to ignore preflight-errors:
sudo kubeadm join xxxxx --token xxxx
--discovery-token-ca-cert-hash xxxx
--control-plane
--ignore-preflight-errors=all

still:

try to rm manifests:

but It seem that the cloud-init is hanging, cloud-init can't execute.

where is cloud-init files that I think i need to changed it can solve this problem.

andyzheung · 2022-11-26T03:27:14Z

two boss form vmware can help me solve the problem above or give some idea! really thanks。 @srm09 @fabriziopandini

I just want to solve the cluster autoscaler on vsphere. and I have used single controller plane + autoscaler as following picuture, and running ok, and can get CA capibility..

next I want solve the controller plane 3 node HA above, and then manage more workload clusters like this, I don't know if this architecture is right?

===》this multi workload cluster autoscaler that I have got them running..
So the remain problem is:
how can I create the 3 controller plane nodes use CAPI and CAPV..
and additional problem is:
If I can use the this CA capibility in a internal network environment that can't access internet? cloud-init or image pull may need internet? and any others?if cloud-init need internet, how to solve it..

srm09 · 2022-11-28T21:36:41Z

Answering this question/comment, kubeadm cleans up the scripts after the init/join command fails which is what is being referred to in the logs. No problem with kubeadm in the environment. You can check the /var/log/cloud-init-output.log to see the set of steps that are run which would show that kubeadm removing this script.

srm09 · 2022-11-28T22:20:52Z

how can I create the 3 controller plane nodes use CAPI and CAPV..

Setting the CONTROL_PLANE_MACHINE_COUNT environment variable to 3 should be the only change that would be needed to get a cluster with 3 control plane nodes.

Have you been able to get a single node control plane workload cluster running yet?

srm09 · 2022-11-29T01:16:17Z

If you are using clusterctl generate cluster command to generate and apply the cluster YAML, then do this

Pipe the generated cluster YAML manifest to a file via command clusterctl generate cluster abc --kubernetes-version 1.23.8 > /tmp/abc.yaml
Edit the generated manifest and add
1. Edit the csi-vsphere-config ConfigMap to include the insecure-flag = true under the [VirtualCenter x.x.x.x] heading to make sure insecure connections to vCenter via the CSI pods are enabled.
2. For good measure, update the CPI image version to match the k8s version used to create the clusters gcr.io/cloud-provider-vsphere/cpi/release/manager:v1.23.0 <<==== point to the minor version of Kubernetes being used
Apply the updated YAML via kubectl apply -f /tmp/abc.yaml.
All the machines should be created eventually, install the CNI to move the Nodes to the Ready state.

andyzheung · 2022-11-29T11:30:16Z

how can I create the 3 controller plane nodes use CAPI and CAPV..

Setting the CONTROL_PLANE_MACHINE_COUNT environment variable to 3 should be the only change that would be needed to get a cluster with 3 control plane nodes.

Have you been able to get a single node control plane workload cluster running yet?

yes, I can setup a single controller plane workload cluster and it can run well and can have cluster autoscaler.

andyzheung · 2022-11-29T11:33:07Z

Answering this question/comment, kubeadm cleans up the scripts after the init/join command fails which is what is being referred to in the logs. No problem with kubeadm in the environment. You can check the /var/log/cloud-init-output.log to see the set of steps that are run which would show that kubeadm removing this script.

I have read the cloud-init-output.log, the second controller plane logs is like, i just don't know how it become this:

andyzheung · 2022-11-29T11:38:37Z

If you are using clusterctl generate cluster command to generate and apply the cluster YAML, then do this

Pipe the generated cluster YAML manifest to a file via command clusterctl generate cluster abc --kubernetes-version 1.23.8 > /tmp/abc.yaml

Edit the generated manifest and add

Edit the csi-vsphere-config ConfigMap to include the insecure-flag = true under the [VirtualCenter x.x.x.x] heading to make sure insecure connections to vCenter via the CSI pods are enabled.

For good measure, update the CPI image version to match the k8s version used to create the clusters gcr.io/cloud-provider-vsphere/cpi/release/manager:v1.23.0 <<==== point to the minor version of Kubernetes being used

Apply the updated YAML via kubectl apply -f /tmp/abc.yaml.

All the machines should be created eventually, install the CNI to move the Nodes to the Ready state.

I am not using clusterctl. I just use the template:
#1700 (comment)
this template is download from the the repo.
and I just install the CAPI and CAPV in my existing cluster as management cluster.
CAPI: 1.2.0
CAPV: 1.2
management cluster: 1.21.0
Have any suggestion that if I have any problem that I don't consider?

In fact my only problem is :
1、how can I create the 3 controller plane nodes use CAPI and CAPV.. I think all things are ready, maybe only a little things that I don't consider? because I can create a single controller plane.
2、If I can use the this CA capibility in a internal network environment that can't access internet? cloud-init or image pull may need internet? and any others?if cloud-init need internet, how to solve it..

srm09 · 2022-11-29T17:04:27Z

CAPV: 1.2

Could you use the latest CAPV version, v1.5.0

how can I create the 3 controller plane nodes use CAPI and CAPV.. I think all things are ready, maybe only a little things that I don't consider? because I can create a single controller plane.

The replica number for the KubeadmControlPlane object needs to be set to 3 for a 3 node control plane.

If I can use the this CA capibility in a internal network environment that can't access internet? cloud-init or image pull may need internet? and any others?if cloud-init need internet, how to solve it..

Could you raise this question in the kubeadm repo or Slack channel? They might have a way documented for this one. Essentially you'd need a custom repository in the internal network hosting the images and have the nodes be able to access this repo by updating the containerd settings via the /etc/containerd/config.toml file. Here is a rough blog I found for that.

andyzheung · 2022-11-30T02:26:45Z

Could you use the latest CAPV version, v1.5.0

--->Is related to CAPV version? I think v1.2.0 is higher enough?
The replica number for the KubeadmControlPlane object needs to be set to 3 for a 3 node control plane. --->I have modified it to 3, but i seems have some problem i metions above.

srm09 · 2023-02-16T18:32:58Z

Were you able to resolve this issue? Is there anything else I can do to help?

srm09 · 2023-02-16T19:15:53Z

/lifecycle frozen

srm09 · 2023-03-16T19:38:07Z

/close
Closing due to inactivity

k8s-ci-robot · 2023-03-16T19:38:14Z

@srm09: Closing this issue.

In response to this:

/close
Closing due to inactivity

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

andyzheung mentioned this issue Nov 25, 2022

Specified --control-plane-machine-count=3 but only one master node was created kubernetes-sigs/cluster-api-provider-ibmcloud#305

Closed

andyzheung mentioned this issue Nov 25, 2022

can help me to solve how to deploy 3 controller plane workload cluster use capi+capv? kubernetes-sigs/cluster-api#7619

Closed

srm09 closed this as completed Nov 28, 2022

srm09 reopened this Nov 28, 2022

k8s-ci-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Feb 16, 2023

k8s-ci-robot closed this as completed Mar 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About CAPV related to CAPI #1700

About CAPV related to CAPI #1700

andyzheung commented Nov 24, 2022 •

edited

Loading

andyzheung commented Nov 24, 2022 •

edited

Loading

andyzheung commented Nov 24, 2022 •

edited

Loading

andyzheung commented Nov 24, 2022 •

edited

Loading

andyzheung commented Nov 24, 2022

andyzheung commented Nov 24, 2022 •

edited

Loading

andyzheung commented Nov 24, 2022 •

edited

Loading

andyzheung commented Nov 25, 2022

andyzheung commented Nov 25, 2022

andyzheung commented Nov 25, 2022

andyzheung commented Nov 25, 2022 •

edited

Loading

andyzheung commented Nov 26, 2022 •

edited

Loading

srm09 commented Nov 28, 2022

srm09 commented Nov 28, 2022 •

edited

Loading

srm09 commented Nov 29, 2022

andyzheung commented Nov 29, 2022

andyzheung commented Nov 29, 2022 •

edited

Loading

andyzheung commented Nov 29, 2022 •

edited

Loading

srm09 commented Nov 29, 2022

andyzheung commented Nov 30, 2022

srm09 commented Feb 16, 2023

srm09 commented Feb 16, 2023

srm09 commented Mar 16, 2023

k8s-ci-robot commented Mar 16, 2023

About CAPV related to CAPI #1700

About CAPV related to CAPI #1700

Comments

andyzheung commented Nov 24, 2022 • edited Loading

andyzheung commented Nov 24, 2022 • edited Loading

andyzheung commented Nov 24, 2022 • edited Loading

andyzheung commented Nov 24, 2022 • edited Loading

andyzheung commented Nov 24, 2022

andyzheung commented Nov 24, 2022 • edited Loading

andyzheung commented Nov 24, 2022 • edited Loading

andyzheung commented Nov 25, 2022

andyzheung commented Nov 25, 2022

andyzheung commented Nov 25, 2022

andyzheung commented Nov 25, 2022 • edited Loading

andyzheung commented Nov 26, 2022 • edited Loading

srm09 commented Nov 28, 2022

srm09 commented Nov 28, 2022 • edited Loading

srm09 commented Nov 29, 2022

andyzheung commented Nov 29, 2022

andyzheung commented Nov 29, 2022 • edited Loading

andyzheung commented Nov 29, 2022 • edited Loading

srm09 commented Nov 29, 2022

andyzheung commented Nov 30, 2022

srm09 commented Feb 16, 2023

srm09 commented Feb 16, 2023

srm09 commented Mar 16, 2023

k8s-ci-robot commented Mar 16, 2023

andyzheung commented Nov 24, 2022 •

edited

Loading

andyzheung commented Nov 24, 2022 •

edited

Loading

andyzheung commented Nov 24, 2022 •

edited

Loading

andyzheung commented Nov 24, 2022 •

edited

Loading

andyzheung commented Nov 24, 2022 •

edited

Loading

andyzheung commented Nov 24, 2022 •

edited

Loading

andyzheung commented Nov 25, 2022 •

edited

Loading

andyzheung commented Nov 26, 2022 •

edited

Loading

srm09 commented Nov 28, 2022 •

edited

Loading

andyzheung commented Nov 29, 2022 •

edited

Loading

andyzheung commented Nov 29, 2022 •

edited

Loading