Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About CAPV related to CAPI #1700

Closed
andyzheung opened this issue Nov 24, 2022 · 23 comments
Closed

About CAPV related to CAPI #1700

andyzheung opened this issue Nov 24, 2022 · 23 comments
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@andyzheung
Copy link

andyzheung commented Nov 24, 2022

from this picture, is capv can't match cpai v1.2? @srm09

image

@andyzheung
Copy link
Author

andyzheung commented Nov 24, 2022

other question is:
image
the ova of this can't setup.
image

I follow this to:
https://medium.com/@abhishek.amjeet/clusterapi-for-kubernetes-a-detailed-look-on-vmware-cluster-api-2ddd541bafa9

use
CAPI: 1.2.0
CAPV: 1.2
management cluster: 1.21.0

@andyzheung
Copy link
Author

andyzheung commented Nov 24, 2022

I try this ova, that can setup the work cluster...
image

but a new problem is that only one controller plane node is ready.
image
image

kubectl logs -n capi-system capi-controller-manager-fbd594dc6-frfj8
image

Is this any logs need to check to find out the problem?

@andyzheung
Copy link
Author

andyzheung commented Nov 24, 2022

I use the template in this repo:
demo-template.zip
CONTROL_PLANE_MACHINE_COUNT='3'
Is there any other things to consider?

@andyzheung
Copy link
Author

I got into the not normal node, see the keubelt:
journalctl -xefu kubelet
image
Is there any problem about kubeadm at this environment.

@andyzheung
Copy link
Author

andyzheung commented Nov 24, 2022

I try to ssh to the controller plane node2, and
kubeadm reset
kubeadm join 10.250.71.221:6443 --token xxxxx --discovery-token-ca-cert-hash sha256:xxxxx --control-plane --certificate-key xxxxx

few minites later, I can see:
image
but still not have node name:
image
and I can't see the third controller node:

logs in capi-kubeadm-control-plane-system
kubectl logs -n capi-kubeadm-control-plane-system capi-kubeadm-control-plane-controller-manager-75d5f9d99-5vdgc
image

@andyzheung
Copy link
Author

andyzheung commented Nov 24, 2022

two boss form vmware can help me solve the problem above or give some idea!
really thanks。
@srm09 @fabriziopandini

I just want to solve the cluster autoscaler on vsphere. and I have used single controller plane + autoscaler as following picuture, and running ok, and can get CA capibility..
image

next I want solve the controller plane 3 node HA above, and then manage more workload clusters like this, I don't know if this architecture is right?
image

@andyzheung
Copy link
Author

I see this issue, I seem need to deploy CAPD? what is? I only deploy CAPI and CAPV.
kubernetes-sigs/cluster-api#4027

@andyzheung
Copy link
Author

kubectl get kubeadmcontrolplanes
image

kubectl describe kubeadmcontrolplanes
Name: autonomy-elastic-dev-cluster
Namespace: default
Labels: cluster.x-k8s.io/cluster-name=autonomy-elastic-dev-cluster
Annotations:
API Version: controlplane.cluster.x-k8s.io/v1beta1
Kind: KubeadmControlPlane
Metadata:
Creation Timestamp: 2022-11-24T09:29:42Z
Finalizers:
kubeadm.controlplane.cluster.x-k8s.io
Generation: 1
Managed Fields:
API Version: controlplane.cluster.x-k8s.io/v1beta1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.:
f:kubectl.kubernetes.io/last-applied-configuration:
f:spec:
.:
f:kubeadmConfigSpec:
.:
f:clusterConfiguration:
.:
f:apiServer:
.:
f:extraArgs:
.:
f:cloud-provider:
f:controllerManager:
.:
f:extraArgs:
.:
f:cloud-provider:
f:files:
f:initConfiguration:
.:
f:nodeRegistration:
.:
f:criSocket:
f:kubeletExtraArgs:
.:
f:cloud-provider:
f:name:
f:joinConfiguration:
.:
f:nodeRegistration:
.:
f:criSocket:
f:kubeletExtraArgs:
.:
f:cloud-provider:
f:name:
f:preKubeadmCommands:
f:users:
f:machineTemplate:
.:
f:infrastructureRef:
f:replicas:
f:rolloutStrategy:
.:
f:rollingUpdate:
.:
f:maxSurge:
f:type:
f:version:
Manager: kubectl-client-side-apply
Operation: Update
Time: 2022-11-24T09:29:42Z
API Version: controlplane.cluster.x-k8s.io/v1beta1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:finalizers:
.:
v:"kubeadm.controlplane.cluster.x-k8s.io":
f:labels:
.:
f:cluster.x-k8s.io/cluster-name:
f:ownerReferences:
.:
k:{"uid":"c90f68e8-9764-4c69-8ec3-e5771b2304d7"}:
.:
f:apiVersion:
f:blockOwnerDeletion:
f:controller:
f:kind:
f:name:
f:uid:
f:status:
.:
f:conditions:
f:initialized:
f:observedGeneration:
f:ready:
f:readyReplicas:
f:replicas:
f:selector:
f:unavailableReplicas:
f:updatedReplicas:
f:version:
Manager: manager
Operation: Update
Time: 2022-11-24T09:38:51Z
Owner References:
API Version: cluster.x-k8s.io/v1beta1
Block Owner Deletion: true
Controller: true
Kind: Cluster
Name: autonomy-elastic-dev-cluster
UID: c90f68e8-9764-4c69-8ec3-e5771b2304d7
Resource Version: 111433
UID: 42624926-090a-47ca-8e77-911e3f59c996
Spec:
Kubeadm Config Spec:
Cluster Configuration:
API Server:
Extra Args:
Cloud - Provider: external
Controller Manager:
Extra Args:
Cloud - Provider: external
Dns:
Etcd:
Networking:
Scheduler:
Files:
Content: apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
name: kube-vip
namespace: kube-system
spec:
containers:

  • args:

    • manager
      env:
    • name: cp_enable
      value: "true"
    • name: vip_interface
      value:
    • name: address
      value: 10.250.71.221
    • name: port
      value: "6443"
    • name: vip_arp
      value: "true"
    • name: vip_leaderelection
      value: "true"
    • name: vip_leaseduration
      value: "15"
    • name: vip_renewdeadline
      value: "10"
    • name: vip_retryperiod
      value: "2"
      image: ghcr.io/kube-vip/kube-vip:v0.5.5
      imagePullPolicy: IfNotPresent
      name: kube-vip
      resources: {}
      securityContext:
      capabilities:
      add:
      • NET_ADMIN
      • NET_RAW
        volumeMounts:
    • mountPath: /etc/kubernetes/admin.conf
      name: kubeconfig
      hostAliases:
  • hostnames:

    • kubernetes
      ip: 127.0.0.1
      hostNetwork: true
      volumes:
  • hostPath:
    path: /etc/kubernetes/admin.conf
    type: FileOrCreate
    name: kubeconfig
    status: {}

    Owner: root:root
    Path: /etc/kubernetes/manifests/kube-vip.yaml
    Format: cloud-config
    Init Configuration:
    Local API Endpoint:
    Node Registration:
    Cri Socket: /var/run/containerd/containerd.sock
    Kubelet Extra Args:
    Cloud - Provider: external
    Name: {{ ds.meta_data.hostname }}
    Join Configuration:
    Discovery:
    Node Registration:
    Cri Socket: /var/run/containerd/containerd.sock
    Kubelet Extra Args:
    Cloud - Provider: external
    Name: {{ ds.meta_data.hostname }}
    Pre Kubeadm Commands:
    hostname "{{ ds.meta_data.hostname }}"
    echo "::1 ipv6-localhost ipv6-loopback" >/etc/hosts
    echo "127.0.0.1 localhost" >>/etc/hosts
    echo "127.0.0.1 {{ ds.meta_data.hostname }}" >>/etc/hosts
    echo "{{ ds.meta_data.hostname }}" >/etc/hostname
    Users:
    Name: capv
    Ssh Authorized Keys:
    ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDjXZum6TwE2qL5wWgp38YA51C2fyfFHYQR7+jFrxq9QW1k3KKIPIc1wA8yMhbA3OMEeaM2/ry37ZdNUsMbATBKSvezhWs77OkZXoWPEWXTvydWf1Nze/Ny9GJAeYIPI8WfTeAo7b7+JpIqQGDMaTK4qX8wLOjTUWJ+ztWAUrXdsHMvhIEKZOUoBBiK+QELrWAS/PKT+UPf/LHnJf4VQ1cGGA/uRjjvcQTdB/XQMzT2GsbuCIDWRX6JIm3+l9VD1Q3Ehv1+zXpjVK7eU9k8XB5iTbFldDLroUlbOcgl7e8BHWUiC2iig7k4Co3Ae4+ubALIlPKXoEaFmK16j9PI+Ajp root@mgmt-master01
    Sudo: ALL=(ALL) NOPASSWD:ALL
    Machine Template:
    Infrastructure Ref:
    API Version: infrastructure.cluster.x-k8s.io/v1beta1
    Kind: VSphereMachineTemplate
    Name: autonomy-elastic-dev-cluster
    Namespace: default
    Metadata:
    Replicas: 3
    Rollout Strategy:
    Rolling Update:
    Max Surge: 1
    Type: RollingUpdate
    Version: v1.21.11
    Status:
    Conditions:
    Last Transition Time: 2022-11-24T09:31:31Z
    Message: Scaling up control plane to 3 replicas (actual 2)
    Reason: ScalingUp
    Severity: Warning
    Status: False
    Type: Ready
    Last Transition Time: 2022-11-24T09:31:04Z
    Status: True
    Type: Available
    Last Transition Time: 2022-11-24T09:29:43Z
    Status: True
    Type: CertificatesAvailable
    Last Transition Time: 2022-11-24T09:31:30Z
    Status: True
    Type: ControlPlaneComponentsHealthy
    Last Transition Time: 2022-11-24T12:48:26Z
    Message: etcd member autonomy-elastic-dev-cluster-bmwb9 does not have a corresponding machine
    Reason: EtcdClusterUnhealthy
    Severity: Error
    Status: False
    Type: EtcdClusterHealthy
    Last Transition Time: 2022-11-24T09:32:21Z
    Status: True
    Type: MachinesReady
    Last Transition Time: 2022-11-24T09:31:31Z
    Message: Scaling up control plane to 3 replicas (actual 2)
    Reason: ScalingUp
    Severity: Warning
    Status: False
    Type: Resized
    Initialized: true
    Observed Generation: 1
    Ready: true
    Ready Replicas: 2
    Replicas: 2
    Selector: cluster.x-k8s.io/cluster-name=autonomy-elastic-dev-cluster,cluster.x-k8s.io/control-plane
    Unavailable Replicas: 0
    Updated Replicas: 2
    Version: v1.21.11
    Events:
    Type Reason Age From Message


Warning ControlPlaneUnhealthy 2m24s (x4435 over 17h) kubeadm-control-plane-controller Waiting for control plane to pass preflight checks to continue reconciliation: [machine autonomy-elastic-dev-cluster-bmwb9 does not have APIServerPodHealthy condition, machine autonomy-elastic-dev-cluster-bmwb9 does not have ControllerManagerPodHealthy condition, machine autonomy-elastic-dev-cluster-bmwb9 does not have SchedulerPodHealthy condition, machine autonomy-elastic-dev-cluster-bmwb9 does not have EtcdPodHealthy condition, machine autonomy-elastic-dev-cluster-bmwb9 does not have EtcdMemberHealthy condition]

@andyzheung
Copy link
Author

Struggle to solve it, I find some related issues paste here:
kubernetes-sigs/cluster-api#5477
kubernetes-sigs/cluster-api#5509
vmware-tanzu/tanzu-framework#954

@andyzheung
Copy link
Author

andyzheung commented Nov 25, 2022

I try to change the:

  • --bootstrap-token-ttl=90m
    then:
    kubectl logs -n capi-kubeadm-control-plane-system capi-kubeadm-control-plane-controller-manager-75d5f9d99-5vdgc
    image

and I find ssh into my second controller plane node , see the cloud-init.logs:
vi cloud-init-output.log
image

the first normal controller plane node cloud-init-output.log like this:
image

try to ignore preflight-errors:
sudo kubeadm join xxxxx --token xxxx
--discovery-token-ca-cert-hash xxxx
--control-plane
--ignore-preflight-errors=all

still:
image

try to rm manifests:
image

but It seem that the cloud-init is hanging, cloud-init can't execute.
image

where is cloud-init files that I think i need to changed it can solve this problem.

@andyzheung
Copy link
Author

andyzheung commented Nov 26, 2022

two boss form vmware can help me solve the problem above or give some idea! really thanks。 @srm09 @fabriziopandini

I just want to solve the cluster autoscaler on vsphere. and I have used single controller plane + autoscaler as following picuture, and running ok, and can get CA capibility.. image

next I want solve the controller plane 3 node HA above, and then manage more workload clusters like this, I don't know if this architecture is right? image


===》this multi workload cluster autoscaler that I have got them running..
So the remain problem is:
how can I create the 3 controller plane nodes use CAPI and CAPV..
and additional problem is:
If I can use the this CA capibility in a internal network environment that can't access internet? cloud-init or image pull may need internet? and any others?if cloud-init need internet, how to solve it..

@srm09
Copy link
Contributor

srm09 commented Nov 28, 2022

Answering this question/comment, kubeadm cleans up the scripts after the init/join command fails which is what is being referred to in the logs. No problem with kubeadm in the environment. You can check the /var/log/cloud-init-output.log to see the set of steps that are run which would show that kubeadm removing this script.

@srm09
Copy link
Contributor

srm09 commented Nov 28, 2022

how can I create the 3 controller plane nodes use CAPI and CAPV..

Setting the CONTROL_PLANE_MACHINE_COUNT environment variable to 3 should be the only change that would be needed to get a cluster with 3 control plane nodes.

Have you been able to get a single node control plane workload cluster running yet?

@srm09 srm09 closed this as completed Nov 28, 2022
@srm09 srm09 reopened this Nov 28, 2022
@srm09
Copy link
Contributor

srm09 commented Nov 29, 2022

If you are using clusterctl generate cluster command to generate and apply the cluster YAML, then do this

  1. Pipe the generated cluster YAML manifest to a file via command clusterctl generate cluster abc --kubernetes-version 1.23.8 > /tmp/abc.yaml
  2. Edit the generated manifest and add
    1. Edit the csi-vsphere-config ConfigMap to include the insecure-flag = true under the [VirtualCenter x.x.x.x] heading to make sure insecure connections to vCenter via the CSI pods are enabled.
    2. For good measure, update the CPI image version to match the k8s version used to create the clusters gcr.io/cloud-provider-vsphere/cpi/release/manager:v1.23.0 <<==== point to the minor version of Kubernetes being used
  3. Apply the updated YAML via kubectl apply -f /tmp/abc.yaml.
  4. All the machines should be created eventually, install the CNI to move the Nodes to the Ready state.

@andyzheung
Copy link
Author

how can I create the 3 controller plane nodes use CAPI and CAPV..

Setting the CONTROL_PLANE_MACHINE_COUNT environment variable to 3 should be the only change that would be needed to get a cluster with 3 control plane nodes.

Have you been able to get a single node control plane workload cluster running yet?


yes, I can setup a single controller plane workload cluster and it can run well and can have cluster autoscaler.

@andyzheung
Copy link
Author

andyzheung commented Nov 29, 2022

Answering this question/comment, kubeadm cleans up the scripts after the init/join command fails which is what is being referred to in the logs. No problem with kubeadm in the environment. You can check the /var/log/cloud-init-output.log to see the set of steps that are run which would show that kubeadm removing this script.

I have read the cloud-init-output.log, the second controller plane logs is like, i just don't know how it become this:
image

@andyzheung
Copy link
Author

andyzheung commented Nov 29, 2022

If you are using clusterctl generate cluster command to generate and apply the cluster YAML, then do this

  1. Pipe the generated cluster YAML manifest to a file via command clusterctl generate cluster abc --kubernetes-version 1.23.8 > /tmp/abc.yaml

  2. Edit the generated manifest and add

    1. Edit the csi-vsphere-config ConfigMap to include the insecure-flag = true under the [VirtualCenter x.x.x.x] heading to make sure insecure connections to vCenter via the CSI pods are enabled.
    2. For good measure, update the CPI image version to match the k8s version used to create the clusters gcr.io/cloud-provider-vsphere/cpi/release/manager:v1.23.0 <<==== point to the minor version of Kubernetes being used
  3. Apply the updated YAML via kubectl apply -f /tmp/abc.yaml.

  4. All the machines should be created eventually, install the CNI to move the Nodes to the Ready state.

I am not using clusterctl. I just use the template:
#1700 (comment)
this template is download from the the repo.
and I just install the CAPI and CAPV in my existing cluster as management cluster.
CAPI: 1.2.0
CAPV: 1.2
management cluster: 1.21.0
Have any suggestion that if I have any problem that I don't consider?

In fact my only problem is :
1、how can I create the 3 controller plane nodes use CAPI and CAPV.. I think all things are ready, maybe only a little things that I don't consider? because I can create a single controller plane.
2、If I can use the this CA capibility in a internal network environment that can't access internet? cloud-init or image pull may need internet? and any others?if cloud-init need internet, how to solve it..

@srm09
Copy link
Contributor

srm09 commented Nov 29, 2022

CAPV: 1.2

Could you use the latest CAPV version, v1.5.0

how can I create the 3 controller plane nodes use CAPI and CAPV.. I think all things are ready, maybe only a little things that I don't consider? because I can create a single controller plane.

The replica number for the KubeadmControlPlane object needs to be set to 3 for a 3 node control plane.

If I can use the this CA capibility in a internal network environment that can't access internet? cloud-init or image pull may need internet? and any others?if cloud-init need internet, how to solve it..

Could you raise this question in the kubeadm repo or Slack channel? They might have a way documented for this one. Essentially you'd need a custom repository in the internal network hosting the images and have the nodes be able to access this repo by updating the containerd settings via the /etc/containerd/config.toml file. Here is a rough blog I found for that.

@andyzheung
Copy link
Author

Could you use the latest CAPV version, v1.5.0

--->Is related to CAPV version? I think v1.2.0 is higher enough?
The replica number for the KubeadmControlPlane object needs to be set to 3 for a 3 node control plane. --->I have modified it to 3, but i seems have some problem i metions above.

@srm09
Copy link
Contributor

srm09 commented Feb 16, 2023

Were you able to resolve this issue? Is there anything else I can do to help?

@srm09
Copy link
Contributor

srm09 commented Feb 16, 2023

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Feb 16, 2023
@srm09
Copy link
Contributor

srm09 commented Mar 16, 2023

/close
Closing due to inactivity

@k8s-ci-robot
Copy link
Contributor

@srm09: Closing this issue.

In response to this:

/close
Closing due to inactivity

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
None yet
Development

No branches or pull requests

3 participants