vCluster creating more trouble than helping(due to different causes) #1787

MichaelKora · 2024-05-22T09:58:55Z

What happened?

I honestly if it is supposed to be that hard but vcluster is creating more trouble than solutions...i've been working on getting a prod ready cluster for over a week and its not working. Right now the cluster is up and running and connect to it using NodePort Service... the issues:

When i deploy the cluster, it takes over 60min for the cluster Pods to be running in healthy state , so i can communicate with the vcluster
the coredns pod though in state running is full of errors:

[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.Service: failed to list *v1.Service: Unauthorized

[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.Service: Unauthorized 
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.Service: failed to list *v1.Service: Unauthorized 

[INFO] plugin/kubernetes: Trace[944124959]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (24-May-2024 11:55:04.692) (total time: 16726ms): Trace[944124959]: ---"Objects listed" error:<nil> 16726ms (11:55:21.419)     Trace[944124959]: [16.726980037s] [16.726980037s] END                                                                                                                                                                                    

[INFO] plugin/kubernetes: Trace[1216864093]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (24-May-2024 11:54:50.867) (total time: 30559ms): = Trace[1216864093]: ---"Objects listed" error:<nil> 30559ms (11:55:21.426)    Trace[1216864093]: [30.55934034s] [30.55934034s] END                                                                                                                                                                                     

[INFO] plugin/kubernetes: Trace[1087029931]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (24-May-2024 11:54:49.127) (total time: 32304ms):  Trace[1087029931]: ---"Objects listed" error:<nil> 32304ms (11:55:21.432)                                                                                                                                                            Trace[1087029931]: [32.304771091s] [32.304771091s] END

When connected to the vcluster, requests delivers different responses each time:
EG: Running kubectl get namespaces might show 4 Namespaces; then 4, and then 6 etc.
Running helm on the vcluster is nearly impossible..it times out nearly every single time..

This is all bordering because i was expecting vCluster way easier to use.

What did you expect to happen?

i deployed a cluster and expected the coreDNS pods to be deployed

How can we reproduce it (as minimally and precisely as possible)?

# vcluster.yaml
exportKubeConfig:
  context: "sharedpool-context"
controlPlane:
  coredns:
    enabled: true
    embedded: false
    deployment:
      replicas: 2
      nodeSelector:
        workload: wk1
  statefulSet:
    highAvailability:
      replicas: 2
    persistence:
      volumeClaim:
        enabled: true
    scheduling:
      nodeSelector:
        workload: wk1
    resources:
      limits:
        ephemeral-storage: 20Gi
        memory: 10Gi
      requests:
        ephemeral-storage: 200Mi
        cpu: 200m
        memory: 256Mi
    
  proxy:
    bindAddress: "0.0.0.0"
    port: 8443
    extraSANs:
      - XX.XX.XX.XXX
      - YY.YY.YY.YYY

helm upgrade -i my-vcluster vcluster \
  --repo https://charts.loft.sh \
  --namespace vcluster-ns --create-namespace \
  --repository-config='' \
  -f vcluster.yaml \
  --version 0.20.0-beta.5

Anything else we need to know?

i used a nodePort service to connect to the cluster

# nodeport.yaml

apiVersion: v1
kind: Service
metadata:
  name: vcluster-nodeport
  namespace: vcluster-ns
spec:
  selector:
    app: vcluster
    release: shared-pool-vcluster
  ports:
    - name: https
      port: 443
      targetPort: 8443
      protocol: TCP
      nodePort: 31222
  type: NodePort

$ helm upgrade -i solr-operator apache-solr/solr-operator --version 0.8.1 -n solr-cloud

Release "solr-operator" does not exist. Installing it now.
Error: failed post-install: 1 error occurred:
        * timed out waiting for the condition


$ helm upgrade -i hz-operator hazelcast/hazelcast-platform-operator -n hz-vc-ns --create-namespace -f operator.yaml

Release "hz-operator" does not exist. Installing it now.
Error: 9 errors occurred:
        * Timeout: request did not complete within requested timeout - context deadline exceeded
        * Timeout: request did not complete within requested timeout - context deadline exceeded
        * Timeout: request did not complete within requested timeout - context deadline exceeded
        * Timeout: request did not complete within requested timeout - context deadline exceeded
        * Timeout: request did not complete within requested timeout - context deadline exceeded
        * Timeout: request did not complete within requested timeout - context deadline exceeded
        * Timeout: request did not complete within requested timeout - context deadline exceeded
        * Timeout: request did not complete within requested timeout - context deadline exceeded
        * Internal error occurred: resource quota evaluation timed out

Host cluster Kubernetes version

$ kubectl version
# paste output here

Host cluster Kubernetes distribution

Client Version: v1.29.1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.27.11

vlcuster version

$ vcluster --version
vcluster version 0.19.5

Vcluster Kubernetes distribution(k3s(default)), k8s, k0s)

default..
# i did no specify a speciffc distribution

OS and Arch

OS: talos
Arch: metal-amd64

The text was updated successfully, but these errors were encountered:

heiko-braun · 2024-06-03T07:55:32Z

hey @MichaelKora , it's unfortunate that have to experience these troubles.

One thing that I'd recommend is to use the latest vcluster CLI, together with 0.20.0-beta.5. From the description it appears that you using the 0.19.5 one instead.

Regarding the other issues: It's a bit hard to say from the outset what might causing your issues. You seem to be leveraging Talos. What Kubernetes distro is running on top of it?

MichaelKora · 2024-06-04T14:01:42Z

hey @heiko-braun 0.19.5 is the latest according to vcluster cli

$ sudo vcluster upgrade
15:55:48 info Current binary is the latest version: 0.19.5

i have a TalosRunning there( the default image) its based on k3s

heiko-braun · 2024-06-05T09:33:19Z

Hi @MichaelKora, you can get the latest CLI (the one to be used with 0.20 vcluster.yaml) here:
https://github.com/loft-sh/vcluster/releases/tag/v0.20.0-beta.6

heiko-braun · 2024-06-05T09:36:47Z

@MichaelKora the hazelcast and solr examples in the description, did you run the commands against the host cluster or the virtual one?

MichaelKora · 2024-06-05T11:06:01Z

@heiko-braun thanks for your response I run the command against the vcluster... when run against the host cluster, I have no issues

MichaelKora · 2024-06-05T14:40:00Z

@heiko-braun when the cluster is being created, the logs show:

 2024-06-05 14:38:37 INFO    setup/controller_context.go:196    couldn't retrieve virtual cluster version (Get "https://127.0.0.1:6443/version": dial tcp 127.0.0.1:6443: connect: connection refused), will retry in 1 seconds    {"component": "vcluster"} 

 2024-06-05 14:38:38    INFO    setup/controller_context.go:196    couldn't retrieve virtual cluster version (Get "https://127.0.0.1:6443/version": dial tcp 127.0.0.1:6443: connect: connection refused), wil l retry in 1 seconds    {"component": "vcluster"}

 2024-06-05 14:38:39    INFO    setup/controller_context.go:196    couldn't retrieve virtual cluster version (Get "https://127.0.0.1:6443/version": dial tcp 127.0.0.1:6443: connect: connection refused), will retry in 1 seconds    {"component": "vcluster"}

 2024-06-05 14:38:40    INFO    commandwriter/commandwriter.go:126    error retrieving resource lock kube-system/kube-controller-manager: Get "https://127.0.0.1:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=5s": dial tcp 127.0.0.1:6443: connect: connection refused    {"component": "vcluster", "component": "controller-manager", "location": "leaderelection.go:33 2"}

and it takes more than 60min before bringing the cluster to an healthy state..that seems verry odd to me that it takes that long to create a virt cluster

everflux · 2024-06-10T11:29:58Z

@MichaelKora how many nodes does your host cluster have, and what capacity? do you use network policies?

MichaelKora · 2024-06-10T13:20:15Z

hey @everflux i dedicated 2nodes of the host cluster to the vcluster...8cpu/32GB..i am not using any restrictive network policies

everflux · 2024-06-13T21:59:45Z

This sounds like a setup problem to me, either with the host cluster or vcluster. Did you try to setup one or multiple vclusters? (check kubectl get all -n vcluster-ns, kubectl get ns)
I am afraid a github issue might not be the right place to discuss this, perhaps the slack channel would be better suited.

deniseschannon · 2024-09-03T20:23:29Z

@MichaelKora Are you still having issues or were you able to resolve them?

MichaelKora · 2024-09-05T07:54:29Z

hey @deniseschannon, yes i am still having the issue!

MichaelKora · 2024-09-06T08:28:37Z

This sounds like a setup problem to me, either with the host cluster or vcluster. Did you try to setup one or multiple vclusters? (check kubectl get all -n vcluster-ns, kubectl get ns) I am afraid a github issue might not be the right place to discuss this, perhaps the slack channel would be better suited.

@everflux i have just one setup

MichaelKora added the kind/bug label May 22, 2024

MichaelKora closed this as completed May 23, 2024

MichaelKora changed the title ~~CoreDNS pods not deployed~~ CoreDNS pods not working May 24, 2024

MichaelKora reopened this May 24, 2024

MichaelKora changed the title ~~CoreDNS pods not working~~ vCluster creating more trouble than helping(due to different causes) May 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vCluster creating more trouble than helping(due to different causes) #1787

vCluster creating more trouble than helping(due to different causes) #1787

MichaelKora commented May 22, 2024 •

edited

Loading

heiko-braun commented Jun 3, 2024

MichaelKora commented Jun 4, 2024

heiko-braun commented Jun 5, 2024 •

edited

Loading

heiko-braun commented Jun 5, 2024

MichaelKora commented Jun 5, 2024

MichaelKora commented Jun 5, 2024 •

edited

Loading

everflux commented Jun 10, 2024

MichaelKora commented Jun 10, 2024

everflux commented Jun 13, 2024

deniseschannon commented Sep 3, 2024

MichaelKora commented Sep 5, 2024

MichaelKora commented Sep 6, 2024 •

edited

Loading

vCluster creating more trouble than helping(due to different causes) #1787

vCluster creating more trouble than helping(due to different causes) #1787

Comments

MichaelKora commented May 22, 2024 • edited Loading

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Host cluster Kubernetes version

Host cluster Kubernetes distribution

vlcuster version

Vcluster Kubernetes distribution(k3s(default)), k8s, k0s)

OS and Arch

heiko-braun commented Jun 3, 2024

MichaelKora commented Jun 4, 2024

heiko-braun commented Jun 5, 2024 • edited Loading

heiko-braun commented Jun 5, 2024

MichaelKora commented Jun 5, 2024

MichaelKora commented Jun 5, 2024 • edited Loading

everflux commented Jun 10, 2024

MichaelKora commented Jun 10, 2024

everflux commented Jun 13, 2024

deniseschannon commented Sep 3, 2024

MichaelKora commented Sep 5, 2024

MichaelKora commented Sep 6, 2024 • edited Loading

MichaelKora commented May 22, 2024 •

edited

Loading

heiko-braun commented Jun 5, 2024 •

edited

Loading

MichaelKora commented Jun 5, 2024 •

edited

Loading

MichaelKora commented Sep 6, 2024 •

edited

Loading