Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deploy kubermatic-seed action does not know about origin cluster name #10589

Closed
almereyda opened this issue Jul 29, 2022 · 12 comments
Closed

deploy kubermatic-seed action does not know about origin cluster name #10589

almereyda opened this issue Jul 29, 2022 · 12 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale.

Comments

@almereyda
Copy link

What happened?

After linking a seed cluster to a master cluster twice, once within the same cluster and once with another KubeOne inception, the deploy kubermatic-seed action did not know about its origin cluster's DNS zone:

image

The IP address is correct (and now outdated), and after applying the *.kubermatic. subdomain below the cluster's, everything will work correctly eventually.

Expected behavior

Kubermatic knows the name and the DNS name of the origin cluster, and displays it accordingly.

How to reproduce the issue?

Follow the installation instructions.

How is your environment configured?

  • KKP version: v.2.20.5
  • Shared master/seed cluster: yes

What cloud provider are you running on?

DigitalOcean

@almereyda almereyda added the kind/bug Categorizes issue or PR as related to a bug. label Jul 29, 2022
@xrstf
Copy link
Contributor

xrstf commented Aug 8, 2022

Can you show your KubermaticConfiguration? The domain should be taken from there (spec.ingress.domain), KKP does not do any DNS zone magic in the background.

@almereyda
Copy link
Author

almereyda commented Aug 10, 2022

Thank you for helping me to investigate this:

`KubermaticConfiguration`
# Copyright 2020 The Kubermatic Kubernetes Platform contributors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# compare against kubermatic.example.yaml from the release tarball and `kubermatic-installer print kubermaticconfiguration 2> kubermaticconfiguration.yaml`
apiVersion: kubermatic.k8c.io/v1
kind: KubermaticConfiguration
metadata:
    name: example-fra1-control
    namespace: kubermatic
spec:
    featureGates:
        KonnectivityService: true
        UserClusterMLA: true
    ingress:
        domain: cluster.example.com
        certificateIssuer:
            apiGroup: null
            kind: ClusterIssuer
            name: letsencrypt-prod
        className: nginx
    auth:
        clientID: kubermatic
        issuerClientID: kubermaticIssuer
        skipTokenIssuerTLSVerify: false
        tokenIssuer: https://cluster.example.com/dex
        issuerClientSecret: ENC[AES256_GCM,data:,type:str]
        issuerCookieKey: ENC[AES256_GCM,data:,type:str]
        serviceAccountKey: ENC[AES256_GCM,data:,type:str]
    ui:
        replicas: 2
    api:
        replicas: 2
        accessibleAddons:
            - cluster-autoscaler
            - node-exporter
            - kube-state-metrics
            - multus
            - hubble
            - metallb
    userCluster:
        apiserverReplicas: 2
        addons:
            default: null
            defaultManifests: |-
                apiVersion: v1
                kind: List
                items:
                - apiVersion: kubermatic.k8c.io/v1
                  kind: Addon
                  metadata:
                    name: cilium
                    labels:
                      addons.kubermatic.io/ensure: true
                - apiVersion: kubermatic.k8c.io/v1
                  kind: Addon
                  metadata:
                    name: csi
                    labels:
                      addons.kubermatic.io/ensure: true
                - apiVersion: kubermatic.k8c.io/v1
                  kind: Addon
                  metadata:
                    name: rbac
                    labels:
                      addons.kubermatic.io/ensure: true
                - apiVersion: kubermatic.k8c.io/v1
                  kind: Addon
                  metadata:
                    name: kubeadm-configmap
                    labels:
                      addons.kubermatic.io/ensure: true
                - apiVersion: kubermatic.k8c.io/v1
                  kind: Addon
                  metadata:
                    name: kubelet-configmap
                - apiVersion: kubermatic.k8c.io/v1
                  kind: Addon
                  metadata:
                    name: default-storage-class
                - apiVersion: kubermatic.k8c.io/v1
                  kind: Addon
                  metadata:
                    name: pod-security-policy
                    labels:
                      addons.kubermatic.io/ensure: true
        monitoring:
            customScrapingConfigs: |-
                - job_name: 'crunchy-postgres-exporter'
                  kubernetes_sd_configs:
                  - role: pod

                  relabel_configs:
                  - source_labels: [__meta_kubernetes_pod_label_postgres_operator_crunchydata_com_crunchy_postgres_exporter,__meta_kubernetes_pod_label_crunchy_postgres_exporter]
                    action: keep
                    regex: true
                    separator: ""
                  - source_labels: [__meta_kubernetes_pod_container_port_number]
                    action: drop
                    regex: 5432
                  - source_labels: [__meta_kubernetes_pod_container_port_number]
                    action: drop
                    regex: 10000
                  - source_labels: [__meta_kubernetes_pod_container_port_number]
                    action: drop
                    regex: 8009
                  - source_labels: [__meta_kubernetes_pod_container_port_number]
                    action: drop
                    regex: 2022
                  - source_labels: [__meta_kubernetes_pod_container_port_number]
                    action: drop
                    regex: ^$
                  - source_labels: [__meta_kubernetes_namespace]
                    action: replace
                    target_label: kubernetes_namespace
                  - source_labels: [__meta_kubernetes_pod_name]
                    target_label: pod
                  - source_labels: [__meta_kubernetes_pod_label_postgres_operator_crunchydata_com_cluster,__meta_kubernetes_pod_label_pg_cluster]
                    target_label: cluster
                    separator: ""
                    replacement: '$1'
                  - source_labels: [__meta_kubernetes_namespace,cluster]
                    target_label: pg_cluster
                    separator: ":"
                    replacement: '$1$2'
                  - source_labels: [__meta_kubernetes_pod_ip]
                    target_label: ip
                    replacement: '$1'
                  - source_labels: [__meta_kubernetes_pod_label_postgres_operator_crunchydata_com_instance,__meta_kubernetes_pod_label_deployment_name]
                    target_label: deployment
                    replacement: '$1'
                    separator: ""
                  - source_labels: [__meta_kubernetes_pod_label_postgres_operator_crunchydata_com_role,__meta_kubernetes_pod_label_role]
                    target_label: role
                    replacement: '$1'
                    separator: ""
                  - source_labels: [dbname]
                    target_label: dbname
                    replacement: '$1'
                  - source_labels: [relname]
                    target_label: relname
                    replacement: '$1'
                  - source_labels: [schemaname]
                    target_label: schemaname
                    replacement: '$1'
    versions:
        default: v1.23.9
        versions:
            - v1.22.12
            - v1.23.9
sops:
    kms: []
    gcp_kms: []
    azure_kv: []
    hc_vault: []
    age:
        - recipient: 
          enc: |
            -----BEGIN AGE ENCRYPTED FILE-----
            -----END AGE ENCRYPTED FILE-----
        - recipient: 
          enc: |
            -----BEGIN AGE ENCRYPTED FILE-----

            -----END AGE ENCRYPTED FILE-----
    lastmodified: ""
    mac: ENC[AES256_GCM,data:,type:str]
    pgp: []
    encrypted_regex: secret|Secret|key|Key|password|hash
    version: 3.7.3

I need to make the following notes:

The installation process of KKP main and seed went a little hacky-whacky:

  1. First invocation of kubermatic-installer deploy --config cluster.yaml --helm-values cluster.values.yaml --storageclass digitalocean created a partly working installation, due to missing DNS entries (as the load balancer was not yet provisioned before).
  2. Second invocation of kubermatic-installer deploy --config cluster.yaml --helm-values cluster.values.yaml --storageclass digitalocean helped set everything up completely, when DNS was propagated.
  3. First invocation of kubermatic-installer deploy kubermatic-seed --config seed-cluster.yaml --helm-values cluster.values.yaml --storageclass digitalocean already produced the above output.
    Here we are reusing the cluster.values.yaml, and have the keys the Seed need in there, too.
    • Then kubermatic-installer convert-kubeconfig cluster-kubeconfig > seed-cluster-kubeconfig helped to adapt the seed-cluster.yaml to inject the Kubeconfig from KubeOne with the KKP service account.
  4. Second invocation of kubermatic-installer deploy kubermatic-seed --config seed-cluster.yaml --helm-values cluster.values.yaml --storageclass digitalocean now finished the seed's deployment, with the same output as above.
    • Yet when checking the resources in the kubermatic namespace, all master components were gone.
  5. Only running the installer deploy job again would yield a working configuration of a shared master/seed cluster. Earlier KKP versions had an isMaster key in their configuration, but that seems to have seized.

@almereyda
Copy link
Author

almereyda commented Aug 10, 2022

When reinstalling Kubermatic v2.20.6 on top, we still get a valid output from the regular deploy job:

$ kubermatic-installer deploy --config cluster.yaml --helm-values cluster.values.yaml --storageclass digitalocean
...
INFO[16:29:56]    📡 Determining DNS settings…               
INFO[16:29:56]       The main LoadBalancer is ready.        
INFO[16:29:56]                                              
INFO[16:29:56]         Service             : nginx-ingress-controller / nginx-ingress-controller 
INFO[16:29:56]         Ingress via IP      : 1.2.3.4  
INFO[16:29:56]                                              
INFO[16:29:56]       Please ensure your DNS settings for "cluster.example.com" include the following records: 
INFO[16:29:56]                                              
INFO[16:29:56]          cluster.example.com.    IN  A  1.2.3.4 
INFO[16:29:56]          *.cluster.example.com.  IN  A  1.2.3.4
INFO[16:29:56]                                              
INFO[16:29:56] 🛬 Installation completed successfully. Thank you for using Kubermatic ❤ 

But again with the seed, it does not recognise the KubermaticConfiguration present in the cluster:

$ kubermatic-installer deploy kubermatic-seed --config seed-cluster.yaml --helm-values cluster.values.yaml --storageclass digitalocean
...
INFO[16:34:57]    📡 Determining DNS settings…               
INFO[16:34:57]       The main LoadBalancer is ready.        
INFO[16:34:57]                                              
INFO[16:34:57]         Service             : kubermatic / nodeport-proxy 
INFO[16:34:57]         Ingress via IP      : 2.3.4.5 
INFO[16:34:57]                                              
INFO[16:34:57]       Please ensure your DNS settings for "" includes the following record: 
INFO[16:34:57]                                              
INFO[16:34:57]          *.kubermatic..  IN  A  2.3.4.5 
INFO[16:34:57]                                              
INFO[16:34:57] 🛬 Installation completed successfully. Have a nice day! 

@xrstf
Copy link
Contributor

xrstf commented Aug 11, 2022

I have adapted the metadata.name field. In kubermatic.example.yaml from the Kubermatic tarball, it is set to kubermatic, yet not so in kubermatic-installer print kubermaticconfiguration or on https://docs.kubermatic.com/kubermatic/v2.20/tutorials_howtos/kkp_configuration/, which asks one to set it oneself. Can this cause issues?

KKP lists all KubermaticConfigurations in its namespace and takes the first one it finds (if there are multiple, the operator will crash on purpose). So in effect the name you give your KubermaticConfiguration does not matter, as long as you have exactly 1 in your namespace.

But again with the seed, it does not recognise the KubermaticConfiguration present in the cluster:

Let me investigate further, this might just be a simple bug in the installer.

@xrstf
Copy link
Contributor

xrstf commented Aug 11, 2022

In your list of whackiness, you mention

kubermatic-installer deploy --config cluster.yaml --helm-values cluster.values.yaml --storageclass digitalocean
kubermatic-installer deploy kubermatic-seed --config seed-cluster.yaml

Why are you using a different KubermaticConfiguration on the seed (--config is the KubermaticConfiguration, not the kubeconfig)? The KC is global for the entire KKP setup and needs to be identical on every master/seed cluster.

Yet when checking the resources in the kubermatic namespace, all master components were gone.

That sounds weird... never experienced or seen this one. If that happens again, check if you

  1. have a KubermaticConfiguration in our kubermatic namespace, then
  2. check the kubermatic-operator logs

If there is no operator, check if the Deployment is gone or just for whatever reason scaled to 0. If an operator is running and you have a KubermaticConfiguration, then the operator should output errors if it cannot reconcile the master or seed clusters.

@almereyda
Copy link
Author

Thank you! This was me reading/interpreting the documentation wrongly. It just felt right to have one configuration file for the master, and one for the seed. Applying the seed manifest later on through kubectl seemed to break the flow with the kubermatic-installer. I believe many of this difficulties emerge from using a shared master/seed cluster, which does not only mix these two components, but with KubeOne a third for a single cluster deployment.

Many of your clear statements would have helped, if they appeared in the documentation:

KKP lists all KubermaticConfigurations in its namespace and takes the first one it finds (if there are multiple, the operator will crash on purpose). So in effect the name you give your KubermaticConfiguration does not matter, as long as you have exactly 1 in your namespace.

The KC is global for the entire KKP setup and needs to be identical on every master/seed cluster.

Of course I should not pass the Seed object to the kubermatic-seed, action, but the master's configuration.

Here we go with an updated run, and now this issue disappears:

$ kubermatic-installer deploy kubermatic-seed --config cluster.yaml --helm-values cluster.values.yaml --storageclass digitalocean 
INFO[16:55:46] 🚀 Initializing installer…                     edition="Community Edition" version=v2.20.6
INFO[16:55:46] 🚦 Validating the provided configuration…     
INFO[16:55:46] ✅ Provided configuration is valid.           
INFO[16:55:47] 🚦 Validating existing installation…          
INFO[16:55:47] ✅ Existing installation is valid.            
INFO[16:55:47] 🛫 Deploying KKP seed stack…                  
INFO[16:55:47]    💾 Deploying kubermatic-fast StorageClass… 
INFO[16:55:47]    ✅ StorageClass exists, nothing to do.     
INFO[16:55:47]    📦 Deploying Minio…                        
INFO[16:55:48]       Release is up-to-date, nothing to do. Set --force to re-install anyway. 
INFO[16:55:48]    ✅ Success.                                
INFO[16:55:48]    📦 Deploying S3 Exporter…                  
INFO[16:55:48]       Release is up-to-date, nothing to do. Set --force to re-install anyway. 
INFO[16:55:48]    ✅ Success.                                
INFO[16:55:48]    📦 Deploying KKP Dependencies…             
INFO[16:55:51]    ✅ Success.                                
INFO[16:55:51]    📡 Determining DNS settings…               
INFO[16:55:51]       The main LoadBalancer is ready.        
INFO[16:55:51]                                              
INFO[16:55:51]         Service             : kubermatic / nodeport-proxy 
INFO[16:55:51]         Ingress via IP      : 3.4.5.6 
INFO[16:55:51]                                              
INFO[16:55:51]       Please ensure your DNS settings for "cluster.example.com" includes the following record: 
INFO[16:55:51]                                              
INFO[16:55:51]          *.kubermatic.cluster.example.com.  IN  A  3.4.5.6 
INFO[16:55:51]                                              
INFO[16:55:51] 🛬 Installation completed successfully. Time for a break, maybe?

Thank you for your help, this was human error in the end.

To ease this for further iterations, could that deploy kubermatic-seed job fail, if it doesn't find a KubermaticConfiguration in the provided config file? In other words: Would it be good for deploy jobs to validate that a KubermaticConfiguration is present in the provided config file?

Else feel free to close here.

@xrstf
Copy link
Contributor

xrstf commented Aug 11, 2022

To ease this for further iterations, could that deploy kubermatic-seed job fail, if it doesn't find a KubermaticConfiguration in the provided config file?

That is exactly what I was wondering as well. Reading a YAML file, totally failing to find anything useful or resembling a KubermaticConfiguration, and then still continuing is both scary and amazing, given how nothing really "exploded". I remember that we switchted to strict YAML unmarshalling, but maybe this broke when we updated to yaml.v3.

Just to be sure, your seed-config.yaml is actually a Seed as YAML and not a KubermaticConfiguration, right`?

@almereyda
Copy link
Author

almereyda commented Aug 11, 2022

Yes, it is a Seed configuration, thank you for asking. Happy having found that slip.

I'm not sure how much YAML validation is built into Kubermatic, at least now I've learnt the term unmarshalling, which probably builds on native Go types. But when I do this with my custom-something configuration, I get valid results, without revealing the contents of the file seed-cluster.yaml:

$ ./seed.validate.sh seed-cluster.yaml split
Wrote split/secret-kubeconfig-kubermatic.yaml -- 9017 bytes.
Wrote split/seed-kubermatic.yaml -- 1131 bytes.
2 files generated.
Validating ./split/secret-kubeconfig-kubermatic.yaml...
Validation success! 👍
Validating ./split/seed-kubermatic.yaml...
Validation success! 👍
`seed.validate.sh`
#!/usr/bin/env bash
# pip install yemale
# kubectl krew install slice
kubectl slice -f $1 -o $2

files=( $(ls $2/*.yaml | cut -d"/" -f2) )

for i in ${!files[@]}
do
  pattern=${files[i]%%-*}
  yamale -s schema/seed-$pattern.schema.yaml $2/${files[i]}
done

Plus these schema files.

`schema/seed-secret.schema.yaml`
apiVersion: regex('v1')
kind: regex('Secret')
metadata:
  name: regex('kubeconfig-kubermatic')
  namespace: regex('kubermatic')
type: regex('Opaque')
data:
  kubeconfig: str()
`schema/seed-seed.schema.yaml`
apiVersion: regex('kubermatic.k8c.io/v1')
kind: regex('Seed')
metadata:
    name: regex('kubermatic')
    namespace: regex('kubermatic')
spec:
    country: str(required=False)
    location: str(required=False)
    datacenters: map(include('datacenters_spec'),min=1)
    metering: include('metering_spec',required=False)
    mla: include('mla_spec',required=False)
    kubeconfig:
      name: regex('kubeconfig-kubermatic')
      namespace: regex('kubermatic')

---

datacenters_spec:
  country: str(required=False)
  location: str(required=False)
  spec: include('datacenter_spec')

---

datacenter_spec: map(any(), min=1)

---

metering_spec:
  enabled: bool()
  storageClassName: str()
  storageSize: str()

---

mla_spec:
  userClusterMLAEnabled: bool()

Maybe at some point Kubermatic offers and/or uses similar schemas to allow users the validation of their manifests before application, think a --dry-run flag for the deploy actions or independent validate master and validate seed jobs.


If you don't need anything else from me, I'm happy to close here, since we already put a mitigation into place.

If you would like to continue with extended manual validation, or a separate --dry-run, I'm happy to create a follow up story, and close here, too.

@kubermatic-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.
After a furter 30 days, they will turn rotten.
Mark the issue as fresh with /remove-lifecycle stale.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@kubermatic-bot kubermatic-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 10, 2022
@kubermatic-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten

@kubermatic-bot kubermatic-bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 10, 2022
@kubermatic-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

@kubermatic-bot
Copy link
Contributor

@kubermatic-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale.
Projects
None yet
Development

No branches or pull requests

3 participants