Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cilium CLI missing (symlink when should actually include the tool #6247

Closed
BloodyIron opened this issue Jun 25, 2024 · 16 comments
Closed

Cilium CLI missing (symlink when should actually include the tool #6247

BloodyIron opened this issue Jun 25, 2024 · 16 comments

Comments

@BloodyIron
Copy link

Environmental Info:
RKE2 Version: v1.26.15+rke2r1

The Cilium CNI installed via the helm charts (using Rancher to provision the cluster) produces cluster node pods that do not have the cilium cli application installed at all. Instead there is a symlink to cilium-dbg. The problem with this is there is diagnostic and troubleshooting functionality that I need in the cilium cli tool that is not available in the other available cilium-related commands.

I read the documentation and searched on google, and cannot find a way to get the cilium cli properly installed via this method. So can we please have this corrected? It really makes most of the cilium troubleshooting documentation completely useless since they constantly refer to troubleshooting and functions that are only provided by the proper cilium cli tool.

@rbrtbnfgl
Copy link
Contributor

rbrtbnfgl commented Jun 25, 2024

We mirrored the cilium image from the Cilium repo. Cilium-cli should be used when Cilium itself is installed using the cli and not the helmchart. What type of command do you need on the cli?
I was able to use the Cilium cli building it from https://github.com/cilium/cilium-cli
I just modified the chart name from the code

--- a/defaults/defaults.go
+++ b/defaults/defaults.go
@@ -126,7 +126,7 @@ const (
        IngressSecretsNamespace = "cilium-secrets"
 
        // HelmReleaseName is the default Helm release name for Cilium.
-       HelmReleaseName               = "cilium"
+       HelmReleaseName               = "rke2-cilium"
        HelmValuesSecretName          = "cilium-cli-helm-values"
        HelmValuesSecretKeyName       = "io.cilium.cilium-cli"
        HelmChartVersionSecretKeyName = "io.cilium.chart-version"

@BloodyIron
Copy link
Author

We mirrored the cilium image from the Cilium repo. Cilium-cli should be used when Cilium itself is installed using the cli and not the helmchart. What type of command do you need on the cli? I was able to use the Cilium cli building it from https://github.com/cilium/cilium-cli I just modified the chart name from the code

--- a/defaults/defaults.go
+++ b/defaults/defaults.go
@@ -126,7 +126,7 @@ const (
        IngressSecretsNamespace = "cilium-secrets"
 
        // HelmReleaseName is the default Helm release name for Cilium.
-       HelmReleaseName               = "cilium"
+       HelmReleaseName               = "rke2-cilium"
        HelmValuesSecretName          = "cilium-cli-helm-values"
        HelmValuesSecretKeyName       = "io.cilium.cilium-cli"
        HelmChartVersionSecretKeyName = "io.cilium.chart-version"

One quick example is checking clustermesh status, but there's some others too: https://docs.cilium.io/en/stable/operations/troubleshooting/#automatic-verification

The point of me using Rancher and RKE2 is not having to go and recompile code or rehost my own variants of the tooling presented. I don't even know how I would make that kind of a modification without compromising future Rancher/RKE2/related updates, which is a significant concern of mine.

@rbrtbnfgl
Copy link
Contributor

Are you sure that the status command is not working?
It should work I can do some tests tomorrow and I'll give you some feedback.

@brandond
Copy link
Member

To rephrase what @rbrtbnfgl said

Cilium-cli should only be used when Cilium itself is installed using the cli and not the helmchart.

The cilium status can be read from various CRDs, right?

@BloodyIron
Copy link
Author

image

I don't even see the documentation from Cilium mentioning CRDs for such troubleshooting so I would be going in blind. And yes, I am sure that the command doesnt work @rbrtbnfgl as the cilium command is a symlink to cilium-dbg which has a completely different set of capabilities and commands.

@rbrtbnfgl
Copy link
Contributor

Are you running the cli from the cilium pod? From the Cilium docs you have to install directly from the node with https://docs.cilium.io/en/stable/operations/troubleshooting/#install-the-cilium-cli

@rbrtbnfgl
Copy link
Contributor

I was able to enable the clustermesh and get the right status.

cilium clustermesh status
⚠  Cluster not configured for clustermesh, use '--set cluster.id' and '--set cluster.name' with 'cilium install'. External workloads may still be configured.
⚠  Service type NodePort detected! Service may fail when nodes are removed from the cluster!
✅ Service "clustermesh-apiserver" of type "NodePort" found
✅ Cluster access information is available:
  - 10.1.1.11:32379
✅ Deployment clustermesh-apiserver is ready
ℹ  KVStoreMesh is disabled


🔌 No cluster connected

🔀 Global services: [ min:-1 / avg:0.0 / max:0 ]

@brandond
Copy link
Member

@rbrtbnfgl can you perhaps show an example HelmChartConfig to enable and configure clustermesh via chart values?

@BloodyIron
Copy link
Author

Are you running the cli from the cilium pod? From the Cilium docs you have to install directly from the node with https://docs.cilium.io/en/stable/operations/troubleshooting/#install-the-cilium-cli

Yeah I'm not installing software on my nodes that doesn't come from a package manager source. That's just creating future problems I'm not interested in having (namely the package manager not being aware of it and never updating it).

I'm in agreement with @brandond that it seems preferable to have a helmchart config setting to enable this, or I dunno... have it being present by default instead of how it is now. Any chance we can make that happen? (I'd prefer it just be there by default)

@brandond
Copy link
Member

brandond commented Jun 27, 2024

Yeah I'm not installing software on my nodes that doesn't come from a package manager

This would be something to raise with the cilium team. We just consume their chart and images; we don't control how they package and distribute the node binaries.

I hope you also realize that the RKE2 and Cilium are both already "installing software on your nodes" by extracting binaries from images and placing them on the root fs, without using the package manager.

it seems preferable to have a helmchart config setting to enable this, or I dunno... have it being present by default

Are you talking about enabling clustermesh by default? I don't think everyone would want that enabled by default. It also requires additional configuration:

Each cluster must be assigned a unique human-readable name as well as a numeric cluster ID (1-255). It is best to assign both these attributes at installation time of Cilium:
Helm options cluster.name and cluster.id

@rbrtbnfgl
Copy link
Contributor

rbrtbnfgl commented Jun 28, 2024

Ok I found a way to configure it. It wasn't so easy to configure it through helm.

Create Cluster1:

RKE2 config:

write-kubeconfig-mode: 644
cluster-cidr: "10.42.0.0/16"
service-cidr: "10.43.0.0/16"
cni: "cilium"

Cilium value config:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: rke2-cilium
  namespace: kube-system
spec:
  valuesContent: |-
    externalWorkloads:
      enabled: true
    cluster:
      name: cluster1
      id: 1
    externalWorkloads:
      enabled: true
    clustermesh:
      useAPIServer: true
      config:
        enabled: true
        clusters:
        - name: cluster1
          ips:
          - <ip for the cluster one node>
          port: 32379

When the first Cluster starts configure the second cluster with some info from the first.
You need to get the clustermesh apiserver certificate with:
kubectl -n kube-system get secret clustermesh-apiserver-remote-cert -o yaml
Get ca.crt, tls.crt and tls.key from the output

Configure Cluster2:

RKE2 config:

write-kubeconfig-mode: 644
cluster-cidr: "10.44.0.0/16"
service-cidr: "10.45.0.0/16"
cni: "cilium"

Cilium Config:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: rke2-cilium
  namespace: kube-system
spec:
  valuesContent: |-
    cluster:
      name: cluster2
      id: 2
    externalWorkloads:
      enabled: true
    clustermesh:
      useAPIServer: true
      config:
        enabled: true
        clusters:
        - name: cluster2
          ips:
          - <Ip for cluster2>
          port: 32379
        - name: cluster1
          ips:
          - <ip for cluster1>
          port: 32379
          tls:
            cert: "The content of tls.crt from cluster1"
            key: "The content of tls.key from cluster1"
            caCert: "The content of ca.crt from cluster1"

When also the second cluster started get the same info that were previously taken from the first one with
kubectl -n kube-system get secret clustermesh-apiserver-remote-cert -o yaml
Get ca.crt, tls.crt and tls.key from the output

Edit Cluster1 config

Edit cilium config

Add the new info from the second cluster on the cilium config of the first cluster.
The new config should looks like this:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: rke2-cilium
  namespace: kube-system
spec:
  valuesContent: |-
    externalWorkloads:
      enabled: true
    cluster:
      name: cluster1
      id: 1
    externalWorkloads:
      enabled: true
    clustermesh:
      useAPIServer: true
      config:
        enabled: true
        clusters:
        - name: cluster1
          ips:
          - <ip for the cluster one node>
          port: 32379
        - name: cluster2
          ips:
          - <ip for cluster2>
          port: 32379
          tls:
            cert: "The content of tls.crt from cluster2"
            key: "The content of tls.key from cluster2"
            caCert: "The content of ca.crt from cluster2"

Restart RKE2

sudo service rke2-server restart

The new configuration should be updated I checked the status and it was fine

cilium clustermesh status
⚠  Service type NodePort detected! Service may fail when nodes are removed from the cluster!
✅ Service "clustermesh-apiserver" of type "NodePort" found
✅ Cluster access information is available:
  - 10.1.1.11:32379
✅ Deployment clustermesh-apiserver is ready
ℹ  KVStoreMesh is disabled

✅ All 1 nodes are connected to all clusters [min:1 / avg:1.0 / max:1]

🔌 Cluster Connections:
  - cluster2: 1/1 configured, 1/1 connected

🔀 Global services: [ min:0 / avg:0.0 / max:0 ]

I got this from the first cluster.

@brandond
Copy link
Member

That's great, we should add that to the docs @rbrtbnfgl !

@rbrtbnfgl
Copy link
Contributor

rbrtbnfgl commented Jun 28, 2024

Maybe there is a way to generate and add the certificate before so you don't need to restart the nodes but you can configure cilium with the already generated ones.

@BloodyIron
Copy link
Author

This isn't about clustermesh specifically, that was simply an example of a function that is exclusive to the cilium cli tool.

And to clarify, I did not mean clustermesh to be on by default, but that the cilium cli binary be present by default, and not a symlink to cilium-dbg as it is currently.

Sorry for any time cost you may have had on this @rbrtbnfgl but yeah I wasn't specifically talking about (only) clustermesh, but the cilium cli tool being present so that it can be used. Namely in scenarios such as the cilium official troubleshooting documentation.

The reason I engaged rke2 on this topic first is it seemed plausible to me (as an outsider) that the way rke2 is implementing cilium "made" this change (not having cilium cli application) since it would be silly for the Cilium team themselves to do that (since it would break a good chunk of the troubleshooting documentation, as we're seeing).

But if there's no way from an rke2 "perspective" to get the cilium cli app existing (as in not a symlink) well I can appeal to the Cilium people (or maybe rke2 team could?). But... are all options exhausted?

There's still plenty for me to learn when it comes to k8s ;)

@rbrtbnfgl
Copy link
Contributor

We don't modify anything on the Cilium image. I think that the image has the Cilium-dbg by design. If you don't want to install anything on the node you could use the Cilium cli by any client it needs only the credentials to the cluster like kubectl.

Copy link
Contributor

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 45 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants