Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems initializing SRLinux #494

Open
yennym3 opened this issue Feb 9, 2024 · 9 comments
Open

Problems initializing SRLinux #494

yennym3 opened this issue Feb 9, 2024 · 9 comments

Comments

@yennym3
Copy link

yennym3 commented Feb 9, 2024

Hi,
I'm deploy the topology 2node-srl-ixr6-with-oc-services.pbtxt, but the containers are unable to stay 'ready' and 'running' as they keep restarting constantly.

Deploying topology:

kne create 2node-srl-ixr6-with-oc-services.pbtxt
I0209 10:00:54.369740 3846339 root.go:119] /home/mw/kne/examples/nokia/srlinux-services
I0209 10:00:54.371543 3846339 topo.go:117] Trying in-cluster configuration
I0209 10:00:54.371573 3846339 topo.go:120] Falling back to kubeconfig: "/home/mw/.kube/config"
I0209 10:00:54.374046 3846339 topo.go:253] Adding Link: srl1:e1-1 srl2:e1-1
I0209 10:00:54.374077 3846339 topo.go:291] Adding Node: srl1:NOKIA
I0209 10:00:54.424631 3846339 topo.go:291] Adding Node: srl2:NOKIA
I0209 10:00:54.459290 3846339 topo.go:358] Creating namespace for topology: "2-srl-ixr6"
I0209 10:00:54.484813 3846339 topo.go:368] Server Namespace: &Namespace{ObjectMeta:{2-srl-ixr6    4b34dc30-d2b2-4340-a901-8967fb08c69e 82945402 0 2024-02-09 10:00:54 +0000 UTC <nil> <nil> map[kubernetes.io/metadata.name:2-srl-ixr6] map[] [] [] [{kne Update v1 2024-02-09 10:00:54 +0000 UTC FieldsV1 {"f:metadata":{"f:labels":{".":{},"f:kubernetes.io/metadata.name":{}}}} }]},Spec:NamespaceSpec{Finalizers:[kubernetes],},Status:NamespaceStatus{Phase:Active,Conditions:[]NamespaceCondition{},},}
I0209 10:00:54.485491 3846339 topo.go:395] Getting topology specs for namespace 2-srl-ixr6
I0209 10:00:54.485510 3846339 topo.go:324] Getting topology specs for node srl1
I0209 10:00:54.485574 3846339 topo.go:324] Getting topology specs for node srl2
I0209 10:00:54.485610 3846339 topo.go:402] Creating topology for meshnet node srl1
I0209 10:00:54.507333 3846339 topo.go:402] Creating topology for meshnet node srl2
I0209 10:00:54.522376 3846339 topo.go:375] Creating Node Pods
I0209 10:00:54.522726 3846339 nokia.go:201] Creating Srlinux node resource srl1
I0209 10:00:54.537059 3846339 nokia.go:206] Created SR Linux node srl1 configmap
I0209 10:00:54.631596 3846339 nokia.go:265] Created Srlinux resource: srl1
I0209 10:00:54.764968 3846339 topo.go:380] Node "srl1" resource created
I0209 10:00:54.765040 3846339 nokia.go:201] Creating Srlinux node resource srl2
I0209 10:00:54.780052 3846339 nokia.go:206] Created SR Linux node srl2 configmap
I0209 10:00:54.910542 3846339 nokia.go:265] Created Srlinux resource: srl2
I0209 10:00:55.028768 3846339 topo.go:380] Node "srl2" resource created
I0209 10:04:15.460792 3846339 topo.go:448] Node "srl1": Status RUNNING

Status of the pods:

k get pods -n 2-srl-ixr6 
NAME   READY   STATUS    RESTARTS     AGE
srl1   0/1     Running   1 (9s ago)   13s
srl2   0/1     Running   1 (9s ago)   13s

k get pods -n 2-srl-ixr6 
NAME   READY   STATUS                  RESTARTS     AGE
srl1   0/1     Init:CrashLoopBackOff   1 (8s ago)   16s
srl2   0/1     Init:CrashLoopBackOff   1 (8s ago)   16s

 k get pods -n 2-srl-ixr6 
NAME   READY   STATUS   RESTARTS   AGE
srl1   0/1     Error    2          32s
srl2   0/1     Error    2          32s

Events for the container srl1:

Events:
 Type     Reason          Age                   From               Message
 ----     ------          ----                  ----               -------
 Normal   Scheduled       6m9s                  default-scheduler  Successfully assigned 2-srl-ixr6/srl1 to k8worker4
 Normal   Killing         5m59s (x2 over 6m5s)  kubelet            Stopping container srl1
 Warning  BackOff         5m56s                 kubelet            Back-off restarting failed container init-srl1 in pod srl1_2-srl-ixr6(600952b1-695d-44c3-95a0-a68ba2f9be5a)
 Normal   SandboxChanged  5m55s (x3 over 6m5s)  kubelet            Pod sandbox changed, it will be killed and re-created.
 Normal   Pulled          5m50s (x3 over 6m8s)  kubelet            Container image "ghcr.io/srl-labs/init-wait:latest" already present on machine
 Normal   Created         5m50s (x3 over 6m8s)  kubelet            Created container init-srl1
 Normal   Started         5m50s (x3 over 6m7s)  kubelet            Started container init-srl1
 Warning  BackOff         5m50s                 kubelet            Back-off restarting failed container srl1 in pod srl1_2-srl-ixr6(600952b1-695d-44c3-95a0-a68ba2f9be5a)
 Normal   Pulled          5m49s (x3 over 6m6s)  kubelet            Container image "ghcr.io/nokia/srlinux" already present on machine
 Normal   Created         5m48s (x3 over 6m6s)  kubelet            Created container srl1
 Normal   Started         5m48s (x3 over 6m6s)  kubelet            Started container srl1
@LimeHat
Copy link

LimeHat commented Feb 10, 2024

You need to have a license for srlinux

@yennym3
Copy link
Author

yennym3 commented Feb 12, 2024

You need to have a license for srlinux

In this documentation https://learn.srlinux.dev/tutorials/infrastructure/kne/installation/#license it mentions that it is possible to use SRLinux without a license by removing certain fields, which I have tried but the error I mentioned above still occurs.

@hellt
Copy link
Contributor

hellt commented Feb 12, 2024

Without sharing the exact topology you try to start it is not possible to answer any questions

@yennym3
Copy link
Author

yennym3 commented Feb 12, 2024

Without sharing the exact topology you try to start it is not possible to answer any questions

The topology I am testing is exactly the same as the one provided in the example repository, https://github.com/openconfig/kne/blob/main/examples/nokia/srlinux-services/2node-srl-ixr6-with-oc-services.pbtxt

@hellt
Copy link
Contributor

hellt commented Feb 12, 2024

it can't be the same, since you should have removed the ixr6e model from the topology and openconfig models from the config

@yennym3
Copy link
Author

yennym3 commented Feb 12, 2024

it can't be the same, since you should have removed the ixr6e model from the topology and openconfig models from the config

I'm sorry for any confusion. I meant to say that the topology I'm using is based on the example from the repository. I've tested it in both configurations with and without, the 'ixr6e' model and OpenConfig models from the configuration. However, I have had the same result in both cases.

@LimeHat
Copy link

LimeHat commented Feb 13, 2024

You need to investigate pod logs to understand the reason; most likely, you need more changes than a simple removal of the model & openconfig container. There are a few other things in the cfg that are not supported on other platforms.

Starting with the default config is your best bet, reusing the configs from ixr6/10 examples on other platforms is unlikely to give you good results.

@yennym3
Copy link
Author

yennym3 commented Mar 14, 2024

Hi,

Currently I persist the error that I have commented on the restart of the pods, looking at the documentation https://learn.srlinux.dev/tutorials/infrastructure/kne/installation/#__tabbed_2_1 in the tutorial indicates that it was used as a test k8s cluster kind, the problem of restarting the pods occurs when I deploy the pods on an external cluster that was created with kubeadm and not kin

I have observed in the srlinus-controller logs when creating the pods the following errors.

1.7104141231090307e+09  INFO    updating srlinux status {"controller": "srlinux", "controllerGroup": "kne.srlinux.dev", "controllerKind": "Srlinux", "Srlinux": {"name":"srl1","namespace":"2srl-prueba-2"}, "namespace": "2srl-prueba-2", "name": "srl1", "reconcileID": "f0b1efe1-1c56-44a9-a205-6dd38b58f561", "srlinux-status": {"status":"Pending","image":"ghcr.io/nokia/srlinux:latest","startup-config":{}}}
1.7104141231321757e+09  **ERROR**   failed to update Srlinux status {"controller": "srlinux", "controllerGroup": "kne.srlinux.dev", "controllerKind": "Srlinux", "Srlinux": {"name":"srl1","namespace":"2srl-prueba-2"}, "namespace": "2srl-prueba-2", "name": "srl1", "reconcileID": "f0b1efe1-1c56-44a9-a205-6dd38b58f561", "error": "Operation cannot be fulfilled on srlinuxes.kne.srlinux.dev \"srl1\": the object has been modified; please apply your changes to the latest version and try again"}
github.com/srl-labs/srl-controller/controllers.(*SrlinuxReconciler).updateSrlinuxStatus
        /workspace/controllers/srlinux_controller.go:265
github.com/srl-labs/srl-controller/controllers.(*SrlinuxReconciler).Reconcile
        /workspace/controllers/srlinux_controller.go:123
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.1/pkg/internal/controller/controller.go:121
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.1/pkg/internal/controller/controller.go:320
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.1/pkg/internal/controller/controller.go:273
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.1/pkg/internal/controller/controller.go:234
1.7104141231323476e+09  **ERROR**   Reconciler error        {"controller": "srlinux", "controllerGroup": "kne.srlinux.dev", "controllerKind": "Srlinux", "Srlinux": {"name":"srl1","namespace":"2srl-prueba-2"}, "namespace": "2srl-prueba-2", "name": "srl1", "reconcileID": "f0b1efe1-1c56-44a9-a205-6dd38b58f561", "error": "Operation cannot be fulfilled on srlinuxes.kne.srlinux.dev \"srl1\": the object has been modified; please apply your changes to the latest version and try again"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.1/pkg/internal/controller/controller.go:326
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.1/pkg/internal/controller/controller.go:273
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.1/pkg/internal/controller/controller.go:234

I have tested to deploy the srlinux on kind cluster and this problem does not happen.

Has anyone had this same problem when not using kind as a cluster and would know how to solve it?

@hellt
Copy link
Contributor

hellt commented Mar 14, 2024

this error on its own doesn't lead to any issues. The reconciliation should still happen.
If you see your pods not coming up, then something else prevents it, not the reconciliation error.
I saw this error in my clusters, but it is transient and goes away

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants