Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Container Stuck in ContainerCreating | no runtime for "spin" is configured #209

Closed
megamxl opened this issue Apr 21, 2024 · 6 comments
Closed

Comments

@megamxl
Copy link

megamxl commented Apr 21, 2024

This issue is already resolved, but I don't know where to share this information to fix it. I want to share this here that other stuck at this can get help when facing this issue.

kubectl describe pods

The failure was: Unknown desc = failed to get sandbox runtime: no runtime for "spin" is configured

Events:
  Type     Reason                  Age                  From               Message
  ----     ------                  ----                 ----               -------
  Normal   Scheduled               4m38s                default-scheduler  Successfully assigned default/simple-spinapp-56687588d9-w9h9k to node
  Warning  FailedCreatePodSandBox  3s (x23 over 4m38s)  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox runtime: no runtime for "spin" is configured

I fixed it with a comment from the Ferymon Discord server, which isn't' that easy to get crawled by a serch engine.
The fix is from @stevesloka on Discord.

edit this file: /var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl on each host

add this

{{ template "base" . }}

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.spin]
runtime_type = "io.containerd.spin.v1"

https://docs.k3s.io/advanced?_highlight=config.toml.tmp#configuring-containerd

I rebooted, than all nodes and then i was finally able to deploy apps.

@megamxl
Copy link
Author

megamxl commented Apr 21, 2024

Update It works fine for applications without autoscaling. When I apply the hpa example, I still have the same problem. Does anyone know why this is happening.

@endocrimes
Copy link
Contributor

@kate-goldenring had issues scaling on k3s that looked like they were tied to concurrent pull limits in the kubelet when scaling up by more than a couple of replicas at a time.

I haven't ever dug in further to see if they were easily fixable (or specifically related to SpinKube) though.

If you could attach kubelet logs they would be super helpful for seeing if it's the same issue and further triage. Thanks 🎉

@megamxl
Copy link
Author

megamxl commented Apr 22, 2024

I have done some further test on a new cluster and documented exactly the way i got normal spin apps without scling working

  1. Setting up the k3s cluster
  2. Installing all required dependencies from the installation with helm section [https://www.spinkube.dev/docs/spin-operator/installation/installing-with-helm/]
  3. Then I checked my /var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl and deleted all spin settings. Afterwards, I added only this. to the end of the file.
.... standard k3s container.d configuration...

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.spin]
    runtime_type = "/opt/kwasm/bin/containerd-shim-spin-v2"

  1. then i restarted the k3s service
    $ sudo systemctl restart k3s [https://docs.k3s.io/upgrades/killall]

Afterward, I was able to run the example.

The next comment will describe the scaling part

@megamxl
Copy link
Author

megamxl commented Apr 22, 2024

when i manually check the pods with Kubectl, i get a values

$kubectl top pod

NAME                              CPU(cores)   MEMORY(bytes)
hpa-spinapp-d75d89476-w2d9m       1m           19Mi
simple-spinapp-56687588d9-r46kd   1m           13Mi

When I do kubectl get hpa i also get values.

NAME                 REFERENCE                TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
spinapp-autoscaler   Deployment/hpa-spinapp   1%/50%    1         10        1          101s

After tying the ingress i noticed that it was unable to sclae and when I removed and re added the hpa i got

kubectl get  hpa
NAME                 REFERENCE                TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
spinapp-autoscaler   Deployment/hpa-spinapp   <unknown>/50%   1         10        1          28s

kubectl describe hpa

Conditions:
  Type           Status  Reason                   Message
  ----           ------  ------                   -------
  AbleToScale    True    SucceededGetScale        the HPA controller was able to get the target's current scale
  ScalingActive  False   FailedGetResourceMetric  the HPA was unable to compute the replica count: failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
Events:
  Type     Reason                        Age   From                       Message
  ----     ------                        ----  ----                       -------
  Warning  FailedGetResourceMetric       2s    horizontal-pod-autoscaler  failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
  Warning  FailedComputeMetricsReplicas  2s    horizontal-pod-autos

I hope you can work with this info

When I tried 5 concurrent invocations, the result was CPU%1 all the time. I also tried to change the interval of the metric server but this also didn't fix anything

I can confirm that the communication between the metric server and spin apps does not work. because when i set the pods under loads nothing happens. and it stays at 1%

@kate-goldenring
Copy link

kate-goldenring commented Apr 22, 2024

@megamxl as @endocrimes mentioned I had some issues with k3s. One was the one you described. The shim wasn't being used until i restarted the k3s service (which contains containerd); however, a change was added to the node installer to do systemctl k3s restart: https://github.com/spinkube/containerd-shim-spin/pull/43/files#diff-716e6e24b7a494f27721cbbd94d70fba41081dcfefbd8a5ca81eea88a7b3de17R49. These changes should be in the latest node installer version (v0.13.1).

The other issue i was experiencing was Pods restarting due to ErrImagePull and ImagePullBackoff due to exceeding max pulls on K3s when using a latest tagged image. Switching to a versioned tag resolved that issue as latest changes the pull policy to Always and I was scaling to 50 replicas.

For HPA, I believe i also had delays on calculating CPU usage with it showing unknown for a minute and rarely could get it to scale above 2%. I assumed this was because i was and not incurring enough load, but it could be an issue with the metrics server that comes with K3s

@endocrimes
Copy link
Contributor

closing this as it's a cluster configuration issue, more than an operator issue that we can act on - but definitely something to keep in mind when using k3s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants