-
Notifications
You must be signed in to change notification settings - Fork 376
Kata Pods with more than 5-6 containers fail to fully start under CRI-O when using runtime_type=vm #2795
Comments
Evan, I'll try to reproduce your issue and at some point Tomorrow. I'd just like to mention that:
|
I upgraded Kata to 1.11.2. I'm still getting the same failure, but I did get some extra information out of the logs:
|
@evanfoster Could you try with small chunks of memory? the next yaml works for me and it's based on your example but with small chunks of memory EDIT: set apiVersion: apps/v1
kind: Deployment
metadata:
name: qemu-guest-empty-dir
spec:
selector:
matchLabels:
app: qemu-guest-empty-dir
replicas: 1
template:
metadata:
labels:
app: qemu-guest-empty-dir
spec:
containers:
- image: alpine
name: qemu-0
command: ["sh"]
stdin: true
volumeMounts:
- name: test-volume
mountPath: /test-volume
tty: true
resources:
requests:
memory: "128Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
- image: alpine
name: qemu-1
command: ["sh"]
stdin: true
volumeMounts:
- name: test-volume
mountPath: /test-volume
tty: true
resources:
requests:
memory: "128Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
- image: alpine
name: qemu-2
command: ["sh"]
stdin: true
volumeMounts:
- name: test-volume
mountPath: /test-volume
tty: true
resources:
requests:
memory: "128Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
- image: alpine
name: qemu-3
command: ["sh"]
stdin: true
volumeMounts:
- name: test-volume
mountPath: /test-volume
tty: true
resources:
requests:
memory: "128Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
- image: alpine
name: qemu-4
command: ["sh"]
stdin: true
volumeMounts:
- name: test-volume
mountPath: /test-volume
tty: true
resources:
requests:
memory: "128Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
- image: alpine
name: qemu-5
command: ["sh"]
stdin: true
volumeMounts:
- name: test-volume
mountPath: /test-volume
tty: true
resources:
requests:
memory: "128Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
- image: alpine
name: qemu-6
command: ["sh"]
stdin: true
volumeMounts:
- name: test-volume
mountPath: /test-volume
tty: true
resources:
requests:
memory: "128Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
- image: alpine
name: qemu-7
command: ["sh"]
stdin: true
volumeMounts:
- name: test-volume
mountPath: /test-volume
tty: true
resources:
requests:
memory: "128Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
- image: alpine
name: qemu-8
command: ["sh"]
stdin: true
volumeMounts:
- name: test-volume
mountPath: /test-volume
tty: true
resources:
requests:
memory: "128Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
- image: alpine
name: qemu-9
command: ["sh"]
stdin: true
volumeMounts:
- name: test-volume
mountPath: /test-volume
tty: true
resources:
requests:
memory: "128Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
- image: alpine
name: qemu-10
command: ["sh"]
stdin: true
volumeMounts:
- name: test-volume
mountPath: /test-volume
tty: true
resources:
requests:
memory: "128Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
runtimeClassName: kata
restartPolicy: Always
volumes:
- name: test-volume
emptyDir: {} |
No joy, unfortunately. I set
|
I realized I wasn't capturing debug logs in my original post. This gist has the output of |
@evanfoster thanks, that's a good log!, could you resize
|
@evanfoster make sure |
No joy again, unfortunately.
Here are the logs from this attempt: https://gist.github.com/evanfoster/12a3c8c4fe41068a50fc764c6816df84 |
'a used vhost backend has no free memory slots left' |
@evanfoster Curious if you were able to try this out with containerd. |
@amshinde I haven't done so yet. The nodes I'm testing on have containerd, but all of the plumbing is set up to use CRI-O. I'll see if I can give that a whirl, but it may take me a bit. @dagrh I'm guessing I'd also want EDIT: Oops, I might have a hard time trying out EDIT: Wait a moment, I can just run using 9p instead of virtio-fs. I'll give that a whirl tomorrow. |
@evanfoster Just gave this is a shot with containerd (reducing cpu limits due to a smaller machine). I got the the error you were seeing initially - not enough memory slots. I then increased |
@evanfoster I confirm I can see the |
@dagrh @fidencio I tried building qemu from master, but I am seeing errors @fidencio This ties into my previous request for moving to qemu 5.0 for virtiofs and having kata-runtime work with the upstream virtiofsd. |
Is there any way I can be of assistance? I'm happy to help out however I can. |
@evanfoster We need to port qemu to 5.0 with any upstream changes required. |
I can trigger the same error on qemu 4.2 (without kata) by adding: and it gives me the error: and that's working on current head qemu; so it does look like that got fixed. |
I've just created a branch based off qemu 5.0 but with the extra slot stuff and our dax code; please try and let me know if that fixes it. |
It works! I'm seeing a pod with 11 containers fully start. |
Excellent! - @amshinde what do you want us to do here, do you want to hold this open until kata updates the qemu version? |
Thanks @dagrh for creating the branch. Yes, lets keep this open, until we move to the branch. I'll close this once move to that branch is complete. |
The qemu branch with the fix has been merged in kata. I have verified that the pod now works with more than 10 containers with virtio-fs. I am going to close this. With this fix is not yet implemented for clh, I am going to open a separate issue to track it. cc @dagrh @jcvenegas |
Description of problem
When deploying a pod with a large number of containers (example repro case here), several containers will fail to come up (in k8s, the state is
CreateContainerError
). This only happens whenruntime_type
is set tovm
(which we need to improve virtio-fs performance). When these containers fail, we see the following message in CRI-O's logs:Here's a non-exhaustive list of the options I've played with to try to address this:
hypervisor.qemu.memory_slots
from 50 to 100hypervisor.qemu.default_memory
virtio-9p
instead ofvirtio-fs
devicemapper
tovfs
hypervisor.qemu.enable_mem_prealloc
totrue
vhost
kernel module withmax_mem_regions=512
Unfortunately, none of this had any effect. The only way I can get all containers in a large pod to run is to set
runtime_type
tooci
in CRI-O.Show kata-collect-data.sh details
Meta details
Running
kata-collect-data.sh
version1.10.0-adobe (commit 2d2b8878c511d6f5ba65a6152c577608ee27e45d-dirty)
at2020-06-25.17:00:58.766166856+0000
.Runtime is
/opt/kata/bin/kata-runtime
.kata-env
Output of "
/opt/kata/bin/kata-runtime kata-env
":Runtime config files
Runtime default config files
Runtime config file contents
Output of "
cat "/etc/kata-containers/configuration.toml"
":Output of "
cat "/opt/kata/share/defaults/kata-containers/configuration.toml"
":Removed, since it's not being used.
KSM throttler
version
Output of "
--version
":systemd service
Image details
Initrd details
No initrd
Logfiles
Runtime logs
Recent runtime problems found in system journal:
Proxy logs
No recent proxy problems found in system journal.
Shim logs
Recent shim problems found in system journal:
Throttler logs
No recent throttler problems found in system journal.
Container manager details
Have
docker
, but it's not in use. Removing this section.Have
kubectl
Kubernetes
Output of "
kubectl version
":Output of "
kubectl config view
":Output of "
systemctl show kubelet
":Have
crio
crio
Output of "
crio --version
":Output of "
systemctl show crio
":Output of "
cat /etc/crio/crio.conf
":Have
containerd
, but it's not being used. Removing this section.Packages
No
dpkg
No
rpm
Additional details
When
runtime_type
is set tovm
, pods don't appear to clean up correctly. I had initially thought this was due todevmapper
cleanup failures, but I see this same issue when we're usingvfs
as our storage driver. Deleted pods look like this:The text was updated successfully, but these errors were encountered: