Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qemu errors when virtiofsd is killed (race condition) #6757

Closed
beraldoleal opened this issue Apr 28, 2023 · 3 comments · Fixed by #6959
Closed

Qemu errors when virtiofsd is killed (race condition) #6757

beraldoleal opened this issue Apr 28, 2023 · 3 comments · Fixed by #6959
Labels
bug Incorrect behaviour needs-review Needs to be assessed by the team.

Comments

@beraldoleal
Copy link
Member

Description of problem

Occasionally, when halting a pod, systemctl displays multiple error messages originating from QEMU. Although these errors are legitimate, they clutter the log files and may create a negative impression for users.

Expected result

Upon stopping a pod, the systemctl should display only relevant and essential messages without cluttering the log files. Ideally, the QEMU-related error messages should be filtered, minimized, or we should investigate the root cause (maybe give more time to virtiofsd stop). This will result in a cleaner and more user-friendly log presentation, thereby enhancing the user experience.

Actual result

level=error msg="Failed to read guest console logs" console-protocol=unix console-url=/run/vc/vm/058f54c49d389618b0db45a838d9729f19b0681b055eb25fef160736c8928514/console.sock error="read unix @->/run/vc/vm/058f54c49d389618b0db45a838d9729f19b0681b055eb25fef160736c8928514/console.sock: use of closed network connection" name=containerd-shim-v2 pid=181775 sandbox=058f54c49d389618b0db45a838d9729f19b0681b055eb25fef160736c8928514 source=virtcontainers subsystem=sandbox
level=error msg="qemu-kvm: Failed to write msg. Wrote -1 instead of 20." name=containerd-shim-v2 pid=181775 qemuPid=181784 sandbox=058f54c49d389618b0db45a838d9729f19b0681b055eb25fef160736c8928514 source=virtcontainers/hypervisor subsystem=qemu
level=error msg="qemu-kvm: Failed to set msg fds." name=containerd-shim-v2 pid=181775 qemuPid=181784 sandbox=058f54c49d389618b0db45a838d9729f19b0681b055eb25fef160736c8928514 source=virtcontainers/hypervisor subsystem=qemu
level=error msg="qemu-kvm: vhost VQ 0 ring restore failed: -22: Invalid argument (22)" name=containerd-shim-v2 pid=181775 qemuPid=181784 sandbox=058f54c49d389618b0db45a838d9729f19b0681b055eb25fef160736c8928514 source=virtcontainers/hypervisor subsystem=qemu
level=error msg="qemu-kvm: Failed to set msg fds." name=containerd-shim-v2 pid=181775 qemuPid=181784 sandbox=058f54c49d389618b0db45a838d9729f19b0681b055eb25fef160736c8928514 source=virtcontainers/hypervisor subsystem=qemu
level=error msg="qemu-kvm: vhost VQ 1 ring restore failed: -22: Invalid argument (22)" name=containerd-shim-v2 pid=181775 qemuPid=181784 sandbox=058f54c49d389618b0db45a838d9729f19b0681b055eb25fef160736c8928514 source=virtcontainers/hypervisor subsystem=qemu
evel=error msg="qemu-kvm: Failed to set msg fds." name=containerd-shim-v2 pid=181775 qemuPid=181784 sandbox=058f54c49d389618b0db45a838d9729f19b0681b055eb25fef160736c8928514 source=virtcontainers/hypervisor subsystem=qemu
level=error msg="qemu-kvm: vhost_set_vring_call failed 22" name=containerd-shim-v2 pid=181775 qemuPid=181784 sandbox=058f54c49d389618b0db45a838d9729f19b0681b055eb25fef160736c8928514 source=virtcontainers/hypervisor subsystem=qemu
level=error msg="qemu-kvm: Failed to set msg fds." name=containerd-shim-v2 pid=181775 qemuPid=181784 sandbox=058f54c49d389618b0db45a838d9729f19b0681b055eb25fef160736c8928514 source=virtcontainers/hypervisor subsystem=qemu
level=error msg="qemu-kvm: vhost_set_vring_call failed 22" name=containerd-shim-v2 pid=181775 qemuPid=181784 sandbox=058f54c49d389618b0db45a838d9729f19b0681b055eb25fef160736c8928514 source=virtcontainers/hypervisor subsystem=qemu

Further information

From @gkurz:

"The QEMU errors are the consequence of QEMU detecting that virtiofsd is gone, which is legitimate."

An option could be to send a SIGTERM instead of a SIGKILL to virtiofsd, the same way nydusd is doing.

Another option could be to simply turn QEMU logging off before terminating virtiofsd.

@beraldoleal beraldoleal added bug Incorrect behaviour needs-review Needs to be assessed by the team. labels Apr 28, 2023
@beraldoleal
Copy link
Member Author

Show kata-collect-data.sh details

Meta details

Running kata-collect-data.sh version 3.2.0-alpha0 (commit 8653b3acd5da506bbf13b8fadb5eb0d06caaf2d2) at 2023-04-28.12:04:40.762413289-0400.


Runtime

Runtime is /usr/local/bin/kata-runtime.

kata-env

/usr/local/bin/kata-runtime kata-env

[Kernel]
  Path = "/tmp/opt/kata/share/kata-containers/vmlinux-5.19.2-100"
  Parameters = "scsi_mod.scan=none agent.log=debug"

[Meta]
  Version = "1.0.26"

[Image]
  Path = ""

[Initrd]
  Path = "/usr/share/kata-containers/kata-containers-initrd-2023-04-21-17:09:16.716634643-0400-97291d88e"

[Hypervisor]
  MachineType = "q35"
  Version = "QEMU emulator version 7.2.0 (qemu-7.2.0-6.fc38)\nCopyright (c) 2003-2022 Fabrice Bellard and the QEMU Project developers"
  Path = "/usr/bin/qemu-system-x86_64"
  BlockDeviceDriver = "virtio-scsi"
  EntropySource = "/dev/urandom"
  SharedFS = "virtio-fs"
  VirtioFSDaemon = "/usr/libexec/virtiofsd"
  SocketPath = ""
  Msize9p = 8192
  MemorySlots = 10
  PCIeRootPort = 0
  HotplugVFIOOnRootBus = false
  Debug = true

[Runtime]
  Path = "/usr/local/bin/kata-runtime"
  GuestSeLinuxLabel = ""
  Debug = true
  Trace = false
  DisableGuestSeccomp = true
  DisableNewNetNs = false
  SandboxCgroupOnly = false
  [Runtime.Config]
    Path = "/etc/kata-containers/configuration.toml"
  [Runtime.Version]
    OCI = "1.0.2-dev"
    [Runtime.Version.Version]
      Semver = "3.2.0-alpha0"
      Commit = "8653b3acd5da506bbf13b8fadb5eb0d06caaf2d2"
      Major = 3
      Minor = 2
      Patch = 0

[Host]
  AvailableGuestProtections = ["sev"]
  Kernel = "6.2.11-300.fc38.x86_64"
  Architecture = "amd64"
  VMContainerCapable = true
  SupportVSocks = true
  [Host.Distro]
    Name = "Fedora Linux"
    Version = "38"
  [Host.CPU]
    Vendor = "AuthenticAMD"
    Model = "AMD EPYC 7302P 16-Core Processor"
    CPUs = 32
  [Host.Memory]
    Total = 131478820
    Free = 116603172
    Available = 127897508

[Agent]
  Debug = true
  Trace = false


Runtime config files

Runtime config files

Runtime default config files

/etc/kata-containers/configuration.toml
/usr/share/defaults/kata-containers/configuration.toml

Runtime config file contents

cat "/etc/kata-containers/configuration.toml"

# Copyright (c) 2017-2019 Intel Corporation
# Copyright (c) 2021 Adobe Inc.
#
# SPDX-License-Identifier: Apache-2.0
#

# XXX: WARNING: this file is auto-generated.
# XXX:
# XXX: Source file: "config/configuration-qemu.toml.in"
# XXX: Project:
# XXX:   Name: Kata Containers
# XXX:   Type: kata

[hypervisor.qemu]
path = "/usr/bin/qemu-system-x86_64"
#kernel = "/usr/share/kata-containers/vmlinux.container"
kernel = "/tmp/opt/kata/share/kata-containers/vmlinux.container"
#image = "/usr/share/kata-containers/kata-containers.img"
initrd = "/usr/share/kata-containers/kata-containers-initrd.img"
machine_type = "q35"

# rootfs filesystem type:
#   - ext4 (default)
#   - xfs
#   - erofs
rootfs_type="ext4"

# Enable confidential guest support.
# Toggling that setting may trigger different hardware features, ranging
# from memory encryption to both memory and CPU-state encryption and integrity.
# The Kata Containers runtime dynamically detects the available feature set and
# aims at enabling the largest possible one, returning an error if none is
# available, or none is supported by the hypervisor.
#
# Known limitations:
# * Does not work by design:
#   - CPU Hotplug 
#   - Memory Hotplug
#   - NVDIMM devices
#
# Default false
# confidential_guest = true

# Choose AMD SEV-SNP confidential guests
# In case of using confidential guests on AMD hardware that supports both SEV
# and SEV-SNP, the following enables SEV-SNP guests. SEV guests are default.
# Default false
# sev_snp_guest = true

# Enable running QEMU VMM as a non-root user.
# By default QEMU VMM run as root. When this is set to true, QEMU VMM process runs as
# a non-root random user. See documentation for the limitations of this mode.
# rootless = true

# List of valid annotation names for the hypervisor
# Each member of the list is a regular expression, which is the base name
# of the annotation, e.g. "path" for io.katacontainers.config.hypervisor.path"
enable_annotations = ["enable_iommu"]

# List of valid annotations values for the hypervisor
# Each member of the list is a path pattern as described by glob(3).
# The default if not set is empty (all annotations rejected.)
# Your distribution recommends: ["/usr/bin/qemu-system-x86_64"]
valid_hypervisor_paths = ["/usr/bin/qemu-system-x86_64"]

# Optional space-separated list of options to pass to the guest kernel.
# For example, use `kernel_params = "vsyscall=emulate"` if you are having
# trouble running pre-2.15 glibc.
#
# WARNING: - any parameter specified here will take priority over the default
# parameter value of the same name used to start the virtual machine.
# Do not set values here unless you understand the impact of doing so as you
# may stop the virtual machine from booting.
# To see the list of default parameters, enable hypervisor debug, create a
# container and look for 'default-kernel-parameters' log entries.
kernel_params = ""

# Path to the firmware.
# If you want that qemu uses the default firmware leave this option empty
firmware = ""

# Path to the firmware volume.
# firmware TDVF or OVMF can be split into FIRMWARE_VARS.fd (UEFI variables
# as configuration) and FIRMWARE_CODE.fd (UEFI program image). UEFI variables
# can be customized per each user while UEFI code is kept same.
firmware_volume = ""

# Machine accelerators
# comma-separated list of machine accelerators to pass to the hypervisor.
# For example, `machine_accelerators = "nosmm,nosmbus,nosata,nopit,static-prt,nofw"`
machine_accelerators=""

# Qemu seccomp sandbox feature
# comma-separated list of seccomp sandbox features to control the syscall access.
# For example, `seccompsandbox= "on,obsolete=deny,spawn=deny,resourcecontrol=deny"`
# Note: "elevateprivileges=deny" doesn't work with daemonize option, so it's removed from the seccomp sandbox
# Another note: enabling this feature may reduce performance, you may enable
# /proc/sys/net/core/bpf_jit_enable to reduce the impact. see https://man7.org/linux/man-pages/man8/bpfc.8.html
#seccompsandbox="on,obsolete=deny,spawn=deny,resourcecontrol=deny"

# CPU features
# comma-separated list of cpu features to pass to the cpu
# For example, `cpu_features = "pmu=off,vmx=off"
cpu_features="pmu=off"

# Default number of vCPUs per SB/VM:
# unspecified or 0                --> will be set to 1
# < 0                             --> will be set to the actual number of physical cores
# > 0 <= number of physical cores --> will be set to the specified number
# > number of physical cores      --> will be set to the actual number of physical cores
default_vcpus = 1

# Default maximum number of vCPUs per SB/VM:
# unspecified or == 0             --> will be set to the actual number of physical cores or to the maximum number
#                                     of vCPUs supported by KVM if that number is exceeded
# > 0 <= number of physical cores --> will be set to the specified number
# > number of physical cores      --> will be set to the actual number of physical cores or to the maximum number
#                                     of vCPUs supported by KVM if that number is exceeded
# WARNING: Depending of the architecture, the maximum number of vCPUs supported by KVM is used when
# the actual number of physical cores is greater than it.
# WARNING: Be aware that this value impacts the virtual machine's memory footprint and CPU
# the hotplug functionality. For example, `default_maxvcpus = 240` specifies that until 240 vCPUs
# can be added to a SB/VM, but the memory footprint will be big. Another example, with
# `default_maxvcpus = 8` the memory footprint will be small, but 8 will be the maximum number of
# vCPUs supported by the SB/VM. In general, we recommend that you do not edit this variable,
# unless you know what are you doing.
# NOTICE: on arm platform with gicv2 interrupt controller, set it to 8.
default_maxvcpus = 0

# Bridges can be used to hot plug devices.
# Limitations:
# * Currently only pci bridges are supported
# * Until 30 devices per bridge can be hot plugged.
# * Until 5 PCI bridges can be cold plugged per VM.
#   This limitation could be a bug in qemu or in the kernel
# Default number of bridges per SB/VM:
# unspecified or 0   --> will be set to 1
# > 1 <= 5           --> will be set to the specified number
# > 5                --> will be set to 5
default_bridges = 1

# Default memory size in MiB for SB/VM.
# If unspecified then it will be set 2048 MiB.
default_memory = 2048
#
# Default memory slots per SB/VM.
# If unspecified then it will be set 10.
# This is will determine the times that memory will be hotadded to sandbox/VM.
#memory_slots = 10

# Default maximum memory in MiB per SB / VM
# unspecified or == 0           --> will be set to the actual amount of physical RAM
# > 0 <= amount of physical RAM --> will be set to the specified number
# > amount of physical RAM      --> will be set to the actual amount of physical RAM
default_maxmemory = 0

# The size in MiB will be plused to max memory of hypervisor.
# It is the memory address space for the NVDIMM devie.
# If set block storage driver (block_device_driver) to "nvdimm",
# should set memory_offset to the size of block device.
# Default 0
#memory_offset = 0

# Specifies virtio-mem will be enabled or not.
# Please note that this option should be used with the command
# "echo 1 > /proc/sys/vm/overcommit_memory".
# Default false
#enable_virtio_mem = true

# Disable block device from being used for a container's rootfs.
# In case of a storage driver like devicemapper where a container's
# root file system is backed by a block device, the block device is passed
# directly to the hypervisor for performance reasons.
# This flag prevents the block device from being passed to the hypervisor,
# virtio-fs is used instead to pass the rootfs.
disable_block_device_use = false

# Shared file system type:
#   - virtio-fs (default)
#   - virtio-9p
#   - virtio-fs-nydus
shared_fs = "virtio-fs"

# Path to vhost-user-fs daemon.
virtio_fs_daemon = "/usr/libexec/virtiofsd"

# List of valid annotations values for the virtiofs daemon
# The default if not set is empty (all annotations rejected.)
# Your distribution recommends: ["/usr/libexec/virtiofsd"]
valid_virtio_fs_daemon_paths = ["/usr/libexec/virtiofsd"]

# Default size of DAX cache in MiB
virtio_fs_cache_size = 0

# Default size of virtqueues
virtio_fs_queue_size = 1024

# Extra args for virtiofsd daemon
#
# Format example:
#   ["-o", "arg1=xxx,arg2", "-o", "hello world", "--arg3=yyy"]
# Examples:
#   Set virtiofsd log level to debug : ["-o", "log_level=debug"] or ["-d"]
#
# see `virtiofsd -h` for possible options.
virtio_fs_extra_args = ["--thread-pool-size=1", "-o", "announce_submounts"]

# Cache mode:
#
#  - never
#    Metadata, data, and pathname lookup are not cached in guest. They are
#    always fetched from host and any changes are immediately pushed to host.
#
#  - auto
#    Metadata and pathname lookup cache expires after a configured amount of
#    time (default is 1 second). Data is cached while the file is open (close
#    to open consistency).
#
#  - always
#    Metadata, data, and pathname lookup are cached in guest and never expire.
virtio_fs_cache = "auto"

# Block storage driver to be used for the hypervisor in case the container
# rootfs is backed by a block device. This is virtio-scsi, virtio-blk
# or nvdimm.
block_device_driver = "virtio-scsi"

# aio is the I/O mechanism used by qemu
# Options:
#
#   - threads
#     Pthread based disk I/O.
#
#   - native
#     Native Linux I/O.
#
#   - io_uring
#     Linux io_uring API. This provides the fastest I/O operations on Linux, requires kernel>5.1 and
#     qemu >=5.0.
block_device_aio = "io_uring"

# Specifies cache-related options will be set to block devices or not.
# Default false
#block_device_cache_set = true

# Specifies cache-related options for block devices.
# Denotes whether use of O_DIRECT (bypass the host page cache) is enabled.
# Default false
#block_device_cache_direct = true

# Specifies cache-related options for block devices.
# Denotes whether flush requests for the device are ignored.
# Default false
#block_device_cache_noflush = true

# Enable iothreads (data-plane) to be used. This causes IO to be
# handled in a separate IO thread. This is currently only implemented
# for SCSI.
#
enable_iothreads = false

# Enable pre allocation of VM RAM, default false
# Enabling this will result in lower container density
# as all of the memory will be allocated and locked
# This is useful when you want to reserve all the memory
# upfront or in the cases where you want memory latencies
# to be very predictable
# Default false
#enable_mem_prealloc = true

# Enable huge pages for VM RAM, default false
# Enabling this will result in the VM memory
# being allocated using huge pages.
# This is useful when you want to use vhost-user network
# stacks within the container. This will automatically
# result in memory pre allocation
#enable_hugepages = true

# Enable vhost-user storage device, default false
# Enabling this will result in some Linux reserved block type
# major range 240-254 being chosen to represent vhost-user devices.
enable_vhost_user_store = false

# The base directory specifically used for vhost-user devices.
# Its sub-path "block" is used for block devices; "block/sockets" is
# where we expect vhost-user sockets to live; "block/devices" is where
# simulated block device nodes for vhost-user devices to live.
vhost_user_store_path = "/var/run/kata-containers/vhost-user"

# Enable vIOMMU, default false
# Enabling this will result in the VM having a vIOMMU device
# This will also add the following options to the kernel's
# command line: intel_iommu=on,iommu=pt
#enable_iommu = true

# Enable IOMMU_PLATFORM, default false
# Enabling this will result in the VM device having iommu_platform=on set
#enable_iommu_platform = true

# List of valid annotations values for the vhost user store path
# The default if not set is empty (all annotations rejected.)
# Your distribution recommends: ["/var/run/kata-containers/vhost-user"]
valid_vhost_user_store_paths = ["/var/run/kata-containers/vhost-user"]

# The timeout for reconnecting on non-server spdk sockets when the remote end goes away.
# qemu will delay this many seconds and then attempt to reconnect.
# Zero disables reconnecting, and the default is zero.
vhost_user_reconnect_timeout_sec = 0

# Enable file based guest memory support. The default is an empty string which
# will disable this feature. In the case of virtio-fs, this is enabled
# automatically and '/dev/shm' is used as the backing folder.
# This option will be ignored if VM templating is enabled.
#file_mem_backend = ""

# List of valid annotations values for the file_mem_backend annotation
# The default if not set is empty (all annotations rejected.)
# Your distribution recommends: [""]
valid_file_mem_backends = [""]

# -pflash can add image file to VM. The arguments of it should be in format
# of ["/path/to/flash0.img", "/path/to/flash1.img"]
pflashes = []

# This option changes the default hypervisor and kernel parameters
# to enable debug output where available. And Debug also enable the hmp socket.
#
# Default false
enable_debug = true

# Disable the customizations done in the runtime when it detects
# that it is running on top a VMM. This will result in the runtime
# behaving as it would when running on bare metal.
#
#disable_nesting_checks = true

# This is the msize used for 9p shares. It is the number of bytes
# used for 9p packet payload.
#msize_9p = 8192

# If false and nvdimm is supported, use nvdimm device to plug guest image.
# Otherwise virtio-block device is used.
#
# nvdimm is not supported when `confidential_guest = true`.
#
# Default is false
#disable_image_nvdimm = true

# VFIO devices are hotplugged on a bridge by default.
# Enable hotplugging on root bus. This may be required for devices with
# a large PCI bar, as this is a current limitation with hotplugging on
# a bridge.
# Default false
#hotplug_vfio_on_root_bus = true

# Before hot plugging a PCIe device, you need to add a pcie_root_port device.
# Use this parameter when using some large PCI bar devices, such as Nvidia GPU
# The value means the number of pcie_root_port
# This value is valid when hotplug_vfio_on_root_bus is true and machine_type is "q35"
# Default 0
#pcie_root_port = 2

# If vhost-net backend for virtio-net is not desired, set to true. Default is false, which trades off
# security (vhost-net runs ring0) for network I/O performance.
#disable_vhost_net = true

#
# Default entropy source.
# The path to a host source of entropy (including a real hardware RNG)
# /dev/urandom and /dev/random are two main options.
# Be aware that /dev/random is a blocking source of entropy.  If the host
# runs out of entropy, the VMs boot time will increase leading to get startup
# timeouts.
# The source of entropy /dev/urandom is non-blocking and provides a
# generally acceptable source of entropy. It should work well for pretty much
# all practical purposes.
#entropy_source= "/dev/urandom"

# List of valid annotations values for entropy_source
# The default if not set is empty (all annotations rejected.)
# Your distribution recommends: ["/dev/urandom","/dev/random",""]
valid_entropy_sources = ["/dev/urandom","/dev/random",""]

# Path to OCI hook binaries in the *guest rootfs*.
# This does not affect host-side hooks which must instead be added to
# the OCI spec passed to the runtime.
#
# You can create a rootfs with hooks by customizing the osbuilder scripts:
# https://github.com/kata-containers/kata-containers/tree/main/tools/osbuilder
#
# Hooks must be stored in a subdirectory of guest_hook_path according to their
# hook type, i.e. "guest_hook_path/{prestart,poststart,poststop}".
# The agent will scan these directories for executable files and add them, in
# lexicographical order, to the lifecycle of the guest container.
# Hooks are executed in the runtime namespace of the guest. See the official documentation:
# https://github.com/opencontainers/runtime-spec/blob/v1.0.1/config.md#posix-platform-hooks
# Warnings will be logged if any error is encountered while scanning for hooks,
# but it will not abort container execution.
#guest_hook_path = "/usr/share/oci/hooks"
#
# Use rx Rate Limiter to control network I/O inbound bandwidth(size in bits/sec for SB/VM).
# In Qemu, we use classful qdiscs HTB(Hierarchy Token Bucket) to discipline traffic.
# Default 0-sized value means unlimited rate.
#rx_rate_limiter_max_rate = 0
# Use tx Rate Limiter to control network I/O outbound bandwidth(size in bits/sec for SB/VM).
# In Qemu, we use classful qdiscs HTB(Hierarchy Token Bucket) and ifb(Intermediate Functional Block)
# to discipline traffic.
# Default 0-sized value means unlimited rate.
#tx_rate_limiter_max_rate = 0

# Set where to save the guest memory dump file.
# If set, when GUEST_PANICKED event occurred,
# guest memeory will be dumped to host filesystem under guest_memory_dump_path,
# This directory will be created automatically if it does not exist.
#
# The dumped file(also called vmcore) can be processed with crash or gdb.
#
# WARNING:
#   Dump guest’s memory can take very long depending on the amount of guest memory
#   and use much disk space.
#guest_memory_dump_path="/var/crash/kata"

# If enable paging.
# Basically, if you want to use "gdb" rather than "crash",
# or need the guest-virtual addresses in the ELF vmcore,
# then you should enable paging.
#
# See: https://www.qemu.org/docs/master/qemu-qmp-ref.html#Dump-guest-memory for details
#guest_memory_dump_paging=false

# Enable swap in the guest. Default false.
# When enable_guest_swap is enabled, insert a raw file to the guest as the swap device
# if the swappiness of a container (set by annotation "io.katacontainers.container.resource.swappiness")
# is bigger than 0.
# The size of the swap device should be
# swap_in_bytes (set by annotation "io.katacontainers.container.resource.swap_in_bytes") - memory_limit_in_bytes.
# If swap_in_bytes is not set, the size should be memory_limit_in_bytes.
# If swap_in_bytes and memory_limit_in_bytes is not set, the size should
# be default_memory.
#enable_guest_swap = true

# use legacy serial for guest console if available and implemented for architecture. Default false
#use_legacy_serial = true

# disable applying SELinux on the VMM process (default false)
disable_selinux=false

# disable applying SELinux on the container process
# If set to false, the type `container_t` is applied to the container process by default.
# Note: To enable guest SELinux, the guest rootfs must be CentOS that is created and built
# with `SELINUX=yes`.
# (default: true)
disable_guest_selinux=true


[factory]
# VM templating support. Once enabled, new VMs are created from template
# using vm cloning. They will share the same initial kernel, initramfs and
# agent memory by mapping it readonly. It helps speeding up new container
# creation and saves a lot of memory if there are many kata containers running
# on the same host.
#
# When disabled, new VMs are created from scratch.
#
# Note: Requires "initrd=" to be set ("image=" is not supported).
#
# Default false
#enable_template = true

# Specifies the path of template.
#
# Default "/run/vc/vm/template"
#template_path = "/run/vc/vm/template"

# The number of caches of VMCache:
# unspecified or == 0   --> VMCache is disabled
# > 0                   --> will be set to the specified number
#
# VMCache is a function that creates VMs as caches before using it.
# It helps speed up new container creation.
# The function consists of a server and some clients communicating
# through Unix socket.  The protocol is gRPC in protocols/cache/cache.proto.
# The VMCache server will create some VMs and cache them by factory cache.
# It will convert the VM to gRPC format and transport it when gets
# requestion from clients.
# Factory grpccache is the VMCache client.  It will request gRPC format
# VM and convert it back to a VM.  If VMCache function is enabled,
# kata-runtime will request VM from factory grpccache when it creates
# a new sandbox.
#
# Default 0
#vm_cache_number = 0

# Specify the address of the Unix socket that is used by VMCache.
#
# Default /var/run/kata-containers/cache.sock
#vm_cache_endpoint = "/var/run/kata-containers/cache.sock"

[agent.kata]
# If enabled, make the agent display debug-level messages.
# (default: disabled)
enable_debug = true

# Enable agent tracing.
#
# If enabled, the agent will generate OpenTelemetry trace spans.
#
# Notes:
#
# - If the runtime also has tracing enabled, the agent spans will be
#   associated with the appropriate runtime parent span.
# - If enabled, the runtime will wait for the container to shutdown,
#   increasing the container shutdown time slightly.
#
# (default: disabled)
#enable_tracing = true

# Comma separated list of kernel modules and their parameters.
# These modules will be loaded in the guest kernel using modprobe(8).
# The following example can be used to load two kernel modules with parameters
#  - kernel_modules=["e1000e InterruptThrottleRate=3000,3000,3000 EEE=1", "i915 enable_ppgtt=0"]
# The first word is considered as the module name and the rest as its parameters.
# Container will not be started when:
#  * A kernel module is specified and the modprobe command is not installed in the guest
#    or it fails loading the module.
#  * The module is not available in the guest or it doesn't met the guest kernel
#    requirements, like architecture and version.
#
kernel_modules=[]

# Enable debug console.

# If enabled, user can connect guest OS running inside hypervisor
# through "kata-runtime exec <sandbox-id>" command

#debug_console_enabled = true

# Agent connection dialing timeout value in seconds
# (default: 45)
dial_timeout = 45

[runtime]
# If enabled, the runtime will log additional debug messages to the
# system log
# (default: disabled)
enable_debug = true
#
# Internetworking model
# Determines how the VM should be connected to the
# the container network interface
# Options:
#
#   - macvtap
#     Used when the Container network interface can be bridged using
#     macvtap.
#
#   - none
#     Used when customize network. Only creates a tap device. No veth pair.
#
#   - tcfilter
#     Uses tc filter rules to redirect traffic from the network interface
#     provided by plugin to a tap interface connected to the VM.
#
internetworking_model="tcfilter"

# disable guest seccomp
# Determines whether container seccomp profiles are passed to the virtual
# machine and applied by the kata agent. If set to true, seccomp is not applied
# within the guest
# (default: true)
disable_guest_seccomp=true

# vCPUs pinning settings
# if enabled, each vCPU thread will be scheduled to a fixed CPU
# qualified condition: num(vCPU threads) == num(CPUs in sandbox's CPUSet)
# enable_vcpus_pinning = false

# Apply a custom SELinux security policy to the container process inside the VM.
# This is used when you want to apply a type other than the default `container_t`,
# so general users should not uncomment and apply it.
# (format: "user:role:type")
# Note: You cannot specify MCS policy with the label because the sensitivity levels and
# categories are determined automatically by high-level container runtimes such as containerd.
#guest_selinux_label="system_u:system_r:container_t"

# If enabled, the runtime will create opentracing.io traces and spans.
# (See https://www.jaegertracing.io/docs/getting-started).
# (default: disabled)
#enable_tracing = true

# Set the full url to the Jaeger HTTP Thrift collector.
# The default if not set will be "http://localhost:14268/api/traces"
#jaeger_endpoint = ""

# Sets the username to be used if basic auth is required for Jaeger.
#jaeger_user = ""

# Sets the password to be used if basic auth is required for Jaeger.
#jaeger_password = ""

# If enabled, the runtime will not create a network namespace for shim and hypervisor processes.
# This option may have some potential impacts to your host. It should only be used when you know what you're doing.
# `disable_new_netns` conflicts with `internetworking_model=tcfilter` and `internetworking_model=macvtap`. It works only
# with `internetworking_model=none`. The tap device will be in the host network namespace and can connect to a bridge
# (like OVS) directly.
# (default: false)
#disable_new_netns = true

# if enabled, the runtime will add all the kata processes inside one dedicated cgroup.
# The container cgroups in the host are not created, just one single cgroup per sandbox.
# The runtime caller is free to restrict or collect cgroup stats of the overall Kata sandbox.
# The sandbox cgroup path is the parent cgroup of a container with the PodSandbox annotation.
# The sandbox cgroup is constrained if there is no container type annotation.
# See: https://pkg.go.dev/github.com/kata-containers/kata-containers/src/runtime/virtcontainers#ContainerType
sandbox_cgroup_only=false

# If enabled, the runtime will attempt to determine appropriate sandbox size (memory, CPU) before booting the virtual machine. In
# this case, the runtime will not dynamically update the amount of memory and CPU in the virtual machine. This is generally helpful
# when a hardware architecture or hypervisor solutions is utilized which does not support CPU and/or memory hotplug.
# Compatibility for determining appropriate sandbox (VM) size:
# - When running with pods, sandbox sizing information will only be available if using Kubernetes >= 1.23 and containerd >= 1.6. CRI-O
#   does not yet support sandbox sizing annotations.
# - When running single containers using a tool like ctr, container sizing information will be available.
static_sandbox_resource_mgmt=false

# If specified, sandbox_bind_mounts identifieds host paths to be mounted (ro) into the sandboxes shared path.
# This is only valid if filesystem sharing is utilized. The provided path(s) will be bindmounted into the shared fs directory.
# If defaults are utilized, these mounts should be available in the guest at `/run/kata-containers/shared/containers/sandbox-mounts`
# These will not be exposed to the container workloads, and are only provided for potential guest services.
sandbox_bind_mounts=[]

# VFIO Mode
# Determines how VFIO devices should be be presented to the container.
# Options:
#
#  - vfio
#    Matches behaviour of OCI runtimes (e.g. runc) as much as
#    possible.  VFIO devices will appear in the container as VFIO
#    character devices under /dev/vfio.  The exact names may differ
#    from the host (they need to match the VM's IOMMU group numbers
#    rather than the host's)
#
#  - guest-kernel
#    This is a Kata-specific behaviour that's useful in certain cases.
#    The VFIO device is managed by whatever driver in the VM kernel
#    claims it.  This means it will appear as one or more device nodes
#    or network interfaces depending on the nature of the device.
#    Using this mode requires specially built workloads that know how
#    to locate the relevant device interfaces within the VM.
#
vfio_mode="guest-kernel"

# If enabled, the runtime will not create Kubernetes emptyDir mounts on the guest filesystem. Instead, emptyDir mounts will
# be created on the host and shared via virtio-fs. This is potentially slower, but allows sharing of files from host to guest.
disable_guest_empty_dir=false

# Enabled experimental feature list, format: ["a", "b"].
# Experimental features are features not stable enough for production,
# they may break compatibility, and are prepared for a big version bump.
# Supported experimental features:
# (default: [])
experimental=[]

# If enabled, user can run pprof tools with shim v2 process through kata-monitor.
# (default: false)
# enable_pprof = true

# WARNING: All the options in the following section have not been implemented yet.
# This section was added as a placeholder. DO NOT USE IT!
[image]
# Container image service.
#
# Offload the CRI image management service to the Kata agent.
# (default: false)
#service_offload = true

# Container image decryption keys provisioning.
# Applies only if service_offload is true.
# Keys can be provisioned locally (e.g. through a special command or
# a local file) or remotely (usually after the guest is remotely attested).
# The provision setting is a complete URL that lets the Kata agent decide
# which method to use in order to fetch the keys.
#
# Keys can be stored in a local file, in a measured and attested initrd:
#provision=data:///local/key/file
#
# Keys could be fetched through a special command or binary from the
# initrd (guest) image, e.g. a firmware call:
#provision=file:///path/to/bin/fetcher/in/guest
#
# Keys can be remotely provisioned. The Kata agent fetches them from e.g.
# a HTTPS URL:
#provision=https://my-key-broker.foo/tenant/<tenant-id>

cat "/usr/share/defaults/kata-containers/configuration.toml"

# Copyright (c) 2017-2019 Intel Corporation
# Copyright (c) 2021 Adobe Inc.
#
# SPDX-License-Identifier: Apache-2.0
#

# XXX: WARNING: this file is auto-generated.
# XXX:
# XXX: Source file: "config/configuration-qemu.toml.in"
# XXX: Project:
# XXX:   Name: Kata Containers
# XXX:   Type: kata

[hypervisor.qemu]
path = "/usr/bin/qemu-system-x86_64"
kernel = "/usr/share/kata-containers/vmlinux.container"
image = "/usr/share/kata-containers/kata-containers.img"
# initrd = "/usr/share/kata-containers/kata-containers-initrd.img"
machine_type = "q35"

# rootfs filesystem type:
#   - ext4 (default)
#   - xfs
#   - erofs
rootfs_type="ext4"

# Enable confidential guest support.
# Toggling that setting may trigger different hardware features, ranging
# from memory encryption to both memory and CPU-state encryption and integrity.
# The Kata Containers runtime dynamically detects the available feature set and
# aims at enabling the largest possible one, returning an error if none is
# available, or none is supported by the hypervisor.
#
# Known limitations:
# * Does not work by design:
#   - CPU Hotplug 
#   - Memory Hotplug
#   - NVDIMM devices
#
# Default false
# confidential_guest = true

# Choose AMD SEV-SNP confidential guests
# In case of using confidential guests on AMD hardware that supports both SEV
# and SEV-SNP, the following enables SEV-SNP guests. SEV guests are default.
# Default false
# sev_snp_guest = true

# Enable running QEMU VMM as a non-root user.
# By default QEMU VMM run as root. When this is set to true, QEMU VMM process runs as
# a non-root random user. See documentation for the limitations of this mode.
# rootless = true

# List of valid annotation names for the hypervisor
# Each member of the list is a regular expression, which is the base name
# of the annotation, e.g. "path" for io.katacontainers.config.hypervisor.path"
enable_annotations = ["enable_iommu"]

# List of valid annotations values for the hypervisor
# Each member of the list is a path pattern as described by glob(3).
# The default if not set is empty (all annotations rejected.)
# Your distribution recommends: ["/usr/bin/qemu-system-x86_64"]
valid_hypervisor_paths = ["/usr/bin/qemu-system-x86_64"]

# Optional space-separated list of options to pass to the guest kernel.
# For example, use `kernel_params = "vsyscall=emulate"` if you are having
# trouble running pre-2.15 glibc.
#
# WARNING: - any parameter specified here will take priority over the default
# parameter value of the same name used to start the virtual machine.
# Do not set values here unless you understand the impact of doing so as you
# may stop the virtual machine from booting.
# To see the list of default parameters, enable hypervisor debug, create a
# container and look for 'default-kernel-parameters' log entries.
kernel_params = ""

# Path to the firmware.
# If you want that qemu uses the default firmware leave this option empty
firmware = ""

# Path to the firmware volume.
# firmware TDVF or OVMF can be split into FIRMWARE_VARS.fd (UEFI variables
# as configuration) and FIRMWARE_CODE.fd (UEFI program image). UEFI variables
# can be customized per each user while UEFI code is kept same.
firmware_volume = ""

# Machine accelerators
# comma-separated list of machine accelerators to pass to the hypervisor.
# For example, `machine_accelerators = "nosmm,nosmbus,nosata,nopit,static-prt,nofw"`
machine_accelerators=""

# Qemu seccomp sandbox feature
# comma-separated list of seccomp sandbox features to control the syscall access.
# For example, `seccompsandbox= "on,obsolete=deny,spawn=deny,resourcecontrol=deny"`
# Note: "elevateprivileges=deny" doesn't work with daemonize option, so it's removed from the seccomp sandbox
# Another note: enabling this feature may reduce performance, you may enable
# /proc/sys/net/core/bpf_jit_enable to reduce the impact. see https://man7.org/linux/man-pages/man8/bpfc.8.html
#seccompsandbox="on,obsolete=deny,spawn=deny,resourcecontrol=deny"

# CPU features
# comma-separated list of cpu features to pass to the cpu
# For example, `cpu_features = "pmu=off,vmx=off"
cpu_features="pmu=off"

# Default number of vCPUs per SB/VM:
# unspecified or 0                --> will be set to 1
# < 0                             --> will be set to the actual number of physical cores
# > 0 <= number of physical cores --> will be set to the specified number
# > number of physical cores      --> will be set to the actual number of physical cores
default_vcpus = 1

# Default maximum number of vCPUs per SB/VM:
# unspecified or == 0             --> will be set to the actual number of physical cores or to the maximum number
#                                     of vCPUs supported by KVM if that number is exceeded
# > 0 <= number of physical cores --> will be set to the specified number
# > number of physical cores      --> will be set to the actual number of physical cores or to the maximum number
#                                     of vCPUs supported by KVM if that number is exceeded
# WARNING: Depending of the architecture, the maximum number of vCPUs supported by KVM is used when
# the actual number of physical cores is greater than it.
# WARNING: Be aware that this value impacts the virtual machine's memory footprint and CPU
# the hotplug functionality. For example, `default_maxvcpus = 240` specifies that until 240 vCPUs
# can be added to a SB/VM, but the memory footprint will be big. Another example, with
# `default_maxvcpus = 8` the memory footprint will be small, but 8 will be the maximum number of
# vCPUs supported by the SB/VM. In general, we recommend that you do not edit this variable,
# unless you know what are you doing.
# NOTICE: on arm platform with gicv2 interrupt controller, set it to 8.
default_maxvcpus = 0

# Bridges can be used to hot plug devices.
# Limitations:
# * Currently only pci bridges are supported
# * Until 30 devices per bridge can be hot plugged.
# * Until 5 PCI bridges can be cold plugged per VM.
#   This limitation could be a bug in qemu or in the kernel
# Default number of bridges per SB/VM:
# unspecified or 0   --> will be set to 1
# > 1 <= 5           --> will be set to the specified number
# > 5                --> will be set to 5
default_bridges = 1

# Default memory size in MiB for SB/VM.
# If unspecified then it will be set 2048 MiB.
default_memory = 2048
#
# Default memory slots per SB/VM.
# If unspecified then it will be set 10.
# This is will determine the times that memory will be hotadded to sandbox/VM.
#memory_slots = 10

# Default maximum memory in MiB per SB / VM
# unspecified or == 0           --> will be set to the actual amount of physical RAM
# > 0 <= amount of physical RAM --> will be set to the specified number
# > amount of physical RAM      --> will be set to the actual amount of physical RAM
default_maxmemory = 0

# The size in MiB will be plused to max memory of hypervisor.
# It is the memory address space for the NVDIMM devie.
# If set block storage driver (block_device_driver) to "nvdimm",
# should set memory_offset to the size of block device.
# Default 0
#memory_offset = 0

# Specifies virtio-mem will be enabled or not.
# Please note that this option should be used with the command
# "echo 1 > /proc/sys/vm/overcommit_memory".
# Default false
#enable_virtio_mem = true

# Disable block device from being used for a container's rootfs.
# In case of a storage driver like devicemapper where a container's
# root file system is backed by a block device, the block device is passed
# directly to the hypervisor for performance reasons.
# This flag prevents the block device from being passed to the hypervisor,
# virtio-fs is used instead to pass the rootfs.
disable_block_device_use = false

# Shared file system type:
#   - virtio-fs (default)
#   - virtio-9p
#   - virtio-fs-nydus
shared_fs = "virtio-fs"

# Path to vhost-user-fs daemon.
virtio_fs_daemon = "/usr/libexec/virtiofsd"

# List of valid annotations values for the virtiofs daemon
# The default if not set is empty (all annotations rejected.)
# Your distribution recommends: ["/usr/libexec/virtiofsd"]
valid_virtio_fs_daemon_paths = ["/usr/libexec/virtiofsd"]

# Default size of DAX cache in MiB
virtio_fs_cache_size = 0

# Default size of virtqueues
virtio_fs_queue_size = 1024

# Extra args for virtiofsd daemon
#
# Format example:
#   ["-o", "arg1=xxx,arg2", "-o", "hello world", "--arg3=yyy"]
# Examples:
#   Set virtiofsd log level to debug : ["-o", "log_level=debug"] or ["-d"]
#
# see `virtiofsd -h` for possible options.
virtio_fs_extra_args = ["--thread-pool-size=1", "-o", "announce_submounts"]

# Cache mode:
#
#  - never
#    Metadata, data, and pathname lookup are not cached in guest. They are
#    always fetched from host and any changes are immediately pushed to host.
#
#  - auto
#    Metadata and pathname lookup cache expires after a configured amount of
#    time (default is 1 second). Data is cached while the file is open (close
#    to open consistency).
#
#  - always
#    Metadata, data, and pathname lookup are cached in guest and never expire.
virtio_fs_cache = "auto"

# Block storage driver to be used for the hypervisor in case the container
# rootfs is backed by a block device. This is virtio-scsi, virtio-blk
# or nvdimm.
block_device_driver = "virtio-scsi"

# aio is the I/O mechanism used by qemu
# Options:
#
#   - threads
#     Pthread based disk I/O.
#
#   - native
#     Native Linux I/O.
#
#   - io_uring
#     Linux io_uring API. This provides the fastest I/O operations on Linux, requires kernel>5.1 and
#     qemu >=5.0.
block_device_aio = "io_uring"

# Specifies cache-related options will be set to block devices or not.
# Default false
#block_device_cache_set = true

# Specifies cache-related options for block devices.
# Denotes whether use of O_DIRECT (bypass the host page cache) is enabled.
# Default false
#block_device_cache_direct = true

# Specifies cache-related options for block devices.
# Denotes whether flush requests for the device are ignored.
# Default false
#block_device_cache_noflush = true

# Enable iothreads (data-plane) to be used. This causes IO to be
# handled in a separate IO thread. This is currently only implemented
# for SCSI.
#
enable_iothreads = false

# Enable pre allocation of VM RAM, default false
# Enabling this will result in lower container density
# as all of the memory will be allocated and locked
# This is useful when you want to reserve all the memory
# upfront or in the cases where you want memory latencies
# to be very predictable
# Default false
#enable_mem_prealloc = true

# Enable huge pages for VM RAM, default false
# Enabling this will result in the VM memory
# being allocated using huge pages.
# This is useful when you want to use vhost-user network
# stacks within the container. This will automatically
# result in memory pre allocation
#enable_hugepages = true

# Enable vhost-user storage device, default false
# Enabling this will result in some Linux reserved block type
# major range 240-254 being chosen to represent vhost-user devices.
enable_vhost_user_store = false

# The base directory specifically used for vhost-user devices.
# Its sub-path "block" is used for block devices; "block/sockets" is
# where we expect vhost-user sockets to live; "block/devices" is where
# simulated block device nodes for vhost-user devices to live.
vhost_user_store_path = "/var/run/kata-containers/vhost-user"

# Enable vIOMMU, default false
# Enabling this will result in the VM having a vIOMMU device
# This will also add the following options to the kernel's
# command line: intel_iommu=on,iommu=pt
#enable_iommu = true

# Enable IOMMU_PLATFORM, default false
# Enabling this will result in the VM device having iommu_platform=on set
#enable_iommu_platform = true

# List of valid annotations values for the vhost user store path
# The default if not set is empty (all annotations rejected.)
# Your distribution recommends: ["/var/run/kata-containers/vhost-user"]
valid_vhost_user_store_paths = ["/var/run/kata-containers/vhost-user"]

# The timeout for reconnecting on non-server spdk sockets when the remote end goes away.
# qemu will delay this many seconds and then attempt to reconnect.
# Zero disables reconnecting, and the default is zero.
vhost_user_reconnect_timeout_sec = 0

# Enable file based guest memory support. The default is an empty string which
# will disable this feature. In the case of virtio-fs, this is enabled
# automatically and '/dev/shm' is used as the backing folder.
# This option will be ignored if VM templating is enabled.
#file_mem_backend = ""

# List of valid annotations values for the file_mem_backend annotation
# The default if not set is empty (all annotations rejected.)
# Your distribution recommends: [""]
valid_file_mem_backends = [""]

# -pflash can add image file to VM. The arguments of it should be in format
# of ["/path/to/flash0.img", "/path/to/flash1.img"]
pflashes = []

# This option changes the default hypervisor and kernel parameters
# to enable debug output where available. And Debug also enable the hmp socket.
#
# Default false
#enable_debug = true

# Disable the customizations done in the runtime when it detects
# that it is running on top a VMM. This will result in the runtime
# behaving as it would when running on bare metal.
#
#disable_nesting_checks = true

# This is the msize used for 9p shares. It is the number of bytes
# used for 9p packet payload.
#msize_9p = 8192

# If false and nvdimm is supported, use nvdimm device to plug guest image.
# Otherwise virtio-block device is used.
#
# nvdimm is not supported when `confidential_guest = true`.
#
# Default is false
#disable_image_nvdimm = true

# VFIO devices are hotplugged on a bridge by default.
# Enable hotplugging on root bus. This may be required for devices with
# a large PCI bar, as this is a current limitation with hotplugging on
# a bridge.
# Default false
#hotplug_vfio_on_root_bus = true

# Before hot plugging a PCIe device, you need to add a pcie_root_port device.
# Use this parameter when using some large PCI bar devices, such as Nvidia GPU
# The value means the number of pcie_root_port
# This value is valid when hotplug_vfio_on_root_bus is true and machine_type is "q35"
# Default 0
#pcie_root_port = 2

# If vhost-net backend for virtio-net is not desired, set to true. Default is false, which trades off
# security (vhost-net runs ring0) for network I/O performance.
#disable_vhost_net = true

#
# Default entropy source.
# The path to a host source of entropy (including a real hardware RNG)
# /dev/urandom and /dev/random are two main options.
# Be aware that /dev/random is a blocking source of entropy.  If the host
# runs out of entropy, the VMs boot time will increase leading to get startup
# timeouts.
# The source of entropy /dev/urandom is non-blocking and provides a
# generally acceptable source of entropy. It should work well for pretty much
# all practical purposes.
#entropy_source= "/dev/urandom"

# List of valid annotations values for entropy_source
# The default if not set is empty (all annotations rejected.)
# Your distribution recommends: ["/dev/urandom","/dev/random",""]
valid_entropy_sources = ["/dev/urandom","/dev/random",""]

# Path to OCI hook binaries in the *guest rootfs*.
# This does not affect host-side hooks which must instead be added to
# the OCI spec passed to the runtime.
#
# You can create a rootfs with hooks by customizing the osbuilder scripts:
# https://github.com/kata-containers/kata-containers/tree/main/tools/osbuilder
#
# Hooks must be stored in a subdirectory of guest_hook_path according to their
# hook type, i.e. "guest_hook_path/{prestart,poststart,poststop}".
# The agent will scan these directories for executable files and add them, in
# lexicographical order, to the lifecycle of the guest container.
# Hooks are executed in the runtime namespace of the guest. See the official documentation:
# https://github.com/opencontainers/runtime-spec/blob/v1.0.1/config.md#posix-platform-hooks
# Warnings will be logged if any error is encountered while scanning for hooks,
# but it will not abort container execution.
#guest_hook_path = "/usr/share/oci/hooks"
#
# Use rx Rate Limiter to control network I/O inbound bandwidth(size in bits/sec for SB/VM).
# In Qemu, we use classful qdiscs HTB(Hierarchy Token Bucket) to discipline traffic.
# Default 0-sized value means unlimited rate.
#rx_rate_limiter_max_rate = 0
# Use tx Rate Limiter to control network I/O outbound bandwidth(size in bits/sec for SB/VM).
# In Qemu, we use classful qdiscs HTB(Hierarchy Token Bucket) and ifb(Intermediate Functional Block)
# to discipline traffic.
# Default 0-sized value means unlimited rate.
#tx_rate_limiter_max_rate = 0

# Set where to save the guest memory dump file.
# If set, when GUEST_PANICKED event occurred,
# guest memeory will be dumped to host filesystem under guest_memory_dump_path,
# This directory will be created automatically if it does not exist.
#
# The dumped file(also called vmcore) can be processed with crash or gdb.
#
# WARNING:
#   Dump guest’s memory can take very long depending on the amount of guest memory
#   and use much disk space.
#guest_memory_dump_path="/var/crash/kata"

# If enable paging.
# Basically, if you want to use "gdb" rather than "crash",
# or need the guest-virtual addresses in the ELF vmcore,
# then you should enable paging.
#
# See: https://www.qemu.org/docs/master/qemu-qmp-ref.html#Dump-guest-memory for details
#guest_memory_dump_paging=false

# Enable swap in the guest. Default false.
# When enable_guest_swap is enabled, insert a raw file to the guest as the swap device
# if the swappiness of a container (set by annotation "io.katacontainers.container.resource.swappiness")
# is bigger than 0.
# The size of the swap device should be
# swap_in_bytes (set by annotation "io.katacontainers.container.resource.swap_in_bytes") - memory_limit_in_bytes.
# If swap_in_bytes is not set, the size should be memory_limit_in_bytes.
# If swap_in_bytes and memory_limit_in_bytes is not set, the size should
# be default_memory.
#enable_guest_swap = true

# use legacy serial for guest console if available and implemented for architecture. Default false
#use_legacy_serial = true

# disable applying SELinux on the VMM process (default false)
disable_selinux=false

# disable applying SELinux on the container process
# If set to false, the type `container_t` is applied to the container process by default.
# Note: To enable guest SELinux, the guest rootfs must be CentOS that is created and built
# with `SELINUX=yes`.
# (default: true)
disable_guest_selinux=true


[factory]
# VM templating support. Once enabled, new VMs are created from template
# using vm cloning. They will share the same initial kernel, initramfs and
# agent memory by mapping it readonly. It helps speeding up new container
# creation and saves a lot of memory if there are many kata containers running
# on the same host.
#
# When disabled, new VMs are created from scratch.
#
# Note: Requires "initrd=" to be set ("image=" is not supported).
#
# Default false
#enable_template = true

# Specifies the path of template.
#
# Default "/run/vc/vm/template"
#template_path = "/run/vc/vm/template"

# The number of caches of VMCache:
# unspecified or == 0   --> VMCache is disabled
# > 0                   --> will be set to the specified number
#
# VMCache is a function that creates VMs as caches before using it.
# It helps speed up new container creation.
# The function consists of a server and some clients communicating
# through Unix socket.  The protocol is gRPC in protocols/cache/cache.proto.
# The VMCache server will create some VMs and cache them by factory cache.
# It will convert the VM to gRPC format and transport it when gets
# requestion from clients.
# Factory grpccache is the VMCache client.  It will request gRPC format
# VM and convert it back to a VM.  If VMCache function is enabled,
# kata-runtime will request VM from factory grpccache when it creates
# a new sandbox.
#
# Default 0
#vm_cache_number = 0

# Specify the address of the Unix socket that is used by VMCache.
#
# Default /var/run/kata-containers/cache.sock
#vm_cache_endpoint = "/var/run/kata-containers/cache.sock"

[agent.kata]
# If enabled, make the agent display debug-level messages.
# (default: disabled)
#enable_debug = true

# Enable agent tracing.
#
# If enabled, the agent will generate OpenTelemetry trace spans.
#
# Notes:
#
# - If the runtime also has tracing enabled, the agent spans will be
#   associated with the appropriate runtime parent span.
# - If enabled, the runtime will wait for the container to shutdown,
#   increasing the container shutdown time slightly.
#
# (default: disabled)
#enable_tracing = true

# Comma separated list of kernel modules and their parameters.
# These modules will be loaded in the guest kernel using modprobe(8).
# The following example can be used to load two kernel modules with parameters
#  - kernel_modules=["e1000e InterruptThrottleRate=3000,3000,3000 EEE=1", "i915 enable_ppgtt=0"]
# The first word is considered as the module name and the rest as its parameters.
# Container will not be started when:
#  * A kernel module is specified and the modprobe command is not installed in the guest
#    or it fails loading the module.
#  * The module is not available in the guest or it doesn't met the guest kernel
#    requirements, like architecture and version.
#
kernel_modules=[]

# Enable debug console.

# If enabled, user can connect guest OS running inside hypervisor
# through "kata-runtime exec <sandbox-id>" command

#debug_console_enabled = true

# Agent connection dialing timeout value in seconds
# (default: 45)
dial_timeout = 45

[runtime]
# If enabled, the runtime will log additional debug messages to the
# system log
# (default: disabled)
#enable_debug = true
#
# Internetworking model
# Determines how the VM should be connected to the
# the container network interface
# Options:
#
#   - macvtap
#     Used when the Container network interface can be bridged using
#     macvtap.
#
#   - none
#     Used when customize network. Only creates a tap device. No veth pair.
#
#   - tcfilter
#     Uses tc filter rules to redirect traffic from the network interface
#     provided by plugin to a tap interface connected to the VM.
#
internetworking_model="tcfilter"

# disable guest seccomp
# Determines whether container seccomp profiles are passed to the virtual
# machine and applied by the kata agent. If set to true, seccomp is not applied
# within the guest
# (default: true)
disable_guest_seccomp=true

# vCPUs pinning settings
# if enabled, each vCPU thread will be scheduled to a fixed CPU
# qualified condition: num(vCPU threads) == num(CPUs in sandbox's CPUSet)
# enable_vcpus_pinning = false

# Apply a custom SELinux security policy to the container process inside the VM.
# This is used when you want to apply a type other than the default `container_t`,
# so general users should not uncomment and apply it.
# (format: "user:role:type")
# Note: You cannot specify MCS policy with the label because the sensitivity levels and
# categories are determined automatically by high-level container runtimes such as containerd.
#guest_selinux_label="system_u:system_r:container_t"

# If enabled, the runtime will create opentracing.io traces and spans.
# (See https://www.jaegertracing.io/docs/getting-started).
# (default: disabled)
#enable_tracing = true

# Set the full url to the Jaeger HTTP Thrift collector.
# The default if not set will be "http://localhost:14268/api/traces"
#jaeger_endpoint = ""

# Sets the username to be used if basic auth is required for Jaeger.
#jaeger_user = ""

# Sets the password to be used if basic auth is required for Jaeger.
#jaeger_password = ""

# If enabled, the runtime will not create a network namespace for shim and hypervisor processes.
# This option may have some potential impacts to your host. It should only be used when you know what you're doing.
# `disable_new_netns` conflicts with `internetworking_model=tcfilter` and `internetworking_model=macvtap`. It works only
# with `internetworking_model=none`. The tap device will be in the host network namespace and can connect to a bridge
# (like OVS) directly.
# (default: false)
#disable_new_netns = true

# if enabled, the runtime will add all the kata processes inside one dedicated cgroup.
# The container cgroups in the host are not created, just one single cgroup per sandbox.
# The runtime caller is free to restrict or collect cgroup stats of the overall Kata sandbox.
# The sandbox cgroup path is the parent cgroup of a container with the PodSandbox annotation.
# The sandbox cgroup is constrained if there is no container type annotation.
# See: https://pkg.go.dev/github.com/kata-containers/kata-containers/src/runtime/virtcontainers#ContainerType
sandbox_cgroup_only=false

# If enabled, the runtime will attempt to determine appropriate sandbox size (memory, CPU) before booting the virtual machine. In
# this case, the runtime will not dynamically update the amount of memory and CPU in the virtual machine. This is generally helpful
# when a hardware architecture or hypervisor solutions is utilized which does not support CPU and/or memory hotplug.
# Compatibility for determining appropriate sandbox (VM) size:
# - When running with pods, sandbox sizing information will only be available if using Kubernetes >= 1.23 and containerd >= 1.6. CRI-O
#   does not yet support sandbox sizing annotations.
# - When running single containers using a tool like ctr, container sizing information will be available.
static_sandbox_resource_mgmt=false

# If specified, sandbox_bind_mounts identifieds host paths to be mounted (ro) into the sandboxes shared path.
# This is only valid if filesystem sharing is utilized. The provided path(s) will be bindmounted into the shared fs directory.
# If defaults are utilized, these mounts should be available in the guest at `/run/kata-containers/shared/containers/sandbox-mounts`
# These will not be exposed to the container workloads, and are only provided for potential guest services.
sandbox_bind_mounts=[]

# VFIO Mode
# Determines how VFIO devices should be be presented to the container.
# Options:
#
#  - vfio
#    Matches behaviour of OCI runtimes (e.g. runc) as much as
#    possible.  VFIO devices will appear in the container as VFIO
#    character devices under /dev/vfio.  The exact names may differ
#    from the host (they need to match the VM's IOMMU group numbers
#    rather than the host's)
#
#  - guest-kernel
#    This is a Kata-specific behaviour that's useful in certain cases.
#    The VFIO device is managed by whatever driver in the VM kernel
#    claims it.  This means it will appear as one or more device nodes
#    or network interfaces depending on the nature of the device.
#    Using this mode requires specially built workloads that know how
#    to locate the relevant device interfaces within the VM.
#
vfio_mode="guest-kernel"

# If enabled, the runtime will not create Kubernetes emptyDir mounts on the guest filesystem. Instead, emptyDir mounts will
# be created on the host and shared via virtio-fs. This is potentially slower, but allows sharing of files from host to guest.
disable_guest_empty_dir=false

# Enabled experimental feature list, format: ["a", "b"].
# Experimental features are features not stable enough for production,
# they may break compatibility, and are prepared for a big version bump.
# Supported experimental features:
# (default: [])
experimental=[]

# If enabled, user can run pprof tools with shim v2 process through kata-monitor.
# (default: false)
# enable_pprof = true

# WARNING: All the options in the following section have not been implemented yet.
# This section was added as a placeholder. DO NOT USE IT!
[image]
# Container image service.
#
# Offload the CRI image management service to the Kata agent.
# (default: false)
#service_offload = true

# Container image decryption keys provisioning.
# Applies only if service_offload is true.
# Keys can be provisioned locally (e.g. through a special command or
# a local file) or remotely (usually after the guest is remotely attested).
# The provision setting is a complete URL that lets the Kata agent decide
# which method to use in order to fetch the keys.
#
# Keys can be stored in a local file, in a measured and attested initrd:
#provision=data:///local/key/file
#
# Keys could be fetched through a special command or binary from the
# initrd (guest) image, e.g. a firmware call:
#provision=file:///path/to/bin/fetcher/in/guest
#
# Keys can be remotely provisioned. The Kata agent fetches them from e.g.
# a HTTPS URL:
#provision=https://my-key-broker.foo/tenant/<tenant-id>


Containerd shim v2

Containerd shim v2 is /usr/local/bin/containerd-shim-kata-v2.

containerd-shim-kata-v2 --version

Kata Containers containerd shim (Golang): id: "io.containerd.kata.v2", version: 3.2.0-alpha0, commit: 8653b3acd5da506bbf13b8fadb5eb0d06caaf2d2


KSM throttler

KSM throttler

version

systemd service

Image details

Image details

No image


Initrd details

Initrd details

---
osbuilder:
  url: "https://github.com/kata-containers/kata-containers/tools/osbuilder"
  version: "unknown"
rootfs-creation-time: "2023-04-21T19:15:39.567813432+0000Z"
description: "osbuilder rootfs"
file-format-version: "0.0.2"
architecture: "x86_64"
base-distro:
  name: "Alpine"
  version: "3.15"
  packages:
    default:
      - "ip6tables"
      - "iptables"
      - "libseccomp"
    extra:

agent:
  url: "https://github.com/kata-containers/kata-containers"
  name: "kata-agent"
  version: "3.2.0-alpha0"
  agent-is-init-daemon: "yes"

Logfiles

Logfiles

Runtime logs

Runtime logs

No recent runtime problems found in system journal.

Throttler logs

Throttler logs

No recent throttler problems found in system journal.

Kata Containerd Shim v2 logs

Kata Containerd Shim v2

Recent problems found in system journal:

time="2023-04-27T16:58:38.237044635-04:00" level=info msg="scanner return error: read unix @->/run/vc/vm/hello3/qmp.sock: use of closed network connection" name=containerd-shim-v2 pid=916157 sandbox=hello3 source=virtcontainers/hypervisor subsystem=qmp
time="2023-04-27T16:58:38.237494983-04:00" level=debug msg="Sandbox OCI spec not found. Sandbox DNS will not be set." name=containerd-shim-v2 pid=916157 sandbox=hello3 source=virtcontainers subsystem=kata_agent
time="2023-04-27T16:58:38.621932608-04:00" level=debug msg="reading guest console" console-protocol=unix console-url=/run/vc/vm/hello3/console.sock name=containerd-shim-v2 pid=916157 sandbox=hello3 source=virtcontainers subsystem=sandbox vmconsole="{\"msg\":\"baremount source=\\\"kataShared\\\", dest=\\\"/run/kata-containers/shared/containers/\\\", fs_type=\\\"virtiofs\\\", options=\\\"\\\", flags=(empty)\",\"level\":\"INFO\",\"ts\":\"2023-04-27T20:58:38.615539152Z\",\"pid\":\"1\",\"storage-type\":\"virtio-fs\",\"name\":\"kata-agent\",\"subsystem\":\"baremount\",\"source\":\"agent\",\"version\":\"0.1.0\"}"
time="2023-04-27T16:58:38.62200185-04:00" level=debug msg="reading guest console" console-protocol=unix console-url=/run/vc/vm/hello3/console.sock name=containerd-shim-v2 pid=916157 sandbox=hello3 source=virtcontainers subsystem=sandbox vmconsole="{\"msg\":\"baremount source=\\\"shm\\\", dest=\\\"/run/kata-containers/sandbox/shm\\\", fs_type=\\\"tmpfs\\\", options=\\\"\\\", flags=(empty)\",\"level\":\"INFO\",\"ts\":\"2023-04-27T20:58:38.615601178Z\",\"storage-type\":\"ephemeral\",\"pid\":\"1\",\"version\":\"0.1.0\",\"name\":\"kata-agent\",\"subsystem\":\"baremount\",\"source\":\"agent\"}"
time="2023-04-27T16:58:38.622430947-04:00" level=debug msg="reading guest console" console-protocol=unix console-url=/run/vc/vm/hello3/console.sock name=containerd-shim-v2 pid=916157 sandbox=hello3 source=virtcontainers subsystem=sandbox vmconsole="{\"msg\":\"hotplug memory error: ENOENT\",\"level\":\"WARN\",\"ts\":\"2023-04-27T20:58:38.616141251Z\",\"subsystem\":\"rpc\",\"name\":\"kata-agent\",\"version\":\"0.1.0\",\"pid\":\"1\",\"source\":\"agent\"}"
time="2023-04-27T16:58:38.653788236-04:00" level=debug msg="reading guest console" console-protocol=unix console-url=/run/vc/vm/hello3/console.sock name=containerd-shim-v2 pid=916157 sandbox=hello3 source=virtcontainers subsystem=sandbox vmconsole="{\"msg\":\"child exited unexpectedly\",\"level\":\"INFO\",\"ts\":\"2023-04-27T20:58:38.647430559Z\",\"subsystem\":\"signals\",\"name\":\"kata-agent\",\"pid\":\"1\",\"child-pid\":\"47\",\"source\":\"agent\",\"version\":\"0.1.0\"}"
time="2023-04-27T16:58:39.735737217-04:00" level=debug msg="reading guest console" console-protocol=unix console-url=/run/vc/vm/hello3/console.sock name=containerd-shim-v2 pid=916157 sandbox=hello3 source=virtcontainers subsystem=sandbox vmconsole="{\"msg\":\"failed to handle signal\",\"level\":\"ERRO\",\"ts\":\"2023-04-27T20:58:39.729439757Z\",\"subsystem\":\"signals\",\"name\":\"kata-agent\",\"source\":\"agent\",\"pid\":\"1\",\"version\":\"0.1.0\",\"error\":\"waitpid reaper failed\\n\\nCaused by:\\n    ECHILD: No child processes\"}"
time="2023-04-27T16:58:39.735889417-04:00" level=info msg="watchSandbox gets an error or stop signal" error="<nil>" name=containerd-shim-v2 pid=916157 sandbox=hello3 source=containerd-kata-shim-v2
time="2023-04-27T16:58:39.737344579-04:00" level=info msg="failed to get OOM event from sandbox" error="rpc error: code = Internal desc = \"\"" name=containerd-shim-v2 pid=916157 sandbox=hello3 source=containerd-kata-shim-v2
time="2023-04-27T16:58:39.738157608-04:00" level=error msg="Failed to read guest console logs" console-protocol=unix console-url=/run/vc/vm/hello3/console.sock error="read unix @->/run/vc/vm/hello3/console.sock: use of closed network connection" name=containerd-shim-v2 pid=916157 sandbox=hello3 source=virtcontainers subsystem=sandbox
time="2023-04-27T16:58:39.760745774-04:00" level=info msg="scanner return error: <nil>" name=containerd-shim-v2 pid=916157 sandbox=hello3 source=virtcontainers/hypervisor subsystem=qmp
time="2023-04-27T16:58:48.701044879-04:00" level=warning msg="Could not add /dev/mshv to the devices cgroup" name=containerd-shim-v2 pid=916344 sandbox=hello3 source=cgroups
time="2023-04-27T16:58:48.713912439-04:00" level=debug msg="restore sandbox failed" error="open /run/vc/sbs/hello3/persist.json: no such file or directory" name=containerd-shim-v2 pid=916344 sandbox=hello3 source=virtcontainers subsystem=sandbox
time="2023-04-27T16:58:48.82205133-04:00" level=info msg="scanner return error: read unix @->/run/vc/vm/hello3/qmp.sock: use of closed network connection" name=containerd-shim-v2 pid=916344 sandbox=hello3 source=virtcontainers/hypervisor subsystem=qmp
time="2023-04-27T16:58:48.822515174-04:00" level=debug msg="Sandbox OCI spec not found. Sandbox DNS will not be set." name=containerd-shim-v2 pid=916344 sandbox=hello3 source=virtcontainers subsystem=kata_agent
time="2023-04-27T16:58:49.191028837-04:00" level=debug msg="reading guest console" console-protocol=unix console-url=/run/vc/vm/hello3/console.sock name=containerd-shim-v2 pid=916344 sandbox=hello3 source=virtcontainers subsystem=sandbox vmconsole="{\"msg\":\"baremount source=\\\"kataShared\\\", dest=\\\"/run/kata-containers/shared/containers/\\\", fs_type=\\\"virtiofs\\\", options=\\\"\\\", flags=(empty)\",\"level\":\"INFO\",\"ts\":\"2023-04-27T20:58:49.183292581Z\",\"pid\":\"1\",\"subsystem\":\"baremount\",\"source\":\"agent\",\"version\":\"0.1.0\",\"storage-type\":\"virtio-fs\",\"name\":\"kata-agent\"}"
time="2023-04-27T16:58:49.192330146-04:00" level=debug msg="reading guest console" console-protocol=unix console-url=/run/vc/vm/hello3/console.sock name=containerd-shim-v2 pid=916344 sandbox=hello3 source=virtcontainers subsystem=sandbox vmconsole="{\"msg\":\"baremount source=\\\"shm\\\", dest=\\\"/run/kata-containers/sandbox/shm\\\", fs_type=\\\"tmpfs\\\", options=\\\"\\\", flags=(empty)\",\"level\":\"INFO\",\"ts\":\"2023-04-27T20:58:49.183347784Z\",\"subsystem\":\"baremount\",\"storage-type\":\"ephemeral\",\"source\":\"agent\",\"name\":\"kata-agent\",\"pid\":\"1\",\"version\":\"0.1.0\"}"
time="2023-04-27T16:58:49.192461255-04:00" level=debug msg="reading guest console" console-protocol=unix console-url=/run/vc/vm/hello3/console.sock name=containerd-shim-v2 pid=916344 sandbox=hello3 source=virtcontainers subsystem=sandbox vmconsole="{\"msg\":\"hotplug memory error: ENOENT\",\"level\":\"WARN\",\"ts\":\"2023-04-27T20:58:49.183739549Z\",\"version\":\"0.1.0\",\"source\":\"agent\",\"pid\":\"1\",\"subsystem\":\"rpc\",\"name\":\"kata-agent\"}"
time="2023-04-27T16:58:49.202836728-04:00" level=debug msg="reading guest console" console-protocol=unix console-url=/run/vc/vm/hello3/console.sock name=containerd-shim-v2 pid=916344 sandbox=hello3 source=virtcontainers subsystem=sandbox vmconsole="{\"msg\":\"child exited unexpectedly\",\"level\":\"INFO\",\"ts\":\"2023-04-27T20:58:49.194094809Z\",\"pid\":\"1\",\"source\":\"agent\",\"version\":\"0.1.0\",\"name\":\"kata-agent\",\"child-pid\":\"47\",\"subsystem\":\"signals\"}"
time="2023-04-27T16:58:50.242209741-04:00" level=info msg="watchSandbox gets an error or stop signal" error="<nil>" name=containerd-shim-v2 pid=916344 sandbox=hello3 source=containerd-kata-shim-v2
time="2023-04-27T16:58:50.242238206-04:00" level=debug msg="reading guest console" console-protocol=unix console-url=/run/vc/vm/hello3/console.sock name=containerd-shim-v2 pid=916344 sandbox=hello3 source=virtcontainers subsystem=sandbox vmconsole="{\"msg\":\"failed to handle signal\",\"level\":\"ERRO\",\"ts\":\"2023-04-27T20:58:50.23449192Z\",\"subsystem\":\"signals\",\"source\":\"agent\",\"version\":\"0.1.0\",\"pid\":\"1\",\"name\":\"kata-agent\",\"error\":\"waitpid reaper failed\\n\\nCaused by:\\n    ECHILD: No child processes\"}"
time="2023-04-27T16:58:50.243617844-04:00" level=info msg="failed to get OOM event from sandbox" error="rpc error: code = Internal desc = \"\"" name=containerd-shim-v2 pid=916344 sandbox=hello3 source=containerd-kata-shim-v2
time="2023-04-27T16:58:50.244517377-04:00" level=error msg="Failed to read guest console logs" console-protocol=unix console-url=/run/vc/vm/hello3/console.sock error="read unix @->/run/vc/vm/hello3/console.sock: use of closed network connection" name=containerd-shim-v2 pid=916344 sandbox=hello3 source=virtcontainers subsystem=sandbox
time="2023-04-27T16:58:50.260360234-04:00" level=info msg="scanner return error: <nil>" name=containerd-shim-v2 pid=916344 sandbox=hello3 source=virtcontainers/hypervisor subsystem=qmp
time="2023-04-27T16:58:57.433851734-04:00" level=warning msg="Could not add /dev/mshv to the devices cgroup" name=containerd-shim-v2 pid=916527 sandbox=hello3 source=cgroups
time="2023-04-27T16:58:57.445932094-04:00" level=debug msg="restore sandbox failed" error="open /run/vc/sbs/hello3/persist.json: no such file or directory" name=containerd-shim-v2 pid=916527 sandbox=hello3 source=virtcontainers subsystem=sandbox
time="2023-04-27T16:58:57.568054873-04:00" level=info msg="scanner return error: read unix @->/run/vc/vm/hello3/qmp.sock: use of closed network connection" name=containerd-shim-v2 pid=916527 sandbox=hello3 source=virtcontainers/hypervisor subsystem=qmp
time="2023-04-27T16:58:57.568465936-04:00" level=debug msg="Sandbox OCI spec not found. Sandbox DNS will not be set." name=containerd-shim-v2 pid=916527 sandbox=hello3 source=virtcontainers subsystem=kata_agent
time="2023-04-27T16:58:57.941986075-04:00" level=debug msg="reading guest console" console-protocol=unix console-url=/run/vc/vm/hello3/console.sock name=containerd-shim-v2 pid=916527 sandbox=hello3 source=virtcontainers subsystem=sandbox vmconsole="{\"msg\":\"baremount source=\\\"kataShared\\\", dest=\\\"/run/kata-containers/shared/containers/\\\", fs_type=\\\"virtiofs\\\", options=\\\"\\\", flags=(empty)\",\"level\":\"INFO\",\"ts\":\"2023-04-27T20:58:57.933623275Z\",\"storage-type\":\"virtio-fs\",\"subsystem\":\"baremount\",\"source\":\"agent\",\"name\":\"kata-agent\",\"pid\":\"1\",\"version\":\"0.1.0\"}"
time="2023-04-27T16:58:57.94251345-04:00" level=debug msg="reading guest console" console-protocol=unix console-url=/run/vc/vm/hello3/console.sock name=containerd-shim-v2 pid=916527 sandbox=hello3 source=virtcontainers subsystem=sandbox vmconsole="{\"msg\":\"baremount source=\\\"shm\\\", dest=\\\"/run/kata-containers/sandbox/shm\\\", fs_type=\\\"tmpfs\\\", options=\\\"\\\", flags=(empty)\",\"level\":\"INFO\",\"ts\":\"2023-04-27T20:58:57.933677697Z\",\"name\":\"kata-agent\",\"pid\":\"1\",\"version\":\"0.1.0\",\"subsystem\":\"baremount\",\"storage-type\":\"ephemeral\",\"source\":\"agent\"}"
time="2023-04-27T16:58:57.942724241-04:00" level=debug msg="reading guest console" console-protocol=unix console-url=/run/vc/vm/hello3/console.sock name=containerd-shim-v2 pid=916527 sandbox=hello3 source=virtcontainers subsystem=sandbox vmconsole="{\"msg\":\"hotplug memory error: ENOENT\",\"level\":\"WARN\",\"ts\":\"2023-04-27T20:58:57.934090291Z\",\"subsystem\":\"rpc\",\"pid\":\"1\",\"version\":\"0.1.0\",\"source\":\"agent\",\"name\":\"kata-agent\"}"
time="2023-04-27T16:58:57.953749712-04:00" level=debug msg="reading guest console" console-protocol=unix console-url=/run/vc/vm/hello3/console.sock name=containerd-shim-v2 pid=916527 sandbox=hello3 source=virtcontainers subsystem=sandbox vmconsole="{\"msg\":\"child exited unexpectedly\",\"level\":\"INFO\",\"ts\":\"2023-04-27T20:58:57.944861701Z\",\"pid\":\"1\",\"version\":\"0.1.0\",\"subsystem\":\"signals\",\"child-pid\":\"47\",\"name\":\"kata-agent\",\"source\":\"agent\"}"
time="2023-04-27T16:58:59.527173554-04:00" level=debug msg="reading guest console" console-protocol=unix console-url=/run/vc/vm/hello3/console.sock name=containerd-shim-v2 pid=916527 sandbox=hello3 source=virtcontainers subsystem=sandbox vmconsole="{\"msg\":\"failed to handle signal\",\"level\":\"ERRO\",\"ts\":\"2023-04-27T20:58:59.51935982Z\",\"name\":\"kata-agent\",\"subsystem\":\"signals\",\"version\":\"0.1.0\",\"pid\":\"1\",\"source\":\"agent\",\"error\":\"waitpid reaper failed\\n\\nCaused by:\\n    ECHILD: No child processes\"}"
time="2023-04-27T16:58:59.527305926-04:00" level=info msg="watchSandbox gets an error or stop signal" error="<nil>" name=containerd-shim-v2 pid=916527 sandbox=hello3 source=containerd-kata-shim-v2
time="2023-04-27T16:58:59.5289095-04:00" level=info msg="failed to get OOM event from sandbox" error="rpc error: code = Internal desc = \"\"" name=containerd-shim-v2 pid=916527 sandbox=hello3 source=containerd-kata-shim-v2
time="2023-04-27T16:58:59.529972385-04:00" level=error msg="Failed to read guest console logs" console-protocol=unix console-url=/run/vc/vm/hello3/console.sock error="read unix @->/run/vc/vm/hello3/console.sock: use of closed network connection" name=containerd-shim-v2 pid=916527 sandbox=hello3 source=virtcontainers subsystem=sandbox
time="2023-04-27T16:58:59.548743289-04:00" level=info msg="scanner return error: <nil>" name=containerd-shim-v2 pid=916527 sandbox=hello3 source=virtcontainers/hypervisor subsystem=qmp
time="2023-04-27T16:59:05.742648417-04:00" level=warning msg="Could not add /dev/mshv to the devices cgroup" name=containerd-shim-v2 pid=916708 sandbox=hello3 source=cgroups
time="2023-04-27T16:59:05.753926099-04:00" level=debug msg="restore sandbox failed" error="open /run/vc/sbs/hello3/persist.json: no such file or directory" name=containerd-shim-v2 pid=916708 sandbox=hello3 source=virtcontainers subsystem=sandbox
time="2023-04-27T16:59:05.864430057-04:00" level=info msg="scanner return error: read unix @->/run/vc/vm/hello3/qmp.sock: use of closed network connection" name=containerd-shim-v2 pid=916708 sandbox=hello3 source=virtcontainers/hypervisor subsystem=qmp
time="2023-04-27T16:59:05.864871277-04:00" level=debug msg="Sandbox OCI spec not found. Sandbox DNS will not be set." name=containerd-shim-v2 pid=916708 sandbox=hello3 source=virtcontainers subsystem=kata_agent
time="2023-04-27T16:59:06.241080088-04:00" level=debug msg="reading guest console" console-protocol=unix console-url=/run/vc/vm/hello3/console.sock name=containerd-shim-v2 pid=916708 sandbox=hello3 source=virtcontainers subsystem=sandbox vmconsole="{\"msg\":\"baremount source=\\\"kataShared\\\", dest=\\\"/run/kata-containers/shared/containers/\\\", fs_type=\\\"virtiofs\\\", options=\\\"\\\", flags=(empty)\",\"level\":\"INFO\",\"ts\":\"2023-04-27T20:59:06.233916354Z\",\"name\":\"kata-agent\",\"subsystem\":\"baremount\",\"pid\":\"1\",\"version\":\"0.1.0\",\"storage-type\":\"virtio-fs\",\"source\":\"agent\"}"
time="2023-04-27T16:59:06.24115484-04:00" level=debug msg="reading guest console" console-protocol=unix console-url=/run/vc/vm/hello3/console.sock name=containerd-shim-v2 pid=916708 sandbox=hello3 source=virtcontainers subsystem=sandbox vmconsole="{\"msg\":\"baremount source=\\\"shm\\\", dest=\\\"/run/kata-containers/sandbox/shm\\\", fs_type=\\\"tmpfs\\\", options=\\\"\\\", flags=(empty)\",\"level\":\"INFO\",\"ts\":\"2023-04-27T20:59:06.233971167Z\",\"version\":\"0.1.0\",\"subsystem\":\"baremount\",\"pid\":\"1\",\"source\":\"agent\",\"storage-type\":\"ephemeral\",\"name\":\"kata-agent\"}"
time="2023-04-27T16:59:06.241372254-04:00" level=debug msg="reading guest console" console-protocol=unix console-url=/run/vc/vm/hello3/console.sock name=containerd-shim-v2 pid=916708 sandbox=hello3 source=virtcontainers subsystem=sandbox vmconsole="{\"msg\":\"hotplug memory error: ENOENT\",\"level\":\"WARN\",\"ts\":\"2023-04-27T20:59:06.234341812Z\",\"version\":\"0.1.0\",\"source\":\"agent\",\"name\":\"kata-agent\",\"subsystem\":\"rpc\",\"pid\":\"1\"}"
time="2023-04-27T16:59:06.251142043-04:00" level=debug msg="reading guest console" console-protocol=unix console-url=/run/vc/vm/hello3/console.sock name=containerd-shim-v2 pid=916708 sandbox=hello3 source=virtcontainers subsystem=sandbox vmconsole="{\"msg\":\"child exited unexpectedly\",\"level\":\"INFO\",\"ts\":\"2023-04-27T20:59:06.244056049Z\",\"version\":\"0.1.0\",\"subsystem\":\"signals\",\"source\":\"agent\",\"name\":\"kata-agent\",\"pid\":\"1\",\"child-pid\":\"47\"}"
time="2023-04-27T16:59:07.223312489-04:00" level=debug msg="reading guest console" console-protocol=unix console-url=/run/vc/vm/hello3/console.sock name=containerd-shim-v2 pid=916708 sandbox=hello3 source=virtcontainers subsystem=sandbox vmconsole="{\"msg\":\"failed to handle signal\",\"level\":\"ERRO\",\"ts\":\"2023-04-27T20:59:07.216063207Z\",\"version\":\"0.1.0\",\"name\":\"kata-agent\",\"subsystem\":\"signals\",\"pid\":\"1\",\"source\":\"agent\",\"error\":\"waitpid reaper failed\\n\\nCaused by:\\n    ECHILD: No child processes\"}"
time="2023-04-27T16:59:07.223397741-04:00" level=info msg="watchSandbox gets an error or stop signal" error="<nil>" name=containerd-shim-v2 pid=916708 sandbox=hello3 source=containerd-kata-shim-v2
time="2023-04-27T16:59:07.225090896-04:00" level=info msg="failed to get OOM event from sandbox" error="rpc error: code = Internal desc = \"\"" name=containerd-shim-v2 pid=916708 sandbox=hello3 source=containerd-kata-shim-v2
time="2023-04-27T16:59:07.226218965-04:00" level=error msg="Failed to read guest console logs" console-protocol=unix console-url=/run/vc/vm/hello3/console.sock error="read unix @->/run/vc/vm/hello3/console.sock: use of closed network connection" name=containerd-shim-v2 pid=916708 sandbox=hello3 source=virtcontainers subsystem=sandbox
time="2023-04-27T16:59:07.243406704-04:00" level=info msg="scanner return error: <nil>" name=containerd-shim-v2 pid=916708 sandbox=hello3 source=virtcontainers/hypervisor subsystem=qmp


Container manager details

Container manager details

Docker

Docker

docker version

Client:
 Version:           20.10.23
 API version:       1.41
 Go version:        go1.20rc3
 Git commit:        %{shortcommit_cli}
 Built:             Sun Jan 29 17:25:05 2023
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.23
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.20rc3
  Git commit:       %{shortcommit_moby}
  Built:            Sun Jan 29 17:25:05 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.19
  GitCommit:        
 runc:
  Version:          1.1.4
  GitCommit:        
 docker-init:
  Version:          0.19.0
  GitCommit:        

docker info

Client:
 Context:    default
 Debug Mode: false

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 10
 Server Version: 20.10.23
 Storage Driver: overlay2
  Backing Filesystem: xfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: journald
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: /usr/libexec/docker/docker-init
 containerd version: 
 runc version: 
 init version: 
 Security Options:
  seccomp
   Profile: default
  selinux
  cgroupns
 Kernel Version: 6.2.11-300.fc38.x86_64
 Operating System: Fedora Linux 38 (Thirty Eight)
 OSType: linux
 Architecture: x86_64
 CPUs: 32
 Total Memory: 125.4GiB
 Name: foo-bar
 ID: 6AOH:WXCO:VNBJ:2SOL:XAJG:ENQA:GD4D:O4E4:G5CE:7OL5:RGPJ:FARH
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Username: foo-bar
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: true

systemctl show docker

Type=notify
ExitType=main
Restart=on-failure
NotifyAccess=main
RestartUSec=100ms
TimeoutStartUSec=infinity
TimeoutStopUSec=45s
TimeoutAbortUSec=45s
TimeoutStartFailureMode=terminate
TimeoutStopFailureMode=abort
RuntimeMaxUSec=infinity
RuntimeRandomizedExtraUSec=0
WatchdogUSec=0
WatchdogTimestampMonotonic=0
RootDirectoryStartOnly=no
RemainAfterExit=no
GuessMainPID=yes
MainPID=734040
ControlPID=0
FileDescriptorStoreMax=0
NFileDescriptorStore=0
StatusErrno=0
Result=success
ReloadResult=success
CleanResult=success
UID=[not set]
GID=[not set]
NRestarts=0
OOMPolicy=stop
ReloadSignal=1
ExecMainStartTimestamp=Fri 2023-04-21 14:33:38 EDT
ExecMainStartTimestampMonotonic=183076994209
ExecMainExitTimestampMonotonic=0
ExecMainPID=734040
ExecMainCode=0
ExecMainStatus=0
ExecStart={ path=/usr/bin/dockerd ; argv[]=/usr/bin/dockerd --host=fd:// --exec-opt native.cgroupdriver=systemd $OPTIONS ; ignore_errors=no ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }
ExecStartEx={ path=/usr/bin/dockerd ; argv[]=/usr/bin/dockerd --host=fd:// --exec-opt native.cgroupdriver=systemd $OPTIONS ; flags= ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }
ExecReload={ path=/bin/kill ; argv[]=/bin/kill -s HUP $MAINPID ; ignore_errors=no ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }
ExecReloadEx={ path=/bin/kill ; argv[]=/bin/kill -s HUP $MAINPID ; flags= ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }
Slice=system.slice
ControlGroup=/system.slice/docker.service
ControlGroupId=746945
MemoryCurrent=899719168
MemoryAvailable=infinity
CPUUsageNSec=161470355000
EffectiveCPUs=0-31
EffectiveMemoryNodes=0
TasksCurrent=39
IPIngressBytes=[no data]
IPIngressPackets=[no data]
IPEgressBytes=[no data]
IPEgressPackets=[no data]
IOReadBytes=18446744073709551615
IOReadOperations=18446744073709551615
IOWriteBytes=18446744073709551615
IOWriteOperations=18446744073709551615
Delegate=no
CPUAccounting=yes
CPUWeight=[not set]
StartupCPUWeight=[not set]
CPUShares=[not set]
StartupCPUShares=[not set]
CPUQuotaPerSecUSec=infinity
CPUQuotaPeriodUSec=infinity
IOAccounting=no
IOWeight=[not set]
StartupIOWeight=[not set]
BlockIOAccounting=no
BlockIOWeight=[not set]
StartupBlockIOWeight=[not set]
MemoryAccounting=yes
DefaultMemoryLow=0
DefaultMemoryMin=0
MemoryMin=0
MemoryLow=0
MemoryHigh=infinity
MemoryMax=infinity
MemorySwapMax=infinity
MemoryZSwapMax=infinity
MemoryLimit=infinity
DevicePolicy=auto
TasksAccounting=yes
TasksMax=154043
IPAccounting=no
ManagedOOMSwap=auto
ManagedOOMMemoryPressure=auto
ManagedOOMMemoryPressureLimit=0
ManagedOOMPreference=none
EnvironmentFiles=/etc/sysconfig/docker (ignore_errors=yes)
UMask=0022
LimitCPU=infinity
LimitCPUSoft=infinity
LimitFSIZE=infinity
LimitFSIZESoft=infinity
LimitDATA=infinity
LimitDATASoft=infinity
LimitSTACK=infinity
LimitSTACKSoft=8388608
LimitCORE=infinity
LimitCORESoft=infinity
LimitRSS=infinity
LimitRSSSoft=infinity
LimitNOFILE=infinity
LimitNOFILESoft=infinity
LimitAS=infinity
LimitASSoft=infinity
LimitNPROC=infinity
LimitNPROCSoft=infinity
LimitMEMLOCK=8388608
LimitMEMLOCKSoft=8388608
LimitLOCKS=infinity
LimitLOCKSSoft=infinity
LimitSIGPENDING=513479
LimitSIGPENDINGSoft=513479
LimitMSGQUEUE=819200
LimitMSGQUEUESoft=819200
LimitNICE=0
LimitNICESoft=0
LimitRTPRIO=0
LimitRTPRIOSoft=0
LimitRTTIME=infinity
LimitRTTIMESoft=infinity
OOMScoreAdjust=0
CoredumpFilter=0x33
Nice=0
IOSchedulingClass=2
IOSchedulingPriority=4
CPUSchedulingPolicy=0
CPUSchedulingPriority=0
CPUAffinityFromNUMA=no
NUMAPolicy=n/a
TimerSlackNSec=50000
CPUSchedulingResetOnFork=no
NonBlocking=no
StandardInput=null
StandardOutput=journal
StandardError=inherit
TTYReset=no
TTYVHangup=no
TTYVTDisallocate=no
SyslogPriority=30
SyslogLevelPrefix=yes
SyslogLevel=6
SyslogFacility=3
LogLevelMax=-1
LogRateLimitIntervalUSec=0
LogRateLimitBurst=0
SecureBits=0
CapabilityBoundingSet=cap_chown cap_dac_override cap_dac_read_search cap_fowner cap_fsetid cap_kill cap_setgid cap_setuid cap_setpcap cap_linux_immutable cap_net_bind_service cap_net_broadcast cap_net_admin cap_net_raw cap_ipc_lock cap_ipc_owner cap_sys_module cap_sys_rawio cap_sys_chroot cap_sys_ptrace cap_sys_pacct cap_sys_admin cap_sys_boot cap_sys_nice cap_sys_resource cap_sys_time cap_sys_tty_config cap_mknod cap_lease cap_audit_write cap_audit_control cap_setfcap cap_mac_override cap_mac_admin cap_syslog cap_wake_alarm cap_block_suspend cap_audit_read cap_perfmon cap_bpf cap_checkpoint_restore
DynamicUser=no
RemoveIPC=no
PrivateTmp=no
PrivateDevices=no
ProtectClock=no
ProtectKernelTunables=no
ProtectKernelModules=no
ProtectKernelLogs=no
ProtectControlGroups=no
PrivateNetwork=no
PrivateUsers=no
PrivateMounts=no
PrivateIPC=no
ProtectHome=no
ProtectSystem=no
SameProcessGroup=no
UtmpMode=init
IgnoreSIGPIPE=yes
NoNewPrivileges=no
SystemCallErrorNumber=2147483646
LockPersonality=no
RuntimeDirectoryPreserve=no
RuntimeDirectoryMode=0755
StateDirectoryMode=0755
CacheDirectoryMode=0755
LogsDirectoryMode=0755
ConfigurationDirectoryMode=0755
TimeoutCleanUSec=infinity
MemoryDenyWriteExecute=no
RestrictRealtime=no
RestrictSUIDSGID=no
RestrictNamespaces=no
MountAPIVFS=no
KeyringMode=private
ProtectProc=default
ProcSubset=all
ProtectHostname=no
KillMode=process
KillSignal=15
RestartKillSignal=15
FinalKillSignal=9
SendSIGKILL=yes
SendSIGHUP=no
WatchdogSignal=6
Id=docker.service
Names=docker.service
Requires=docker.socket system.slice sysinit.target
Wants=network-online.target
Conflicts=shutdown.target
Before=shutdown.target
After=systemd-journald.socket basic.target firewalld.service network-online.target docker.socket sysinit.target system.slice
TriggeredBy=docker.socket
Documentation=https://docs.docker.com
Description=Docker Application Container Engine
AccessSELinuxContext=system_u:object_r:container_unit_file_t:s0
LoadState=loaded
ActiveState=active
FreezerState=running
SubState=running
FragmentPath=/usr/lib/systemd/system/docker.service
DropInPaths=/usr/lib/systemd/system/service.d/10-timeout-abort.conf
UnitFileState=disabled
UnitFilePreset=disabled
StateChangeTimestamp=Fri 2023-04-21 14:33:39 EDT
StateChangeTimestampMonotonic=183077983453
InactiveExitTimestamp=Fri 2023-04-21 14:33:38 EDT
InactiveExitTimestampMonotonic=183076994395
ActiveEnterTimestamp=Fri 2023-04-21 14:33:39 EDT
ActiveEnterTimestampMonotonic=183077983453
ActiveExitTimestampMonotonic=0
InactiveEnterTimestampMonotonic=0
CanStart=yes
CanStop=yes
CanReload=yes
CanIsolate=no
CanFreeze=yes
StopWhenUnneeded=no
RefuseManualStart=no
RefuseManualStop=no
AllowIsolate=no
DefaultDependencies=yes
OnSuccessJobMode=fail
OnFailureJobMode=replace
IgnoreOnIsolate=no
NeedDaemonReload=no
JobTimeoutUSec=infinity
JobRunningTimeoutUSec=infinity
JobTimeoutAction=none
ConditionResult=yes
AssertResult=yes
ConditionTimestamp=Fri 2023-04-21 14:33:38 EDT
ConditionTimestampMonotonic=183076992300
AssertTimestamp=Fri 2023-04-21 14:33:38 EDT
AssertTimestampMonotonic=183076992303
Transient=no
Perpetual=no
StartLimitIntervalUSec=1min
StartLimitBurst=3
StartLimitAction=none
FailureAction=none
SuccessAction=none
InvocationID=80831f362dc04a19ad3cb5448d61edca
CollectMode=inactive

Kubernetes

Kubernetes

kubectl version

WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.3", GitCommit:"9e644106593f3f4aa98f8a84b23db5fa378900bd", GitTreeState:"archive", BuildDate:"2023-04-04T00:00:00Z", GoVersion:"go1.20.2", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
The connection to the server localhost:8080 was refused - did you specify the right host or port?

kubectl config view

apiVersion: v1
clusters: null
contexts: null
current-context: ""
kind: Config
preferences: {}
users: null

systemctl show kubelet

Type=simple
ExitType=main
Restart=always
NotifyAccess=none
RestartUSec=10s
TimeoutStartUSec=45s
TimeoutStopUSec=45s
TimeoutAbortUSec=45s
TimeoutStartFailureMode=terminate
TimeoutStopFailureMode=abort
RuntimeMaxUSec=infinity
RuntimeRandomizedExtraUSec=0
WatchdogUSec=0
WatchdogTimestampMonotonic=0
RootDirectoryStartOnly=no
RemainAfterExit=no
GuessMainPID=yes
MainPID=0
ControlPID=0
FileDescriptorStoreMax=0
NFileDescriptorStore=0
StatusErrno=0
Result=success
ReloadResult=success
CleanResult=success
UID=[not set]
GID=[not set]
NRestarts=0
OOMPolicy=stop
ReloadSignal=1
ExecMainStartTimestamp=Fri 2023-04-21 12:05:27 EDT
ExecMainStartTimestampMonotonic=174185831861
ExecMainExitTimestamp=Fri 2023-04-21 12:58:08 EDT
ExecMainExitTimestampMonotonic=177346790002
ExecMainPID=622111
ExecMainCode=1
ExecMainStatus=0
ExecStart={ path=/usr/bin/kubelet ; argv[]=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_SYSTEM_PODS_ARGS $KUBELET_DNS_ARGS $KUBELET_AUTHZ_ARGS $KUBELET_EXTRA_ARGS $KUBELET_KUBEADM_ARGS ; ignore_errors=no ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }
ExecStartEx={ path=/usr/bin/kubelet ; argv[]=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_SYSTEM_PODS_ARGS $KUBELET_DNS_ARGS $KUBELET_AUTHZ_ARGS $KUBELET_EXTRA_ARGS $KUBELET_KUBEADM_ARGS ; flags= ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }
Slice=system.slice
ControlGroupId=0
MemoryCurrent=[not set]
MemoryAvailable=infinity
CPUUsageNSec=46470330000
TasksCurrent=[not set]
IPIngressBytes=[no data]
IPIngressPackets=[no data]
IPEgressBytes=[no data]
IPEgressPackets=[no data]
IOReadBytes=18446744073709551615
IOReadOperations=18446744073709551615
IOWriteBytes=18446744073709551615
IOWriteOperations=18446744073709551615
Delegate=no
CPUAccounting=yes
CPUWeight=[not set]
StartupCPUWeight=[not set]
CPUShares=[not set]
StartupCPUShares=[not set]
CPUQuotaPerSecUSec=infinity
CPUQuotaPeriodUSec=infinity
IOAccounting=no
IOWeight=[not set]
StartupIOWeight=[not set]
BlockIOAccounting=no
BlockIOWeight=[not set]
StartupBlockIOWeight=[not set]
MemoryAccounting=yes
DefaultMemoryLow=0
DefaultMemoryMin=0
MemoryMin=0
MemoryLow=0
MemoryHigh=infinity
MemoryMax=infinity
MemorySwapMax=infinity
MemoryZSwapMax=infinity
MemoryLimit=infinity
DevicePolicy=auto
TasksAccounting=yes
TasksMax=154043
IPAccounting=no
ManagedOOMSwap=auto
ManagedOOMMemoryPressure=auto
ManagedOOMMemoryPressureLimit=0
ManagedOOMPreference=none
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --fail-swap-on=false" KUBELET_SYSTEM_PODS_ARGS=--pod-manifest-path=/etc/kubernetes/manifests "KUBELET_DNS_ARGS=--cluster-dns=foo-bar --cluster-domain=cluster.local" "KUBELET_AUTHZ_ARGS=--authorization-mode=Webhook --client-ca-file=/etc/kubernetes/pki/ca.crt" KUBELET_EXTRA_ARGS=--cgroup-driver=systemd
EnvironmentFiles=/etc/kubernetes/config (ignore_errors=yes)
EnvironmentFiles=/etc/kubernetes/kubelet (ignore_errors=yes)
EnvironmentFiles=/var/lib/kubelet/kubeadm-flags.env (ignore_errors=yes)
UMask=0022
LimitCPU=infinity
LimitCPUSoft=infinity
LimitFSIZE=infinity
LimitFSIZESoft=infinity
LimitDATA=infinity
LimitDATASoft=infinity
LimitSTACK=infinity
LimitSTACKSoft=8388608
LimitCORE=infinity
LimitCORESoft=infinity
LimitRSS=infinity
LimitRSSSoft=infinity
LimitNOFILE=524288
LimitNOFILESoft=1024
LimitAS=infinity
LimitASSoft=infinity
LimitNPROC=513479
LimitNPROCSoft=513479
LimitMEMLOCK=8388608
LimitMEMLOCKSoft=8388608
LimitLOCKS=infinity
LimitLOCKSSoft=infinity
LimitSIGPENDING=513479
LimitSIGPENDINGSoft=513479
LimitMSGQUEUE=819200
LimitMSGQUEUESoft=819200
LimitNICE=0
LimitNICESoft=0
LimitRTPRIO=0
LimitRTPRIOSoft=0
LimitRTTIME=infinity
LimitRTTIMESoft=infinity
WorkingDirectory=/var/lib/kubelet
OOMScoreAdjust=0
CoredumpFilter=0x33
Nice=0
IOSchedulingClass=2
IOSchedulingPriority=4
CPUSchedulingPolicy=0
CPUSchedulingPriority=0
CPUAffinityFromNUMA=no
NUMAPolicy=n/a
TimerSlackNSec=50000
CPUSchedulingResetOnFork=no
NonBlocking=no
StandardInput=null
StandardOutput=journal
StandardError=inherit
TTYReset=no
TTYVHangup=no
TTYVTDisallocate=no
SyslogPriority=30
SyslogLevelPrefix=yes
SyslogLevel=6
SyslogFacility=3
LogLevelMax=-1
LogRateLimitIntervalUSec=0
LogRateLimitBurst=0
SecureBits=0
CapabilityBoundingSet=cap_chown cap_dac_override cap_dac_read_search cap_fowner cap_fsetid cap_kill cap_setgid cap_setuid cap_setpcap cap_linux_immutable cap_net_bind_service cap_net_broadcast cap_net_admin cap_net_raw cap_ipc_lock cap_ipc_owner cap_sys_module cap_sys_rawio cap_sys_chroot cap_sys_ptrace cap_sys_pacct cap_sys_admin cap_sys_boot cap_sys_nice cap_sys_resource cap_sys_time cap_sys_tty_config cap_mknod cap_lease cap_audit_write cap_audit_control cap_setfcap cap_mac_override cap_mac_admin cap_syslog cap_wake_alarm cap_block_suspend cap_audit_read cap_perfmon cap_bpf cap_checkpoint_restore
DynamicUser=no
RemoveIPC=no
PrivateTmp=no
PrivateDevices=no
ProtectClock=no
ProtectKernelTunables=no
ProtectKernelModules=no
ProtectKernelLogs=no
ProtectControlGroups=no
PrivateNetwork=no
PrivateUsers=no
PrivateMounts=no
PrivateIPC=no
ProtectHome=no
ProtectSystem=no
SameProcessGroup=no
UtmpMode=init
IgnoreSIGPIPE=yes
NoNewPrivileges=no
SystemCallErrorNumber=2147483646
LockPersonality=no
RuntimeDirectoryPreserve=no
RuntimeDirectoryMode=0755
StateDirectoryMode=0755
CacheDirectoryMode=0755
LogsDirectoryMode=0755
ConfigurationDirectoryMode=0755
TimeoutCleanUSec=infinity
MemoryDenyWriteExecute=no
RestrictRealtime=no
RestrictSUIDSGID=no
RestrictNamespaces=no
MountAPIVFS=no
KeyringMode=private
ProtectProc=default
ProcSubset=all
ProtectHostname=no
KillMode=process
KillSignal=15
RestartKillSignal=15
FinalKillSignal=9
SendSIGKILL=yes
SendSIGHUP=no
WatchdogSignal=6
Id=kubelet.service
Names=kubelet.service
Requires=sysinit.target system.slice -.mount
WantedBy=multi-user.target
Conflicts=shutdown.target
Before=shutdown.target multi-user.target
After=system.slice containerd.service -.mount systemd-journald.socket crio.service basic.target sysinit.target
RequiresMountsFor=/var/lib/kubelet
Documentation=https://kubernetes.io/docs/concepts/overview/components/#kubelet https://kubernetes.io/docs/reference/generated/kubelet/
Description=Kubernetes Kubelet Server
AccessSELinuxContext=system_u:object_r:systemd_unit_file_t:s0
LoadState=loaded
ActiveState=inactive
FreezerState=running
SubState=dead
FragmentPath=/usr/lib/systemd/system/kubelet.service
DropInPaths=/usr/lib/systemd/system/service.d/10-timeout-abort.conf /etc/systemd/system/kubelet.service.d/kubeadm.conf
UnitFileState=enabled
UnitFilePreset=disabled
StateChangeTimestamp=Fri 2023-04-21 12:58:08 EDT
StateChangeTimestampMonotonic=177346790095
InactiveExitTimestamp=Fri 2023-04-21 12:05:27 EDT
InactiveExitTimestampMonotonic=174185832111
ActiveEnterTimestamp=Fri 2023-04-21 12:05:27 EDT
ActiveEnterTimestampMonotonic=174185832111
ActiveExitTimestamp=Fri 2023-04-21 12:58:08 EDT
ActiveExitTimestampMonotonic=177346770091
InactiveEnterTimestamp=Fri 2023-04-21 12:58:08 EDT
InactiveEnterTimestampMonotonic=177346790095
CanStart=yes
CanStop=yes
CanReload=no
CanIsolate=no
CanFreeze=yes
StopWhenUnneeded=no
RefuseManualStart=no
RefuseManualStop=no
AllowIsolate=no
DefaultDependencies=yes
OnSuccessJobMode=fail
OnFailureJobMode=replace
IgnoreOnIsolate=no
NeedDaemonReload=no
JobTimeoutUSec=infinity
JobRunningTimeoutUSec=infinity
JobTimeoutAction=none
ConditionResult=yes
AssertResult=yes
ConditionTimestamp=Fri 2023-04-21 12:05:27 EDT
ConditionTimestampMonotonic=174185809748
AssertTimestamp=Fri 2023-04-21 12:05:27 EDT
AssertTimestampMonotonic=174185809750
Transient=no
Perpetual=no
StartLimitIntervalUSec=0
StartLimitBurst=5
StartLimitAction=none
FailureAction=none
SuccessAction=none
InvocationID=0f2fd016aa1a457a8a2aded0b254dec0
CollectMode=inactive

crio

crio

crio --version

crio version 1.26.1
Version:        1.26.1
GitCommit:      unknown
GitCommitDate:  unknown
GitTreeState:   clean
GoVersion:      go1.20rc3
Compiler:       gc
Platform:       linux/amd64
Linkmode:       dynamic
BuildTags:      
  rpm_crashtraceback
  seccomp
  selinux
LDFlags:           -X github.com/cri-o/cri-o/version=1.26.1 -B 0xcca2fed69ece72f605cda1245ffc1e1ebf2c2292 -compressdwarf=false -linkmode=external -extldflags '-Wl,-z,relro -Wl,--as-needed  -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -Wl,--build-id=sha1 -specs=/usr/lib/rpm/redhat/redhat-package-notes '
SeccompEnabled:   true
AppArmorEnabled:  false
Dependencies:     
  

systemctl show crio

Type=notify
ExitType=main
Restart=on-failure
NotifyAccess=main
RestartUSec=10s
TimeoutStartUSec=infinity
TimeoutStopUSec=45s
TimeoutAbortUSec=45s
TimeoutStartFailureMode=terminate
TimeoutStopFailureMode=abort
RuntimeMaxUSec=infinity
RuntimeRandomizedExtraUSec=0
WatchdogUSec=0
WatchdogTimestampMonotonic=0
RootDirectoryStartOnly=no
RemainAfterExit=no
GuessMainPID=yes
MainPID=853502
ControlPID=0
FileDescriptorStoreMax=0
NFileDescriptorStore=0
StatusErrno=0
Result=success
ReloadResult=success
CleanResult=success
UID=[not set]
GID=[not set]
NRestarts=0
OOMPolicy=stop
ReloadSignal=1
ExecMainStartTimestamp=Tue 2023-04-25 13:58:01 EDT
ExecMainStartTimestampMonotonic=526540013851
ExecMainExitTimestampMonotonic=0
ExecMainPID=853502
ExecMainCode=0
ExecMainStatus=0
ExecStart={ path=/usr/bin/crio ; argv[]=/usr/bin/crio $CRIO_CONFIG_OPTIONS $CRIO_RUNTIME_OPTIONS $CRIO_STORAGE_OPTIONS $CRIO_NETWORK_OPTIONS $CRIO_METRICS_OPTIONS ; ignore_errors=no ; start_time=[Tue 2023-04-25 13:58:01 EDT] ; stop_time=[n/a] ; pid=853502 ; code=(null) ; status=0/0 }
ExecStartEx={ path=/usr/bin/crio ; argv[]=/usr/bin/crio $CRIO_CONFIG_OPTIONS $CRIO_RUNTIME_OPTIONS $CRIO_STORAGE_OPTIONS $CRIO_NETWORK_OPTIONS $CRIO_METRICS_OPTIONS ; flags= ; start_time=[Tue 2023-04-25 13:58:01 EDT] ; stop_time=[n/a] ; pid=853502 ; code=(null) ; status=0/0 }
ExecReload={ path=/bin/kill ; argv[]=/bin/kill -s HUP $MAINPID ; ignore_errors=no ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }
ExecReloadEx={ path=/bin/kill ; argv[]=/bin/kill -s HUP $MAINPID ; flags= ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }
Slice=system.slice
ControlGroup=/system.slice/crio.service
ControlGroupId=758600
MemoryCurrent=18546688
MemoryAvailable=infinity
CPUUsageNSec=17105607000
EffectiveCPUs=0-31
EffectiveMemoryNodes=0
TasksCurrent=27
IPIngressBytes=[no data]
IPIngressPackets=[no data]
IPEgressBytes=[no data]
IPEgressPackets=[no data]
IOReadBytes=18446744073709551615
IOReadOperations=18446744073709551615
IOWriteBytes=18446744073709551615
IOWriteOperations=18446744073709551615
Delegate=no
CPUAccounting=yes
CPUWeight=[not set]
StartupCPUWeight=[not set]
CPUShares=[not set]
StartupCPUShares=[not set]
CPUQuotaPerSecUSec=infinity
CPUQuotaPeriodUSec=infinity
IOAccounting=no
IOWeight=[not set]
StartupIOWeight=[not set]
BlockIOAccounting=no
BlockIOWeight=[not set]
StartupBlockIOWeight=[not set]
MemoryAccounting=yes
DefaultMemoryLow=0
DefaultMemoryMin=0
MemoryMin=0
MemoryLow=0
MemoryHigh=infinity
MemoryMax=infinity
MemorySwapMax=infinity
MemoryZSwapMax=infinity
MemoryLimit=infinity
DevicePolicy=auto
TasksAccounting=yes
TasksMax=infinity
IPAccounting=no
ManagedOOMSwap=auto
ManagedOOMMemoryPressure=auto
ManagedOOMMemoryPressureLimit=0
ManagedOOMPreference=none
Environment=GOTRACEBACK=crash
EnvironmentFiles=/etc/sysconfig/crio (ignore_errors=yes)
UMask=0022
LimitCPU=infinity
LimitCPUSoft=infinity
LimitFSIZE=infinity
LimitFSIZESoft=infinity
LimitDATA=infinity
LimitDATASoft=infinity
LimitSTACK=infinity
LimitSTACKSoft=8388608
LimitCORE=infinity
LimitCORESoft=infinity
LimitRSS=infinity
LimitRSSSoft=infinity
LimitNOFILE=1048576
LimitNOFILESoft=1048576
LimitAS=infinity
LimitASSoft=infinity
LimitNPROC=1048576
LimitNPROCSoft=1048576
LimitMEMLOCK=8388608
LimitMEMLOCKSoft=8388608
LimitLOCKS=infinity
LimitLOCKSSoft=infinity
LimitSIGPENDING=513479
LimitSIGPENDINGSoft=513479
LimitMSGQUEUE=819200
LimitMSGQUEUESoft=819200
LimitNICE=0
LimitNICESoft=0
LimitRTPRIO=0
LimitRTPRIOSoft=0
LimitRTTIME=infinity
LimitRTTIMESoft=infinity
OOMScoreAdjust=-999
CoredumpFilter=0x33
Nice=0
IOSchedulingClass=2
IOSchedulingPriority=4
CPUSchedulingPolicy=0
CPUSchedulingPriority=0
CPUAffinityFromNUMA=no
NUMAPolicy=n/a
TimerSlackNSec=50000
CPUSchedulingResetOnFork=no
NonBlocking=no
StandardInput=null
StandardOutput=journal
StandardError=inherit
TTYReset=no
TTYVHangup=no
TTYVTDisallocate=no
SyslogPriority=30
SyslogLevelPrefix=yes
SyslogLevel=6
SyslogFacility=3
LogLevelMax=-1
LogRateLimitIntervalUSec=0
LogRateLimitBurst=0
SecureBits=0
CapabilityBoundingSet=cap_chown cap_dac_override cap_dac_read_search cap_fowner cap_fsetid cap_kill cap_setgid cap_setuid cap_setpcap cap_linux_immutable cap_net_bind_service cap_net_broadcast cap_net_admin cap_net_raw cap_ipc_lock cap_ipc_owner cap_sys_module cap_sys_rawio cap_sys_chroot cap_sys_ptrace cap_sys_pacct cap_sys_admin cap_sys_boot cap_sys_nice cap_sys_resource cap_sys_time cap_sys_tty_config cap_mknod cap_lease cap_audit_write cap_audit_control cap_setfcap cap_mac_override cap_mac_admin cap_syslog cap_wake_alarm cap_block_suspend cap_audit_read cap_perfmon cap_bpf cap_checkpoint_restore
DynamicUser=no
RemoveIPC=no
PrivateTmp=no
PrivateDevices=no
ProtectClock=no
ProtectKernelTunables=no
ProtectKernelModules=no
ProtectKernelLogs=no
ProtectControlGroups=no
PrivateNetwork=no
PrivateUsers=no
PrivateMounts=no
PrivateIPC=no
ProtectHome=no
ProtectSystem=no
SameProcessGroup=no
UtmpMode=init
IgnoreSIGPIPE=yes
NoNewPrivileges=no
SystemCallErrorNumber=2147483646
LockPersonality=no
RuntimeDirectoryPreserve=no
RuntimeDirectoryMode=0755
StateDirectoryMode=0755
CacheDirectoryMode=0755
LogsDirectoryMode=0755
ConfigurationDirectoryMode=0755
TimeoutCleanUSec=infinity
MemoryDenyWriteExecute=no
RestrictRealtime=no
RestrictSUIDSGID=no
RestrictNamespaces=no
MountAPIVFS=no
KeyringMode=private
ProtectProc=default
ProcSubset=all
ProtectHostname=no
KillMode=control-group
KillSignal=15
RestartKillSignal=15
FinalKillSignal=9
SendSIGKILL=yes
SendSIGHUP=no
WatchdogSignal=6
Id=crio.service
Names=crio.service
Requires=sysinit.target system.slice
Wants=network-online.target
Conflicts=shutdown.target
Before=shutdown.target kubelet.service
After=sysinit.target system.slice network-online.target systemd-journald.socket basic.target
Documentation=https://github.com/cri-o/cri-o
Description=Container Runtime Interface for OCI (CRI-O)
AccessSELinuxContext=system_u:object_r:systemd_unit_file_t:s0
LoadState=loaded
ActiveState=active
FreezerState=running
SubState=running
FragmentPath=/usr/lib/systemd/system/crio.service
DropInPaths=/usr/lib/systemd/system/service.d/10-timeout-abort.conf
UnitFileState=disabled
UnitFilePreset=disabled
StateChangeTimestamp=Tue 2023-04-25 13:58:01 EDT
StateChangeTimestampMonotonic=526540091924
InactiveExitTimestamp=Tue 2023-04-25 13:58:01 EDT
InactiveExitTimestampMonotonic=526540014066
ActiveEnterTimestamp=Tue 2023-04-25 13:58:01 EDT
ActiveEnterTimestampMonotonic=526540091924
ActiveExitTimestamp=Tue 2023-04-25 13:58:00 EDT
ActiveExitTimestampMonotonic=526538706429
InactiveEnterTimestamp=Tue 2023-04-25 13:58:00 EDT
InactiveEnterTimestampMonotonic=526538714746
CanStart=yes
CanStop=yes
CanReload=yes
CanIsolate=no
CanFreeze=yes
StopWhenUnneeded=no
RefuseManualStart=no
RefuseManualStop=no
AllowIsolate=no
DefaultDependencies=yes
OnSuccessJobMode=fail
OnFailureJobMode=replace
IgnoreOnIsolate=no
NeedDaemonReload=no
JobTimeoutUSec=infinity
JobRunningTimeoutUSec=infinity
JobTimeoutAction=none
ConditionResult=yes
AssertResult=yes
ConditionTimestamp=Tue 2023-04-25 13:58:01 EDT
ConditionTimestampMonotonic=526539991205
AssertTimestamp=Tue 2023-04-25 13:58:01 EDT
AssertTimestampMonotonic=526539991206
Transient=no
Perpetual=no
StartLimitIntervalUSec=10s
StartLimitBurst=5
StartLimitAction=none
FailureAction=none
SuccessAction=none
InvocationID=a2f81c6b74944af29e5b123d60436be2
CollectMode=inactive

crio config

time="2023-04-28 12:04:42.019751617-04:00" level=info msg="Starting CRI-O, version: 1.26.1, git: unknown(clean)"
level=info msg="Using default capabilities: CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_FSETID, CAP_FOWNER, CAP_SETGID, CAP_SETUID, CAP_SETPCAP, CAP_NET_BIND_SERVICE, CAP_KILL"
# The CRI-O configuration file specifies all of the available configuration
# options and command-line flags for the crio(8) OCI Kubernetes Container Runtime
# daemon, but in a TOML format that can be more easily modified and versioned.
#
# Please refer to crio.conf(5) for details of all configuration options.

# CRI-O supports partial configuration reload during runtime, which can be
# done by sending SIGHUP to the running process. Currently supported options
# are explicitly mentioned with: 'This option supports live configuration
# reload'.

# CRI-O reads its storage defaults from the containers-storage.conf(5) file
# located at /etc/containers/storage.conf. Modify this storage configuration if
# you want to change the system's defaults. If you want to modify storage just
# for CRI-O, you can change the storage configuration options here.
[crio]

# Path to the "root directory". CRI-O stores all of its data, including
# containers images, in this directory.
# root = "/var/lib/containers/storage"

# Path to the "run directory". CRI-O stores all of its state in this directory.
# runroot = "/run/containers/storage"

# Storage driver used to manage the storage of images and containers. Please
# refer to containers-storage.conf(5) to see all available storage drivers.
# storage_driver = "overlay"

# List to pass options to the storage driver. Please refer to
# containers-storage.conf(5) to see all available storage options.
# storage_option = [
# 	"overlay.mountopt=nodev,metacopy=on",
# ]

# The default log directory where all logs will go unless directly specified by
# the kubelet. The log directory specified must be an absolute directory.
# log_dir = "/var/log/crio/pods"

# Location for CRI-O to lay down the temporary version file.
# It is used to check if crio wipe should wipe containers, which should
# always happen on a node reboot
# version_file = "/var/run/crio/version"

# Location for CRI-O to lay down the persistent version file.
# It is used to check if crio wipe should wipe images, which should
# only happen when CRI-O has been upgraded
# version_file_persist = ""

# InternalWipe is whether CRI-O should wipe containers and images after a reboot when the server starts.
# If set to false, one must use the external command 'crio wipe' to wipe the containers and images in these situations.
# internal_wipe = true

# Location for CRI-O to lay down the clean shutdown file.
# It is used to check whether crio had time to sync before shutting down.
# If not found, crio wipe will clear the storage directory.
# clean_shutdown_file = "/var/lib/crio/clean.shutdown"

# The crio.api table contains settings for the kubelet/gRPC interface.
[crio.api]

# Path to AF_LOCAL socket on which CRI-O will listen.
# listen = "/var/run/crio/crio.sock"

# IP address on which the stream server will listen.
# stream_address = "127.0.0.1"

# The port on which the stream server will listen. If the port is set to "0", then
# CRI-O will allocate a random free port number.
# stream_port = "0"

# Enable encrypted TLS transport of the stream server.
# stream_enable_tls = false

# Length of time until open streams terminate due to lack of activity
# stream_idle_timeout = ""

# Path to the x509 certificate file used to serve the encrypted stream. This
# file can change, and CRI-O will automatically pick up the changes within 5
# minutes.
# stream_tls_cert = ""

# Path to the key file used to serve the encrypted stream. This file can
# change and CRI-O will automatically pick up the changes within 5 minutes.
# stream_tls_key = ""

# Path to the x509 CA(s) file used to verify and authenticate client
# communication with the encrypted stream. This file can change and CRI-O will
# automatically pick up the changes within 5 minutes.
# stream_tls_ca = ""

# Maximum grpc send message size in bytes. If not set or <=0, then CRI-O will default to 16 * 1024 * 1024.
# grpc_max_send_msg_size = 83886080

# Maximum grpc receive message size. If not set or <= 0, then CRI-O will default to 16 * 1024 * 1024.
# grpc_max_recv_msg_size = 83886080

# The crio.runtime table contains settings pertaining to the OCI runtime used
# and options for how to set up and manage the OCI runtime.
[crio.runtime]

# A list of ulimits to be set in containers by default, specified as
# "<ulimit name>=<soft limit>:<hard limit>", for example:
# "nofile=1024:2048"
# If nothing is set here, settings will be inherited from the CRI-O daemon
# default_ulimits = [
# ]

# If true, the runtime will not use pivot_root, but instead use MS_MOVE.
# no_pivot = false

# decryption_keys_path is the path where the keys required for
# image decryption are stored. This option supports live configuration reload.
# decryption_keys_path = "/etc/crio/keys/"

# Path to the conmon binary, used for monitoring the OCI runtime.
# Will be searched for using $PATH if empty.
# This option is currently deprecated, and will be replaced with RuntimeHandler.MonitorEnv.
# conmon = ""

# Cgroup setting for conmon
# This option is currently deprecated, and will be replaced with RuntimeHandler.MonitorCgroup.
# conmon_cgroup = ""

# Environment variable list for the conmon process, used for passing necessary
# environment variables to conmon or the runtime.
# This option is currently deprecated, and will be replaced with RuntimeHandler.MonitorEnv.
# conmon_env = [
# ]

# Additional environment variables to set for all the
# containers. These are overridden if set in the
# container image spec or in the container runtime configuration.
# default_env = [
# ]

# If true, SELinux will be used for pod separation on the host.
# selinux = true

# Path to the seccomp.json profile which is used as the default seccomp profile
# for the runtime. If not specified, then the internal default seccomp profile
# will be used. This option supports live configuration reload.
# seccomp_profile = ""

# Changes the meaning of an empty seccomp profile. By default
# (and according to CRI spec), an empty profile means unconfined.
# This option tells CRI-O to treat an empty profile as the default profile,
# which might increase security.
# seccomp_use_default_when_empty = true

# Used to change the name of the default AppArmor profile of CRI-O. The default
# profile name is "crio-default". This profile only takes effect if the user
# does not specify a profile via the Kubernetes Pod's metadata annotation. If
# the profile is set to "unconfined", then this equals to disabling AppArmor.
# This option supports live configuration reload.
# apparmor_profile = "crio-default"

# Path to the blockio class configuration file for configuring
# the cgroup blockio controller.
# blockio_config_file = ""

# Used to change irqbalance service config file path which is used for configuring
# irqbalance daemon.
# irqbalance_config_file = "/etc/sysconfig/irqbalance"

# Path to the RDT configuration file for configuring the resctrl pseudo-filesystem.
# This option supports live configuration reload.
# rdt_config_file = ""

# Cgroup management implementation used for the runtime.
# cgroup_manager = "systemd"

# Specify whether the image pull must be performed in a separate cgroup.
# separate_pull_cgroup = ""

# List of default capabilities for containers. If it is empty or commented out,
# only the capabilities defined in the containers json file by the user/kube
# will be added.
# default_capabilities = [
# 	"CHOWN",
# 	"DAC_OVERRIDE",
# 	"FSETID",
# 	"FOWNER",
# 	"SETGID",
# 	"SETUID",
# 	"SETPCAP",
# 	"NET_BIND_SERVICE",
# 	"KILL",
# ]

# Add capabilities to the inheritable set, as well as the default group of permitted, bounding and effective.
# If capabilities are expected to work for non-root users, this option should be set.
# add_inheritable_capabilities = false

# List of default sysctls. If it is empty or commented out, only the sysctls
# defined in the container json file by the user/kube will be added.
# default_sysctls = [
# ]

# List of devices on the host that a
# user can specify with the "io.kubernetes.cri-o.Devices" allowed annotation.
# allowed_devices = [
# 	"/dev/fuse",
# ]

# List of additional devices. specified as
# "<device-on-host>:<device-on-container>:<permissions>", for example: "--device=/dev/sdc:/dev/xvdc:rwm".
# If it is empty or commented out, only the devices
# defined in the container json file by the user/kube will be added.
# additional_devices = [
# ]

# List of directories to scan for CDI Spec files.
# cdi_spec_dirs = [
# 	"/etc/cdi",
# 	"/var/run/cdi",
# ]

# Change the default behavior of setting container devices uid/gid from CRI's
# SecurityContext (RunAsUser/RunAsGroup) instead of taking host's uid/gid.
# Defaults to false.
# device_ownership_from_security_context = false

# Path to OCI hooks directories for automatically executed hooks. If one of the
# directories does not exist, then CRI-O will automatically skip them.
# hooks_dir = [
# 	"/usr/share/containers/oci/hooks.d",
# ]

# Path to the file specifying the defaults mounts for each container. The
# format of the config is /SRC:/DST, one mount per line. Notice that CRI-O reads
# its default mounts from the following two files:
#
#   1) /etc/containers/mounts.conf (i.e., default_mounts_file): This is the
#      override file, where users can either add in their own default mounts, or
#      override the default mounts shipped with the package.
#
#   2) /usr/share/containers/mounts.conf: This is the default file read for
#      mounts. If you want CRI-O to read from a different, specific mounts file,
#      you can change the default_mounts_file. Note, if this is done, CRI-O will
#      only add mounts it finds in this file.
#
# default_mounts_file = ""

# Maximum number of processes allowed in a container.
# This option is deprecated. The Kubelet flag '--pod-pids-limit' should be used instead.
# pids_limit = 0

# Maximum sized allowed for the container log file. Negative numbers indicate
# that no size limit is imposed. If it is positive, it must be >= 8192 to
# match/exceed conmon's read buffer. The file is truncated and re-opened so the
# limit is never exceeded. This option is deprecated. The Kubelet flag '--container-log-max-size' should be used instead.
# log_size_max = -1

# Whether container output should be logged to journald in addition to the kuberentes log file
# log_to_journald = false

# Path to directory in which container exit files are written to by conmon.
# container_exits_dir = "/var/run/crio/exits"

# Path to directory for container attach sockets.
# container_attach_socket_dir = "/var/run/crio"

# The prefix to use for the source of the bind mounts.
# bind_mount_prefix = ""

# If set to true, all containers will run in read-only mode.
# read_only = false

# Changes the verbosity of the logs based on the level it is set to. Options
# are fatal, panic, error, warn, info, debug and trace. This option supports
# live configuration reload.
# log_level = "info"

# Filter the log messages by the provided regular expression.
# This option supports live configuration reload.
# log_filter = ""

# The UID mappings for the user namespace of each container. A range is
# specified in the form containerUID:HostUID:Size. Multiple ranges must be
# separated by comma.
# uid_mappings = ""

# The GID mappings for the user namespace of each container. A range is
# specified in the form containerGID:HostGID:Size. Multiple ranges must be
# separated by comma.
# gid_mappings = ""

# If set, CRI-O will reject any attempt to map host UIDs below this value
# into user namespaces.  A negative value indicates that no minimum is set,
# so specifying mappings will only be allowed for pods that run as UID 0.
# minimum_mappable_uid = -1

# If set, CRI-O will reject any attempt to map host GIDs below this value
# into user namespaces.  A negative value indicates that no minimum is set,
# so specifying mappings will only be allowed for pods that run as UID 0.
# minimum_mappable_gid = -1

# The minimal amount of time in seconds to wait before issuing a timeout
# regarding the proper termination of the container. The lowest possible
# value is 30s, whereas lower values are not considered by CRI-O.
# ctr_stop_timeout = 30

# drop_infra_ctr determines whether CRI-O drops the infra container
# when a pod does not have a private PID namespace, and does not use
# a kernel separating runtime (like kata).
# It requires manage_ns_lifecycle to be true.
# drop_infra_ctr = true

# infra_ctr_cpuset determines what CPUs will be used to run infra containers.
# You can use linux CPU list format to specify desired CPUs.
# To get better isolation for guaranteed pods, set this parameter to be equal to kubelet reserved-cpus.
# infra_ctr_cpuset = ""

# The directory where the state of the managed namespaces gets tracked.
# Only used when manage_ns_lifecycle is true.
# namespaces_dir = "/var/run"

# pinns_path is the path to find the pinns binary, which is needed to manage namespace lifecycle
# pinns_path = ""

# Globally enable/disable CRIU support which is necessary to
# checkpoint and restore container or pods (even if CRIU is found in $PATH).
# enable_criu_support = false

# Enable/disable the generation of the container,
# sandbox lifecycle events to be sent to the Kubelet to optimize the PLEG
# enable_pod_events = false

# default_runtime is the _name_ of the OCI runtime to be used as the default.
# default_runtime is the _name_ of the OCI runtime to be used as the default.
# The name is matched against the runtimes map below.
# default_runtime = "runc"

# A list of paths that, when absent from the host,
# will cause a container creation to fail (as opposed to the current behavior being created as a directory).
# This option is to protect from source locations whose existence as a directory could jepordize the health of the node, and whose
# creation as a file is not desired either.
# An example is /etc/hostname, which will cause failures on reboot if it's created as a directory, but often doesn't exist because
# the hostname is being managed dynamically.
# absent_mount_sources_to_reject = [
# ]

# The "crio.runtime.runtimes" table defines a list of OCI compatible runtimes.
# The runtime to use is picked based on the runtime handler provided by the CRI.
# If no runtime handler is provided, the "default_runtime" will be used.
# Each entry in the table should follow the format:
#
# [crio.runtime.runtimes.runtime-handler]
# runtime_path = "/path/to/the/executable"
# runtime_type = "oci"
# runtime_root = "/path/to/the/root"
# monitor_path = "/path/to/container/monitor"
# monitor_cgroup = "/cgroup/path"
# monitor_exec_cgroup = "/cgroup/path"
# monitor_env = []
# privileged_without_host_devices = false
# allowed_annotations = []
# Where:
# - runtime-handler: Name used to identify the runtime.
# - runtime_path (optional, string): Absolute path to the runtime executable in
#   the host filesystem. If omitted, the runtime-handler identifier should match
#   the runtime executable name, and the runtime executable should be placed
#   in $PATH.
# - runtime_type (optional, string): Type of runtime, one of: "oci", "vm". If
#   omitted, an "oci" runtime is assumed.
# - runtime_root (optional, string): Root directory for storage of containers
#   state.
# - runtime_config_path (optional, string): the path for the runtime configuration
#   file. This can only be used with when using the VM runtime_type.
# - privileged_without_host_devices (optional, bool): an option for restricting
#   host devices from being passed to privileged containers.
# - allowed_annotations (optional, array of strings): an option for specifying
#   a list of experimental annotations that this runtime handler is allowed to process.
#   The currently recognized values are:
#   "io.kubernetes.cri-o.userns-mode" for configuring a user namespace for the pod.
#   "io.kubernetes.cri-o.cgroup2-mount-hierarchy-rw" for mounting cgroups writably when set to "true".
#   "io.kubernetes.cri-o.Devices" for configuring devices for the pod.
#   "io.kubernetes.cri-o.ShmSize" for configuring the size of /dev/shm.
#   "io.kubernetes.cri-o.UnifiedCgroup.$CTR_NAME" for configuring the cgroup v2 unified block for a container.
#   "io.containers.trace-syscall" for tracing syscalls via the OCI seccomp BPF hook.
#   "io.kubernetes.cri.rdt-class" for setting the RDT class of a container
# - monitor_path (optional, string): The path of the monitor binary. Replaces
#   deprecated option "conmon".
# - monitor_cgroup (optional, string): The cgroup the container monitor process will be put in.
#   Replaces deprecated option "conmon_cgroup".
# - monitor_exec_cgroup (optional, string): If set to "container", indicates exec probes
#   should be moved to the container's cgroup
# - monitor_env (optional, array of strings): Environment variables to pass to the montior.
#   Replaces deprecated option "conmon_env".
#
# Using the seccomp notifier feature:
#
# This feature can help you to debug seccomp related issues, for example if
# blocked syscalls (permission denied errors) have negative impact on the workload.
#
# To be able to use this feature, configure a runtime which has the annotation
# "io.kubernetes.cri-o.seccompNotifierAction" in the allowed_annotations array.
#
# It also requires at least runc 1.1.0 or crun 0.19 which support the notifier
# feature.
#
# If everything is setup, CRI-O will modify chosen seccomp profiles for
# containers if the annotation "io.kubernetes.cri-o.seccompNotifierAction" is
# set on the Pod sandbox. CRI-O will then get notified if a container is using
# a blocked syscall and then terminate the workload after a timeout of 5
# seconds if the value of "io.kubernetes.cri-o.seccompNotifierAction=stop".
#
# This also means that multiple syscalls can be captured during that period,
# while the timeout will get reset once a new syscall has been discovered.
#
# This also means that the Pods "restartPolicy" has to be set to "Never",
# otherwise the kubelet will restart the container immediately.
#
# Please be aware that CRI-O is not able to get notified if a syscall gets
# blocked based on the seccomp defaultAction, which is a general runtime
# limitation.


# [crio.runtime.runtimes.runc]
# runtime_path = ""
# runtime_type = "oci"
# runtime_root = "/run/runc"
# runtime_config_path = ""
# monitor_path = ""
# monitor_cgroup = "system.slice"
# monitor_exec_cgroup = ""
# monitor_env = [
# 	"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
# ]
# allowed_annotations = [
# 	"io.containers.trace-syscall",
# ]
# privileged_without_host_devices = false

# The workloads table defines ways to customize containers with different resources
# that work based on annotations, rather than the CRI.
# Note, the behavior of this table is EXPERIMENTAL and may change at any time.
# Each workload, has a name, activation_annotation, annotation_prefix and set of resources it supports mutating.
# The currently supported resources are "cpu" (to configure the cpu shares) and "cpuset" to configure the cpuset.
# Each resource can have a default value specified, or be empty.
# For a container to opt-into this workload, the pod should be configured with the annotation $activation_annotation (key only, value is ignored).
# To customize per-container, an annotation of the form $annotation_prefix.$resource/$ctrName = "value" can be specified
# signifying for that resource type to override the default value.
# If the annotation_prefix is not present, every container in the pod will be given the default values.
# Example:
# [crio.runtime.workloads.workload-type]
# activation_annotation = "io.crio/workload"
# annotation_prefix = "io.crio.workload-type"
# [crio.runtime.workloads.workload-type.resources]
# cpuset = 0
# cpushares = "0-1"
# Where:
# The workload name is workload-type.
# To specify, the pod must have the "io.crio.workload" annotation (this is a precise string match).
# This workload supports setting cpuset and cpu resources.
# annotation_prefix is used to customize the different resources.
# To configure the cpu shares a container gets in the example above, the pod would have to have the following annotation:
# "io.crio.workload-type/$container_name = {"cpushares": "value"}"

# The crio.image table contains settings pertaining to the management of OCI images.
#
# CRI-O reads its configured registries defaults from the system wide
# containers-registries.conf(5) located in /etc/containers/registries.conf. If
# you want to modify just CRI-O, you can change the registries configuration in
# this file. Otherwise, leave insecure_registries and registries commented out to
# use the system's defaults from /etc/containers/registries.conf.
[crio.image]

# Default transport for pulling images from a remote container storage.
# default_transport = "docker://"

# The path to a file containing credentials necessary for pulling images from
# secure registries. The file is similar to that of /var/lib/kubelet/config.json
# global_auth_file = ""

# The image used to instantiate infra containers.
# This option supports live configuration reload.
# pause_image = "registry.k8s.io/pause:3.6"

# The path to a file containing credentials specific for pulling the pause_image from
# above. The file is similar to that of /var/lib/kubelet/config.json
# This option supports live configuration reload.
# pause_image_auth_file = ""

# The command to run to have a container stay in the paused state.
# When explicitly set to "", it will fallback to the entrypoint and command
# specified in the pause image. When commented out, it will fallback to the
# default: "/pause". This option supports live configuration reload.
# pause_command = "/pause"

# Path to the file which decides what sort of policy we use when deciding
# whether or not to trust an image that we've pulled. It is not recommended that
# this option be used, as the default behavior of using the system-wide default
# policy (i.e., /etc/containers/policy.json) is most often preferred. Please
# refer to containers-policy.json(5) for more details.
# signature_policy = ""

# List of registries to skip TLS verification for pulling images. Please
# consider configuring the registries via /etc/containers/registries.conf before
# changing them here.
# insecure_registries = [
# ]

# Controls how image volumes are handled. The valid values are mkdir, bind and
# ignore; the latter will ignore volumes entirely.
# image_volumes = "mkdir"

# Temporary directory to use for storing big files
# big_files_temporary_dir = ""

# The crio.network table containers settings pertaining to the management of
# CNI plugins.
[crio.network]

# The default CNI network name to be selected. If not set or "", then
# CRI-O will pick-up the first one found in network_dir.
# cni_default_network = ""

# Path to the directory where CNI configuration files are located.
# network_dir = "/etc/cni/net.d/"

# Paths to directories where CNI plugin binaries are located.
plugin_dirs = [
	"/opt/cni/bin",
	"/usr/libexec/cni",
]

# A necessary configuration for Prometheus based metrics retrieval
[crio.metrics]

# Globally enable or disable metrics support.
enable_metrics = true

# Specify enabled metrics collectors.
# Per default all metrics are enabled.
# It is possible, to prefix the metrics with "container_runtime_" and "crio_".
# For example, the metrics collector "operations" would be treated in the same
# way as "crio_operations" and "container_runtime_crio_operations".
# metrics_collectors = [
# 	"operations",
# 	"operations_latency_microseconds_total",
# 	"operations_latency_microseconds",
# 	"operations_errors",
# 	"image_pulls_by_digest",
# 	"image_pulls_by_name",
# 	"image_pulls_by_name_skipped",
# 	"image_pulls_failures",
# 	"image_pulls_successes",
# 	"image_pulls_layer_size",
# 	"image_layer_reuse",
# 	"containers_oom_total",
# 	"containers_oom",
# 	"processes_defunct",
# 	"operations_total",
# 	"operations_latency_seconds",
# 	"operations_latency_seconds_total",
# 	"operations_errors_total",
# 	"image_pulls_bytes_total",
# 	"image_pulls_skipped_bytes_total",
# 	"image_pulls_failure_total",
# 	"image_pulls_success_total",
# 	"image_layer_reuse_total",
# 	"containers_oom_count_total",
# 	"containers_seccomp_notifier_count_total",
# ]
# The port on which the metrics server will listen.
metrics_port = 9537

# Local socket path to bind the metrics server to
# metrics_socket = ""

# The certificate for the secure metrics server.
# If the certificate is not available on disk, then CRI-O will generate a
# self-signed one. CRI-O also watches for changes of this path and reloads the
# certificate on any modification event.
# metrics_cert = ""

# The certificate key for the secure metrics server.
# Behaves in the same way as the metrics_cert.
# metrics_key = ""

# A necessary configuration for OpenTelemetry trace data exporting
[crio.tracing]

# Globally enable or disable exporting OpenTelemetry traces.
# enable_tracing = false

# Address on which the gRPC trace collector listens on.
# tracing_endpoint = "0.0.0.0:4317"

# Number of samples to collect per million spans. Set to 1000000 to always sample.
# tracing_sampling_rate_per_million = 0

# CRI-O NRI configuration.
[crio.nri]

# Globally enable or disable NRI.
# enable_nri = false

# NRI configuration file to use.
# nri_config_file = "/etc/nri/nri.conf"

# NRI socket to listen on.
# nri_listen = "/var/run/nri.sock"

# NRI plugin directory to use.
# nri_plugin_dir = "/opt/nri/plugins"

# Necessary information pertaining to container and pod stats reporting.
[crio.stats]

# The number of seconds between collecting pod and container stats.
# If set to 0, the stats are collected on-demand instead.
# stats_collection_period = 0

containerd

containerd

containerd --version

containerd github.com/containerd/containerd 1.6.19 

systemctl show containerd

Type=notify
ExitType=main
Restart=always
NotifyAccess=main
RestartUSec=5s
TimeoutStartUSec=45s
TimeoutStopUSec=45s
TimeoutAbortUSec=45s
TimeoutStartFailureMode=terminate
TimeoutStopFailureMode=abort
RuntimeMaxUSec=infinity
RuntimeRandomizedExtraUSec=0
WatchdogUSec=0
WatchdogTimestampMonotonic=0
RootDirectoryStartOnly=no
RemainAfterExit=no
GuessMainPID=yes
MainPID=874889
ControlPID=0
FileDescriptorStoreMax=0
NFileDescriptorStore=0
StatusErrno=0
Result=success
ReloadResult=success
CleanResult=success
UID=[not set]
GID=[not set]
NRestarts=0
OOMPolicy=continue
ReloadSignal=1
ExecMainStartTimestamp=Thu 2023-04-27 12:17:20 EDT
ExecMainStartTimestampMonotonic=693299466835
ExecMainExitTimestampMonotonic=0
ExecMainPID=874889
ExecMainCode=0
ExecMainStatus=0
ExecStartPre={ path=/sbin/modprobe ; argv[]=/sbin/modprobe overlay ; ignore_errors=yes ; start_time=[Thu 2023-04-27 12:17:20 EDT] ; stop_time=[Thu 2023-04-27 12:17:20 EDT] ; pid=874888 ; code=exited ; status=0 }
ExecStartPreEx={ path=/sbin/modprobe ; argv[]=/sbin/modprobe overlay ; flags=ignore-failure ; start_time=[Thu 2023-04-27 12:17:20 EDT] ; stop_time=[Thu 2023-04-27 12:17:20 EDT] ; pid=874888 ; code=exited ; status=0 }
ExecStart={ path=/usr/bin/containerd ; argv[]=/usr/bin/containerd ; ignore_errors=no ; start_time=[Thu 2023-04-27 12:17:20 EDT] ; stop_time=[n/a] ; pid=874889 ; code=(null) ; status=0/0 }
ExecStartEx={ path=/usr/bin/containerd ; argv[]=/usr/bin/containerd ; flags= ; start_time=[Thu 2023-04-27 12:17:20 EDT] ; stop_time=[n/a] ; pid=874889 ; code=(null) ; status=0/0 }
Slice=system.slice
ControlGroup=/system.slice/containerd.service
ControlGroupId=21590
MemoryCurrent=1615441920
MemoryAvailable=infinity
CPUUsageNSec=132715694000
EffectiveCPUs=0-31
EffectiveMemoryNodes=0
TasksCurrent=196
IPIngressBytes=[no data]
IPIngressPackets=[no data]
IPEgressBytes=[no data]
IPEgressPackets=[no data]
IOReadBytes=18446744073709551615
IOReadOperations=18446744073709551615
IOWriteBytes=18446744073709551615
IOWriteOperations=18446744073709551615
Delegate=yes
DelegateControllers=cpu cpuacct cpuset io blkio memory devices pids bpf-firewall bpf-devices bpf-foreign bpf-socket-bind bpf-restrict-network-interfaces
CPUAccounting=yes
CPUWeight=[not set]
StartupCPUWeight=[not set]
CPUShares=[not set]
StartupCPUShares=[not set]
CPUQuotaPerSecUSec=infinity
CPUQuotaPeriodUSec=infinity
IOAccounting=no
IOWeight=[not set]
StartupIOWeight=[not set]
BlockIOAccounting=no
BlockIOWeight=[not set]
StartupBlockIOWeight=[not set]
MemoryAccounting=yes
DefaultMemoryLow=0
DefaultMemoryMin=0
MemoryMin=0
MemoryLow=0
MemoryHigh=infinity
MemoryMax=infinity
MemorySwapMax=infinity
MemoryZSwapMax=infinity
MemoryLimit=infinity
DevicePolicy=auto
TasksAccounting=yes
TasksMax=infinity
IPAccounting=no
ManagedOOMSwap=auto
ManagedOOMMemoryPressure=auto
ManagedOOMMemoryPressureLimit=0
ManagedOOMPreference=none
UMask=0022
LimitCPU=infinity
LimitCPUSoft=infinity
LimitFSIZE=infinity
LimitFSIZESoft=infinity
LimitDATA=infinity
LimitDATASoft=infinity
LimitSTACK=infinity
LimitSTACKSoft=8388608
LimitCORE=infinity
LimitCORESoft=infinity
LimitRSS=infinity
LimitRSSSoft=infinity
LimitNOFILE=infinity
LimitNOFILESoft=infinity
LimitAS=infinity
LimitASSoft=infinity
LimitNPROC=infinity
LimitNPROCSoft=infinity
LimitMEMLOCK=8388608
LimitMEMLOCKSoft=8388608
LimitLOCKS=infinity
LimitLOCKSSoft=infinity
LimitSIGPENDING=513479
LimitSIGPENDINGSoft=513479
LimitMSGQUEUE=819200
LimitMSGQUEUESoft=819200
LimitNICE=0
LimitNICESoft=0
LimitRTPRIO=0
LimitRTPRIOSoft=0
LimitRTTIME=infinity
LimitRTTIMESoft=infinity
OOMScoreAdjust=-999
CoredumpFilter=0x33
Nice=0
IOSchedulingClass=2
IOSchedulingPriority=4
CPUSchedulingPolicy=0
CPUSchedulingPriority=0
CPUAffinityFromNUMA=no
NUMAPolicy=n/a
TimerSlackNSec=50000
CPUSchedulingResetOnFork=no
NonBlocking=no
StandardInput=null
StandardOutput=journal
StandardError=inherit
TTYReset=no
TTYVHangup=no
TTYVTDisallocate=no
SyslogPriority=30
SyslogLevelPrefix=yes
SyslogLevel=6
SyslogFacility=3
LogLevelMax=-1
LogRateLimitIntervalUSec=0
LogRateLimitBurst=0
SecureBits=0
CapabilityBoundingSet=cap_chown cap_dac_override cap_dac_read_search cap_fowner cap_fsetid cap_kill cap_setgid cap_setuid cap_setpcap cap_linux_immutable cap_net_bind_service cap_net_broadcast cap_net_admin cap_net_raw cap_ipc_lock cap_ipc_owner cap_sys_module cap_sys_rawio cap_sys_chroot cap_sys_ptrace cap_sys_pacct cap_sys_admin cap_sys_boot cap_sys_nice cap_sys_resource cap_sys_time cap_sys_tty_config cap_mknod cap_lease cap_audit_write cap_audit_control cap_setfcap cap_mac_override cap_mac_admin cap_syslog cap_wake_alarm cap_block_suspend cap_audit_read cap_perfmon cap_bpf cap_checkpoint_restore
DynamicUser=no
RemoveIPC=no
PrivateTmp=no
PrivateDevices=no
ProtectClock=no
ProtectKernelTunables=no
ProtectKernelModules=no
ProtectKernelLogs=no
ProtectControlGroups=no
PrivateNetwork=no
PrivateUsers=no
PrivateMounts=no
PrivateIPC=no
ProtectHome=no
ProtectSystem=no
SameProcessGroup=no
UtmpMode=init
IgnoreSIGPIPE=yes
NoNewPrivileges=no
SystemCallErrorNumber=2147483646
LockPersonality=no
RuntimeDirectoryPreserve=no
RuntimeDirectoryMode=0755
StateDirectoryMode=0755
CacheDirectoryMode=0755
LogsDirectoryMode=0755
ConfigurationDirectoryMode=0755
TimeoutCleanUSec=infinity
MemoryDenyWriteExecute=no
RestrictRealtime=no
RestrictSUIDSGID=no
RestrictNamespaces=no
MountAPIVFS=no
KeyringMode=private
ProtectProc=default
ProcSubset=all
ProtectHostname=no
KillMode=process
KillSignal=15
RestartKillSignal=15
FinalKillSignal=9
SendSIGKILL=yes
SendSIGHUP=no
WatchdogSignal=6
Id=containerd.service
Names=containerd.service
Requires=sysinit.target system.slice
Conflicts=shutdown.target
Before=shutdown.target kubelet.service
After=local-fs.target sysinit.target basic.target systemd-journald.socket system.slice network.target
Documentation=https://containerd.io
Description=containerd container runtime
AccessSELinuxContext=system_u:object_r:container_unit_file_t:s0
LoadState=loaded
ActiveState=active
FreezerState=running
SubState=running
FragmentPath=/usr/lib/systemd/system/containerd.service
DropInPaths=/usr/lib/systemd/system/service.d/10-timeout-abort.conf
UnitFileState=disabled
UnitFilePreset=disabled
StateChangeTimestamp=Thu 2023-04-27 12:17:20 EDT
StateChangeTimestampMonotonic=693299516222
InactiveExitTimestamp=Thu 2023-04-27 12:17:20 EDT
InactiveExitTimestampMonotonic=693299461063
ActiveEnterTimestamp=Thu 2023-04-27 12:17:20 EDT
ActiveEnterTimestampMonotonic=693299516222
ActiveExitTimestamp=Thu 2023-04-27 12:17:20 EDT
ActiveExitTimestampMonotonic=693299416897
InactiveEnterTimestamp=Thu 2023-04-27 12:17:20 EDT
InactiveEnterTimestampMonotonic=693299422025
CanStart=yes
CanStop=yes
CanReload=no
CanIsolate=no
CanFreeze=yes
StopWhenUnneeded=no
RefuseManualStart=no
RefuseManualStop=no
AllowIsolate=no
DefaultDependencies=yes
OnSuccessJobMode=fail
OnFailureJobMode=replace
IgnoreOnIsolate=no
NeedDaemonReload=no
JobTimeoutUSec=infinity
JobRunningTimeoutUSec=infinity
JobTimeoutAction=none
ConditionResult=yes
AssertResult=yes
ConditionTimestamp=Thu 2023-04-27 12:17:20 EDT
ConditionTimestampMonotonic=693299441314
AssertTimestamp=Thu 2023-04-27 12:17:20 EDT
AssertTimestampMonotonic=693299441316
Transient=no
Perpetual=no
StartLimitIntervalUSec=10s
StartLimitBurst=5
StartLimitAction=none
FailureAction=none
SuccessAction=none
InvocationID=084171217340476fa533d75fefdcadb7
CollectMode=inactive

cat /etc/containerd/config.toml

version = 2

[debug]
  address = "/run/containerd/debug.sock"
  level = "debug"

[plugins]
  [plugins."io.containerd.grpc.v1.cri"]
    [plugins."io.containerd.grpc.v1.cri".cni]
      bin_dir = "/usr/libexec/cni/"
      conf_dir = "/etc/cni/net.d"
    [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata]
      runtime_type = "io.containerd.kata.v2"
      privileged_without_host_devices = true
  [plugins."io.containerd.internal.v1.opt"]
    path = "/var/lib/containerd/opt"


Packages

Packages

No dpkg
Have rpm

rpm -qa|egrep "(cc-oci-runtime|cc-runtime|runv|kata-runtime|kata-ksm-throttler|kata-containers-image|linux-container|qemu-)"

egrep: warning: egrep is obsolescent; using grep -E
qemu-virtiofsd-7.2.0-6.fc38.x86_64
qemu-img-7.2.0-6.fc38.x86_64
qemu-tools-7.2.0-6.fc38.x86_64
ipxe-roms-qemu-20220210-3.git64113751.fc38.noarch
qemu-common-7.2.0-6.fc38.x86_64
qemu-ui-opengl-7.2.0-6.fc38.x86_64
qemu-ui-spice-core-7.2.0-6.fc38.x86_64
qemu-char-spice-7.2.0-6.fc38.x86_64
qemu-ui-spice-app-7.2.0-6.fc38.x86_64
qemu-audio-spice-7.2.0-6.fc38.x86_64
qemu-device-display-qxl-7.2.0-6.fc38.x86_64
qemu-ui-egl-headless-7.2.0-6.fc38.x86_64
qemu-ui-gtk-7.2.0-6.fc38.x86_64
qemu-ui-sdl-7.2.0-6.fc38.x86_64
qemu-audio-alsa-7.2.0-6.fc38.x86_64
qemu-audio-dbus-7.2.0-6.fc38.x86_64
qemu-audio-jack-7.2.0-6.fc38.x86_64
qemu-audio-oss-7.2.0-6.fc38.x86_64
qemu-audio-pa-7.2.0-6.fc38.x86_64
qemu-audio-sdl-7.2.0-6.fc38.x86_64
qemu-block-blkio-7.2.0-6.fc38.x86_64
qemu-block-curl-7.2.0-6.fc38.x86_64
qemu-block-dmg-7.2.0-6.fc38.x86_64
qemu-block-gluster-7.2.0-6.fc38.x86_64
qemu-block-iscsi-7.2.0-6.fc38.x86_64
qemu-block-nfs-7.2.0-6.fc38.x86_64
qemu-block-rbd-7.2.0-6.fc38.x86_64
qemu-block-ssh-7.2.0-6.fc38.x86_64
qemu-char-baum-7.2.0-6.fc38.x86_64
qemu-device-display-vhost-user-gpu-7.2.0-6.fc38.x86_64
qemu-device-display-virtio-gpu-7.2.0-6.fc38.x86_64
qemu-device-display-virtio-gpu-ccw-7.2.0-6.fc38.x86_64
qemu-device-display-virtio-gpu-gl-7.2.0-6.fc38.x86_64
qemu-device-display-virtio-gpu-pci-7.2.0-6.fc38.x86_64
qemu-device-display-virtio-gpu-pci-gl-7.2.0-6.fc38.x86_64
qemu-device-display-virtio-vga-7.2.0-6.fc38.x86_64
qemu-device-display-virtio-vga-gl-7.2.0-6.fc38.x86_64
qemu-device-usb-host-7.2.0-6.fc38.x86_64
qemu-device-usb-redirect-7.2.0-6.fc38.x86_64
qemu-device-usb-smartcard-7.2.0-6.fc38.x86_64
qemu-ui-curses-7.2.0-6.fc38.x86_64
qemu-system-aarch64-core-7.2.0-6.fc38.x86_64
qemu-system-alpha-core-7.2.0-6.fc38.x86_64
qemu-system-arm-core-7.2.0-6.fc38.x86_64
qemu-system-avr-core-7.2.0-6.fc38.x86_64
qemu-system-cris-core-7.2.0-6.fc38.x86_64
qemu-system-loongarch64-core-7.2.0-6.fc38.x86_64
qemu-system-m68k-core-7.2.0-6.fc38.x86_64
qemu-system-microblaze-core-7.2.0-6.fc38.x86_64
qemu-system-mips-core-7.2.0-6.fc38.x86_64
qemu-system-nios2-core-7.2.0-6.fc38.x86_64
qemu-system-or1k-core-7.2.0-6.fc38.x86_64
qemu-system-riscv-core-7.2.0-6.fc38.x86_64
qemu-system-rx-core-7.2.0-6.fc38.x86_64
qemu-system-s390x-core-7.2.0-6.fc38.x86_64
qemu-system-sh4-core-7.2.0-6.fc38.x86_64
qemu-system-sparc-core-7.2.0-6.fc38.x86_64
qemu-system-tricore-core-7.2.0-6.fc38.x86_64
qemu-system-xtensa-core-7.2.0-6.fc38.x86_64
qemu-user-7.2.0-6.fc38.x86_64
qemu-system-ppc-core-7.2.0-6.fc38.x86_64
qemu-system-x86-core-7.2.0-6.fc38.x86_64
qemu-pr-helper-7.2.0-6.fc38.x86_64
qemu-system-aarch64-7.2.0-6.fc38.x86_64
qemu-system-alpha-7.2.0-6.fc38.x86_64
qemu-system-arm-7.2.0-6.fc38.x86_64
qemu-system-avr-7.2.0-6.fc38.x86_64
qemu-system-cris-7.2.0-6.fc38.x86_64
qemu-system-loongarch64-7.2.0-6.fc38.x86_64
qemu-system-m68k-7.2.0-6.fc38.x86_64
qemu-system-microblaze-7.2.0-6.fc38.x86_64
qemu-system-mips-7.2.0-6.fc38.x86_64
qemu-system-nios2-7.2.0-6.fc38.x86_64
qemu-system-or1k-7.2.0-6.fc38.x86_64
qemu-system-ppc-7.2.0-6.fc38.x86_64
qemu-system-riscv-7.2.0-6.fc38.x86_64
qemu-system-rx-7.2.0-6.fc38.x86_64
qemu-system-s390x-7.2.0-6.fc38.x86_64
qemu-system-sh4-7.2.0-6.fc38.x86_64
qemu-system-sparc-7.2.0-6.fc38.x86_64
qemu-system-tricore-7.2.0-6.fc38.x86_64
qemu-system-x86-7.2.0-6.fc38.x86_64
qemu-system-xtensa-7.2.0-6.fc38.x86_64
qemu-7.2.0-6.fc38.x86_64


Kata Monitor

Kata Monitor kata-monitor.

kata-monitor --version

kata-monitor
 Version:	0.3.0
 Go version:	go1.20.3
 Git commit:	8653b3acd5da506bbf13b8fadb5eb0d06caaf2d2
 OS/Arch:	linux/amd64


@gkurz
Copy link
Member

gkurz commented May 9, 2023

An option could be to send a SIGTERM instead of a SIGKILL to virtiofsd, the same way nydusd is doing.

nydusd is using WaitLocalProcess : it first kills with SIGTERM and then waits for the process to termintate. If after some time the process is still running, it is forcibly killed with SIGKILL. This might introduce some latency and give some more time to QEMU to process its QMP quit command, but :

  • it isn't guaranteed in any way : virtiofsd could still terminate before QEMU stops monitoring the connection
  • it might increase pod deletion time substantially if virtiofsd takes time to terminate

Another option could be to simply turn QEMU logging off before terminating virtiofsd.

Yet another option could be to just SIGKILL the QEMU process instead instead of doing QMP quit.

beraldoleal added a commit to beraldoleal/kata-containers that referenced this issue May 24, 2023
There is a race condition when virtiofsd is killed without finishing all
the clients. Because of that, when a pod is stopped, QEMU detects
virtiofsd is gone, which is legitimate.

Sending a SIGTERM first before killing could introduce some latency
during the shutdown.

Fixes kata-containers#6757.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
beraldoleal added a commit to beraldoleal/kata-containers that referenced this issue May 24, 2023
There is a race condition when virtiofsd is killed without finishing all
the clients. Because of that, when a pod is stopped, QEMU detects
virtiofsd is gone, which is legitimate.

Sending a SIGTERM first before killing could introduce some latency
during the shutdown.

Fixes kata-containers#6757.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
@beraldoleal
Copy link
Member Author

Thank you @gkurz for your comments. Just sent #6959, let me know if that works for you.

beraldoleal added a commit to beraldoleal/kata-containers that referenced this issue May 25, 2023
There is a race condition when virtiofsd is killed without finishing all
the clients. Because of that, when a pod is stopped, QEMU detects
virtiofsd is gone, which is legitimate.

Sending a SIGTERM first before killing could introduce some latency
during the shutdown.

Fixes kata-containers#6757.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
beraldoleal added a commit to beraldoleal/kata-containers that referenced this issue May 25, 2023
There is a race condition when virtiofsd is killed without finishing all
the clients. Because of that, when a pod is stopped, QEMU detects
virtiofsd is gone, which is legitimate.

Sending a SIGTERM first before killing could introduce some latency
during the shutdown.

Fixes kata-containers#6757.
Backport of kata-containers#6959.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
beraldoleal added a commit to beraldoleal/kata-containers that referenced this issue May 25, 2023
There is a race condition when virtiofsd is killed without finishing all
the clients. Because of that, when a pod is stopped, QEMU detects
virtiofsd is gone, which is legitimate.

Sending a SIGTERM first before killing could introduce some latency
during the shutdown.

Fixes kata-containers#6757.
Backport of kata-containers#6959.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
(cherry picked from commit 0e47cfc)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behaviour needs-review Needs to be assessed by the team.
Projects
Issue backlog
  
To do
Development

Successfully merging a pull request may close this issue.

2 participants