Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If image auto-update fails due to no space, it stays and corrupts new containers #404

Closed
2 of 6 tasks
andrey-utkin opened this issue Jan 19, 2024 · 2 comments · Fixed by #435
Closed
2 of 6 tasks
Assignees
Labels
Bug Confirmed to be a bug
Milestone

Comments

@andrey-utkin
Copy link

andrey-utkin commented Jan 19, 2024

Required information

  • Distribution: Ubuntu
  • Distribution version: 22.04.3 LTS
  • The output of "incus info"
incus info
config:
  core.https_address: '[::]:8443'
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- network_sriov
- console
- restrict_dev_incus
- migration_pre_copy
- infiniband
- dev_incus_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- dev_incus_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- backup_compression
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- snapshot_schedule_aliases
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- container_nic_routed_host_table
- container_nic_ipvlan_host_table
- container_nic_ipvlan_mode
- resources_system
- images_push_relay
- network_dns_search
- container_nic_routed_limits
- instance_nic_bridged_vlan
- network_state_bond_bridge
- usedby_consistency
- custom_block_volumes
- clustering_failure_domains
- resources_gpu_mdev
- console_vga_type
- projects_limits_disk
- network_type_macvlan
- network_type_sriov
- container_syscall_intercept_bpf_devices
- network_type_ovn
- projects_networks
- projects_networks_restricted_uplinks
- custom_volume_backup
- backup_override_name
- storage_rsync_compression
- network_type_physical
- network_ovn_external_subnets
- network_ovn_nat
- network_ovn_external_routes_remove
- tpm_device_type
- storage_zfs_clone_copy_rebase
- gpu_mdev
- resources_pci_iommu
- resources_network_usb
- resources_disk_address
- network_physical_ovn_ingress_mode
- network_ovn_dhcp
- network_physical_routes_anycast
- projects_limits_instances
- network_state_vlan
- instance_nic_bridged_port_isolation
- instance_bulk_state_change
- network_gvrp
- instance_pool_move
- gpu_sriov
- pci_device_type
- storage_volume_state
- network_acl
- migration_stateful
- disk_state_quota
- storage_ceph_features
- projects_compression
- projects_images_remote_cache_expiry
- certificate_project
- network_ovn_acl
- projects_images_auto_update
- projects_restricted_cluster_target
- images_default_architecture
- network_ovn_acl_defaults
- gpu_mig
- project_usage
- network_bridge_acl
- warnings
- projects_restricted_backups_and_snapshots
- clustering_join_token
- clustering_description
- server_trusted_proxy
- clustering_update_cert
- storage_api_project
- server_instance_driver_operational
- server_supported_storage_drivers
- event_lifecycle_requestor_address
- resources_gpu_usb
- clustering_evacuation
- network_ovn_nat_address
- network_bgp
- network_forward
- custom_volume_refresh
- network_counters_errors_dropped
- metrics
- image_source_project
- clustering_config
- network_peer
- linux_sysctl
- network_dns
- ovn_nic_acceleration
- certificate_self_renewal
- instance_project_move
- storage_volume_project_move
- cloud_init
- network_dns_nat
- database_leader
- instance_all_projects
- clustering_groups
- ceph_rbd_du
- instance_get_full
- qemu_metrics
- gpu_mig_uuid
- event_project
- clustering_evacuation_live
- instance_allow_inconsistent_copy
- network_state_ovn
- storage_volume_api_filtering
- image_restrictions
- storage_zfs_export
- network_dns_records
- storage_zfs_reserve_space
- network_acl_log
- storage_zfs_blocksize
- metrics_cpu_seconds
- instance_snapshot_never
- certificate_token
- instance_nic_routed_neighbor_probe
- event_hub
- agent_nic_config
- projects_restricted_intercept
- metrics_authentication
- images_target_project
- cluster_migration_inconsistent_copy
- cluster_ovn_chassis
- container_syscall_intercept_sched_setscheduler
- storage_lvm_thinpool_metadata_size
- storage_volume_state_total
- instance_file_head
- instances_nic_host_name
- image_copy_profile
- container_syscall_intercept_sysinfo
- clustering_evacuation_mode
- resources_pci_vpd
- qemu_raw_conf
- storage_cephfs_fscache
- network_load_balancer
- vsock_api
- instance_ready_state
- network_bgp_holdtime
- storage_volumes_all_projects
- metrics_memory_oom_total
- storage_buckets
- storage_buckets_create_credentials
- metrics_cpu_effective_total
- projects_networks_restricted_access
- storage_buckets_local
- loki
- acme
- internal_metrics
- cluster_join_token_expiry
- remote_token_expiry
- init_preseed
- storage_volumes_created_at
- cpu_hotplug
- projects_networks_zones
- network_txqueuelen
- cluster_member_state
- instances_placement_scriptlet
- storage_pool_source_wipe
- zfs_block_mode
- instance_generation_id
- disk_io_cache
- amd_sev
- storage_pool_loop_resize
- migration_vm_live
- ovn_nic_nesting
- oidc
- network_ovn_l3only
- ovn_nic_acceleration_vdpa
- cluster_healing
- instances_state_total
- auth_user
- security_csm
- instances_rebuild
- numa_cpu_placement
- custom_volume_iso
- network_allocations
- zfs_delegate
- storage_api_remote_volume_snapshot_copy
- operations_get_query_all_projects
- metadata_configuration
- syslog_socket
- event_lifecycle_name_and_project
- instances_nic_limits_priority
- disk_initial_volume_configuration
- operation_wait
- image_restriction_privileged
- cluster_internal_custom_volume_copy
- disk_io_bus
- storage_cephfs_create_missing
- instance_move_config
- ovn_ssl_config
- certificate_description
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
auth_user_name: bluecherryteam
auth_user_method: unix
environment:
  addresses:
  - 192.168.86.151:8443
  - 10.224.252.1:8443
  - '[fd42:a96a:f32e:f14a::1]:8443'
  - 10.181.144.1:8443
  - '[fd42:a046:4d35:f6dd::1]:8443'
  architectures:
  - x86_64
  - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
    MIICDzCCAZWgAwIBAgIQXVoaCx77/vGcN5L2q6gljzAKBggqhkjOPQQDAzA3MRkw
    FwYDVQQKExBMaW51eCBDb250YWluZXJzMRowGAYDVQQDDBFyb290QGZvY2FsLTEt
    Ni0yNDAeFw0yNDAxMDYyMDUzNTBaFw0zNDAxMDMyMDUzNTBaMDcxGTAXBgNVBAoT
    EExpbnV4IENvbnRhaW5lcnMxGjAYBgNVBAMMEXJvb3RAZm9jYWwtMS02LTI0MHYw
    EAYHKoZIzj0CAQYFK4EEACIDYgAEI0LvCfJq47k1Jov/I7n+yXF9UqUtEFn2YNmA
    0vpKE6Kgeon4zhQ1WLm1x2iz6yaWitnVdj/hTwK+FzQKZVNFDiW6ectxZxlMbyT+
    7+BePUedxm3XT+/2VsJeivWyU3wao2YwZDAOBgNVHQ8BAf8EBAMCBaAwEwYDVR0l
    BAwwCgYIKwYBBQUHAwEwDAYDVR0TAQH/BAIwADAvBgNVHREEKDAmggxmb2NhbC0x
    LTYtMjSHBH8AAAGHEAAAAAAAAAAAAAAAAAAAAAEwCgYIKoZIzj0EAwMDaAAwZQIw
    JFO7HPjo/RojG0vpv7C7UQGjw7X1m6vHpQa+aw+kR5zSDgv0qGxf09HBhmW7SDfk
    AjEArfaeKLzkqgwMQluRvLGeeQewxpBR7tuM/EC1WquYozt6jf/s1hRYN3Dja/+w
    Uyfs
    -----END CERTIFICATE-----
  certificate_fingerprint: e9c32ebfd473892cb5728b977929f8b45ddcd0adac7457685ee6098baa5af826
  driver: lxc | qemu
  driver_version: 5.0.3 | 8.1.3
  firewall: nftables
  kernel: Linux
  kernel_architecture: x86_64
  kernel_features:
    idmapped_mounts: "true"
    netnsid_getifaddrs: "true"
    seccomp_listener: "true"
    seccomp_listener_continue: "true"
    uevent_injection: "true"
    unpriv_fscaps: "true"
  kernel_version: 5.15.0-91-generic
  lxc_features:
    cgroup2: "true"
    core_scheduling: "true"
    devpts_fd: "true"
    idmapped_mounts_v2: "true"
    mount_injection_file: "true"
    network_gateway_device_route: "true"
    network_ipvlan: "true"
    network_l2proxy: "true"
    network_phys_macvlan_mtu: "true"
    network_veth_router: "true"
    pidfd: "true"
    seccomp_allow_deny_syntax: "true"
    seccomp_notify: "true"
    seccomp_proxy_send_notify_fd: "true"
  os_name: Ubuntu
  os_version: "22.04"
  project: default
  server: incus
  server_clustered: false
  server_event_mode: full-mesh
  server_name: incus
  server_pid: 866
  server_version: "0.4"
  storage: btrfs
  storage_version: 5.16.2
  storage_supported_drivers:
  - name: dir
    version: "1"
    remote: false
  - name: lvm
    version: 2.03.11(2) (2021-01-08) / 1.02.175 (2021-01-08) / 4.45.0
    remote: false
  - name: btrfs
    version: 5.16.2
    remote: false

Issue description

When the storage pool is overfilled by image auto-update operation, the inconsistent image ends up being used.
In my case it resulted in containers failing to start, but the bug can be less visible.

incusd.log:

time="2024-01-13T10:27:37Z" level=warning msg="Unpack failed" allowedCmds="[xz]" err="Failed to run: tar --wildcards --exclude=dev/* --exclude=./dev/* --exclude=rootfs/dev/* --exclude=rootfs/./dev/* --restrict --force-local -C /var/lib/incus/storage-pools/default/images/54df95801a0bdbdd981401884bbdec09f6b959170877df2e71f0677d3f220319 --numeric-owner --xattrs-include=* -Jxf -: exit status 2 (tar: metadata.yaml: Cannot write: No space left on device\ntar: templates/hostname.tpl: Cannot write: No space left on device\ntar: templates/hosts.tpl: Cannot write: No space left on device\ntar: Exiting with failure status due to previous errors)" extension=.tar.xz file=/var/lib/incus/images/54df95801a0bdbdd981401884bbdec09f6b959170877df2e71f0677d3f220319 path=/var/lib/incus/storage-pools/default/images/54df95801a0bdbdd981401884bbdec09f6b959170877df2e71f0677d3f220319

container-name/console.log:

/sbin/init: error while loading shared libraries: /lib/x86_64-linux-gnu/libseccomp.so.2: file too short

journalctl -u incus:

level=error msg="Failed to retrieve PID of executing child process" instance=huge-cluster-server-1 instanceType=container project=default

Steps to reproduce

  1. Pull some image from public repo, make sure it it configured to be auto-updated
  2. Make storage pool to be almost full
  3. Make image auto-update happen (e.g. wait; I don't know how to trigger it)
  4. Try to incus launch a container from previously downloaded image having auto-update on
  5. Observe it failing to start

Workaround

Manually delete all images from storage pool: incus image list, incus image delete ...

Information to attach

  • Any relevant kernel output (dmesg)
  • Container log (incus info NAME --show-log)
  • Container configuration (incus config show NAME --expanded)
  • Main daemon log (at /var/log/incus/incusd.log)
  • Output of the client with --debug
  • Output of the daemon with --debug (alternatively output of incus monitor --pretty while reproducing the issue)
@andrey-utkin andrey-utkin changed the title Auto-updated image is corrupted and can fail due to no space, but leave corrupetd image If image auto-update fails due to no space, it stays and corrupts new containers Jan 19, 2024
@stgraber
Copy link
Member

What storage pool driver are you using?

@stgraber stgraber added the Bug Confirmed to be a bug label Jan 19, 2024
@stgraber stgraber added this to the incus-0.5 milestone Jan 19, 2024
@andrey-utkin
Copy link
Author

btrfs

@stgraber stgraber self-assigned this Jan 19, 2024
stgraber added a commit to stgraber/incus that referenced this issue Jan 25, 2024
Closes lxc#404

Signed-off-by: Stéphane Graber <stgraber@stgraber.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Confirmed to be a bug
Development

Successfully merging a pull request may close this issue.

2 participants