Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nvidia Vulkan ICD is not exposed via nvidia.runtime #5879

Closed
flexiondotorg opened this issue Jun 24, 2019 · 1 comment

Comments

@flexiondotorg
Copy link

commented Jun 24, 2019

Required information

  • Distribution: Ubuntu
  • Distribution version: 19.04
config: {}
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- macaroon_authentication
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- candid_authentication
- backup_compression
- candid_config
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- candid_config_key
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- rbac
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
environment:
  addresses: []
  architectures:
  - x86_64
  - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
    MIICAjCCAYegAwIBAgIRAOfLyH9frIOUVkTkGiVRNe8wCgYIKoZIzj0EAwMwNzEc
    MBoGA1UEChMTbGludXhjb250YWluZXJzLm9yZzEXMBUGA1UEAwwOcm9vdEBkZXNp
    Z25hcmUwHhcNMTkwMzAzMTYyNDAyWhcNMjkwMjI4MTYyNDAyWjA3MRwwGgYDVQQK
    ExNsaW51eGNvbnRhaW5lcnMub3JnMRcwFQYDVQQDDA5yb290QGRlc2lnbmFyZTB2
    MBAGByqGSM49AgEGBSuBBAAiA2IABODcjp+Omoi8AFHrPmlYmY9i/CuiQsbZFS0x
    waK51zCENA3eZ74mXAeaenyHsXCthqO35MRC/cm7Dagle1fQ4Kp7mpDK/QfPU+cm
    exZ0h0/q7ZQpd81o4pe5Ctd1qeHdwaNXMFUwDgYDVR0PAQH/BAQDAgWgMBMGA1Ud
    JQQMMAoGCCsGAQUFBwMBMAwGA1UdEwEB/wQCMAAwIAYDVR0RBBkwF4IJZGVzaWdu
    YXJlhwTAqAJyhwQKSqIBMAoGCCqGSM49BAMDA2kAMGYCMQC2I3ysCDyjYBz0OpPk
    FuX3LEgDXlYSAl2AKU6STdK6a5t0OzSiAQonYXa1wLfNeZQCMQDBvp+RzStPtJi3
    Il1ms/ufEa1XvKAGCG5Jn6GdOhuLZUnz2wfrPKCETy5kWBmSKBI=
    -----END CERTIFICATE-----
  certificate_fingerprint: 1849f1eab064ac8b1e9d918a83bd81e46b981752c6238aadc9dc1fe8e8488bf4
  driver: lxc
  driver_version: 3.1.0 (devel)
  kernel: Linux
  kernel_architecture: x86_64
  kernel_features:
    netnsid_getifaddrs: "true"
    seccomp_listener: "true"
    shiftfs: "false"
    uevent_injection: "true"
    unpriv_fscaps: "true"
  kernel_version: 5.0.0-17-generic
  lxc_features:
    mount_injection_file: "true"
    network_gateway_device_route: "true"
    network_ipvlan: "true"
    network_l2proxy: "true"
    network_phys_macvlan_mtu: "true"
    seccomp_notify: "true"
  project: default
  server: lxd
  server_clustered: false
  server_name: designare
  server_pid: 20328
  server_version: "3.14"
  storage: dir
  storage_version: "1"

Issue description

Vulkan applications do not work in LXD containers. I have the nvidia-418 drivers installed on my host, both 64-bit and 32-bit. I've created a container using a profile that includes nvidia.driver.capabilities: all and nvidia.runtime: true. The container has mounted all the required Nvidia userspace libraries with the exception of:

  • /usr/share/vulkan/icd.d/nvidia_icd.json
  • /usr/lib/i386-linux-gnu/libnvidia-glvkspirv.so.418.56
  • /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.418.56

Consequently, applications requiring the Vulkan API segfault.

Steps to reproduce

  1. Install nvidia drivers on the host.
  2. Create a container and include nvidia.driver.capabilities: all and nvidia.runtime: true in the profile.
  3. Install vulkan-utils in the container and execute vulkaninfo
/build/vulkan-UL09PJ/vulkan-1.1.70+dfsg1/demos/vulkaninfo.c:2700: failed with VK_ERROR_INITIALIZATION_FAILED

Information to attach

This bug looks related:

@stgraber

This comment has been minimized.

Copy link
Member

commented Jun 24, 2019

Yeah, we use and ship the latest nvidia-container release so that's where this would have to get fixed first then things will just work in LXD.

Closing as there's nothing for us to do in LXD but I'll watch and comment in the nvidia-container issue.

@stgraber stgraber closed this Jun 24, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
2 participants
You can’t perform that action at this time.