Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
iommufd: vfio container FD ioctl compatibility
iommufd can directly implement the /dev/vfio/vfio container IOCTLs by mapping them into io_pagetable operations. A userspace application can test against iommufd and confirm compatability then simply make a small change to open /dev/iommu instead of /dev/vfio/vfio. For testing purposes /dev/vfio/vfio can be symlinked to /dev/iommu and then all applications will use the compatability path with no code changes. It is unclear if this could ever be a production configuration. This series just provides the iommufd side of compatability. Actually linking this to VFIO_SET_CONTAINER is a followup series, with a link in the cover letter. Internally the compatibility API uses a normal IOAS object that, like vfio, is automatically allocated when the first device is attached. Userspace can also query or set this IOAS object directly using the IOMMU_VFIO_IOAS ioctl. This allows mixing and matching new iommufd only features while still using the VFIO style map/unmap ioctls. While this is enough to operate qemu, it is still a bit of a WIP with a few gaps: - Only the TYPE1v2 mode is supported where unmap cannot punch holes or split areas. The old mode can be implemented with a new operation to split an iopt_area into two without disturbing the iopt_pages or the domains, then unmapping a whole area as normal. - Resource limits rely on memory cgroups to bound what userspace can do instead of the module parameter dma_entry_limit. - Pinned page accounting uses the same system as io_uring, not the mm_struct based system vfio uses. - VFIO P2P is not implemented. The DMABUF patches for vfio are a start at a solution where iommufd would import a special DMABUF. This is to avoid further propogating the follow_pfn() security problem. - Indefinite suspend of SW access (VFIO_DMA_MAP_FLAG_VADDR) is not implemented. - A full audit for pedantic compatibility details (eg errnos, etc) has not yet been done - powerpc SPAPR is left out, as it is not connected to the iommu_domain framework. My hope is that SPAPR will be moved into the iommu_domain framework as a special HW specific type and would expect power to support the generic interface through a normal iommu_domain. The following are not going to be implemented and we expect to remove them from VFIO type1: - SW access 'dirty tracking'. As discussed in the cover letter this will be done in VFIO. - VFIO_TYPE1_NESTING_IOMMU https://lore.kernel.org/all/0-v1-0093c9b0e345+19-vfio_no_nesting_jgg@nvidia.com/ Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
- Loading branch information