Skip to content

Commit

Permalink
iommufd: vfio container FD ioctl compatibility
Browse files Browse the repository at this point in the history
iommufd can directly implement the /dev/vfio/vfio container IOCTLs by
mapping them into io_pagetable operations.

A userspace application can test against iommufd and confirm compatability
then simply make a small change to open /dev/iommu instead of
/dev/vfio/vfio.

For testing purposes /dev/vfio/vfio can be symlinked to /dev/iommu and
then all applications will use the compatability path with no code
changes. It is unclear if this could ever be a production configuration.

This series just provides the iommufd side of compatability. Actually
linking this to VFIO_SET_CONTAINER is a followup series, with a link in
the cover letter.

Internally the compatibility API uses a normal IOAS object that, like
vfio, is automatically allocated when the first device is
attached.

Userspace can also query or set this IOAS object directly using the
IOMMU_VFIO_IOAS ioctl. This allows mixing and matching new iommufd only
features while still using the VFIO style map/unmap ioctls.

While this is enough to operate qemu, it is still a bit of a WIP with a
few gaps:

 - Only the TYPE1v2 mode is supported where unmap cannot punch holes or
   split areas. The old mode can be implemented with a new operation to
   split an iopt_area into two without disturbing the iopt_pages or the
   domains, then unmapping a whole area as normal.

 - Resource limits rely on memory cgroups to bound what userspace can do
   instead of the module parameter dma_entry_limit.

 - Pinned page accounting uses the same system as io_uring, not the
   mm_struct based system vfio uses.

 - VFIO P2P is not implemented. The DMABUF patches for vfio are a start at
   a solution where iommufd would import a special DMABUF. This is to avoid
   further propogating the follow_pfn() security problem.

 - Indefinite suspend of SW access (VFIO_DMA_MAP_FLAG_VADDR) is not
   implemented.

 - A full audit for pedantic compatibility details (eg errnos, etc) has
   not yet been done

 - powerpc SPAPR is left out, as it is not connected to the iommu_domain
   framework. My hope is that SPAPR will be moved into the iommu_domain
   framework as a special HW specific type and would expect power to
   support the generic interface through a normal iommu_domain.

The following are not going to be implemented and we expect to remove them
from VFIO type1:

 - SW access 'dirty tracking'. As discussed in the cover letter this will
   be done in VFIO.

 - VFIO_TYPE1_NESTING_IOMMU
    https://lore.kernel.org/all/0-v1-0093c9b0e345+19-vfio_no_nesting_jgg@nvidia.com/

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
  • Loading branch information
jgunthorpe committed Oct 12, 2022
1 parent 4297c7d commit 8fa4c87
Show file tree
Hide file tree
Showing 6 changed files with 498 additions and 6 deletions.
3 changes: 2 additions & 1 deletion drivers/iommu/iommufd/Makefile
Expand Up @@ -5,6 +5,7 @@ iommufd-y := \
io_pagetable.o \
ioas.o \
main.o \
pages.o
pages.o \
vfio_compat.o

obj-$(CONFIG_IOMMUFD) += iommufd.o
6 changes: 6 additions & 0 deletions drivers/iommu/iommufd/iommufd_private.h
Expand Up @@ -75,6 +75,8 @@ int iopt_disable_large_pages(struct io_pagetable *iopt);
struct iommufd_ctx {
struct file *file;
struct xarray objects;

struct iommufd_ioas *vfio_ioas;
};

struct iommufd_ucmd {
Expand All @@ -84,6 +86,9 @@ struct iommufd_ucmd {
void *cmd;
};

int iommufd_vfio_ioctl(struct iommufd_ctx *ictx, unsigned int cmd,
unsigned long arg);

/* Copy the response in ucmd->cmd back to userspace. */
static inline int iommufd_ucmd_respond(struct iommufd_ucmd *ucmd,
size_t cmd_len)
Expand Down Expand Up @@ -210,6 +215,7 @@ int iommufd_ioas_allow_iovas(struct iommufd_ucmd *ucmd);
int iommufd_ioas_map(struct iommufd_ucmd *ucmd);
int iommufd_ioas_copy(struct iommufd_ucmd *ucmd);
int iommufd_ioas_unmap(struct iommufd_ucmd *ucmd);
int iommufd_vfio_ioas(struct iommufd_ucmd *ucmd);

/*
* A HW pagetable is called an iommu_domain inside the kernel. This user object
Expand Down
16 changes: 11 additions & 5 deletions drivers/iommu/iommufd/main.c
Expand Up @@ -134,6 +134,8 @@ bool iommufd_object_destroy_user(struct iommufd_ctx *ictx,
return false;
}
__xa_erase(&ictx->objects, obj->id);
if (ictx->vfio_ioas && &ictx->vfio_ioas->obj == obj)
ictx->vfio_ioas = NULL;
xa_unlock(&ictx->objects);
up_write(&obj->destroy_rwsem);

Expand Down Expand Up @@ -241,27 +243,31 @@ static struct iommufd_ioctl_op iommufd_ioctl_ops[] = {
__reserved),
IOCTL_OP(IOMMU_IOAS_UNMAP, iommufd_ioas_unmap, struct iommu_ioas_unmap,
length),
IOCTL_OP(IOMMU_VFIO_IOAS, iommufd_vfio_ioas, struct iommu_vfio_ioas,
__reserved),
};

static long iommufd_fops_ioctl(struct file *filp, unsigned int cmd,
unsigned long arg)
{
struct iommufd_ctx *ictx = filp->private_data;
struct iommufd_ucmd ucmd = {};
struct iommufd_ioctl_op *op;
union ucmd_buffer buf;
unsigned int nr;
int ret;

ucmd.ictx = filp->private_data;
nr = _IOC_NR(cmd);
if (nr < IOMMUFD_CMD_BASE ||
(nr - IOMMUFD_CMD_BASE) >= ARRAY_SIZE(iommufd_ioctl_ops))
return iommufd_vfio_ioctl(ictx, cmd, arg);

ucmd.ictx = ictx;
ucmd.ubuffer = (void __user *)arg;
ret = get_user(ucmd.user_size, (u32 __user *)ucmd.ubuffer);
if (ret)
return ret;

nr = _IOC_NR(cmd);
if (nr < IOMMUFD_CMD_BASE ||
(nr - IOMMUFD_CMD_BASE) >= ARRAY_SIZE(iommufd_ioctl_ops))
return -ENOIOCTLCMD;
op = &iommufd_ioctl_ops[nr - IOMMUFD_CMD_BASE];
if (op->ioctl_num != cmd)
return -ENOIOCTLCMD;
Expand Down

0 comments on commit 8fa4c87

Please sign in to comment.