Skip to content

Commit

Permalink
swiotlb: Split up single swiotlb lock
Browse files Browse the repository at this point in the history
Traditionally swiotlb was not performance critical because it was only
used for slow devices. But in some setups, like TDX confidential
guests, all IO has to go through swiotlb. Currently swiotlb only has a
single lock. Under high IO load with multiple CPUs this can lead to
signifiant lock contention on the swiotlb lock. We've seen 20+% CPU
time in locks in some extreme cases.

This patch splits the swiotlb into individual areas which have their
own lock. Each CPU tries to allocate in its own area first. Only if
that fails does it search other areas. On freeing the allocation is
freed into the area where the memory was originally allocated from.

To avoid doing a full modulo in the main path the number of swiotlb
areas is always rounded to the next power of two. I believe that's
not really needed anymore on modern CPUs (which have fast enough
dividers), but still a good idea on older parts.

The number of areas can be set using the swiotlb option. But to avoid
every user having to set this option set the default to the number of
available CPUs. Unfortunately on x86 swiotlb is initialized before
num_possible_cpus() is available, that is why it uses a custom hook
called from the early ACPI code.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
  • Loading branch information
Andi Kleen authored and Kuppuswamy Sathyanarayanan committed Oct 11, 2021
1 parent 3d6a937 commit 998ba0e
Show file tree
Hide file tree
Showing 4 changed files with 177 additions and 29 deletions.
4 changes: 3 additions & 1 deletion Documentation/admin-guide/kernel-parameters.txt
Expand Up @@ -5581,8 +5581,10 @@
it if 0 is given (See Documentation/admin-guide/cgroup-v1/memory.rst)

swiotlb= [ARM,IA-64,PPC,MIPS,X86]
Format: { <int> | force | noforce }
Format: { <int> [,<int>] | force | noforce }
<int> -- Number of I/O TLB slabs
<int> -- Second integer after comma. Number of swiotlb
areas with their own lock. Must be power of 2.
force -- force using of bounce buffers even if they
wouldn't be automatically used by the kernel
noforce -- Never use bounce buffers (for debugging)
Expand Down
4 changes: 4 additions & 0 deletions arch/x86/kernel/acpi/boot.c
Expand Up @@ -22,6 +22,7 @@
#include <linux/efi-bgrt.h>
#include <linux/serial_core.h>
#include <linux/pgtable.h>
#include <linux/swiotlb.h>

#include <asm/e820/api.h>
#include <asm/irqdomain.h>
Expand Down Expand Up @@ -1129,6 +1130,9 @@ static int __init acpi_parse_madt_lapic_entries(void)
return count;
}

/* This does not take overrides into consideration */
swiotlb_hint_cpus(max(count, x2count));

x2count = acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_X2APIC_NMI,
acpi_parse_x2apic_nmi, 0);
count = acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_APIC_NMI,
Expand Down
27 changes: 21 additions & 6 deletions include/linux/swiotlb.h
Expand Up @@ -38,6 +38,7 @@ enum swiotlb_force {

extern void swiotlb_init(int verbose);
int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose);
void swiotlb_hint_cpus(int cpus);
unsigned long swiotlb_size_or_default(void);
extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);
extern int swiotlb_late_init_with_default_size(size_t default_size);
Expand Down Expand Up @@ -70,7 +71,25 @@ struct io_tlb_slot {
};

/**
* struct io_tlb_mem - IO TLB Memory Pool Descriptor
* struct io_tlb_area - IO TLB memory area descriptor
*
* This is a single area with a single lock.
*
* @used: The number of used IO TLB block.
* @list: The free list describing the number of free entries available
* from each index.
* @lock: The lock to protect the above data structures in the map and
* unmap calls.
*/

struct io_tlb_area {
unsigned long used;
struct list_head free_slots;
spinlock_t lock;
};

/**
* struct io_tlb_mem - io tlb memory pool descriptor
*
* @start: The start address of the swiotlb memory pool. Used to do a quick
* range check to see if the memory was in fact allocated by this
Expand All @@ -85,8 +104,6 @@ struct io_tlb_slot {
* @index: The index to start searching in the next round.
* @orig_addr: The original address corresponding to a mapped entry.
* @alloc_size: Size of the allocated buffer.
* @lock: The lock to protect the above data structures in the map and
* unmap calls.
* @debugfs: The dentry to debugfs.
* @late_alloc: %true if allocated using the page allocator
* @force_bounce: %true if swiotlb bouncing is forced
Expand All @@ -98,13 +115,11 @@ struct io_tlb_mem {
phys_addr_t start;
phys_addr_t end;
unsigned long nslabs;
unsigned long used;
struct list_head free_slots;
spinlock_t lock;
struct dentry *debugfs;
bool late_alloc;
bool force_bounce;
bool for_alloc;
struct io_tlb_area *areas;
struct io_tlb_slot *slots;
unsigned long *bitmap;
};
Expand Down

0 comments on commit 998ba0e

Please sign in to comment.