Skip to content

Commit a096b02

Browse files
Kanglong Wanggregkh
authored andcommitted
LoongArch: Optimize module load time by optimizing PLT/GOT counting
[ Upstream commit 63dbd8f ] When enabling CONFIG_KASAN, CONFIG_PREEMPT_VOLUNTARY_BUILD and CONFIG_PREEMPT_VOLUNTARY at the same time, there will be soft deadlock, the relevant logs are as follows: rcu: INFO: rcu_sched self-detected stall on CPU ... Call Trace: [<900000000024f9e4>] show_stack+0x5c/0x180 [<90000000002482f4>] dump_stack_lvl+0x94/0xbc [<9000000000224544>] rcu_dump_cpu_stacks+0x1fc/0x280 [<900000000037ac80>] rcu_sched_clock_irq+0x720/0xf88 [<9000000000396c34>] update_process_times+0xb4/0x150 [<90000000003b2474>] tick_nohz_handler+0xf4/0x250 [<9000000000397e28>] __hrtimer_run_queues+0x1d0/0x428 [<9000000000399b2c>] hrtimer_interrupt+0x214/0x538 [<9000000000253634>] constant_timer_interrupt+0x64/0x80 [<9000000000349938>] __handle_irq_event_percpu+0x78/0x1a0 [<9000000000349a78>] handle_irq_event_percpu+0x18/0x88 [<9000000000354c00>] handle_percpu_irq+0x90/0xf0 [<9000000000348c74>] handle_irq_desc+0x94/0xb8 [<9000000001012b28>] handle_cpu_irq+0x68/0xa0 [<9000000001def8c0>] handle_loongarch_irq+0x30/0x48 [<9000000001def958>] do_vint+0x80/0xd0 [<9000000000268a0c>] kasan_mem_to_shadow.part.0+0x2c/0x2a0 [<90000000006344f4>] __asan_load8+0x4c/0x120 [<900000000025c0d0>] module_frob_arch_sections+0x5c8/0x6b8 [<90000000003895f0>] load_module+0x9e0/0x2958 [<900000000038b770>] __do_sys_init_module+0x208/0x2d0 [<9000000001df0c34>] do_syscall+0x94/0x190 [<900000000024d6fc>] handle_syscall+0xbc/0x158 After analysis, this is because the slow speed of loading the amdgpu module leads to the long time occupation of the cpu and then the soft deadlock. When loading a module, module_frob_arch_sections() tries to figure out the number of PLTs/GOTs that will be needed to handle all the RELAs. It will call the count_max_entries() to find in an out-of-order date which counting algorithm has O(n^2) complexity. To make it faster, we sort the relocation list by info and addend. That way, to check for a duplicate relocation, it just needs to compare with the previous entry. This reduces the complexity of the algorithm to O(n log n), as done in commit d4e0340 ("arm64/module: Optimize module load time by optimizing PLT counting"). This gives sinificant reduction in module load time for modules with large number of relocations. After applying this patch, the soft deadlock problem has been solved, and the kernel starts normally without "Call Trace". Using the default configuration to test some modules, the results are as follows: Module Size ip_tables 36K fat 143K radeon 2.5MB amdgpu 16MB Without this patch: Module Module load time (ms) Count(PLTs/GOTs) ip_tables 18 59/6 fat 0 162/14 radeon 54 1221/84 amdgpu 1411 4525/1098 With this patch: Module Module load time (ms) Count(PLTs/GOTs) ip_tables 18 59/6 fat 0 162/14 radeon 22 1221/84 amdgpu 45 4525/1098 Fixes: fcdfe9d ("LoongArch: Add ELF and module support") Signed-off-by: Kanglong Wang <wangkanglong@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Sasha Levin <sashal@kernel.org>
1 parent c83e687 commit a096b02

File tree

1 file changed

+18
-18
lines changed

1 file changed

+18
-18
lines changed

arch/loongarch/kernel/module-sections.c

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88
#include <linux/module.h>
99
#include <linux/moduleloader.h>
1010
#include <linux/ftrace.h>
11+
#include <linux/sort.h>
1112

1213
Elf_Addr module_emit_got_entry(struct module *mod, Elf_Shdr *sechdrs, Elf_Addr val)
1314
{
@@ -61,39 +62,38 @@ Elf_Addr module_emit_plt_entry(struct module *mod, Elf_Shdr *sechdrs, Elf_Addr v
6162
return (Elf_Addr)&plt[nr];
6263
}
6364

64-
static int is_rela_equal(const Elf_Rela *x, const Elf_Rela *y)
65-
{
66-
return x->r_info == y->r_info && x->r_addend == y->r_addend;
67-
}
65+
#define cmp_3way(a, b) ((a) < (b) ? -1 : (a) > (b))
6866

69-
static bool duplicate_rela(const Elf_Rela *rela, int idx)
67+
static int compare_rela(const void *x, const void *y)
7068
{
71-
int i;
69+
int ret;
70+
const Elf_Rela *rela_x = x, *rela_y = y;
7271

73-
for (i = 0; i < idx; i++) {
74-
if (is_rela_equal(&rela[i], &rela[idx]))
75-
return true;
76-
}
72+
ret = cmp_3way(rela_x->r_info, rela_y->r_info);
73+
if (ret == 0)
74+
ret = cmp_3way(rela_x->r_addend, rela_y->r_addend);
7775

78-
return false;
76+
return ret;
7977
}
8078

8179
static void count_max_entries(Elf_Rela *relas, int num,
8280
unsigned int *plts, unsigned int *gots)
8381
{
84-
unsigned int i, type;
82+
unsigned int i;
83+
84+
sort(relas, num, sizeof(Elf_Rela), compare_rela, NULL);
8585

8686
for (i = 0; i < num; i++) {
87-
type = ELF_R_TYPE(relas[i].r_info);
88-
switch (type) {
87+
if (i && !compare_rela(&relas[i-1], &relas[i]))
88+
continue;
89+
90+
switch (ELF_R_TYPE(relas[i].r_info)) {
8991
case R_LARCH_SOP_PUSH_PLT_PCREL:
9092
case R_LARCH_B26:
91-
if (!duplicate_rela(relas, i))
92-
(*plts)++;
93+
(*plts)++;
9394
break;
9495
case R_LARCH_GOT_PC_HI20:
95-
if (!duplicate_rela(relas, i))
96-
(*gots)++;
96+
(*gots)++;
9797
break;
9898
default:
9999
break; /* Do nothing. */

0 commit comments

Comments
 (0)