[gccld] When seeing -l<lib>, should add a corresponding -load=<lib> to script #512

llvmbot · 2003-11-20T03:39:32Z


Bugzilla Link	140
Resolution	FIXED
Resolved on	Feb 22, 2010 12:48
Version	1.0
OS	All
Reporter	LLVM Bugzilla Contributor
CC	@lattner

Extended Description

I'm getting tired of manually adding -load=/usr/lib... lines to every new lli
runner script to run applications. This is made worse when an application
decides to run an intermediate executable during its full program compilation cycle.

In essence, since gccld has all this information on the command line, it should
be nice and spit out an appropriate runner file.

lattner · 2003-11-20T06:33:25Z

That would be a cool feature. :)

-Chris

llvmbot · 2003-11-20T09:11:47Z

As they say, if it scratches your itch, well, you know... :)
This one's mine. Big time. The fix is pretty much done.

llvmbot · 2003-11-20T21:11:13Z

Fixed:

http://mail.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20031117/009712.html

With this patch, programs magically start working without manual intervention!
Life is good.

llvmbot · 2003-12-12T21:34:12Z

This is really an enhancement.

llvmbot · 2003-12-12T21:38:52Z

One could argue that it's not an enhancement since the programs wouldn't even
run in the JIT without this. In testing Xlib apps, I had to manually insert
-load= lines to the runner script to run the program, which may be
considered an inconvenience, but in reality, if you can't run the resulting file
after compiling it, it's broken.

…d loads. Trunk would try to create something like "stp x9, x8, [x0], llvm#512", which isn't actually a valid instruction. Differential revision: https://reviews.llvm.org/D23368 llvm-svn: 278559

Summary: The greedy register allocator occasionally decides to insert a large number of unnecessary copies, see below for an example. The -consider-local-interval-cost option (which X86 already enables by default) fixes this. We enable this option for AArch64 only after receiving feedback that this change is not beneficial for PowerPC. We evaluated the impact of this change on compile time, code size and performance benchmarks. This option has a small impact on compile time, measured on CTMark. A 0.1% geomean regression on -O1 and -O2, and 0.2% geomean for -O3, with at most 0.5% on individual benchmarks. The effect on both code size and performance on AArch64 for the LLVM test suite is nil on the geomean with individual outliers (ignoring short exec_times) between: best worst size..text -3.3% +0.0% exec_time -5.8% +2.3% On SPEC CPU® 2017 (compiled for AArch64) there is a minor reduction (-0.2% at most) in code size on some benchmarks, with a tiny movement (-0.01%) on the geomean. Neither intrate nor fprate show any change in performance. This patch makes the following changes. - For the AArch64 target, enableAdvancedRASplitCost() now returns true. - Ensures that -consider-local-interval-cost=false can disable the new behaviour if necessary. This matrix multiply example: $ cat test.c long A[8][8]; long B[8][8]; long C[8][8]; void run_test() { for (int k = 0; k < 8; k++) { for (int i = 0; i < 8; i++) { for (int j = 0; j < 8; j++) { C[i][j] += A[i][k] * B[k][j]; } } } } results in the following generated code on AArch64: $ clang --target=aarch64-arm-none-eabi -O3 -S test.c -o - [...] // %for.cond1.preheader // =>This Inner Loop Header: Depth=1 add x14, x11, x9 str q0, [sp, llvm#16] // 16-byte Folded Spill ldr q0, [x14] mov v2.16b, v15.16b mov v15.16b, v14.16b mov v14.16b, v13.16b mov v13.16b, v12.16b mov v12.16b, v11.16b mov v11.16b, v10.16b mov v10.16b, v9.16b mov v9.16b, v8.16b mov v8.16b, v31.16b mov v31.16b, v30.16b mov v30.16b, v29.16b mov v29.16b, v28.16b mov v28.16b, v27.16b mov v27.16b, v26.16b mov v26.16b, v25.16b mov v25.16b, v24.16b mov v24.16b, v23.16b mov v23.16b, v22.16b mov v22.16b, v21.16b mov v21.16b, v20.16b mov v20.16b, v19.16b mov v19.16b, v18.16b mov v18.16b, v17.16b mov v17.16b, v16.16b mov v16.16b, v7.16b mov v7.16b, v6.16b mov v6.16b, v5.16b mov v5.16b, v4.16b mov v4.16b, v3.16b mov v3.16b, v1.16b mov x12, v0.d[1] fmov x15, d0 ldp q1, q0, [x14, llvm#16] ldur x1, [x10, #-256] ldur x2, [x10, #-192] add x9, x9, llvm#64 // =64 mov x13, v1.d[1] fmov x16, d1 ldr q1, [x14, llvm#48] mul x3, x15, x1 mov x14, v0.d[1] fmov x17, d0 mov x18, v1.d[1] fmov x0, d1 mov v1.16b, v3.16b mov v3.16b, v4.16b mov v4.16b, v5.16b mov v5.16b, v6.16b mov v6.16b, v7.16b mov v7.16b, v16.16b mov v16.16b, v17.16b mov v17.16b, v18.16b mov v18.16b, v19.16b mov v19.16b, v20.16b mov v20.16b, v21.16b mov v21.16b, v22.16b mov v22.16b, v23.16b mov v23.16b, v24.16b mov v24.16b, v25.16b mov v25.16b, v26.16b mov v26.16b, v27.16b mov v27.16b, v28.16b mov v28.16b, v29.16b mov v29.16b, v30.16b mov v30.16b, v31.16b mov v31.16b, v8.16b mov v8.16b, v9.16b mov v9.16b, v10.16b mov v10.16b, v11.16b mov v11.16b, v12.16b mov v12.16b, v13.16b mov v13.16b, v14.16b mov v14.16b, v15.16b mov v15.16b, v2.16b ldr q2, [sp] // 16-byte Folded Reload fmov d0, x3 mul x3, x12, x1 [...] With -consider-local-interval-cost the same section of code results in the following: $ clang --target=aarch64-arm-none-eabi -mllvm -consider-local-interval-cost -O3 -S test.c -o - [...] .LBB0_1: // %for.cond1.preheader // =>This Inner Loop Header: Depth=1 add x14, x11, x9 ldp q0, q1, [x14] ldur x1, [x10, #-256] ldur x2, [x10, #-192] add x9, x9, llvm#64 // =64 mov x12, v0.d[1] fmov x15, d0 mov x13, v1.d[1] fmov x16, d1 ldp q0, q1, [x14, llvm#32] mul x3, x15, x1 cmp x9, llvm#512 // =512 mov x14, v0.d[1] fmov x17, d0 fmov d0, x3 mul x3, x12, x1 [...] Reviewers: SjoerdMeijer, samparker, dmgreen, qcolombet Reviewed By: dmgreen Subscribers: ZhangKang, jsji, wuzish, ppc-slack, lkail, steven.zhang, MatzeB, qcolombet, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69437

------------------------------------------------------------------------ r278559 | efriedma | 2016-08-12 13:28:02 -0700 (Fri, 12 Aug 2016) | 7 lines [AArch64LoadStoreOpt] Handle offsets correctly for post-indexed paired loads. Trunk would try to create something like "stp x9, x8, [x0], llvm#512", which isn't actually a valid instruction. Differential revision: https://reviews.llvm.org/D23368 ------------------------------------------------------------------------ llvm-svn: 279123

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 1, 2021

rickyzhang012500 mentioned this issue Mar 1, 2023

[AArch64] llvm-bolt crash on aarch64 platform #61075

Closed

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[gccld] When seeing -l<lib>, should add a corresponding -load=<lib> to script #512

[gccld] When seeing -l<lib>, should add a corresponding -load=<lib> to script #512

llvmbot commented Nov 20, 2003

lattner commented Nov 20, 2003

llvmbot commented Nov 20, 2003

llvmbot commented Nov 20, 2003

llvmbot commented Dec 12, 2003

llvmbot commented Dec 12, 2003

[gccld] When seeing -l<lib>, should add a corresponding -load=<lib> to script #512

[gccld] When seeing -l<lib>, should add a corresponding -load=<lib> to script #512

Comments

llvmbot commented Nov 20, 2003

Extended Description

lattner commented Nov 20, 2003

llvmbot commented Nov 20, 2003

llvmbot commented Nov 20, 2003

llvmbot commented Dec 12, 2003

llvmbot commented Dec 12, 2003