Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[gccld] When seeing -l<lib>, should add a corresponding -load=<lib> to script #512

Closed
llvmbot opened this issue Nov 20, 2003 · 5 comments
Closed

Comments

@llvmbot
Copy link
Collaborator

llvmbot commented Nov 20, 2003

Bugzilla Link 140
Resolution FIXED
Resolved on Feb 22, 2010 12:48
Version 1.0
OS All
Reporter LLVM Bugzilla Contributor
CC @lattner

Extended Description

I'm getting tired of manually adding -load=/usr/lib... lines to every new lli
runner script to run applications. This is made worse when an application
decides to run an intermediate executable during its full program compilation cycle.

In essence, since gccld has all this information on the command line, it should
be nice and spit out an appropriate runner file.

@lattner
Copy link
Collaborator

lattner commented Nov 20, 2003

That would be a cool feature. :)

-Chris

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 20, 2003

As they say, if it scratches your itch, well, you know... :)
This one's mine. Big time. The fix is pretty much done.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 20, 2003

Fixed:

http://mail.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20031117/009712.html

With this patch, programs magically start working without manual intervention!
Life is good.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Dec 12, 2003

This is really an enhancement.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Dec 12, 2003

One could argue that it's not an enhancement since the programs wouldn't even
run in the JIT without this. In testing Xlib apps, I had to manually insert
-load= lines to the runner script to run the program, which may be
considered an inconvenience, but in reality, if you can't run the resulting file
after compiling it, it's broken.

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 1, 2021
trevor-m pushed a commit to trevor-m/llvm-project that referenced this issue Apr 20, 2023
…d loads.

Trunk would try to create something like "stp x9, x8, [x0], llvm#512", which isn't actually a valid instruction.

Differential revision: https://reviews.llvm.org/D23368

llvm-svn: 278559
trevor-m pushed a commit to trevor-m/llvm-project that referenced this issue Apr 20, 2023
Summary:
The greedy register allocator occasionally decides to insert a large number of
unnecessary copies, see below for an example.  The -consider-local-interval-cost
option (which X86 already enables by default) fixes this.  We enable this option
for AArch64 only after receiving feedback that this change is not beneficial for
PowerPC.

We evaluated the impact of this change on compile time, code size and
performance benchmarks.

This option has a small impact on compile time, measured on CTMark. A 0.1%
geomean regression on -O1 and -O2, and 0.2% geomean for -O3, with at most 0.5%
on individual benchmarks.

The effect on both code size and performance on AArch64 for the LLVM test suite
is nil on the geomean with individual outliers (ignoring short exec_times)
between:

                 best     worst
  size..text     -3.3%    +0.0%
  exec_time      -5.8%    +2.3%

On SPEC CPU® 2017 (compiled for AArch64) there is a minor reduction (-0.2% at
most) in code size on some benchmarks, with a tiny movement (-0.01%) on the
geomean.  Neither intrate nor fprate show any change in performance.

This patch makes the following changes.

- For the AArch64 target, enableAdvancedRASplitCost() now returns true.

- Ensures that -consider-local-interval-cost=false can disable the new
  behaviour if necessary.

This matrix multiply example:

   $ cat test.c
   long A[8][8];
   long B[8][8];
   long C[8][8];

   void run_test() {
     for (int k = 0; k < 8; k++) {
       for (int i = 0; i < 8; i++) {
	 for (int j = 0; j < 8; j++) {
	   C[i][j] += A[i][k] * B[k][j];
	 }
       }
     }
   }

results in the following generated code on AArch64:

  $ clang --target=aarch64-arm-none-eabi -O3 -S test.c -o -
  [...]
                                        // %for.cond1.preheader
                                        // =>This Inner Loop Header: Depth=1
        add     x14, x11, x9
        str     q0, [sp, llvm#16]           // 16-byte Folded Spill
        ldr     q0, [x14]
        mov     v2.16b, v15.16b
        mov     v15.16b, v14.16b
        mov     v14.16b, v13.16b
        mov     v13.16b, v12.16b
        mov     v12.16b, v11.16b
        mov     v11.16b, v10.16b
        mov     v10.16b, v9.16b
        mov     v9.16b, v8.16b
        mov     v8.16b, v31.16b
        mov     v31.16b, v30.16b
        mov     v30.16b, v29.16b
        mov     v29.16b, v28.16b
        mov     v28.16b, v27.16b
        mov     v27.16b, v26.16b
        mov     v26.16b, v25.16b
        mov     v25.16b, v24.16b
        mov     v24.16b, v23.16b
        mov     v23.16b, v22.16b
        mov     v22.16b, v21.16b
        mov     v21.16b, v20.16b
        mov     v20.16b, v19.16b
        mov     v19.16b, v18.16b
        mov     v18.16b, v17.16b
        mov     v17.16b, v16.16b
        mov     v16.16b, v7.16b
        mov     v7.16b, v6.16b
        mov     v6.16b, v5.16b
        mov     v5.16b, v4.16b
        mov     v4.16b, v3.16b
        mov     v3.16b, v1.16b
        mov     x12, v0.d[1]
        fmov    x15, d0
        ldp     q1, q0, [x14, llvm#16]
        ldur    x1, [x10, #-256]
        ldur    x2, [x10, #-192]
        add     x9, x9, llvm#64             // =64
        mov     x13, v1.d[1]
        fmov    x16, d1
        ldr     q1, [x14, llvm#48]
        mul     x3, x15, x1
        mov     x14, v0.d[1]
        fmov    x17, d0
        mov     x18, v1.d[1]
        fmov    x0, d1
        mov     v1.16b, v3.16b
        mov     v3.16b, v4.16b
        mov     v4.16b, v5.16b
        mov     v5.16b, v6.16b
        mov     v6.16b, v7.16b
        mov     v7.16b, v16.16b
        mov     v16.16b, v17.16b
        mov     v17.16b, v18.16b
        mov     v18.16b, v19.16b
        mov     v19.16b, v20.16b
        mov     v20.16b, v21.16b
        mov     v21.16b, v22.16b
        mov     v22.16b, v23.16b
        mov     v23.16b, v24.16b
        mov     v24.16b, v25.16b
        mov     v25.16b, v26.16b
        mov     v26.16b, v27.16b
        mov     v27.16b, v28.16b
        mov     v28.16b, v29.16b
        mov     v29.16b, v30.16b
        mov     v30.16b, v31.16b
        mov     v31.16b, v8.16b
        mov     v8.16b, v9.16b
        mov     v9.16b, v10.16b
        mov     v10.16b, v11.16b
        mov     v11.16b, v12.16b
        mov     v12.16b, v13.16b
        mov     v13.16b, v14.16b
        mov     v14.16b, v15.16b
        mov     v15.16b, v2.16b
        ldr     q2, [sp]                // 16-byte Folded Reload
        fmov    d0, x3
        mul     x3, x12, x1
  [...]

With -consider-local-interval-cost the same section of code results in the
following:

  $ clang --target=aarch64-arm-none-eabi -mllvm -consider-local-interval-cost -O3 -S test.c -o -
  [...]
  .LBB0_1:                              // %for.cond1.preheader
                                        // =>This Inner Loop Header: Depth=1
        add     x14, x11, x9
        ldp     q0, q1, [x14]
        ldur    x1, [x10, #-256]
        ldur    x2, [x10, #-192]
        add     x9, x9, llvm#64             // =64
        mov     x12, v0.d[1]
        fmov    x15, d0
        mov     x13, v1.d[1]
        fmov    x16, d1
        ldp     q0, q1, [x14, llvm#32]
        mul     x3, x15, x1
        cmp     x9, llvm#512                // =512
        mov     x14, v0.d[1]
        fmov    x17, d0
        fmov    d0, x3
        mul     x3, x12, x1
  [...]

Reviewers: SjoerdMeijer, samparker, dmgreen, qcolombet

Reviewed By: dmgreen

Subscribers: ZhangKang, jsji, wuzish, ppc-slack, lkail, steven.zhang, MatzeB, qcolombet, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69437
ChunyuLiao pushed a commit to ruyisdk/llvm-project that referenced this issue Jun 21, 2023
------------------------------------------------------------------------
r278559 | efriedma | 2016-08-12 13:28:02 -0700 (Fri, 12 Aug 2016) | 7 lines

[AArch64LoadStoreOpt] Handle offsets correctly for post-indexed paired loads.

Trunk would try to create something like "stp x9, x8, [x0], llvm#512", which isn't actually a valid instruction.

Differential revision: https://reviews.llvm.org/D23368


------------------------------------------------------------------------

llvm-svn: 279123
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants