Skip to content

Make LLVM better at using XMM registers to perform structure moves #35093

@pcwalton

Description

@pcwalton

Here's a small snippet of Servo code from block flow fragmentation:

servo[0x100bf974e] <+6638>:  mov    r10b, byte ptr [rbp - 0x729]
servo[0x100bf9755] <+6645>:  mov    r9, qword ptr [rbp - 0x728]
servo[0x100bf975c] <+6652>:  mov    rax, qword ptr [rbp - 0x720]
servo[0x100bf9763] <+6659>:  mov    rdi, qword ptr [rbp - 0x718]
servo[0x100bf976a] <+6666>:  mov    r8, qword ptr [rbp - 0x6c8]
servo[0x100bf9771] <+6673>:  mov    rdx, qword ptr [rbp - 0x710]
servo[0x100bf9778] <+6680>:  mov    qword ptr [rbp - 0x710], rdx
servo[0x100bf977f] <+6687>:  mov    qword ptr [r13 + 0x78], r8
servo[0x100bf9783] <+6691>:  mov    qword ptr [r13 + 0x80], rdi
servo[0x100bf978a] <+6698>:  mov    qword ptr [r13 + 0x88], rax
servo[0x100bf9791] <+6705>:  mov    qword ptr [r13 + 0x90], r9
servo[0x100bf9798] <+6712>:  mov    byte ptr [r13 + 0x98], r10b
servo[0x100bf979f] <+6719>:  mov    al, byte ptr [rbp - 0x34a]
servo[0x100bf97a5] <+6725>:  mov    byte ptr [r13 + 0x9f], al
servo[0x100bf97ac] <+6732>:  mov    ax, word ptr [rbp - 0x34c]
servo[0x100bf97b3] <+6739>:  mov    word ptr [r13 + 0x9d], ax
servo[0x100bf97bb] <+6747>:  mov    eax, dword ptr [rbp - 0x350]
servo[0x100bf97c1] <+6753>:  mov    dword ptr [r13 + 0x99], eax
servo[0x100bf97c8] <+6760>:  mov    eax, dword ptr [r13 + 0x15c]
servo[0x100bf97cf] <+6767>:  test   ah, 0x6
servo[0x100bf97d2] <+6770>:  je     0x100bf9949               ; <+7145>

I see this all over the place. It should be using XMM registers instead. This is bad because: (a) it clogs up the instruction stream; (b) it's an inefficient way to perform structure moves; (c) it kills tons of registers, resulting in spills elsewhere (notice rax, rdx, rdi, r8, r9, and r10 are all dead for no good reason); (d) it puts pressure on the register allocator, making compile times worse.

Is there some way to get LLVM to emit the right thing here?

cc @eddyb @brson

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.A-codegenArea: Code generationC-enhancementCategory: An issue proposing an enhancement or a PR with one.C-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing suchE-help-wantedCall for participation: Help is requested to fix this issue.I-compiletimeIssue: Problems and improvements with respect to compile times.I-slowIssue: Problems and improvements with respect to performance of generated code.O-x86_32Target: x86 processors, 32 bit (like i686-*) (also known as IA-32, i386, i586, i686)O-x86_64Target: x86-64 processors (like x86_64-*) (also known as amd64 and x64)T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions