Skip to content

[AMDGPU] Optimize True16 16-bit mov and 16-bit zero #190796

@Sisyph

Description

@Sisyph

On the true16 code generation path, we often see cases where two v_mov_b16 are used, where on the fake16 path one lshlrev/lshrrev is used instead. One v_mov_b16 moves 16 bits from some register to the hi or low half (equivalent to a 16 bit shift) and another v_mov_b16 writes 0 to the other half of that register. We could do this in one instruction lshlrev or lshrrev. Those shift instructions can can do 16-bit data movement and write 0 in the other 16 bits in one instruction. The fake16 path will emit that sequence. We can probably do this with an isel pattern change. See an example here:
#190036 (comment)

Metadata

Metadata

Assignees

No one assigned
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions